WIP: Implement content-hash based collision resolution (bd-89)

- Replace timestamp-based collision scoring with deterministic content hashing
- Add hashIssueContent() using SHA-256 of all substantive fields
- Modify ScoreCollisions to compare hashes and set RemapIncoming flag
- Update RemapCollisions to handle both directions (remap incoming OR existing)
- Add CollisionDetail.RemapIncoming field to control which version gets remapped
- Add unit tests for hash function and deterministic collision resolution

Status: Hash-based resolution works correctly, but TestTwoCloneCollision still
fails due to missing rename detection. After Clone B resolves collision,
Clone A needs to recognize its issue was remapped to a different ID.

Next: Add content-based rename detection during import to prevent
re-resolving already-resolved collisions.

Progress on bd-86.

Amp-Thread-ID: https://ampcode.com/threads/T-b19b49e8-b52a-463d-b052-8a526a500260
Co-authored-by: Amp <amp@ampcode.com>
This commit is contained in:
Steve Yegge
2025-10-28 17:11:40 -07:00
parent c5e7ad8d71
commit 2e87329cf8
3 changed files with 192 additions and 37 deletions

View File

@@ -83,4 +83,7 @@
{"id":"bd-84","title":"Remove unreachable utility functions","description":"Several small utility functions are unreachable:\n\nFiles to clean:\n1. `internal/storage/sqlite/hash.go` - `computeIssueContentHash` (line 17)\n - Check if entire file can be deleted if only contains this function\n\n2. `internal/config/config.go` - `FileUsed` (line 151)\n - Delete unused config helper\n\n3. `cmd/bd/git_sync_test.go` - `verifyIssueOpen` (line 300)\n - Delete dead test helper\n\n4. `internal/compact/haiku.go` - `HaikuClient.SummarizeTier2` (line 81)\n - Tier 2 summarization not implemented\n - Options: implement feature OR delete method\n\nImpact: Removes 50-100 LOC depending on decisions","acceptance_criteria":"- Remove unreachable functions\n- If entire files can be deleted (like hash.go), delete them\n- For SummarizeTier2: decide to implement or delete, document decision\n- All tests pass: `go test ./...`\n- Verify no callers exist for each function","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-28T16:20:02.434573-07:00","updated_at":"2025-10-28T16:20:02.434573-07:00"}
{"id":"bd-85","title":"Event-driven daemon architecture","description":"Replace 5-second polling sync loop with event-driven architecture that reacts instantly to changes. Eliminates stale data issues while reducing CPU ~60%. Key components: FileWatcher (fsnotify), Debouncer (500ms), RPC mutation events, optional git hooks. Target latency: \u003c500ms (vs 5000ms). See event_driven_daemon.md for full design.","status":"open","priority":1,"issue_type":"epic","created_at":"2025-10-28T16:30:27.39845-07:00","updated_at":"2025-10-28T16:30:27.39845-07:00"}
{"id":"bd-86","title":"Make two-clone workflow actually work (no hacks)","description":"TestTwoCloneCollision proves beads CANNOT handle two independent clones filing issues simultaneously. This is the basic collaborative workflow and it must work cleanly.\n\nTest location: beads_twoclone_test.go\n\nThe test creates two git clones, both file issues with same ID (test-1), --resolve-collisions remaps clone B's to test-2, but after sync:\n- Clone A has test-1=\"Issue from clone A\", test-2=\"Issue from clone B\" \n- Clone B has test-1=\"Issue from clone B\", test-2=\"Issue from clone A\"\n\nThe TITLES are swapped! Both clones have 2 issues but with opposite title assignments.\n\nWe've tried many fixes (per-project daemons, auto-sync, lamport hashing, precommit hooks) but nothing has made the test pass.\n\nGoal: Make the test pass WITHOUT hacks. The two clones should converge to identical state after sync.","acceptance_criteria":"1. TestTwoCloneCollision passes without EXPECTED FAILURE\n2. Both clones converge to identical issue database\n3. No manual conflict resolution required\n4. Git status clean in both clones\n5. bd ready output identical in both clones","status":"open","priority":0,"issue_type":"epic","created_at":"2025-10-28T16:34:53.278793-07:00","updated_at":"2025-10-28T16:34:53.278793-07:00"}
{"id":"bd-87","title":"Implement content-hash based collision resolution for deterministic convergence","description":"The current collision resolution uses creation timestamps to decide which issue to keep vs. remap. This is non-deterministic when two clones create issues at nearly the same time.\n\nRoot cause of bd-86:\n- Clone A creates test-1=\"Issue from clone A\" at T0\n- Clone B creates test-1=\"Issue from clone B\" at T0+30ms\n- Clone B syncs first, remaps Clone A's to test-2\n- Clone A syncs second, sees collision, remaps Clone B's to test-2\n- Result: titles are swapped between clones\n\nSolution:\n- Use content-based hashing (title + description + priority + type)\n- Deterministic winner: always keep issue with lower hash\n- Same collision on different clones produces same result (idempotent)\n\nImplementation:\n- Modify ScoreCollisions in internal/storage/sqlite/collision.go\n- Replace timestamp-based scoring with content hash comparison\n- Ensure hash function is stable across platforms","status":"open","priority":0,"issue_type":"task","created_at":"2025-10-28T17:04:06.145646-07:00","updated_at":"2025-10-28T17:04:06.145646-07:00"}
{"id":"bd-88","title":"Add test case for symmetric collision (both clones create same ID simultaneously)","description":"TestTwoCloneCollision demonstrates the problem, but we need a simpler unit test for the collision resolver itself.\n\nTest should verify:\n- Two issues with same ID, different content\n- Content hash determines winner deterministically \n- Result is same regardless of which clone imports first\n- No title swapping occurs\n\nThis can be a simpler test than the full integration test.","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T17:04:06.146021-07:00","updated_at":"2025-10-28T17:04:06.146021-07:00","dependencies":[{"issue_id":"bd-88","depends_on_id":"bd-86","type":"blocks","created_at":"2025-10-28T17:04:06.147846-07:00","created_by":"daemon"}]}
{"id":"bd-89","title":"Implement content-hash based collision resolution for deterministic convergence","description":"The current collision resolution uses creation timestamps to decide which issue to keep vs. remap. This is non-deterministic when two clones create issues at nearly the same time.\n\nRoot cause of bd-86:\n- Clone A creates test-1=\"Issue from clone A\" at T0\n- Clone B creates test-1=\"Issue from clone B\" at T0+30ms\n- Clone B syncs first, remaps Clone A's to test-2\n- Clone A syncs second, sees collision, remaps Clone B's to test-2\n- Result: titles are swapped between clones\n\nSolution:\n- Use content-based hashing (title + description + priority + type)\n- Deterministic winner: always keep issue with lower hash\n- Same collision on different clones produces same result (idempotent)\n\nImplementation:\n- Modify ScoreCollisions in internal/storage/sqlite/collision.go\n- Replace timestamp-based scoring with content hash comparison\n- Ensure hash function is stable across platforms","notes":"Current status: Hash-based collision resolution is implemented and unit-tested, but TestTwoCloneCollision still fails.\n\nRoot cause: After Clone B resolves collision (test-1=B, test-2=A), Clone A syncs and sees:\n- DB: test-1=A\n- JSONL: test-1=B, test-2=A\n\nClone A detects collision on test-1 and tries to resolve it independently. But it should recognize that test-2 in JSONL is actually ITS OWN issue (content match).\n\nSolution: Add content-based rename detection during import. If incoming has different ID but same content as existing issue, treat as rename/remap rather than collision.\n\nFiles modified so far:\n- internal/storage/sqlite/collision.go (hashIssueContent, ScoreCollisions, RemapCollisions)\n- internal/storage/sqlite/collision_hash_test.go (new unit tests)\n\nNext steps:\n1. Add rename detection logic to DetectCollisions\n2. When importing, match issues by content hash across IDs\n3. Delete old ID, accept new ID for content matches\n4. Only apply collision resolution for true conflicts","status":"in_progress","priority":0,"issue_type":"task","created_at":"2025-10-28T17:04:11.530026-07:00","updated_at":"2025-10-28T17:10:28.522668-07:00","dependencies":[{"issue_id":"bd-89","depends_on_id":"bd-86","type":"blocks","created_at":"2025-10-28T17:04:18.149604-07:00","created_by":"daemon"}]}
{"id":"bd-9","title":"Document bd edit command and verify MCP exclusion","description":"Follow-up from PR #152:\n1. Add \"bd edit\" to AGENTS.md with \"Humans only\" note\n2. Verify MCP server doesn't expose bd edit command\n3. Consider adding test for command registration","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-26T13:23:47.982295-07:00","updated_at":"2025-10-27T22:22:23.815469-07:00"}

View File

@@ -2,9 +2,9 @@ package sqlite
import (
"context"
"crypto/sha256"
"fmt"
"regexp"
"sort"
"strings"
"github.com/steveyegge/beads/internal/types"
@@ -23,7 +23,8 @@ type CollisionDetail struct {
IncomingIssue *types.Issue // The issue from the import file
ExistingIssue *types.Issue // The issue currently in the database
ConflictingFields []string // List of field names that differ
ReferenceScore int // Number of references to this issue (for scoring)
ReferenceScore int // Number of references to this issue (for scoring) - DEPRECATED
RemapIncoming bool // If true, remap incoming; if false, remap existing
}
// DetectCollisions compares incoming JSONL issues against DB state
@@ -146,32 +147,63 @@ func equalStringPtr(a, b *string) bool {
return *a == *b
}
// ScoreCollisions calculates reference scores for all colliding issues and sorts them
// by score ascending (fewest references first). This minimizes the total number of
// updates needed during renumbering - issues with fewer references are renumbered first.
// hashIssueContent creates a deterministic hash of an issue's content.
// Uses all substantive fields (excluding timestamps and ID) to ensure
// that identical content produces identical hashes across all clones.
func hashIssueContent(issue *types.Issue) string {
h := sha256.New()
// Hash all substantive fields in a stable order
h.Write([]byte(issue.Title))
h.Write([]byte{0}) // separator
h.Write([]byte(issue.Description))
h.Write([]byte{0})
h.Write([]byte(issue.Design))
h.Write([]byte{0})
h.Write([]byte(issue.AcceptanceCriteria))
h.Write([]byte{0})
h.Write([]byte(issue.Notes))
h.Write([]byte{0})
h.Write([]byte(issue.Status))
h.Write([]byte{0})
h.Write([]byte(fmt.Sprintf("%d", issue.Priority)))
h.Write([]byte{0})
h.Write([]byte(issue.IssueType))
h.Write([]byte{0})
h.Write([]byte(issue.Assignee))
h.Write([]byte{0})
if issue.ExternalRef != nil {
h.Write([]byte(*issue.ExternalRef))
}
return fmt.Sprintf("%x", h.Sum(nil))
}
// ScoreCollisions determines which version of each colliding issue to keep vs. remap.
// Uses deterministic content-based hashing to ensure all clones make the same decision.
//
// Reference score = text mentions + dependency references
// Decision process:
// 1. Hash both versions (existing and incoming) based on content
// 2. Keep the version with the lexicographically LOWER hash
// 3. Mark the other version for remapping
//
// This ensures:
// - Deterministic: same collision always produces same result
// - Symmetric: works regardless of which clone syncs first
// - Idempotent: converges to same state across all clones
func ScoreCollisions(ctx context.Context, s *SQLiteStorage, collisions []*CollisionDetail, allIssues []*types.Issue) error {
// Get all dependency records for efficient lookup
allDeps, err := s.GetAllDependencyRecords(ctx)
if err != nil {
return fmt.Errorf("failed to get dependency records: %w", err)
}
// Calculate reference score for each collision
// Determine which version to keep for each collision
for _, collision := range collisions {
score, err := countReferences(collision.ID, allIssues, allDeps)
if err != nil {
return fmt.Errorf("failed to count references for %s: %w", collision.ID, err)
}
collision.ReferenceScore = score
existingHash := hashIssueContent(collision.ExistingIssue)
incomingHash := hashIssueContent(collision.IncomingIssue)
// Keep the version with lower hash (deterministic winner)
// If incoming has lower hash, we need to remap existing and keep incoming
// If existing has lower hash, we need to remap incoming and keep existing
collision.RemapIncoming = existingHash < incomingHash
}
// Sort collisions by reference score ascending (fewest first)
sort.Slice(collisions, func(i, j int) bool {
return collisions[i].ReferenceScore < collisions[j].ReferenceScore
})
return nil
}
@@ -286,9 +318,15 @@ func deduplicateIncomingIssues(issues []*types.Issue) []*types.Issue {
return result
}
// RemapCollisions handles ID remapping for colliding issues
// Takes sorted collisions (fewest references first) and remaps them to new IDs
// Returns a map of old ID -> new ID for reporting
// RemapCollisions handles ID remapping for colliding issues based on content hash.
// For each collision, either the incoming or existing issue is remapped (determined by ScoreCollisions).
// Returns a map of old ID -> new ID for reporting.
//
// Process:
// 1. If RemapIncoming=true: remap incoming issue, keep existing
// 2. If RemapIncoming=false: remap existing issue, replace with incoming
//
// This ensures deterministic, symmetric collision resolution across all clones.
//
// NOTE: This function is not atomic - it performs multiple separate database operations.
// If an error occurs partway through, some issues may be created without their references
@@ -302,7 +340,7 @@ func RemapCollisions(ctx context.Context, s *SQLiteStorage, collisions []*Collis
return nil, fmt.Errorf("failed to sync ID counters: %w", err)
}
// For each collision (in order of ascending reference score)
// Process each collision based on which version should be remapped
for _, collision := range collisions {
oldID := collision.ID
@@ -317,16 +355,40 @@ func RemapCollisions(ctx context.Context, s *SQLiteStorage, collisions []*Collis
}
newID := fmt.Sprintf("%s-%d", prefix, nextID)
// Record mapping
idMapping[oldID] = newID
// Update the issue ID in the incoming issue
collision.IncomingIssue.ID = newID
// Create the issue with new ID
// Note: CreateIssue will use the ID we set
if err := s.CreateIssue(ctx, collision.IncomingIssue, "import-remap"); err != nil {
return nil, fmt.Errorf("failed to create remapped issue %s -> %s: %w", oldID, newID, err)
if collision.RemapIncoming {
// Incoming has higher hash -> remap incoming, keep existing
// Record mapping
idMapping[oldID] = newID
// Update incoming issue ID
collision.IncomingIssue.ID = newID
// Create incoming issue with new ID
if err := s.CreateIssue(ctx, collision.IncomingIssue, "import-remap"); err != nil {
return nil, fmt.Errorf("failed to create remapped incoming issue %s -> %s: %w", oldID, newID, err)
}
} else {
// Existing has higher hash -> remap existing, replace with incoming
// First, remap the existing issue to new ID
idMapping[oldID] = newID
// Create a copy of existing issue with new ID
existingCopy := *collision.ExistingIssue
existingCopy.ID = newID
if err := s.CreateIssue(ctx, &existingCopy, "import-remap"); err != nil {
return nil, fmt.Errorf("failed to create remapped existing issue %s -> %s: %w", oldID, newID, err)
}
// Delete the existing issue with old ID
if err := s.DeleteIssue(ctx, oldID); err != nil {
return nil, fmt.Errorf("failed to delete old existing issue %s: %w", oldID, err)
}
// Create incoming issue with original ID (replaces the deleted one)
if err := s.CreateIssue(ctx, collision.IncomingIssue, "import-replace"); err != nil {
return nil, fmt.Errorf("failed to create incoming issue %s: %w", oldID, err)
}
}
}

View File

@@ -0,0 +1,90 @@
package sqlite
import (
"testing"
"github.com/steveyegge/beads/internal/types"
)
func TestHashIssueContent(t *testing.T) {
issue1 := &types.Issue{
Title: "Issue from clone A",
Description: "",
Priority: 1,
IssueType: "task",
Status: "open",
}
issue2 := &types.Issue{
Title: "Issue from clone B",
Description: "",
Priority: 1,
IssueType: "task",
Status: "open",
}
hash1 := hashIssueContent(issue1)
hash2 := hashIssueContent(issue2)
// Hashes should be different
if hash1 == hash2 {
t.Errorf("Expected different hashes, got same: %s", hash1)
}
// Hashes should be deterministic
hash1Again := hashIssueContent(issue1)
if hash1 != hash1Again {
t.Errorf("Hash not deterministic: %s != %s", hash1, hash1Again)
}
t.Logf("Hash A: %s", hash1)
t.Logf("Hash B: %s", hash2)
t.Logf("A < B: %v (B wins if true)", hash1 < hash2)
}
func TestScoreCollisions_Deterministic(t *testing.T) {
existingIssue := &types.Issue{
ID: "test-1",
Title: "Issue from clone B",
Description: "",
Priority: 1,
IssueType: "task",
Status: "open",
}
incomingIssue := &types.Issue{
ID: "test-1",
Title: "Issue from clone A",
Description: "",
Priority: 1,
IssueType: "task",
Status: "open",
}
collision := &CollisionDetail{
ID: "test-1",
ExistingIssue: existingIssue,
IncomingIssue: incomingIssue,
}
// Run scoring
err := ScoreCollisions(nil, nil, []*CollisionDetail{collision}, nil)
if err != nil {
t.Fatalf("ScoreCollisions failed: %v", err)
}
existingHash := hashIssueContent(existingIssue)
incomingHash := hashIssueContent(incomingIssue)
t.Logf("Existing hash (B): %s", existingHash)
t.Logf("Incoming hash (A): %s", incomingHash)
t.Logf("Existing < Incoming: %v", existingHash < incomingHash)
// Clone B has lower hash, so it should win
// This means: RemapIncoming should be TRUE (remap incoming A, keep existing B)
if !collision.RemapIncoming {
t.Errorf("Expected RemapIncoming=true (remap incoming A, keep existing B with lower hash), got false")
} else {
t.Logf("✓ Correct: RemapIncoming=true, will remap incoming 'clone A' and keep existing 'clone B'")
}
}