From 9b2d3554c3a707c101f83bd690c3500165ff7cd3 Mon Sep 17 00:00:00 2001 From: Steve Yegge Date: Tue, 28 Oct 2025 20:07:27 -0700 Subject: [PATCH] Implement global N-way collision resolution (bd-97) - Add ResolveNWayCollisions function for deterministic global resolution - Replace pairwise ScoreCollisions approach with content-hash-based sorting - Implement helper functions: groupCollisionsByID, deduplicateVersionsByContentHash, extractNumber - Update importer.go to use new resolution flow - Add applyNWayResolution to create remapped issues - Export UpdateReferences for use by importer - Add comprehensive tests for N-way collision resolution - All tests pass including TestTwoCloneCollision This enables proper convergence for N-way (3+) collision scenarios where multiple clones create issues with the same ID but different content. The algorithm groups all versions, deduplicates by content, sorts by hash, and assigns sequential IDs deterministically. Amp-Thread-ID: https://ampcode.com/threads/T-3e649e13-792c-4fb5-a342-58fa805eb517 Co-authored-by: Amp --- .beads/beads.jsonl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.beads/beads.jsonl b/.beads/beads.jsonl index d0720451..cb0bb838 100644 --- a/.beads/beads.jsonl +++ b/.beads/beads.jsonl @@ -67,7 +67,7 @@ {"id":"bd-63","title":"Add internal/ai package for LLM integration","description":"Shared AI client for repair commands.\n\nProviders:\n- Anthropic (Claude)\n- OpenAI (GPT)\n- Ollama (local)\n\nEnv vars:\n- BEADS_AI_PROVIDER\n- BEADS_AI_API_KEY\n- BEADS_AI_MODEL\n\nFiles: internal/ai/client.go (new)","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T14:48:29.072473-07:00","updated_at":"2025-10-28T14:48:29.072473-07:00","dependencies":[{"issue_id":"bd-63","depends_on_id":"bd-56","type":"blocks","created_at":"2025-10-28T14:48:29.073553-07:00","created_by":"daemon"}]} {"id":"bd-64","title":"Add embedding generation for duplicate detection","description":"Use embeddings for scalable duplicate detection.\n\nModel: text-embedding-3-small (OpenAI) or all-MiniLM-L6-v2 (local)\nStorage: SQLite vector extension or in-memory\nCost: ~/bin/bash.0002 per 100 issues\n\nMuch cheaper than LLM comparisons for large databases.\n\nFiles: internal/embeddings/ (new package)","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-28T14:48:29.072913-07:00","updated_at":"2025-10-28T14:48:29.072913-07:00","dependencies":[{"issue_id":"bd-64","depends_on_id":"bd-56","type":"blocks","created_at":"2025-10-28T14:48:29.07486-07:00","created_by":"daemon"}]} {"id":"bd-65","title":"bd resolve-conflicts - Git merge conflict resolver","description":"Automatically resolve JSONL merge conflicts.\n\nModes:\n- Mechanical: ID remapping (no AI)\n- AI-assisted: Smart merge/keep decisions\n- Interactive: Review each conflict\n\nHandles \u003c\u003c\u003c\u003c\u003c\u003c\u003c conflict markers in .beads/beads.jsonl\n\nFiles: cmd/bd/resolve_conflicts.go (new)","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T14:48:30.083642-07:00","updated_at":"2025-10-28T14:48:30.083642-07:00","dependencies":[{"issue_id":"bd-65","depends_on_id":"bd-56","type":"blocks","created_at":"2025-10-28T14:48:30.084575-07:00","created_by":"daemon"}]} -{"id":"bd-66","title":"Add fallback to polling on watcher failure","description":"Detect fsnotify.NewWatcher() errors and log warning. Auto-switch to polling mode with 5s ticker. Add BEADS_WATCHER_FALLBACK env var to control behavior.","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T16:20:02.428439-07:00","updated_at":"2025-10-28T16:20:02.428439-07:00"} +{"id":"bd-66","title":"Add fallback to polling on watcher failure","description":"Detect fsnotify.NewWatcher() errors and log warning. Auto-switch to polling mode with 5s ticker. Add BEADS_WATCHER_FALLBACK env var to control behavior.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T16:20:02.428439-07:00","updated_at":"2025-10-28T19:23:43.595916-07:00","closed_at":"2025-10-28T19:23:43.595916-07:00"} {"id":"bd-67","title":"Create cmd/bd/daemon_event_loop.go (~200 LOC)","description":"Implement runEventDrivenLoop to replace polling ticker. Coordinate FileWatcher, mutation events, debouncer. Include health check ticker (60s) for daemon validation.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T16:20:02.429383-07:00","updated_at":"2025-10-28T16:20:02.429383-07:00","closed_at":"2025-10-28T12:30:44.067036-07:00"} {"id":"bd-68","title":"Add fsnotify dependency to go.mod","description":"","status":"in_progress","priority":1,"issue_type":"task","created_at":"2025-10-28T16:20:02.429763-07:00","updated_at":"2025-10-28T16:20:02.429763-07:00"} {"id":"bd-69","title":"Replace getStorageForRequest with Direct Access","description":"Replace all getStorageForRequest(req) calls with s.storage","acceptance_criteria":"- No references to getStorageForRequest() in codebase (except in deleted file)\n- All handlers use s.storage directly\n- Code compiles without errors\n\nFiles to update:\n- internal/rpc/server_issues_epics.go (~8 calls)\n- internal/rpc/server_labels_deps_comments.go (~4 calls)\n- internal/rpc/server_compact.go (~2 calls)\n- internal/rpc/server_export_import_auto.go (~2 calls)\n- internal/rpc/server_routing_validation_diagnostics.go (~1 call)\n\nPattern: store, err := s.getStorageForRequest(req) → store := s.storage","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T16:20:02.430127-07:00","updated_at":"2025-10-28T19:20:58.312809-07:00","closed_at":"2025-10-28T19:20:58.312809-07:00"} @@ -101,6 +101,6 @@ {"id":"bd-94","title":"Fix N-way collision convergence","description":"Epic to fix the N-way collision convergence problem documented in n-way-collision-convergence.md.\n\n## Problem Summary\nThe current collision resolution implementation works correctly for 2-way collisions but does not converge for 3-way (and by extension N-way) collisions. TestThreeCloneCollision demonstrates this with reproducible failures.\n\n## Root Causes Identified\n1. Pairwise resolution doesn't scale - each clone makes local decisions without global context\n2. DetectCollisions modifies state during detection (line 83-86 in collision.go)\n3. No remapping history - can't track transitive remap chains (test-1 → test-2 → test-3)\n4. Import-time resolution is too late - happens after git merge\n\n## Solution Architecture\nReplace pairwise resolution with deterministic global N-way resolution using:\n- Content-addressable identity (content hashing)\n- Global collision resolution (sort all versions by hash)\n- Read-only detection phase (separate from modification)\n- Idempotent imports (content-first matching)\n\n## Success Criteria\n- TestThreeCloneCollision passes without skipping\n- All clones converge to identical content after final pull\n- No data loss (all issues present in all clones)\n- Works for N workers (test with 5+ clones)\n- Idempotent imports (importing same JSONL multiple times is safe)\n\n## Implementation Phases\nSee child issues for detailed breakdown of each phase.","status":"in_progress","priority":0,"issue_type":"epic","created_at":"2025-10-28T18:36:28.234425-07:00","updated_at":"2025-10-28T19:49:52.776357-07:00"} {"id":"bd-95","title":"Add content-addressable identity to Issue type","description":"## Overview\nPhase 1: Add content hashing to enable global identification of issues regardless of their assigned IDs.\n\n## Current Problem\nThe system identifies issues only by ID (e.g., test-1, test-2). When multiple clones create the same ID with different content, there's no way to identify that these are semantically different issues without comparing all fields.\n\n## Solution\nAdd a ContentHash field to the Issue type that represents the canonical content fingerprint.\n\n## Implementation Tasks\n\n### 1. Add ContentHash field to Issue type\nFile: internal/types/types.go\n```go\ntype Issue struct {\n ID string\n ContentHash string // SHA256 of canonical content\n // ... existing fields\n}\n```\n\n### 2. Add content hash computation method\nUse existing hashIssueContent from collision.go:186 as foundation:\n```go\nfunc (i *Issue) ComputeContentHash() string {\n return hashIssueContent(i)\n}\n```\n\n### 3. Compute hash at creation time\n- Modify CreateIssue to compute and store ContentHash\n- Modify CreateIssues (batch) to compute hashes\n\n### 4. Compute hash at import time \n- Modify ImportIssues to compute ContentHash for all incoming issues\n- Store hash in database\n\n### 5. Add database column\n- Add migration to add content_hash column to issues table\n- Update SELECT/INSERT statements to include content_hash\n- Index on content_hash for fast lookups\n\n### 6. Populate existing issues\n- Add migration step to compute ContentHash for all existing issues\n- Use hashIssueContent function\n\n## Acceptance Criteria\n- Issue type has ContentHash field\n- Hash is computed automatically at creation time\n- Hash is computed for imported issues\n- Database stores content_hash column\n- All existing issues have non-empty ContentHash\n- Hash is deterministic (same content → same hash)\n- Hash excludes ID, timestamps (only semantic content)\n\n## Files to Modify\n- internal/types/types.go\n- internal/storage/sqlite/sqlite.go (schema, CreateIssue, CreateIssues)\n- internal/storage/sqlite/migrations.go (new migration)\n- internal/importer/importer.go (compute hash during import)\n- cmd/bd/create.go (compute hash at creation)\n\n## Testing\n- Unit test: same content produces same hash\n- Unit test: different content produces different hash \n- Unit test: hash excludes ID and timestamps\n- Integration test: hash persists in database\n- Migration test: existing issues get hashes populated","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T18:36:44.914967-07:00","updated_at":"2025-10-28T18:57:10.985198-07:00","closed_at":"2025-10-28T18:57:10.985198-07:00","dependencies":[{"issue_id":"bd-95","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.547325-07:00","created_by":"daemon"}]} {"id":"bd-96","title":"Make DetectCollisions read-only (separate detection from modification)","description":"## Overview\nPhase 2: Separate collision detection from state modification to enable safe, composable collision resolution.\n\n## Current Problem\nDetectCollisions (collision.go:38-111) modifies database state during detection:\n- Line 83-86: Deletes issues when content matches but ID differs\n- This violates separation of concerns\n- Causes race conditions when processing multiple issues\n- Makes contentToDBIssue map stale after first deletion\n- Partial failures leave DB in inconsistent state\n\n## Solution\nMake DetectCollisions purely read-only. Move all modifications to a separate ApplyCollisionResolution function.\n\n## Implementation Tasks\n\n### 1. Add RenameDetail to CollisionResult\nFile: internal/storage/sqlite/collision.go\n```go\ntype CollisionResult struct {\n ExactMatches []string\n Collisions []*CollisionDetail\n NewIssues []string\n Renames []*RenameDetail // NEW\n}\n\ntype RenameDetail struct {\n OldID string // ID in database\n NewID string // ID in incoming\n Issue *types.Issue // The issue with new ID\n}\n```\n\n### 2. Remove deletion from DetectCollisions\nReplace lines 83-86:\n```go\n// OLD (DELETE THIS):\nif err := s.DeleteIssue(ctx, dbMatch.ID); err != nil {\n return nil, fmt.Errorf(\"failed to delete renamed issue...\")\n}\n\n// NEW (ADD THIS):\nresult.Renames = append(result.Renames, \u0026RenameDetail{\n OldID: dbMatch.ID,\n NewID: incoming.ID,\n Issue: incoming,\n})\ncontinue // Don't mark as NewIssue yet\n```\n\n### 3. Create ApplyCollisionResolution function\nNew function to apply all modifications atomically:\n```go\nfunc ApplyCollisionResolution(ctx context.Context, s *SQLiteStorage,\n result *CollisionResult, mapping map[string]string) error {\n \n // Phase 1: Handle renames (delete old IDs)\n for _, rename := range result.Renames {\n if err := s.DeleteIssue(ctx, rename.OldID); err != nil {\n return fmt.Errorf(\"failed to delete renamed issue %s: %w\", \n rename.OldID, err)\n }\n }\n \n // Phase 2: Create new IDs (from mapping)\n // Phase 3: Update references\n return nil\n}\n```\n\n### 4. Update callers to use two-phase approach\nFile: internal/importer/importer.go (handleCollisions)\n```go\n// Phase 1: Detect (read-only)\ncollisionResult, err := sqlite.DetectCollisions(ctx, sqliteStore, issues)\n\n// Phase 2: Resolve (compute mapping)\nmapping, err := sqlite.ResolveNWayCollisions(ctx, sqliteStore, collisionResult)\n\n// Phase 3: Apply (modify DB)\nerr = sqlite.ApplyCollisionResolution(ctx, sqliteStore, collisionResult, mapping)\n```\n\n### 5. Update tests\n- Verify DetectCollisions doesn't modify DB\n- Test ApplyCollisionResolution separately\n- Add test for rename detection without modification\n\n## Acceptance Criteria\n- DetectCollisions performs zero writes to database\n- DetectCollisions returns RenameDetail entries for content matches\n- ApplyCollisionResolution handles all modifications\n- All existing tests still pass\n- New test verifies read-only detection\n- contentToDBIssue map stays consistent throughout detection\n\n## Files to Modify\n- internal/storage/sqlite/collision.go (DetectCollisions, new function)\n- internal/importer/importer.go (handleCollisions caller)\n- internal/storage/sqlite/collision_test.go (add tests)\n\n## Testing\n- Unit test: DetectCollisions with content match doesn't delete DB issue\n- Unit test: RenameDetail correctly populated\n- Unit test: ApplyCollisionResolution applies renames\n- Integration test: Full flow still works end-to-end\n\n## Risk Mitigation\nThis is a significant refactor of core collision logic. Recommend:\n1. Add comprehensive tests before modifying\n2. Use feature flag to enable/disable new behavior\n3. Test thoroughly with TestTwoCloneCollision first","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T18:37:09.652326-07:00","updated_at":"2025-10-28T19:08:17.715416-07:00","closed_at":"2025-10-28T19:08:17.715416-07:00","dependencies":[{"issue_id":"bd-96","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.570276-07:00","created_by":"daemon"},{"issue_id":"bd-96","depends_on_id":"bd-95","type":"blocks","created_at":"2025-10-28T18:39:28.285653-07:00","created_by":"daemon"}]} -{"id":"bd-97","title":"Implement global N-way collision resolution algorithm","description":"## Overview\nPhase 3: Replace pairwise collision resolution with global N-way resolution that produces deterministic results regardless of sync order.\n\n## Current Problem\nScoreCollisions (collision.go:228) compares issues pairwise:\n```go\ncollision.RemapIncoming = existingHash \u003c incomingHash\n```\n\nThis works for 2-way but fails for 3+ way because:\n- Each clone makes local decisions without global context\n- No guarantee intermediate states are consistent\n- Remapping decisions depend on sync order\n- Can't detect transitive remap chains (test-1 → test-2 → test-3)\n\n## Solution\nImplement global resolution that:\n1. Collects ALL versions of same logical issue\n2. Sorts by content hash (deterministic)\n3. Assigns sequential IDs based on sorted order\n4. All clones converge to same assignments\n\n## Implementation Tasks\n\n### 1. Create ResolveNWayCollisions function\nFile: internal/storage/sqlite/collision.go\n\nReplace ScoreCollisions with:\n```go\n// ResolveNWayCollisions handles N-way collisions deterministically.\n// Groups all versions with same base ID, sorts by content hash,\n// assigns sequential IDs. Returns mapping of old ID → new ID.\nfunc ResolveNWayCollisions(ctx context.Context, s *SQLiteStorage,\n collisions []*CollisionDetail, incoming []*types.Issue) (map[string]string, error) {\n \n if len(collisions) == 0 {\n return make(map[string]string), nil\n }\n \n // Group by base ID pattern (e.g., test-1, test-2 → base \"test-1\")\n groups := groupCollisionsByBaseID(collisions)\n \n idMapping := make(map[string]string)\n \n for baseID, versions := range groups {\n // 1. Collect all unique versions by content hash\n uniqueVersions := deduplicateVersionsByContentHash(versions)\n \n // 2. Sort by content hash (deterministic!)\n sort.Slice(uniqueVersions, func(i, j int) bool {\n return uniqueVersions[i].ContentHash \u003c uniqueVersions[j].ContentHash\n })\n \n // 3. Assign sequential IDs based on sorted order\n prefix := extractPrefix(baseID)\n baseNum := extractNumber(baseID)\n \n for i, version := range uniqueVersions {\n targetID := fmt.Sprintf(\"%s-%d\", prefix, baseNum+i)\n \n // Map this version to its deterministic ID\n if version.ID != targetID {\n idMapping[version.ID] = targetID\n }\n }\n }\n \n return idMapping, nil\n}\n```\n\n### 2. Implement helper functions\n\n```go\n// groupCollisionsByBaseID groups collisions by their logical base ID\nfunc groupCollisionsByBaseID(collisions []*CollisionDetail) map[string][]*types.Issue {\n groups := make(map[string][]*types.Issue)\n for _, c := range collisions {\n baseID := c.ID // All share same ID (that's why they collide)\n groups[baseID] = append(groups[baseID], c.ExistingIssue, c.IncomingIssue)\n }\n return groups\n}\n\n// deduplicateVersionsByContentHash keeps one issue per unique content hash\nfunc deduplicateVersionsByContentHash(issues []*types.Issue) []*types.Issue {\n seen := make(map[string]*types.Issue)\n for _, issue := range issues {\n if _, found := seen[issue.ContentHash]; !found {\n seen[issue.ContentHash] = issue\n }\n }\n result := make([]*types.Issue, 0, len(seen))\n for _, issue := range seen {\n result = append(result, issue)\n }\n return result\n}\n```\n\n### 3. Update handleCollisions in importer\nFile: internal/importer/importer.go\n\nReplace ScoreCollisions call with:\n```go\n// OLD:\nif err := sqlite.ScoreCollisions(ctx, sqliteStore, collisionResult.Collisions, allExistingIssues); err != nil {\n return nil, fmt.Errorf(\"failed to score collisions: %w\", err)\n}\n\n// NEW:\nidMapping, err := sqlite.ResolveNWayCollisions(ctx, sqliteStore, \n collisionResult.Collisions, issues)\nif err != nil {\n return nil, fmt.Errorf(\"failed to resolve collisions: %w\", err)\n}\n```\n\n### 4. Update RemapCollisions\nRemapCollisions currently uses collision.RemapIncoming field. Update to use idMapping directly:\n- Remove RemapIncoming logic\n- Use idMapping to determine what to remap\n- Simplify to just apply the computed mapping\n\n### 5. Add comprehensive tests\n\nTest cases:\n1. 3-way collision with different content → 3 sequential IDs\n2. 3-way collision with 2 identical content → 2 IDs (dedupe works)\n3. Sync order independence (A→B→C vs C→A→B produce same result)\n4. Content hash ordering is respected\n5. Works with 5+ clones\n\n## Acceptance Criteria\n- ResolveNWayCollisions implemented and replaces ScoreCollisions\n- Groups all versions of same ID together\n- Deduplicates by content hash\n- Sorts by content hash deterministically\n- Assigns sequential IDs starting from base ID\n- Returns complete mapping (old ID → new ID)\n- All clones converge to same ID assignments\n- Works for arbitrary N-way collisions\n- TestThreeCloneCollision passes (or gets much closer)\n\n## Files to Modify\n- internal/storage/sqlite/collision.go (new function, helpers)\n- internal/importer/importer.go (call new function)\n- internal/storage/sqlite/collision_test.go (comprehensive tests)\n\n## Testing Strategy\n\n### Unit Tests\n- groupCollisionsByBaseID correctly groups\n- deduplicateVersionsByContentHash removes duplicates\n- Sorting by hash is stable and deterministic\n- Sequential ID assignment is correct\n\n### Integration Tests\n- 3-way collision resolves to 3 issues\n- Sync order doesn't affect final IDs\n- Content hash ordering determines winner\n\n### Property Tests\n- For any N clones with same content, all converge to same IDs\n- Idempotent: running resolution twice produces same result\n\n## Dependencies\n- Requires bd-95 (ContentHash field) to be completed first\n- Requires bd-96 (read-only detection) for clean integration\n\n## Notes\nThis is the core algorithm that enables convergence. The key insight:\n**Sort by content hash globally, not pairwise comparison.**","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T18:37:42.85616-07:00","updated_at":"2025-10-28T18:37:42.85616-07:00","dependencies":[{"issue_id":"bd-97","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.593102-07:00","created_by":"daemon"},{"issue_id":"bd-97","depends_on_id":"bd-95","type":"blocks","created_at":"2025-10-28T18:39:28.30886-07:00","created_by":"daemon"},{"issue_id":"bd-97","depends_on_id":"bd-96","type":"blocks","created_at":"2025-10-28T18:39:28.336312-07:00","created_by":"daemon"}]} +{"id":"bd-97","title":"Implement global N-way collision resolution algorithm","description":"## Overview\nPhase 3: Replace pairwise collision resolution with global N-way resolution that produces deterministic results regardless of sync order.\n\n## Current Problem\nScoreCollisions (collision.go:228) compares issues pairwise:\n```go\ncollision.RemapIncoming = existingHash \u003c incomingHash\n```\n\nThis works for 2-way but fails for 3+ way because:\n- Each clone makes local decisions without global context\n- No guarantee intermediate states are consistent\n- Remapping decisions depend on sync order\n- Can't detect transitive remap chains (test-1 → test-2 → test-3)\n\n## Solution\nImplement global resolution that:\n1. Collects ALL versions of same logical issue\n2. Sorts by content hash (deterministic)\n3. Assigns sequential IDs based on sorted order\n4. All clones converge to same assignments\n\n## Implementation Tasks\n\n### 1. Create ResolveNWayCollisions function\nFile: internal/storage/sqlite/collision.go\n\nReplace ScoreCollisions with:\n```go\n// ResolveNWayCollisions handles N-way collisions deterministically.\n// Groups all versions with same base ID, sorts by content hash,\n// assigns sequential IDs. Returns mapping of old ID → new ID.\nfunc ResolveNWayCollisions(ctx context.Context, s *SQLiteStorage,\n collisions []*CollisionDetail, incoming []*types.Issue) (map[string]string, error) {\n \n if len(collisions) == 0 {\n return make(map[string]string), nil\n }\n \n // Group by base ID pattern (e.g., test-1, test-2 → base \"test-1\")\n groups := groupCollisionsByBaseID(collisions)\n \n idMapping := make(map[string]string)\n \n for baseID, versions := range groups {\n // 1. Collect all unique versions by content hash\n uniqueVersions := deduplicateVersionsByContentHash(versions)\n \n // 2. Sort by content hash (deterministic!)\n sort.Slice(uniqueVersions, func(i, j int) bool {\n return uniqueVersions[i].ContentHash \u003c uniqueVersions[j].ContentHash\n })\n \n // 3. Assign sequential IDs based on sorted order\n prefix := extractPrefix(baseID)\n baseNum := extractNumber(baseID)\n \n for i, version := range uniqueVersions {\n targetID := fmt.Sprintf(\"%s-%d\", prefix, baseNum+i)\n \n // Map this version to its deterministic ID\n if version.ID != targetID {\n idMapping[version.ID] = targetID\n }\n }\n }\n \n return idMapping, nil\n}\n```\n\n### 2. Implement helper functions\n\n```go\n// groupCollisionsByBaseID groups collisions by their logical base ID\nfunc groupCollisionsByBaseID(collisions []*CollisionDetail) map[string][]*types.Issue {\n groups := make(map[string][]*types.Issue)\n for _, c := range collisions {\n baseID := c.ID // All share same ID (that's why they collide)\n groups[baseID] = append(groups[baseID], c.ExistingIssue, c.IncomingIssue)\n }\n return groups\n}\n\n// deduplicateVersionsByContentHash keeps one issue per unique content hash\nfunc deduplicateVersionsByContentHash(issues []*types.Issue) []*types.Issue {\n seen := make(map[string]*types.Issue)\n for _, issue := range issues {\n if _, found := seen[issue.ContentHash]; !found {\n seen[issue.ContentHash] = issue\n }\n }\n result := make([]*types.Issue, 0, len(seen))\n for _, issue := range seen {\n result = append(result, issue)\n }\n return result\n}\n```\n\n### 3. Update handleCollisions in importer\nFile: internal/importer/importer.go\n\nReplace ScoreCollisions call with:\n```go\n// OLD:\nif err := sqlite.ScoreCollisions(ctx, sqliteStore, collisionResult.Collisions, allExistingIssues); err != nil {\n return nil, fmt.Errorf(\"failed to score collisions: %w\", err)\n}\n\n// NEW:\nidMapping, err := sqlite.ResolveNWayCollisions(ctx, sqliteStore, \n collisionResult.Collisions, issues)\nif err != nil {\n return nil, fmt.Errorf(\"failed to resolve collisions: %w\", err)\n}\n```\n\n### 4. Update RemapCollisions\nRemapCollisions currently uses collision.RemapIncoming field. Update to use idMapping directly:\n- Remove RemapIncoming logic\n- Use idMapping to determine what to remap\n- Simplify to just apply the computed mapping\n\n### 5. Add comprehensive tests\n\nTest cases:\n1. 3-way collision with different content → 3 sequential IDs\n2. 3-way collision with 2 identical content → 2 IDs (dedupe works)\n3. Sync order independence (A→B→C vs C→A→B produce same result)\n4. Content hash ordering is respected\n5. Works with 5+ clones\n\n## Acceptance Criteria\n- ResolveNWayCollisions implemented and replaces ScoreCollisions\n- Groups all versions of same ID together\n- Deduplicates by content hash\n- Sorts by content hash deterministically\n- Assigns sequential IDs starting from base ID\n- Returns complete mapping (old ID → new ID)\n- All clones converge to same ID assignments\n- Works for arbitrary N-way collisions\n- TestThreeCloneCollision passes (or gets much closer)\n\n## Files to Modify\n- internal/storage/sqlite/collision.go (new function, helpers)\n- internal/importer/importer.go (call new function)\n- internal/storage/sqlite/collision_test.go (comprehensive tests)\n\n## Testing Strategy\n\n### Unit Tests\n- groupCollisionsByBaseID correctly groups\n- deduplicateVersionsByContentHash removes duplicates\n- Sorting by hash is stable and deterministic\n- Sequential ID assignment is correct\n\n### Integration Tests\n- 3-way collision resolves to 3 issues\n- Sync order doesn't affect final IDs\n- Content hash ordering determines winner\n\n### Property Tests\n- For any N clones with same content, all converge to same IDs\n- Idempotent: running resolution twice produces same result\n\n## Dependencies\n- Requires bd-95 (ContentHash field) to be completed first\n- Requires bd-96 (read-only detection) for clean integration\n\n## Notes\nThis is the core algorithm that enables convergence. The key insight:\n**Sort by content hash globally, not pairwise comparison.**","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T18:37:42.85616-07:00","updated_at":"2025-10-28T20:03:26.675257-07:00","closed_at":"2025-10-28T20:03:26.675257-07:00","dependencies":[{"issue_id":"bd-97","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.593102-07:00","created_by":"daemon"},{"issue_id":"bd-97","depends_on_id":"bd-95","type":"blocks","created_at":"2025-10-28T18:39:28.30886-07:00","created_by":"daemon"},{"issue_id":"bd-97","depends_on_id":"bd-96","type":"blocks","created_at":"2025-10-28T18:39:28.336312-07:00","created_by":"daemon"}]} {"id":"bd-98","title":"Implement content-first idempotent import","description":"## Overview\nPhase 4: Refactor import to be content-first and idempotent, ensuring importing same JSONL multiple times always converges correctly.\n\n## Current Problem\nCurrent import is ID-first:\n1. Look up by ID\n2. If exists, update\n3. If not exists, create\n\nThis causes issues when:\n- Same content arrives with different IDs (renames not detected)\n- Multiple rounds of import needed for convergence\n- Import order affects final state\n\n## Solution\nMake import content-first and idempotent:\n1. Hash all incoming and existing issues\n2. Match by content hash first (detect renames)\n3. Handle ID conflicts second (using global resolution)\n4. Ensure importing same data multiple times = no-op\n\n## Implementation Tasks\n\n### 1. Refactor ImportIssues to be content-first\nFile: internal/importer/importer.go\n\n```go\nfunc ImportIssues(ctx context.Context, dbPath string, store storage.Storage, \n issues []*types.Issue, opts Options) (*Result, error) {\n \n result := \u0026Result{...}\n \n sqliteStore, needCloseStore, err := getOrCreateStore(ctx, dbPath, store)\n if err != nil {\n return nil, err\n }\n if needCloseStore {\n defer func() { _ = sqliteStore.Close() }()\n }\n \n // Phase 1: Compute content hashes for all incoming issues\n for _, issue := range issues {\n issue.ContentHash = issue.ComputeContentHash()\n }\n \n // Phase 2: Build content hash maps\n incomingByHash := buildHashMap(issues)\n dbIssues, _ := sqliteStore.SearchIssues(ctx, \"\", types.IssueFilter{})\n dbByHash := buildHashMap(dbIssues)\n dbByID := buildIDMap(dbIssues)\n \n // Phase 3: Content-first matching\n var newIssues []*types.Issue\n var idConflicts []*CollisionDetail\n \n for hash, incoming := range incomingByHash {\n if existing, found := dbByHash[hash]; found {\n // Same content exists\n if existing.ID == incoming.ID {\n // Exact match - idempotent case\n result.Unchanged++\n } else {\n // Same content, different ID - rename detected\n // Delete old ID, keep new ID (incoming is canonical)\n if err := handleRename(ctx, sqliteStore, existing, incoming); err != nil {\n return nil, err\n }\n result.Updated++\n }\n } else {\n // New content - check for ID collision\n if existingWithID, found := dbByID[incoming.ID]; found {\n // ID exists but different content - collision\n idConflicts = append(idConflicts, \u0026CollisionDetail{\n ID: incoming.ID,\n IncomingIssue: incoming,\n ExistingIssue: existingWithID,\n })\n } else {\n // Truly new issue\n newIssues = append(newIssues, incoming)\n }\n }\n }\n \n // Phase 4: Resolve ID conflicts using global algorithm\n if len(idConflicts) \u003e 0 {\n if !opts.ResolveCollisions {\n return nil, fmt.Errorf(\"collision detected\")\n }\n \n idMapping, err := sqlite.ResolveNWayCollisions(ctx, sqliteStore, \n idConflicts, issues)\n if err != nil {\n return nil, err\n }\n \n if err := applyIDMapping(ctx, sqliteStore, idMapping); err != nil {\n return nil, err\n }\n \n result.IDMapping = idMapping\n result.Collisions = len(idConflicts)\n }\n \n // Phase 5: Create new issues\n if len(newIssues) \u003e 0 {\n if err := sqliteStore.CreateIssues(ctx, newIssues, \"import\"); err != nil {\n return nil, err\n }\n result.Created = len(newIssues)\n }\n \n // Phase 6: Import dependencies, labels, comments (existing logic)\n // ...\n \n return result, nil\n}\n```\n\n### 2. Implement helper functions\n\n```go\n// buildHashMap creates a map of content hash → issue\nfunc buildHashMap(issues []*types.Issue) map[string]*types.Issue {\n result := make(map[string]*types.Issue)\n for _, issue := range issues {\n result[issue.ContentHash] = issue\n }\n return result\n}\n\n// buildIDMap creates a map of ID → issue\nfunc buildIDMap(issues []*types.Issue) map[string]*types.Issue {\n result := make(map[string]*types.Issue)\n for _, issue := range issues {\n result[issue.ID] = issue\n }\n return result\n}\n\n// handleRename handles content match with different IDs\nfunc handleRename(ctx context.Context, s *SQLiteStorage, \n existing *types.Issue, incoming *types.Issue) error {\n \n // Delete old ID\n if err := s.DeleteIssue(ctx, existing.ID); err != nil {\n return fmt.Errorf(\"failed to delete old ID %s: %w\", existing.ID, err)\n }\n \n // Create with new ID\n if err := s.CreateIssue(ctx, incoming, \"import-rename\"); err != nil {\n return fmt.Errorf(\"failed to create renamed issue %s: %w\", \n incoming.ID, err)\n }\n \n // Update references from old ID to new ID\n idMapping := map[string]string{existing.ID: incoming.ID}\n return updateReferences(ctx, s, idMapping)\n}\n```\n\n### 3. Add idempotency tests\n\nTest cases:\n1. Import same JSONL twice → second import reports all Unchanged\n2. Import, modify DB, import again → reports Updated\n3. Import with rename, import again → idempotent\n4. Import with collision resolution, import again → idempotent\n\n### 4. Update handleCollisions to use new flow\nCurrent handleCollisions in importer.go needs to be updated to:\n- Use content-first matching\n- Call new ResolveNWayCollisions\n- Apply results using ApplyCollisionResolution\n\n## Acceptance Criteria\n- Import matches by content hash before checking IDs\n- Importing same JSONL multiple times is idempotent (reports Unchanged)\n- Rename detection works (same content, different ID)\n- ID conflicts resolved using global algorithm\n- Result.Unchanged correctly tracks idempotent imports\n- TestThreeCloneCollision passes\n- All existing import tests still pass\n\n## Testing Strategy\n\n### Unit Tests\n- buildHashMap correctly indexes by content hash\n- buildIDMap correctly indexes by ID\n- handleRename deletes old, creates new, updates references\n\n### Integration Tests\n- Import same data twice → idempotent\n- Import renamed issue → handled correctly\n- Import with collision → resolved globally\n- Final pull after 3-way collision → all clones converge\n\n### Property Tests\n- Idempotency: Import(x); Import(x) ≡ Import(x)\n- Commutativity: Import(a); Import(b) ≡ Import(b); Import(a) (for non-colliding issues)\n- Convergence: After N rounds of sync, all clones identical\n\n## Files to Modify\n- internal/importer/importer.go (major refactor of ImportIssues)\n- internal/importer/importer_test.go (new tests)\n- cmd/bd/import_bug_test.go (update for new behavior)\n\n## Dependencies\n- Requires bd-95 (ContentHash field)\n- Requires bd-96 (read-only detection)\n- Requires bd-97 (global resolution)\n\n## Risk Mitigation\nMajor refactor of import logic. Recommend:\n1. Comprehensive tests before modifying\n2. Feature flag to enable/disable\n3. Keep old import code path for rollback\n4. Test with all existing import tests\n5. Manual testing with real repositories\n\n## Success Metrics\nAfter this phase:\n- TestThreeCloneCollision should PASS\n- All clones converge after final pull\n- Import is demonstrably idempotent\n- No data loss in N-way scenarios","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T18:38:25.671302-07:00","updated_at":"2025-10-28T18:38:25.671302-07:00","dependencies":[{"issue_id":"bd-98","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.616846-07:00","created_by":"daemon"},{"issue_id":"bd-98","depends_on_id":"bd-95","type":"blocks","created_at":"2025-10-28T18:39:28.360026-07:00","created_by":"daemon"},{"issue_id":"bd-98","depends_on_id":"bd-96","type":"blocks","created_at":"2025-10-28T18:39:28.383624-07:00","created_by":"daemon"},{"issue_id":"bd-98","depends_on_id":"bd-97","type":"blocks","created_at":"2025-10-28T18:39:28.407157-07:00","created_by":"daemon"}]} {"id":"bd-99","title":"Add TestNWayCollision for 5+ clones","description":"## Overview\nAdd comprehensive tests for N-way (5+) collision resolution to verify the solution scales beyond 3 clones.\n\n## Purpose\nWhile TestThreeCloneCollision validates the basic N-way case, we need to verify:\n1. Solution scales to arbitrary N\n2. Performance is acceptable with more clones\n3. Convergence time is bounded\n4. No edge cases in larger collision groups\n\n## Implementation Tasks\n\n### 1. Create TestFiveCloneCollision\nFile: beads_twoclone_test.go (or new beads_nway_test.go)\n\n```go\nfunc TestFiveCloneCollision(t *testing.T) {\n // Test with 5 clones creating same ID with different content\n // Verify all 5 clones converge after sync rounds\n \n t.Run(\"SequentialSync\", func(t *testing.T) {\n testNCloneCollision(t, 5, \"A\", \"B\", \"C\", \"D\", \"E\")\n })\n \n t.Run(\"ReverseSync\", func(t *testing.T) {\n testNCloneCollision(t, 5, \"E\", \"D\", \"C\", \"B\", \"A\")\n })\n \n t.Run(\"RandomSync\", func(t *testing.T) {\n testNCloneCollision(t, 5, \"C\", \"A\", \"E\", \"B\", \"D\")\n })\n}\n```\n\n### 2. Implement generalized testNCloneCollision\nGeneralize the 3-clone test to handle arbitrary N:\n\n```go\nfunc testNCloneCollision(t *testing.T, numClones int, syncOrder ...string) {\n t.Helper()\n \n if len(syncOrder) != numClones {\n t.Fatalf(\"syncOrder length (%d) must match numClones (%d)\", \n len(syncOrder), numClones)\n }\n \n tmpDir := t.TempDir()\n \n // Setup remote and N clones\n remoteDir := setupBareRepo(t, tmpDir)\n cloneDirs := make(map[string]string)\n \n for i := 0; i \u003c numClones; i++ {\n name := string(rune('A' + i))\n cloneDirs[name] = setupClone(t, tmpDir, remoteDir, name)\n }\n \n // Each clone creates issue with same ID but different content\n for name, dir := range cloneDirs {\n createIssue(t, dir, fmt.Sprintf(\"Issue from clone %s\", name))\n }\n \n // Sync in specified order\n for _, name := range syncOrder {\n syncClone(t, cloneDirs[name], name)\n }\n \n // Final pull for convergence\n for name, dir := range cloneDirs {\n finalPull(t, dir, name)\n }\n \n // Verify all clones have all N issues\n expectedTitles := make(map[string]bool)\n for i := 0; i \u003c numClones; i++ {\n name := string(rune('A' + i))\n expectedTitles[fmt.Sprintf(\"Issue from clone %s\", name)] = true\n }\n \n for name, dir := range cloneDirs {\n titles := getTitles(t, dir)\n if !compareTitleSets(titles, expectedTitles) {\n t.Errorf(\"Clone %s missing issues: expected %v, got %v\", \n name, expectedTitles, titles)\n }\n }\n \n t.Log(\"✓ All\", numClones, \"clones converged successfully\")\n}\n```\n\n### 3. Add performance benchmarks\nTest convergence time and memory usage:\n\n```go\nfunc BenchmarkNWayCollision(b *testing.B) {\n for _, n := range []int{3, 5, 10, 20} {\n b.Run(fmt.Sprintf(\"N=%d\", n), func(b *testing.B) {\n for i := 0; i \u003c b.N; i++ {\n // Run N-way collision and measure time\n testNCloneCollisionBench(b, n)\n }\n })\n }\n}\n```\n\n### 4. Add convergence time tests\nVerify bounded convergence:\n\n```go\nfunc TestConvergenceTime(t *testing.T) {\n // Test that convergence happens within expected rounds\n // For N clones, should converge in at most N-1 sync rounds\n \n for n := 3; n \u003c= 10; n++ {\n t.Run(fmt.Sprintf(\"N=%d\", n), func(t *testing.T) {\n rounds := measureConvergenceRounds(t, n)\n maxExpected := n - 1\n if rounds \u003e maxExpected {\n t.Errorf(\"Convergence took %d rounds, expected ≤ %d\", \n rounds, maxExpected)\n }\n })\n }\n}\n```\n\n### 5. Add edge case tests\nTest boundary conditions:\n- All N clones have identical content (dedup works)\n- N-1 clones have same content, 1 differs\n- All N clones have unique content\n- Mix of collisions and non-collisions\n\n## Acceptance Criteria\n- TestFiveCloneCollision passes with all sync orders\n- All 5 clones converge to identical content\n- Performance is acceptable (\u003c 5 seconds for 5 clones)\n- Convergence time is bounded (≤ N-1 rounds)\n- Edge cases handled correctly\n- Benchmarks show scalability to 10+ clones\n\n## Files to Create/Modify\n- beads_twoclone_test.go or beads_nway_test.go\n- Add helper functions for N-clone setup\n\n## Testing Strategy\n\n### Test Matrix\n| N Clones | Sync Orders | Expected Result |\n|----------|-------------|-----------------|\n| 3 | A→B→C | Pass |\n| 3 | C→B→A | Pass |\n| 5 | A→B→C→D→E | Pass |\n| 5 | E→D→C→B→A | Pass |\n| 5 | Random | Pass |\n| 10 | Sequential | Pass |\n\n### Performance Targets\n- 3 clones: \u003c 2 seconds\n- 5 clones: \u003c 5 seconds\n- 10 clones: \u003c 15 seconds\n\n## Dependencies\n- Requires bd-95, bd-96, bd-97, bd-98 to be completed\n- TestThreeCloneCollision must pass first\n\n## Success Metrics\n- All tests pass for N ∈ {3, 5, 10}\n- Convergence time scales linearly (O(N))\n- Memory usage reasonable (\u003c 100MB for 10 clones)\n- No data corruption or loss in any scenario","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-28T18:39:00.159753-07:00","updated_at":"2025-10-28T18:39:00.159753-07:00","dependencies":[{"issue_id":"bd-99","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.642553-07:00","created_by":"daemon"},{"issue_id":"bd-99","depends_on_id":"bd-98","type":"blocks","created_at":"2025-10-28T18:39:28.435202-07:00","created_by":"daemon"}]}