From a51a9297d6014b52a230b256ec1ed40e579a7ff0 Mon Sep 17 00:00:00 2001 From: Steve Yegge Date: Tue, 28 Oct 2025 19:54:04 -0700 Subject: [PATCH] bd sync: 2025-10-28 19:54:04 --- .beads/beads.jsonl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.beads/beads.jsonl b/.beads/beads.jsonl index d91e2df1..79e60af3 100644 --- a/.beads/beads.jsonl +++ b/.beads/beads.jsonl @@ -98,7 +98,7 @@ {"id":"bd-91","title":"Fix TestTwoCloneCollision to compare content not timestamps","description":"The test at beads_twoclone_test.go:204-207 currently compares full JSON output including timestamps, causing false negative failures.\n\nCurrent behavior:\n- Both clones converge to identical semantic content\n- Clone A: test-2=\"Issue from clone A\", test-1=\"Issue from clone B\"\n- Clone B: test-1=\"Issue from clone B\", test-2=\"Issue from clone A\"\n- Titles match IDs correctly, no data corruption\n- Only timestamps differ (expected and acceptable)\n\nFix needed:\n- Replace exact JSON comparison with content-aware comparison\n- Normalize or ignore timestamp fields when asserting convergence\n- Test should PASS after this fix\n\nThis blocks completion of bd-86.","acceptance_criteria":"- Test compares issue content (title, description, status, priority) not timestamps\n- TestTwoCloneCollision passes\n- Both clones shown to have identical semantic content\n- Timestamps explicitly documented as acceptable difference","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T17:58:52.057194-07:00","updated_at":"2025-10-28T18:01:38.751895-07:00","closed_at":"2025-10-28T18:01:38.751895-07:00","dependencies":[{"issue_id":"bd-91","depends_on_id":"bd-90","type":"parent-child","created_at":"2025-10-28T17:58:52.058202-07:00","created_by":"stevey"},{"issue_id":"bd-91","depends_on_id":"bd-86","type":"blocks","created_at":"2025-10-28T17:58:52.05873-07:00","created_by":"stevey"}]} {"id":"bd-92","title":"Add TestThreeCloneCollision for regression protection","description":"Add a 3-clone collision test to document behavior and provide regression protection.\n\nPurpose:\n- Verify content convergence regardless of sync order\n- Document the ID non-determinism behavior (IDs may be assigned differently based on sync order)\n- Provide regression protection for multi-way collisions\n\nTest design:\n- 3 clones create same ID with different content\n- Test two different sync orders (A→B→C vs C→A→B)\n- Assert content sets match (ignore specific ID assignments)\n- Add comment explaining ID non-determinism is expected behavior\n\nKnown limitation:\n- Content always converges correctly (all issues present with correct titles)\n- Numeric ID assignments (test-2 vs test-3) depend on sync order\n- This is acceptable if content convergence is the primary goal","acceptance_criteria":"- TestThreeCloneCollision added to beads_twoclone_test.go (or new file)\n- Tests 3 clones with same ID collision\n- Tests two different sync orders\n- Asserts content convergence (all issues present, correct titles)\n- Documents ID non-determinism in test comments\n- Test passes consistently","status":"closed","priority":2,"issue_type":"task","created_at":"2025-10-28T17:59:05.941735-07:00","updated_at":"2025-10-28T18:09:12.717604-07:00","closed_at":"2025-10-28T18:09:12.717604-07:00","dependencies":[{"issue_id":"bd-92","depends_on_id":"bd-90","type":"parent-child","created_at":"2025-10-28T17:59:05.942783-07:00","created_by":"stevey"}]} {"id":"bd-93","title":"Document 3-clone ID non-determinism in collision resolution","description":"Document the known behavior of 3+ way collision resolution where ID assignments may vary based on sync order, even though content always converges correctly.\n\nUpdates needed:\n- Update bd-86 notes to mark 2-clone case as solved\n- Document 3-clone ID non-determinism as known limitation\n- Add explanation to ADVANCED.md or collision resolution docs\n- Explain why this happens (pairwise hash comparison is deterministic, but multi-way ID allocation uses sync-order dependent counters)\n- Clarify trade-offs: content convergence ✅ vs ID stability ❌\n\nKey points to document:\n- Hash-based resolution is pairwise deterministic\n- Content always converges correctly (all issues present with correct data)\n- Numeric ID assignments in 3+ way collisions depend on sync order\n- This is acceptable for most use cases (content convergence is primary goal)\n- Full determinism would require complex multi-way comparison","acceptance_criteria":"- bd-86 updated with notes about 2-clone solution being complete\n- 3-clone ID non-determinism documented in ADVANCED.md or similar\n- Explanation includes why it happens and trade-offs\n- Links to TestThreeCloneCollision as demonstration\n- Users understand this is expected behavior, not a bug","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-28T17:59:21.93014-07:00","updated_at":"2025-10-28T17:59:21.93014-07:00","dependencies":[{"issue_id":"bd-93","depends_on_id":"bd-90","type":"parent-child","created_at":"2025-10-28T17:59:21.938709-07:00","created_by":"stevey"}]} -{"id":"bd-94","title":"Fix N-way collision convergence","description":"Epic to fix the N-way collision convergence problem documented in n-way-collision-convergence.md.\n\n## Problem Summary\nThe current collision resolution implementation works correctly for 2-way collisions but does not converge for 3-way (and by extension N-way) collisions. TestThreeCloneCollision demonstrates this with reproducible failures.\n\n## Root Causes Identified\n1. Pairwise resolution doesn't scale - each clone makes local decisions without global context\n2. DetectCollisions modifies state during detection (line 83-86 in collision.go)\n3. No remapping history - can't track transitive remap chains (test-1 → test-2 → test-3)\n4. Import-time resolution is too late - happens after git merge\n\n## Solution Architecture\nReplace pairwise resolution with deterministic global N-way resolution using:\n- Content-addressable identity (content hashing)\n- Global collision resolution (sort all versions by hash)\n- Read-only detection phase (separate from modification)\n- Idempotent imports (content-first matching)\n\n## Success Criteria\n- TestThreeCloneCollision passes without skipping\n- All clones converge to identical content after final pull\n- No data loss (all issues present in all clones)\n- Works for N workers (test with 5+ clones)\n- Idempotent imports (importing same JSONL multiple times is safe)\n\n## Implementation Phases\nSee child issues for detailed breakdown of each phase.","status":"in_progress","priority":1,"issue_type":"epic","created_at":"2025-10-28T18:36:28.234425-07:00","updated_at":"2025-10-28T18:43:11.067618-07:00"} +{"id":"bd-94","title":"Fix N-way collision convergence","description":"Epic to fix the N-way collision convergence problem documented in n-way-collision-convergence.md.\n\n## Problem Summary\nThe current collision resolution implementation works correctly for 2-way collisions but does not converge for 3-way (and by extension N-way) collisions. TestThreeCloneCollision demonstrates this with reproducible failures.\n\n## Root Causes Identified\n1. Pairwise resolution doesn't scale - each clone makes local decisions without global context\n2. DetectCollisions modifies state during detection (line 83-86 in collision.go)\n3. No remapping history - can't track transitive remap chains (test-1 → test-2 → test-3)\n4. Import-time resolution is too late - happens after git merge\n\n## Solution Architecture\nReplace pairwise resolution with deterministic global N-way resolution using:\n- Content-addressable identity (content hashing)\n- Global collision resolution (sort all versions by hash)\n- Read-only detection phase (separate from modification)\n- Idempotent imports (content-first matching)\n\n## Success Criteria\n- TestThreeCloneCollision passes without skipping\n- All clones converge to identical content after final pull\n- No data loss (all issues present in all clones)\n- Works for N workers (test with 5+ clones)\n- Idempotent imports (importing same JSONL multiple times is safe)\n\n## Implementation Phases\nSee child issues for detailed breakdown of each phase.","status":"in_progress","priority":0,"issue_type":"epic","created_at":"2025-10-28T18:36:28.234425-07:00","updated_at":"2025-10-28T19:49:52.776357-07:00"} {"id":"bd-95","title":"Add content-addressable identity to Issue type","description":"## Overview\nPhase 1: Add content hashing to enable global identification of issues regardless of their assigned IDs.\n\n## Current Problem\nThe system identifies issues only by ID (e.g., test-1, test-2). When multiple clones create the same ID with different content, there's no way to identify that these are semantically different issues without comparing all fields.\n\n## Solution\nAdd a ContentHash field to the Issue type that represents the canonical content fingerprint.\n\n## Implementation Tasks\n\n### 1. Add ContentHash field to Issue type\nFile: internal/types/types.go\n```go\ntype Issue struct {\n ID string\n ContentHash string // SHA256 of canonical content\n // ... existing fields\n}\n```\n\n### 2. Add content hash computation method\nUse existing hashIssueContent from collision.go:186 as foundation:\n```go\nfunc (i *Issue) ComputeContentHash() string {\n return hashIssueContent(i)\n}\n```\n\n### 3. Compute hash at creation time\n- Modify CreateIssue to compute and store ContentHash\n- Modify CreateIssues (batch) to compute hashes\n\n### 4. Compute hash at import time \n- Modify ImportIssues to compute ContentHash for all incoming issues\n- Store hash in database\n\n### 5. Add database column\n- Add migration to add content_hash column to issues table\n- Update SELECT/INSERT statements to include content_hash\n- Index on content_hash for fast lookups\n\n### 6. Populate existing issues\n- Add migration step to compute ContentHash for all existing issues\n- Use hashIssueContent function\n\n## Acceptance Criteria\n- Issue type has ContentHash field\n- Hash is computed automatically at creation time\n- Hash is computed for imported issues\n- Database stores content_hash column\n- All existing issues have non-empty ContentHash\n- Hash is deterministic (same content → same hash)\n- Hash excludes ID, timestamps (only semantic content)\n\n## Files to Modify\n- internal/types/types.go\n- internal/storage/sqlite/sqlite.go (schema, CreateIssue, CreateIssues)\n- internal/storage/sqlite/migrations.go (new migration)\n- internal/importer/importer.go (compute hash during import)\n- cmd/bd/create.go (compute hash at creation)\n\n## Testing\n- Unit test: same content produces same hash\n- Unit test: different content produces different hash \n- Unit test: hash excludes ID and timestamps\n- Integration test: hash persists in database\n- Migration test: existing issues get hashes populated","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T18:36:44.914967-07:00","updated_at":"2025-10-28T18:57:10.985198-07:00","closed_at":"2025-10-28T18:57:10.985198-07:00","dependencies":[{"issue_id":"bd-95","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.547325-07:00","created_by":"daemon"}]} {"id":"bd-96","title":"Make DetectCollisions read-only (separate detection from modification)","description":"## Overview\nPhase 2: Separate collision detection from state modification to enable safe, composable collision resolution.\n\n## Current Problem\nDetectCollisions (collision.go:38-111) modifies database state during detection:\n- Line 83-86: Deletes issues when content matches but ID differs\n- This violates separation of concerns\n- Causes race conditions when processing multiple issues\n- Makes contentToDBIssue map stale after first deletion\n- Partial failures leave DB in inconsistent state\n\n## Solution\nMake DetectCollisions purely read-only. Move all modifications to a separate ApplyCollisionResolution function.\n\n## Implementation Tasks\n\n### 1. Add RenameDetail to CollisionResult\nFile: internal/storage/sqlite/collision.go\n```go\ntype CollisionResult struct {\n ExactMatches []string\n Collisions []*CollisionDetail\n NewIssues []string\n Renames []*RenameDetail // NEW\n}\n\ntype RenameDetail struct {\n OldID string // ID in database\n NewID string // ID in incoming\n Issue *types.Issue // The issue with new ID\n}\n```\n\n### 2. Remove deletion from DetectCollisions\nReplace lines 83-86:\n```go\n// OLD (DELETE THIS):\nif err := s.DeleteIssue(ctx, dbMatch.ID); err != nil {\n return nil, fmt.Errorf(\"failed to delete renamed issue...\")\n}\n\n// NEW (ADD THIS):\nresult.Renames = append(result.Renames, \u0026RenameDetail{\n OldID: dbMatch.ID,\n NewID: incoming.ID,\n Issue: incoming,\n})\ncontinue // Don't mark as NewIssue yet\n```\n\n### 3. Create ApplyCollisionResolution function\nNew function to apply all modifications atomically:\n```go\nfunc ApplyCollisionResolution(ctx context.Context, s *SQLiteStorage,\n result *CollisionResult, mapping map[string]string) error {\n \n // Phase 1: Handle renames (delete old IDs)\n for _, rename := range result.Renames {\n if err := s.DeleteIssue(ctx, rename.OldID); err != nil {\n return fmt.Errorf(\"failed to delete renamed issue %s: %w\", \n rename.OldID, err)\n }\n }\n \n // Phase 2: Create new IDs (from mapping)\n // Phase 3: Update references\n return nil\n}\n```\n\n### 4. Update callers to use two-phase approach\nFile: internal/importer/importer.go (handleCollisions)\n```go\n// Phase 1: Detect (read-only)\ncollisionResult, err := sqlite.DetectCollisions(ctx, sqliteStore, issues)\n\n// Phase 2: Resolve (compute mapping)\nmapping, err := sqlite.ResolveNWayCollisions(ctx, sqliteStore, collisionResult)\n\n// Phase 3: Apply (modify DB)\nerr = sqlite.ApplyCollisionResolution(ctx, sqliteStore, collisionResult, mapping)\n```\n\n### 5. Update tests\n- Verify DetectCollisions doesn't modify DB\n- Test ApplyCollisionResolution separately\n- Add test for rename detection without modification\n\n## Acceptance Criteria\n- DetectCollisions performs zero writes to database\n- DetectCollisions returns RenameDetail entries for content matches\n- ApplyCollisionResolution handles all modifications\n- All existing tests still pass\n- New test verifies read-only detection\n- contentToDBIssue map stays consistent throughout detection\n\n## Files to Modify\n- internal/storage/sqlite/collision.go (DetectCollisions, new function)\n- internal/importer/importer.go (handleCollisions caller)\n- internal/storage/sqlite/collision_test.go (add tests)\n\n## Testing\n- Unit test: DetectCollisions with content match doesn't delete DB issue\n- Unit test: RenameDetail correctly populated\n- Unit test: ApplyCollisionResolution applies renames\n- Integration test: Full flow still works end-to-end\n\n## Risk Mitigation\nThis is a significant refactor of core collision logic. Recommend:\n1. Add comprehensive tests before modifying\n2. Use feature flag to enable/disable new behavior\n3. Test thoroughly with TestTwoCloneCollision first","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-28T18:37:09.652326-07:00","updated_at":"2025-10-28T19:08:17.715416-07:00","closed_at":"2025-10-28T19:08:17.715416-07:00","dependencies":[{"issue_id":"bd-96","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.570276-07:00","created_by":"daemon"},{"issue_id":"bd-96","depends_on_id":"bd-95","type":"blocks","created_at":"2025-10-28T18:39:28.285653-07:00","created_by":"daemon"}]} {"id":"bd-97","title":"Implement global N-way collision resolution algorithm","description":"## Overview\nPhase 3: Replace pairwise collision resolution with global N-way resolution that produces deterministic results regardless of sync order.\n\n## Current Problem\nScoreCollisions (collision.go:228) compares issues pairwise:\n```go\ncollision.RemapIncoming = existingHash \u003c incomingHash\n```\n\nThis works for 2-way but fails for 3+ way because:\n- Each clone makes local decisions without global context\n- No guarantee intermediate states are consistent\n- Remapping decisions depend on sync order\n- Can't detect transitive remap chains (test-1 → test-2 → test-3)\n\n## Solution\nImplement global resolution that:\n1. Collects ALL versions of same logical issue\n2. Sorts by content hash (deterministic)\n3. Assigns sequential IDs based on sorted order\n4. All clones converge to same assignments\n\n## Implementation Tasks\n\n### 1. Create ResolveNWayCollisions function\nFile: internal/storage/sqlite/collision.go\n\nReplace ScoreCollisions with:\n```go\n// ResolveNWayCollisions handles N-way collisions deterministically.\n// Groups all versions with same base ID, sorts by content hash,\n// assigns sequential IDs. Returns mapping of old ID → new ID.\nfunc ResolveNWayCollisions(ctx context.Context, s *SQLiteStorage,\n collisions []*CollisionDetail, incoming []*types.Issue) (map[string]string, error) {\n \n if len(collisions) == 0 {\n return make(map[string]string), nil\n }\n \n // Group by base ID pattern (e.g., test-1, test-2 → base \"test-1\")\n groups := groupCollisionsByBaseID(collisions)\n \n idMapping := make(map[string]string)\n \n for baseID, versions := range groups {\n // 1. Collect all unique versions by content hash\n uniqueVersions := deduplicateVersionsByContentHash(versions)\n \n // 2. Sort by content hash (deterministic!)\n sort.Slice(uniqueVersions, func(i, j int) bool {\n return uniqueVersions[i].ContentHash \u003c uniqueVersions[j].ContentHash\n })\n \n // 3. Assign sequential IDs based on sorted order\n prefix := extractPrefix(baseID)\n baseNum := extractNumber(baseID)\n \n for i, version := range uniqueVersions {\n targetID := fmt.Sprintf(\"%s-%d\", prefix, baseNum+i)\n \n // Map this version to its deterministic ID\n if version.ID != targetID {\n idMapping[version.ID] = targetID\n }\n }\n }\n \n return idMapping, nil\n}\n```\n\n### 2. Implement helper functions\n\n```go\n// groupCollisionsByBaseID groups collisions by their logical base ID\nfunc groupCollisionsByBaseID(collisions []*CollisionDetail) map[string][]*types.Issue {\n groups := make(map[string][]*types.Issue)\n for _, c := range collisions {\n baseID := c.ID // All share same ID (that's why they collide)\n groups[baseID] = append(groups[baseID], c.ExistingIssue, c.IncomingIssue)\n }\n return groups\n}\n\n// deduplicateVersionsByContentHash keeps one issue per unique content hash\nfunc deduplicateVersionsByContentHash(issues []*types.Issue) []*types.Issue {\n seen := make(map[string]*types.Issue)\n for _, issue := range issues {\n if _, found := seen[issue.ContentHash]; !found {\n seen[issue.ContentHash] = issue\n }\n }\n result := make([]*types.Issue, 0, len(seen))\n for _, issue := range seen {\n result = append(result, issue)\n }\n return result\n}\n```\n\n### 3. Update handleCollisions in importer\nFile: internal/importer/importer.go\n\nReplace ScoreCollisions call with:\n```go\n// OLD:\nif err := sqlite.ScoreCollisions(ctx, sqliteStore, collisionResult.Collisions, allExistingIssues); err != nil {\n return nil, fmt.Errorf(\"failed to score collisions: %w\", err)\n}\n\n// NEW:\nidMapping, err := sqlite.ResolveNWayCollisions(ctx, sqliteStore, \n collisionResult.Collisions, issues)\nif err != nil {\n return nil, fmt.Errorf(\"failed to resolve collisions: %w\", err)\n}\n```\n\n### 4. Update RemapCollisions\nRemapCollisions currently uses collision.RemapIncoming field. Update to use idMapping directly:\n- Remove RemapIncoming logic\n- Use idMapping to determine what to remap\n- Simplify to just apply the computed mapping\n\n### 5. Add comprehensive tests\n\nTest cases:\n1. 3-way collision with different content → 3 sequential IDs\n2. 3-way collision with 2 identical content → 2 IDs (dedupe works)\n3. Sync order independence (A→B→C vs C→A→B produce same result)\n4. Content hash ordering is respected\n5. Works with 5+ clones\n\n## Acceptance Criteria\n- ResolveNWayCollisions implemented and replaces ScoreCollisions\n- Groups all versions of same ID together\n- Deduplicates by content hash\n- Sorts by content hash deterministically\n- Assigns sequential IDs starting from base ID\n- Returns complete mapping (old ID → new ID)\n- All clones converge to same ID assignments\n- Works for arbitrary N-way collisions\n- TestThreeCloneCollision passes (or gets much closer)\n\n## Files to Modify\n- internal/storage/sqlite/collision.go (new function, helpers)\n- internal/importer/importer.go (call new function)\n- internal/storage/sqlite/collision_test.go (comprehensive tests)\n\n## Testing Strategy\n\n### Unit Tests\n- groupCollisionsByBaseID correctly groups\n- deduplicateVersionsByContentHash removes duplicates\n- Sorting by hash is stable and deterministic\n- Sequential ID assignment is correct\n\n### Integration Tests\n- 3-way collision resolves to 3 issues\n- Sync order doesn't affect final IDs\n- Content hash ordering determines winner\n\n### Property Tests\n- For any N clones with same content, all converge to same IDs\n- Idempotent: running resolution twice produces same result\n\n## Dependencies\n- Requires bd-95 (ContentHash field) to be completed first\n- Requires bd-96 (read-only detection) for clean integration\n\n## Notes\nThis is the core algorithm that enables convergence. The key insight:\n**Sort by content hash globally, not pairwise comparison.**","status":"open","priority":1,"issue_type":"task","created_at":"2025-10-28T18:37:42.85616-07:00","updated_at":"2025-10-28T18:37:42.85616-07:00","dependencies":[{"issue_id":"bd-97","depends_on_id":"bd-94","type":"parent-child","created_at":"2025-10-28T18:39:20.593102-07:00","created_by":"daemon"},{"issue_id":"bd-97","depends_on_id":"bd-95","type":"blocks","created_at":"2025-10-28T18:39:28.30886-07:00","created_by":"daemon"},{"issue_id":"bd-97","depends_on_id":"bd-96","type":"blocks","created_at":"2025-10-28T18:39:28.336312-07:00","created_by":"daemon"}]}