Document N-way collision convergence problem for investigation

This commit is contained in:
Steve Yegge
2025-10-28 18:24:00 -07:00
parent aab5be6fb1
commit f384fb239e

View File

@@ -0,0 +1,95 @@
# N-Way Collision Convergence Problem
## Summary
The current collision resolution implementation (`--resolve-collisions`) works correctly for 2-way collisions but **does not converge** for 3-way (and by extension N-way) collisions. This is a critical limitation for parallel worker scenarios where multiple agents file issues simultaneously.
## Test Evidence
`TestThreeCloneCollision` in `beads_twoclone_test.go` demonstrates the problem with 3 clones creating the same issue ID (`test-1`) with different content.
### Observed Behavior
**Sync Order A→B→C:**
- Clone A: 0 issues (empty database after final pull)
- Clone B: 2 issues (missing "Issue from clone C")
- Clone C: 3 issues (has all issues)
**Sync Order C→A→B:**
- Clone A: 2 issues (missing "Issue from clone B")
- Clone B: 3 issues (has all issues)
- Clone C: 0 issues (empty database after final pull)
**Pattern:** The middle clone in the sync order gets all issues, but the first and last clones end up with incomplete data. This behavior is **100% reproducible** across all test runs.
## Root Cause Analysis
When the third clone pulls and resolves collisions:
1. It correctly remaps its conflicting issue to a new ID (e.g., `test-1``test-3`)
2. It imports the issues from the other two clones
3. It pushes the merged state
However, when the first clone pulls this merged state:
1. The import sees new issues that collide with its local database
2. The resolution logic doesn't properly handle issues that were already remapped upstream
3. The database ends up in an inconsistent state (often empty or partially populated)
## Why This Matters
This prevents reliable N-way parallel worker scenarios:
- Multiple AI agents filing issues simultaneously
- Distributed teams working on different clones
- CI/CD systems creating issues in parallel builds
**Current workaround:** Only works reliably with 2 workers or sequential issue creation.
## What Needs To Be Fixed
### 1. Import Logic Enhancement
The `--resolve-collisions` import needs to:
- Detect when incoming issues were already remapped upstream
- Preserve the remapping chain (track `test-1``test-2``test-3`)
- Not re-remap already-remapped issues
### 2. Convergence Algorithm
Implement a proper convergence algorithm that ensures:
- All clones eventually have the same complete set of issues
- Idempotent imports (importing the same JSONL multiple times is safe)
- Transitive collision resolution (if A remaps to B, and B exists, handle gracefully)
### 3. Test Requirements
The fix should make `TestThreeCloneCollision` pass without skipping:
- All three clones must have all three issues (by title)
- Content must match across all clones (ignoring timestamps and specific ID assignments)
- Must work for both sync orders (A→B→C and C→A→B)
### 4. Extend to N-Way
Once 3-way works, verify it generalizes to N workers:
- Test with 5+ clones
- Test with different sync order permutations
- Ensure convergence time is bounded
## Files To Examine
- **`beads_twoclone_test.go`**: Contains `TestThreeCloneCollision` that reproduces the issue
- **`cmd/bd/import.go`**: Import logic with `--resolve-collisions` flag
- **`internal/storage/sqlite/sqlite.go`**: Database operations for collision detection
- **`cmd/bd/sync.go`**: Sync workflow that calls import/export
## Success Criteria
1. `TestThreeCloneCollision` passes without skipping
2. All clones converge to identical content after final pull
3. No data loss (all issues present in all clones)
4. ID assignments can be non-deterministic, but content must match
5. Works for N workers (extend test to 5+ clones)
## Current Test Status
```bash
go test -v -run TestThreeCloneCollision
# Both subtests SKIP with message:
# "KNOWN LIMITATION: 3-way collisions may require additional resolution logic"
```
The test is designed to skip when convergence fails, so it won't break CI, but it documents the limitation clearly.