Document N-way collision convergence problem for investigation
This commit is contained in:
95
n-way-collision-convergence.md
Normal file
95
n-way-collision-convergence.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# N-Way Collision Convergence Problem
|
||||
|
||||
## Summary
|
||||
|
||||
The current collision resolution implementation (`--resolve-collisions`) works correctly for 2-way collisions but **does not converge** for 3-way (and by extension N-way) collisions. This is a critical limitation for parallel worker scenarios where multiple agents file issues simultaneously.
|
||||
|
||||
## Test Evidence
|
||||
|
||||
`TestThreeCloneCollision` in `beads_twoclone_test.go` demonstrates the problem with 3 clones creating the same issue ID (`test-1`) with different content.
|
||||
|
||||
### Observed Behavior
|
||||
|
||||
**Sync Order A→B→C:**
|
||||
- Clone A: 0 issues (empty database after final pull)
|
||||
- Clone B: 2 issues (missing "Issue from clone C")
|
||||
- Clone C: 3 issues (has all issues)
|
||||
|
||||
**Sync Order C→A→B:**
|
||||
- Clone A: 2 issues (missing "Issue from clone B")
|
||||
- Clone B: 3 issues (has all issues)
|
||||
- Clone C: 0 issues (empty database after final pull)
|
||||
|
||||
**Pattern:** The middle clone in the sync order gets all issues, but the first and last clones end up with incomplete data. This behavior is **100% reproducible** across all test runs.
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
When the third clone pulls and resolves collisions:
|
||||
1. It correctly remaps its conflicting issue to a new ID (e.g., `test-1` → `test-3`)
|
||||
2. It imports the issues from the other two clones
|
||||
3. It pushes the merged state
|
||||
|
||||
However, when the first clone pulls this merged state:
|
||||
1. The import sees new issues that collide with its local database
|
||||
2. The resolution logic doesn't properly handle issues that were already remapped upstream
|
||||
3. The database ends up in an inconsistent state (often empty or partially populated)
|
||||
|
||||
## Why This Matters
|
||||
|
||||
This prevents reliable N-way parallel worker scenarios:
|
||||
- Multiple AI agents filing issues simultaneously
|
||||
- Distributed teams working on different clones
|
||||
- CI/CD systems creating issues in parallel builds
|
||||
|
||||
**Current workaround:** Only works reliably with 2 workers or sequential issue creation.
|
||||
|
||||
## What Needs To Be Fixed
|
||||
|
||||
### 1. Import Logic Enhancement
|
||||
The `--resolve-collisions` import needs to:
|
||||
- Detect when incoming issues were already remapped upstream
|
||||
- Preserve the remapping chain (track `test-1` → `test-2` → `test-3`)
|
||||
- Not re-remap already-remapped issues
|
||||
|
||||
### 2. Convergence Algorithm
|
||||
Implement a proper convergence algorithm that ensures:
|
||||
- All clones eventually have the same complete set of issues
|
||||
- Idempotent imports (importing the same JSONL multiple times is safe)
|
||||
- Transitive collision resolution (if A remaps to B, and B exists, handle gracefully)
|
||||
|
||||
### 3. Test Requirements
|
||||
The fix should make `TestThreeCloneCollision` pass without skipping:
|
||||
- All three clones must have all three issues (by title)
|
||||
- Content must match across all clones (ignoring timestamps and specific ID assignments)
|
||||
- Must work for both sync orders (A→B→C and C→A→B)
|
||||
|
||||
### 4. Extend to N-Way
|
||||
Once 3-way works, verify it generalizes to N workers:
|
||||
- Test with 5+ clones
|
||||
- Test with different sync order permutations
|
||||
- Ensure convergence time is bounded
|
||||
|
||||
## Files To Examine
|
||||
|
||||
- **`beads_twoclone_test.go`**: Contains `TestThreeCloneCollision` that reproduces the issue
|
||||
- **`cmd/bd/import.go`**: Import logic with `--resolve-collisions` flag
|
||||
- **`internal/storage/sqlite/sqlite.go`**: Database operations for collision detection
|
||||
- **`cmd/bd/sync.go`**: Sync workflow that calls import/export
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. `TestThreeCloneCollision` passes without skipping
|
||||
2. All clones converge to identical content after final pull
|
||||
3. No data loss (all issues present in all clones)
|
||||
4. ID assignments can be non-deterministic, but content must match
|
||||
5. Works for N workers (extend test to 5+ clones)
|
||||
|
||||
## Current Test Status
|
||||
|
||||
```bash
|
||||
go test -v -run TestThreeCloneCollision
|
||||
# Both subtests SKIP with message:
|
||||
# "KNOWN LIMITATION: 3-way collisions may require additional resolution logic"
|
||||
```
|
||||
|
||||
The test is designed to skip when convergence fails, so it won't break CI, but it documents the limitation clearly.
|
||||
Reference in New Issue
Block a user