96 lines
3.8 KiB
Markdown
96 lines
3.8 KiB
Markdown
# N-Way Collision Convergence Problem
|
|
|
|
## Summary
|
|
|
|
The current collision resolution implementation (`--resolve-collisions`) works correctly for 2-way collisions but **does not converge** for 3-way (and by extension N-way) collisions. This is a critical limitation for parallel worker scenarios where multiple agents file issues simultaneously.
|
|
|
|
## Test Evidence
|
|
|
|
`TestThreeCloneCollision` in `beads_twoclone_test.go` demonstrates the problem with 3 clones creating the same issue ID (`test-1`) with different content.
|
|
|
|
### Observed Behavior
|
|
|
|
**Sync Order A→B→C:**
|
|
- Clone A: 0 issues (empty database after final pull)
|
|
- Clone B: 2 issues (missing "Issue from clone C")
|
|
- Clone C: 3 issues (has all issues)
|
|
|
|
**Sync Order C→A→B:**
|
|
- Clone A: 2 issues (missing "Issue from clone B")
|
|
- Clone B: 3 issues (has all issues)
|
|
- Clone C: 0 issues (empty database after final pull)
|
|
|
|
**Pattern:** The middle clone in the sync order gets all issues, but the first and last clones end up with incomplete data. This behavior is **100% reproducible** across all test runs.
|
|
|
|
## Root Cause Analysis
|
|
|
|
When the third clone pulls and resolves collisions:
|
|
1. It correctly remaps its conflicting issue to a new ID (e.g., `test-1` → `test-3`)
|
|
2. It imports the issues from the other two clones
|
|
3. It pushes the merged state
|
|
|
|
However, when the first clone pulls this merged state:
|
|
1. The import sees new issues that collide with its local database
|
|
2. The resolution logic doesn't properly handle issues that were already remapped upstream
|
|
3. The database ends up in an inconsistent state (often empty or partially populated)
|
|
|
|
## Why This Matters
|
|
|
|
This prevents reliable N-way parallel worker scenarios:
|
|
- Multiple AI agents filing issues simultaneously
|
|
- Distributed teams working on different clones
|
|
- CI/CD systems creating issues in parallel builds
|
|
|
|
**Current workaround:** Only works reliably with 2 workers or sequential issue creation.
|
|
|
|
## What Needs To Be Fixed
|
|
|
|
### 1. Import Logic Enhancement
|
|
The `--resolve-collisions` import needs to:
|
|
- Detect when incoming issues were already remapped upstream
|
|
- Preserve the remapping chain (track `test-1` → `test-2` → `test-3`)
|
|
- Not re-remap already-remapped issues
|
|
|
|
### 2. Convergence Algorithm
|
|
Implement a proper convergence algorithm that ensures:
|
|
- All clones eventually have the same complete set of issues
|
|
- Idempotent imports (importing the same JSONL multiple times is safe)
|
|
- Transitive collision resolution (if A remaps to B, and B exists, handle gracefully)
|
|
|
|
### 3. Test Requirements
|
|
The fix should make `TestThreeCloneCollision` pass without skipping:
|
|
- All three clones must have all three issues (by title)
|
|
- Content must match across all clones (ignoring timestamps and specific ID assignments)
|
|
- Must work for both sync orders (A→B→C and C→A→B)
|
|
|
|
### 4. Extend to N-Way
|
|
Once 3-way works, verify it generalizes to N workers:
|
|
- Test with 5+ clones
|
|
- Test with different sync order permutations
|
|
- Ensure convergence time is bounded
|
|
|
|
## Files To Examine
|
|
|
|
- **`beads_twoclone_test.go`**: Contains `TestThreeCloneCollision` that reproduces the issue
|
|
- **`cmd/bd/import.go`**: Import logic with `--resolve-collisions` flag
|
|
- **`internal/storage/sqlite/sqlite.go`**: Database operations for collision detection
|
|
- **`cmd/bd/sync.go`**: Sync workflow that calls import/export
|
|
|
|
## Success Criteria
|
|
|
|
1. `TestThreeCloneCollision` passes without skipping
|
|
2. All clones converge to identical content after final pull
|
|
3. No data loss (all issues present in all clones)
|
|
4. ID assignments can be non-deterministic, but content must match
|
|
5. Works for N workers (extend test to 5+ clones)
|
|
|
|
## Current Test Status
|
|
|
|
```bash
|
|
go test -v -run TestThreeCloneCollision
|
|
# Both subtests SKIP with message:
|
|
# "KNOWN LIMITATION: 3-way collisions may require additional resolution logic"
|
|
```
|
|
|
|
The test is designed to skip when convergence fails, so it won't break CI, but it documents the limitation clearly.
|