beads/.beads/bd-9-design.md

# BD-9: Collision Resolution Design Document

**Status**: In progress, design complete, ready for implementation
**Date**: 2025-10-12
**Issue**: bd-9 - Build collision resolution tooling for distributed branch workflows

## Problem Statement

When branches diverge and both create issues, auto-incrementing IDs collide on merge:
- Branch A creates bd-10, bd-11, bd-12
- Branch B (diverged) creates bd-10, bd-11, bd-12 (different issues!)
- On merge: 6 issues, but 3 duplicate IDs
- References to "bd-10" in descriptions/dependencies are now ambiguous

## Design Goals

1. **Preserve brevity** - Keep bd-302 format, not bd-302-branch-a-uuid-mess
2. **Minimize disruption** - Renumber issues with fewer references
3. **Update all references** - Text fields AND dependency table
4. **Atomic operation** - All or nothing
5. **Clear feedback** - User must understand what changed

## Algorithm Design

### Phase 1: Collision Detection

```
Input: JSONL issues + current DB state
Output: Set of colliding issues

for each issue in JSONL:
  if DB contains issue.ID:
    if DB issue == JSONL issue:
      skip (already imported, idempotent)
    else:
      mark as COLLISION
```

### Phase 2: Reference Counting (The Smart Part)

Renumber issues with FEWER references first because:
- If bd-10 is referenced 20 times and bd-11 once
- Renumbering bd-11→bd-15 updates 1 reference
- Renumbering bd-10→bd-15 updates 20 references

```
for each colliding_issue:
  score = 0

  // Count text references in OTHER issues
  for each other_issue in JSONL:
    score += count_mentions(other_issue.all_text, colliding_issue.ID)

  // Count dependency references
  deps = DB.get_dependents(colliding_issue.ID)  // who depends on me?
  score += len(deps)

  // Store score
  collision_scores[colliding_issue.ID] = score

// Sort ascending: lowest score = fewest references = renumber first
sorted_collisions = sort_by(collision_scores)
```

### Phase 3: ID Allocation

```
id_mapping = {}  // old_id -> new_id
next_num = DB.get_next_id_number()

for collision in sorted_collisions:
  // Find next available ID
  while true:
    candidate = f"{prefix}-{next_num}"
    if not DB.exists(candidate) and candidate not in id_mapping.values():
      id_mapping[collision.ID] = candidate
      next_num++
      break
    next_num++
```

### Phase 4: Reference Updates

This is the trickiest part - must update:
1. Issue IDs themselves
2. Text field references (description, design, notes, acceptance_criteria)
3. Dependency records (when they reference old IDs)

```
updated_issues = []
reference_update_count = 0

for issue in JSONL:
  new_issue = clone(issue)

  // 1. Update own ID if it collided
  if issue.ID in id_mapping:
    new_issue.ID = id_mapping[issue.ID]

  // 2. Update text field references
  for old_id, new_id in id_mapping:
    for field in [title, description, design, notes, acceptance_criteria]:
      if field:
        pattern = r'\b' + re.escape(old_id) + r'\b'
        new_text, count = re.subn(pattern, new_id, field)
        field = new_text
        reference_update_count += count

  updated_issues.append(new_issue)
```

### Phase 5: Dependency Handling

**Approach A: Export dependencies in JSONL** (PREFERRED)
- Extend export to include `"dependencies": [{...}]` per issue
- Import dependencies along with issues
- Update dependency records during phase 4

Why preferred:
- Self-contained JSONL (better for git workflow)
- Easier to reason about
- Can detect cross-file dependencies

### Phase 6: Atomic Import

```
transaction:
  for issue in updated_issues:
    if issue.ID was remapped:
      DB.create_issue(issue)
    else:
      DB.upsert_issue(issue)

  // Update dependency table
  for issue in updated_issues:
    for dep in issue.dependencies:
      // dep IDs already updated in phase 4
      DB.create_or_update_dependency(dep)

  commit
```

### Phase 7: User Reporting

```
report = {
  collisions_detected: N,
  remappings: [
    "bd-10 → bd-15 (Score: 3 references)",
    "bd-11 → bd-16 (Score: 15 references)",
  ],
  text_updates: M,
  dependency_updates: K,
}
```

## Edge Cases

1. **Chain dependencies**: bd-10 depends on bd-11, both collide
   - Sorted renumbering handles this naturally
   - Lower-referenced one renumbered first

2. **Circular dependencies**: Shouldn't happen (DB has cycle detection)

3. **Partial ID matches**: "bd-1" shouldn't match "bd-10"
   - Use word boundary regex: `\bbd-10\b`

4. **Case sensitivity**: IDs are case-sensitive (bd-10 ≠ BD-10)

5. **IDs in code blocks**: Will be replaced
   - Could add `--preserve-code-blocks` flag later

6. **Triple merges**: Branch A, B, C all create bd-10
   - Algorithm handles N collisions

7. **Dependencies pointing to DB-only issues**:
   - JSONL issue depends on bd-999 (only in DB)
   - No collision, works fine

## Performance Considerations

- O(N*M) for reference counting (N issues × M collisions)
- For 1000 issues, 10 collisions: 10,000 text scans
- Acceptable for typical use (hundreds of issues)
- Could optimize with index/trie if needed

## API Design

```bash
# Default: fail on collision (safe)
bd import -i issues.jsonl
# Error: Collision detected: bd-10 already exists

# With auto-resolution
bd import -i issues.jsonl --resolve-collisions
# Resolved 3 collisions:
#   bd-10 → bd-15 (3 refs)
#   bd-11 → bd-16 (1 ref)
#   bd-12 → bd-17 (7 refs)
# Imported 45 issues, updated 23 references

# Dry run (preview changes)
bd import -i issues.jsonl --resolve-collisions --dry-run
```

## Implementation Breakdown

### Child Issues to Create

1. **bd-10**: Extend export to include dependencies in JSONL
   - Modify export.go to include dependencies array
   - Format: `{"id":"bd-10","dependencies":[{"depends_on_id":"bd-5","type":"blocks"}]}`
   - Priority: 1, Type: task

2. **bd-11**: Implement collision detection in import
   - Create collision.go with detectCollisions() function
   - Compare incoming JSONL against DB state
   - Distinguish: exact match (skip), collision (flag), new (create)
   - Priority: 1, Type: task

3. **bd-12**: Implement reference scoring algorithm
   - Count text mentions + dependency references
   - Sort collisions by score ascending (fewest refs first)
   - Minimize total updates during renumbering
   - Priority: 1, Type: task

4. **bd-13**: Implement ID remapping with reference updates
   - Allocate new IDs for colliding issues
   - Update text field references with word-boundary regex
   - Update dependency records
   - Build id_mapping for reporting
   - Priority: 1, Type: task

5. **bd-14**: Add --resolve-collisions flag and user reporting
   - Add import flags: --resolve-collisions, --dry-run
   - Display clear report with remappings and counts
   - Default: fail on collision (safe)
   - Priority: 1, Type: task

6. **bd-15**: Write comprehensive collision resolution tests
   - Test cases: simple/multiple collisions, dependencies, text refs
   - Edge cases: partial matches, case sensitivity, triple merges
   - Add to import_test.go and collision_test.go
   - Priority: 1, Type: task

7. **bd-16**: Update documentation for collision resolution
   - Update README.md with collision resolution section
   - Update CLAUDE.md with new workflow
   - Document flags and example scenarios
   - Priority: 1, Type: task

### Additional Issue: Add Design Field Support

**NEW ISSUE**: Add design field to bd update command
- Currently: `bd update` doesn't support --design flag (discovered during work)
- Need: Allow updating design, notes, acceptance_criteria fields
- This would make bd-9's design easier to attach to the issue itself
- Priority: 2, Type: feature

## Current State

- bd-9 is in_progress (claimed)
- bd-10 was successfully created (first child issue)
- bd-11+ creation failed with UNIQUE constraint (collision!)
  - This demonstrates the exact problem we're solving
  - Need to manually create remaining issues with different IDs
  - Or implement collision resolution first! (chicken/egg)

## Data Structures Involved

- **Issues table**: id, title, description, design, notes, acceptance_criteria, status, priority, issue_type, assignee, estimated_minutes, created_at, updated_at, closed_at
- **Dependencies table**: issue_id, depends_on_id, type, created_at, created_by
- **Text fields with ID references**: description, design, notes, acceptance_criteria (title too?)

## Files to Modify

1. `cmd/bd/export.go` - Add dependency export
2. `cmd/bd/import.go` - Call collision resolution
3. `cmd/bd/collision.go` - NEW FILE - Core algorithm
4. `cmd/bd/collision_test.go` - NEW FILE - Tests
5. `internal/types/types.go` - May need collision report types
6. `README.md` - Documentation
7. `CLAUDE.md` - AI agent workflow docs

## Next Steps

1. ✅ Design complete
2. 🔄 Create child issues (bd-10 created, bd-11+ need different IDs)
3. ⏳ Implement Phase 1: Export enhancement
4. ⏳ Implement Phase 2-7: Core algorithm
5. ⏳ Tests
6. ⏳ Documentation
7. ⏳ Export issues to JSONL before committing

## Meta: Real Collision Encountered!

While creating child issues, we hit the exact problem:
- bd-10 was created successfully
- bd-11, bd-12, bd-13, bd-14, bd-15, bd-16 all failed with "UNIQUE constraint failed"
- This means the DB already has bd-11+ from a previous session/import
- Perfect demonstration of why we need collision resolution!

Resolution: Create remaining child issues manually with explicit IDs after checking what exists.