beads/REVIEW_BD379.md

# Code Review: Auto-Import Collision Detection (bd-379)

## Executive Summary

The auto-import collision detection implementation is **functionally working but has several correctness and robustness issues** that should be addressed. Rating: **3.5/5** - Works in happy path but vulnerable to edge cases.

## Critical Issues (P0-P1)

### 1. **Metadata Error Handling is Too Conservative** (P0)
**Current behavior:** If `GetMetadata()` fails, auto-import is skipped entirely.

**Problem:** This means if metadata is corrupted or missing, auto-import stops forever until manually fixed.

**Fix:**
```go
lastHash, err := store.GetMetadata(ctx, "last_import_hash")
if err != nil {
    if os.Getenv("BD_DEBUG") != "" {
        fmt.Fprintf(os.Stderr, "Debug: metadata read failed (%v), assuming first import\n", err)
    }
    lastHash = "" // Treat as first import
}
```

### 2. **Hash Not Updated on Partial Success** (P0)
**Problem:** If import succeeds but we fail to update `last_import_hash`, auto-import will retry the same import forever.

**Current behavior:** Hash update happens at end (line ~404) but not error-checked.

**Fix:** Track import success/failure state and only update hash on full success:
```go
// After all imports complete successfully
if err := store.SetMetadata(ctx, "last_import_hash", currentHash); err != nil {
    fmt.Fprintf(os.Stderr, "Warning: failed to update import hash: %v\n", err)
    fmt.Fprintf(os.Stderr, "Next auto-import may re-import these issues.\n")
}
```

### 3. **No Transaction for Multi-Issue Import** (P1)
**Problem:** If import fails midway, database is left in inconsistent state.

**Current behavior:** Each issue is imported separately (lines 346-401).

**Fix:** Wrap entire import in a transaction or use batch operations.

### 4. **N+1 Query Pattern** (P1)
**Problem:** Line 347: `store.GetIssue(ctx, issue.ID)` is called for every issue = O(n) queries.

**Impact:** With 1000+ issues, this is slow and hammers the database.

**Fix:** Batch fetch all existing IDs upfront:
```go
existingIDs := make(map[string]*types.Issue)
allExisting, err := store.SearchIssues(ctx, "", types.IssueFilter{})
for _, issue := range allExisting {
    existingIDs[issue.ID] = issue
}
```

## Medium Issues (P2)

### 5. **Scanner Uses String Conversion** (P2)
**Line 233:** `strings.NewReader(string(jsonlData))`

**Problem:** Unnecessarily converts bytes to string, wastes memory.

**Fix:** `bytes.NewReader(jsonlData)`

### 6. **Verbose Output on Every Auto-Import** (P2)
**Current:** Prints remapping summary to stderr on every collision (lines 309-329).

**Problem:** For frequent auto-imports with collisions, this gets noisy.

**Fix:** Gate detailed output behind `BD_DEBUG`, show 1-line summary by default:
```go
if os.Getenv("BD_DEBUG") != "" {
    // Detailed output
} else {
    fmt.Fprintf(os.Stderr, "Auto-import: %d parsed, %d remapped due to collisions\n", len(allIssues), numRemapped)
}
```

### 7. **No Collision Metrics/Telemetry** (P2)
**Problem:** No way to track how often collisions occur or if they're increasing.

**Fix:** Add metadata counters:
- `collision_count_total`
- `last_collision_timestamp`
- `auto_import_runs_total`

### 8. **"All Collisions" Case Not Optimized** (P2)
**Problem:** If every issue collides (e.g., pulling unchanged state), we still process everything.

**Fix:** If `len(filteredIssues) == 0` and `len(collisionResult.NewIssues) == 0`, it's a no-op - just update hash and return.

## Low Priority Issues (P3)

### 9. **No Configurable Collision Mode** (P3)
Some users may prefer auto-import to **fail** on collisions rather than auto-resolve.

**Suggestion:** Add `BD_AUTO_IMPORT_MODE=remap|fail` environment variable.

### 10. **No Collision Threshold** (P3)
If 90% of issues collide, something is probably wrong (bad merge).

**Suggestion:** Add `BD_AUTO_IMPORT_COLLISION_THRESHOLD` - if exceeded, fail with clear error.

## Testing Gaps

Missing test coverage for:
1. ✅ Metadata read failure → should treat as first import
2. ✅ Hash update failure → should warn but not crash
3. ✅ All issues collide → should be no-op
4. ✅ Scanner buffer overflow (>2MB line) → should error gracefully
5. ✅ Concurrent auto-imports (race condition testing)
6. ✅ Transaction rollback on mid-import failure
7. ✅ 1000+ issue performance test

## Answers to Review Questions

### Q1: Should auto-import be more aggressive (auto-resolve) or conservative (fail)?

**Recommendation:** Keep auto-resolve as default but add:
- Collision threshold that switches to fail mode if exceeded
- Config option for users who prefer fail-fast behavior
- Clear messaging about what was remapped

### Q2: Should we add a counter for collision occurrences?

**Yes.** Add metadata:
- `collision_count_total` (cumulative)
- `last_collision_count` (last run)
- `last_collision_timestamp`

### Q3: Should there be a config option to disable collision detection?

**No.** Collision detection is a safety feature. Instead provide:
- `BD_AUTO_IMPORT_MODE=remap|fail` to control behavior
- `--no-auto-import` flag already exists to disable entirely

### Q4: Is the warning too verbose for typical workflows?

**Yes.** The 10-line summary on every auto-import is noisy. Gate behind `BD_DEBUG`.

## Recommended Fixes Priority

**P0 (Critical - Fix ASAP):**
- [ ] bd-TBD: Fix metadata error handling (treat as first import)
- [ ] bd-TBD: Ensure hash update happens and is error-checked
- [ ] bd-TBD: Fix N+1 query pattern with batch fetch

**P1 (High - Fix Before 1.0):**
- [ ] bd-TBD: Wrap import in transaction for atomicity
- [ ] bd-TBD: Add test coverage for edge cases
- [ ] bd-TBD: Optimize "all collisions" case

**P2 (Medium - Nice to Have):**
- [ ] bd-TBD: Reduce output verbosity (gate behind BD_DEBUG)
- [ ] bd-TBD: Use bytes.NewReader instead of string conversion
- [ ] bd-TBD: Add collision metrics/telemetry

**P3 (Low - Future Enhancement):**
- [ ] bd-TBD: Add BD_AUTO_IMPORT_MODE config
- [ ] bd-TBD: Add collision threshold safety rail

## Conclusion

The implementation **works for the happy path** but has **robustness issues** around error handling, performance, and edge cases. The auto-resolve approach is good, but needs better error handling and performance optimization.

**Estimated effort to fix P0-P1 issues:** 2-3 days
**Risk level if not fixed:** Medium-High (data loss possible on edge cases, poor performance at scale)

---

**Review completed:** 2025-10-16
**Reviewer:** Oracle (via Amp)
**Issue:** bd-379