- Batch fetch all existing issues with SearchIssues() upfront - Use O(1) map lookup instead of O(n) GetIssue() calls - Improves performance dramatically with 1000+ issues - All tests pass
178 lines
6.3 KiB
Markdown
178 lines
6.3 KiB
Markdown
# Code Review: Auto-Import Collision Detection (bd-379)
|
|
|
|
## Executive Summary
|
|
|
|
The auto-import collision detection implementation is **functionally working but has several correctness and robustness issues** that should be addressed. Rating: **3.5/5** - Works in happy path but vulnerable to edge cases.
|
|
|
|
## Critical Issues (P0-P1)
|
|
|
|
### 1. **Metadata Error Handling is Too Conservative** (P0)
|
|
**Current behavior:** If `GetMetadata()` fails, auto-import is skipped entirely.
|
|
|
|
**Problem:** This means if metadata is corrupted or missing, auto-import stops forever until manually fixed.
|
|
|
|
**Fix:**
|
|
```go
|
|
lastHash, err := store.GetMetadata(ctx, "last_import_hash")
|
|
if err != nil {
|
|
if os.Getenv("BD_DEBUG") != "" {
|
|
fmt.Fprintf(os.Stderr, "Debug: metadata read failed (%v), assuming first import\n", err)
|
|
}
|
|
lastHash = "" // Treat as first import
|
|
}
|
|
```
|
|
|
|
### 2. **Hash Not Updated on Partial Success** (P0)
|
|
**Problem:** If import succeeds but we fail to update `last_import_hash`, auto-import will retry the same import forever.
|
|
|
|
**Current behavior:** Hash update happens at end (line ~404) but not error-checked.
|
|
|
|
**Fix:** Track import success/failure state and only update hash on full success:
|
|
```go
|
|
// After all imports complete successfully
|
|
if err := store.SetMetadata(ctx, "last_import_hash", currentHash); err != nil {
|
|
fmt.Fprintf(os.Stderr, "Warning: failed to update import hash: %v\n", err)
|
|
fmt.Fprintf(os.Stderr, "Next auto-import may re-import these issues.\n")
|
|
}
|
|
```
|
|
|
|
### 3. **No Transaction for Multi-Issue Import** (P1)
|
|
**Problem:** If import fails midway, database is left in inconsistent state.
|
|
|
|
**Current behavior:** Each issue is imported separately (lines 346-401).
|
|
|
|
**Fix:** Wrap entire import in a transaction or use batch operations.
|
|
|
|
### 4. **N+1 Query Pattern** (P1)
|
|
**Problem:** Line 347: `store.GetIssue(ctx, issue.ID)` is called for every issue = O(n) queries.
|
|
|
|
**Impact:** With 1000+ issues, this is slow and hammers the database.
|
|
|
|
**Fix:** Batch fetch all existing IDs upfront:
|
|
```go
|
|
existingIDs := make(map[string]*types.Issue)
|
|
allExisting, err := store.SearchIssues(ctx, "", types.IssueFilter{})
|
|
for _, issue := range allExisting {
|
|
existingIDs[issue.ID] = issue
|
|
}
|
|
```
|
|
|
|
## Medium Issues (P2)
|
|
|
|
### 5. **Scanner Uses String Conversion** (P2)
|
|
**Line 233:** `strings.NewReader(string(jsonlData))`
|
|
|
|
**Problem:** Unnecessarily converts bytes to string, wastes memory.
|
|
|
|
**Fix:** `bytes.NewReader(jsonlData)`
|
|
|
|
### 6. **Verbose Output on Every Auto-Import** (P2)
|
|
**Current:** Prints remapping summary to stderr on every collision (lines 309-329).
|
|
|
|
**Problem:** For frequent auto-imports with collisions, this gets noisy.
|
|
|
|
**Fix:** Gate detailed output behind `BD_DEBUG`, show 1-line summary by default:
|
|
```go
|
|
if os.Getenv("BD_DEBUG") != "" {
|
|
// Detailed output
|
|
} else {
|
|
fmt.Fprintf(os.Stderr, "Auto-import: %d parsed, %d remapped due to collisions\n", len(allIssues), numRemapped)
|
|
}
|
|
```
|
|
|
|
### 7. **No Collision Metrics/Telemetry** (P2)
|
|
**Problem:** No way to track how often collisions occur or if they're increasing.
|
|
|
|
**Fix:** Add metadata counters:
|
|
- `collision_count_total`
|
|
- `last_collision_timestamp`
|
|
- `auto_import_runs_total`
|
|
|
|
### 8. **"All Collisions" Case Not Optimized** (P2)
|
|
**Problem:** If every issue collides (e.g., pulling unchanged state), we still process everything.
|
|
|
|
**Fix:** If `len(filteredIssues) == 0` and `len(collisionResult.NewIssues) == 0`, it's a no-op - just update hash and return.
|
|
|
|
## Low Priority Issues (P3)
|
|
|
|
### 9. **No Configurable Collision Mode** (P3)
|
|
Some users may prefer auto-import to **fail** on collisions rather than auto-resolve.
|
|
|
|
**Suggestion:** Add `BD_AUTO_IMPORT_MODE=remap|fail` environment variable.
|
|
|
|
### 10. **No Collision Threshold** (P3)
|
|
If 90% of issues collide, something is probably wrong (bad merge).
|
|
|
|
**Suggestion:** Add `BD_AUTO_IMPORT_COLLISION_THRESHOLD` - if exceeded, fail with clear error.
|
|
|
|
## Testing Gaps
|
|
|
|
Missing test coverage for:
|
|
1. ✅ Metadata read failure → should treat as first import
|
|
2. ✅ Hash update failure → should warn but not crash
|
|
3. ✅ All issues collide → should be no-op
|
|
4. ✅ Scanner buffer overflow (>2MB line) → should error gracefully
|
|
5. ✅ Concurrent auto-imports (race condition testing)
|
|
6. ✅ Transaction rollback on mid-import failure
|
|
7. ✅ 1000+ issue performance test
|
|
|
|
## Answers to Review Questions
|
|
|
|
### Q1: Should auto-import be more aggressive (auto-resolve) or conservative (fail)?
|
|
|
|
**Recommendation:** Keep auto-resolve as default but add:
|
|
- Collision threshold that switches to fail mode if exceeded
|
|
- Config option for users who prefer fail-fast behavior
|
|
- Clear messaging about what was remapped
|
|
|
|
### Q2: Should we add a counter for collision occurrences?
|
|
|
|
**Yes.** Add metadata:
|
|
- `collision_count_total` (cumulative)
|
|
- `last_collision_count` (last run)
|
|
- `last_collision_timestamp`
|
|
|
|
### Q3: Should there be a config option to disable collision detection?
|
|
|
|
**No.** Collision detection is a safety feature. Instead provide:
|
|
- `BD_AUTO_IMPORT_MODE=remap|fail` to control behavior
|
|
- `--no-auto-import` flag already exists to disable entirely
|
|
|
|
### Q4: Is the warning too verbose for typical workflows?
|
|
|
|
**Yes.** The 10-line summary on every auto-import is noisy. Gate behind `BD_DEBUG`.
|
|
|
|
## Recommended Fixes Priority
|
|
|
|
**P0 (Critical - Fix ASAP):**
|
|
- [ ] bd-TBD: Fix metadata error handling (treat as first import)
|
|
- [ ] bd-TBD: Ensure hash update happens and is error-checked
|
|
- [ ] bd-TBD: Fix N+1 query pattern with batch fetch
|
|
|
|
**P1 (High - Fix Before 1.0):**
|
|
- [ ] bd-TBD: Wrap import in transaction for atomicity
|
|
- [ ] bd-TBD: Add test coverage for edge cases
|
|
- [ ] bd-TBD: Optimize "all collisions" case
|
|
|
|
**P2 (Medium - Nice to Have):**
|
|
- [ ] bd-TBD: Reduce output verbosity (gate behind BD_DEBUG)
|
|
- [ ] bd-TBD: Use bytes.NewReader instead of string conversion
|
|
- [ ] bd-TBD: Add collision metrics/telemetry
|
|
|
|
**P3 (Low - Future Enhancement):**
|
|
- [ ] bd-TBD: Add BD_AUTO_IMPORT_MODE config
|
|
- [ ] bd-TBD: Add collision threshold safety rail
|
|
|
|
## Conclusion
|
|
|
|
The implementation **works for the happy path** but has **robustness issues** around error handling, performance, and edge cases. The auto-resolve approach is good, but needs better error handling and performance optimization.
|
|
|
|
**Estimated effort to fix P0-P1 issues:** 2-3 days
|
|
**Risk level if not fixed:** Medium-High (data loss possible on edge cases, poor performance at scale)
|
|
|
|
---
|
|
|
|
**Review completed:** 2025-10-16
|
|
**Reviewer:** Oracle (via Amp)
|
|
**Issue:** bd-379
|