Files

Steve Yegge ef31d98b43 Fix bd-666: Replace N+1 query pattern in auto-import with batch fetch

- Batch fetch all existing issues with SearchIssues() upfront
- Use O(1) map lookup instead of O(n) GetIssue() calls
- Improves performance dramatically with 1000+ issues
- All tests pass

2025-10-16 19:14:17 -07:00

6.3 KiB

Raw Blame History

Code Review: Auto-Import Collision Detection (bd-379)

Executive Summary

The auto-import collision detection implementation is functionally working but has several correctness and robustness issues that should be addressed. Rating: 3.5/5 - Works in happy path but vulnerable to edge cases.

Critical Issues (P0-P1)

1. Metadata Error Handling is Too Conservative (P0)

Current behavior: If GetMetadata() fails, auto-import is skipped entirely.

Problem: This means if metadata is corrupted or missing, auto-import stops forever until manually fixed.

Fix:

lastHash, err := store.GetMetadata(ctx, "last_import_hash")
if err != nil {
    if os.Getenv("BD_DEBUG") != "" {
        fmt.Fprintf(os.Stderr, "Debug: metadata read failed (%v), assuming first import\n", err)
    }
    lastHash = "" // Treat as first import
}

2. Hash Not Updated on Partial Success (P0)

Problem: If import succeeds but we fail to update last_import_hash, auto-import will retry the same import forever.

Current behavior: Hash update happens at end (line ~404) but not error-checked.

Fix: Track import success/failure state and only update hash on full success:

// After all imports complete successfully
if err := store.SetMetadata(ctx, "last_import_hash", currentHash); err != nil {
    fmt.Fprintf(os.Stderr, "Warning: failed to update import hash: %v\n", err)
    fmt.Fprintf(os.Stderr, "Next auto-import may re-import these issues.\n")
}

3. No Transaction for Multi-Issue Import (P1)

Problem: If import fails midway, database is left in inconsistent state.

Current behavior: Each issue is imported separately (lines 346-401).

Fix: Wrap entire import in a transaction or use batch operations.

4. N+1 Query Pattern (P1)

Problem: Line 347: store.GetIssue(ctx, issue.ID) is called for every issue = O(n) queries.

Impact: With 1000+ issues, this is slow and hammers the database.

Fix: Batch fetch all existing IDs upfront:

existingIDs := make(map[string]*types.Issue)
allExisting, err := store.SearchIssues(ctx, "", types.IssueFilter{})
for _, issue := range allExisting {
    existingIDs[issue.ID] = issue
}

Medium Issues (P2)

5. Scanner Uses String Conversion (P2)

Line 233: strings.NewReader(string(jsonlData))

Problem: Unnecessarily converts bytes to string, wastes memory.

Fix: bytes.NewReader(jsonlData)

6. Verbose Output on Every Auto-Import (P2)

Current: Prints remapping summary to stderr on every collision (lines 309-329).

Problem: For frequent auto-imports with collisions, this gets noisy.

Fix: Gate detailed output behind BD_DEBUG, show 1-line summary by default:

if os.Getenv("BD_DEBUG") != "" {
    // Detailed output
} else {
    fmt.Fprintf(os.Stderr, "Auto-import: %d parsed, %d remapped due to collisions\n", len(allIssues), numRemapped)
}

7. No Collision Metrics/Telemetry (P2)

Problem: No way to track how often collisions occur or if they're increasing.

Fix: Add metadata counters:

collision_count_total
last_collision_timestamp
auto_import_runs_total

8. "All Collisions" Case Not Optimized (P2)

Problem: If every issue collides (e.g., pulling unchanged state), we still process everything.

Fix: If len(filteredIssues) == 0 and len(collisionResult.NewIssues) == 0, it's a no-op - just update hash and return.

Low Priority Issues (P3)

9. No Configurable Collision Mode (P3)

Some users may prefer auto-import to fail on collisions rather than auto-resolve.

Suggestion: Add BD_AUTO_IMPORT_MODE=remap|fail environment variable.

10. No Collision Threshold (P3)

If 90% of issues collide, something is probably wrong (bad merge).

Suggestion: Add BD_AUTO_IMPORT_COLLISION_THRESHOLD - if exceeded, fail with clear error.

Testing Gaps

Missing test coverage for:

✅ Metadata read failure → should treat as first import
✅ Hash update failure → should warn but not crash
✅ All issues collide → should be no-op
✅ Scanner buffer overflow (>2MB line) → should error gracefully
✅ Concurrent auto-imports (race condition testing)
✅ Transaction rollback on mid-import failure
✅ 1000+ issue performance test

Answers to Review Questions

Q1: Should auto-import be more aggressive (auto-resolve) or conservative (fail)?

Recommendation: Keep auto-resolve as default but add:

Collision threshold that switches to fail mode if exceeded
Config option for users who prefer fail-fast behavior
Clear messaging about what was remapped

Q2: Should we add a counter for collision occurrences?

Yes. Add metadata:

collision_count_total (cumulative)
last_collision_count (last run)
last_collision_timestamp

Q3: Should there be a config option to disable collision detection?

No. Collision detection is a safety feature. Instead provide:

BD_AUTO_IMPORT_MODE=remap|fail to control behavior
--no-auto-import flag already exists to disable entirely

Q4: Is the warning too verbose for typical workflows?

Yes. The 10-line summary on every auto-import is noisy. Gate behind BD_DEBUG.

Recommended Fixes Priority

P0 (Critical - Fix ASAP):

bd-TBD: Fix metadata error handling (treat as first import)
bd-TBD: Ensure hash update happens and is error-checked
bd-TBD: Fix N+1 query pattern with batch fetch

P1 (High - Fix Before 1.0):

bd-TBD: Wrap import in transaction for atomicity
bd-TBD: Add test coverage for edge cases
bd-TBD: Optimize "all collisions" case

P2 (Medium - Nice to Have):

bd-TBD: Reduce output verbosity (gate behind BD_DEBUG)
bd-TBD: Use bytes.NewReader instead of string conversion
bd-TBD: Add collision metrics/telemetry

P3 (Low - Future Enhancement):

bd-TBD: Add BD_AUTO_IMPORT_MODE config
bd-TBD: Add collision threshold safety rail

Conclusion

The implementation works for the happy path but has robustness issues around error handling, performance, and edge cases. The auto-resolve approach is good, but needs better error handling and performance optimization.

Estimated effort to fix P0-P1 issues: 2-3 days Risk level if not fixed: Medium-High (data loss possible on edge cases, poor performance at scale)

Review completed: 2025-10-16
Reviewer: Oracle (via Amp)
Issue: bd-379

6.3 KiB Raw Blame History