Fix race condition in auto-flush mechanism (issue bd-52)

Critical fixes to code review findings:

1. Remove global state access from flushToJSONLWithState
   - FlushManager now has true single ownership of flush state
   - No more race conditions from concurrent global state access
   - flushToJSONLWithState trusts only the flushState parameter
   - Legacy wrapper handles success detection via failure count

2. Fix shutdown timeout data loss risk
   - Increased timeout from 5s → 30s to prevent data loss
   - Added detailed comments explaining the timeout rationale
   - Better error message indicates potential data loss scenario

Implementation details:
- New FlushManager uses event-driven single-owner pattern
- Channels eliminate shared mutable state (markDirtyCh, flushNowCh, etc.)
- Comprehensive race detector tests verify concurrency safety
- Backward compatible with existing tests via legacy code path
- ARCHITECTURE.md documents design principles and guarantees

Test results:
- All race detector tests pass (TestFlushManager*)
- Legacy API compatibility verified (TestMarkDirtyAndScheduleFlush*)
- No race conditions detected under concurrent load

Future improvements tracked as beads:
- bd-gdn: Add functional tests for flush correctness verification
- bd-5xt: Log errors from timer-triggered flushes
- bd-i00: Convert magic numbers to named constants

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Steve Yegge
2025-11-20 21:23:52 -05:00
parent 4a566edaa6
commit a9b2f9f553
6 changed files with 842 additions and 61 deletions

View File

@@ -62,14 +62,17 @@ var (
// Auto-flush state
autoFlushEnabled = true // Can be disabled with --no-auto-flush
isDirty = false // Tracks if DB has changes needing export
needsFullExport = false // Set to true when IDs change (e.g., rename-prefix)
isDirty = false // Tracks if DB has changes needing export (used by legacy code)
needsFullExport = false // Set to true when IDs change (used by legacy code)
flushMutex sync.Mutex
flushTimer *time.Timer
storeMutex sync.Mutex // Protects store access from background goroutine
storeActive = false // Tracks if store is available
flushFailureCount = 0 // Consecutive flush failures
lastFlushError error // Last flush error for debugging
flushTimer *time.Timer // DEPRECATED: Use flushManager instead
storeMutex sync.Mutex // Protects store access from background goroutine
storeActive = false // Tracks if store is available
flushFailureCount = 0 // Consecutive flush failures
lastFlushError error // Last flush error for debugging
// Auto-flush manager (replaces timer-based approach to fix bd-52)
flushManager *FlushManager
// Auto-import state
autoImportEnabled = true // Can be disabled with --no-auto-import
@@ -445,6 +448,12 @@ var rootCmd = &cobra.Command{
storeActive = true
storeMutex.Unlock()
// Initialize flush manager (fixes bd-52: race condition in auto-flush)
// For in-process test scenarios where commands run multiple times,
// we create a new manager each time. Shutdown() is idempotent so
// PostRun can safely shutdown whichever manager is active.
flushManager = NewFlushManager(autoFlushEnabled, getDebounceDuration())
// Warn if multiple databases detected in directory hierarchy
warnMultipleDatabases(dbPath)
@@ -502,22 +511,11 @@ var rootCmd = &cobra.Command{
}
// Otherwise, handle direct mode cleanup
// Flush any pending changes before closing
flushMutex.Lock()
needsFlush := isDirty && autoFlushEnabled
if needsFlush {
// Cancel timer and flush immediately
if flushTimer != nil {
flushTimer.Stop()
flushTimer = nil
// Shutdown flush manager (performs final flush if needed)
if flushManager != nil {
if err := flushManager.Shutdown(); err != nil {
fmt.Fprintf(os.Stderr, "Warning: flush manager shutdown error: %v\n", err)
}
// Don't clear isDirty or needsFullExport here - let flushToJSONL do it
}
flushMutex.Unlock()
if needsFlush {
// Call the shared flush function (handles both incremental and full export)
flushToJSONL()
}
// Signal that store is closing (prevents background flush from accessing closed store)