bd sync: 2025-11-25 11:35:26

2025-11-25 11:35:26 -08:00
parent d1d7b0e34a
commit c3e4172be7
3 changed files with 586 additions and 308 deletions
@@ -1,357 +1,275 @@
 # Architecture

-This document describes the internal architecture of the `bd` issue tracker, with particular focus on concurrency guarantees and data consistency.
+This document describes bd's overall architecture - the data model, sync mechanism, and how components fit together. For internal implementation details (FlushManager, Blocked Cache), see [INTERNALS.md](INTERNALS.md).

-## Auto-Flush Architecture
+## The Three-Layer Data Model

-### Problem Statement (Issue bd-52)
-
-The original auto-flush implementation suffered from a critical race condition when multiple concurrent operations accessed shared state:
-
- **Concurrent access points:**
-  - Auto-flush timer goroutine (5s debounce)
-  - Daemon sync goroutine
-  - Concurrent CLI commands
-  - Git hook execution
-  - PersistentPostRun cleanup
-
- **Shared mutable state:**
-  - `isDirty` flag
-  - `needsFullExport` flag
-  - `flushTimer` instance
-  - `storeActive` flag
-
- **Impact:**
-  - Potential data loss under concurrent load
-  - Corruption when multiple agents/commands run simultaneously
-  - Race conditions during rapid commits
-  - Flush operations could access closed storage
-
-### Solution: Event-Driven FlushManager
-
-The race condition was eliminated by replacing timer-based shared state with an event-driven architecture using a single-owner pattern.
-
-#### Architecture
+bd's core design enables a distributed, git-backed issue tracker that feels like a centralized database. The "magic" comes from three synchronized layers:

 ```
-┌─────────────────────────────────────────────────────────┐
-│                     Command/Agent                        │
-│                                                          │
-│  markDirtyAndScheduleFlush() ─┐                         │
-│  markDirtyAndScheduleFullExport() ─┐                    │
-└────────────────────────────────────┼───┼────────────────┘
-                                     │   │
-                                     v   v
-                    ┌────────────────────────────────────┐
-                    │        FlushManager                │
-                    │  (Single-Owner Pattern)            │
-                    │                                    │
-                    │  Channels (buffered):              │
-                    │    - markDirtyCh                   │
-                    │    - timerFiredCh                  │
-                    │    - flushNowCh                    │
-                    │    - shutdownCh                    │
-                    │                                    │
-                    │  State (owned by run() goroutine): │
-                    │    - isDirty                       │
-                    │    - needsFullExport               │
-                    │    - debounceTimer                 │
-                    └────────────────────────────────────┘
-                                     │
-                                     v
-                    ┌────────────────────────────────────┐
-                    │      flushToJSONLWithState()       │
-                    │                                    │
-                    │  - Validates store is active       │
-                    │  - Checks JSONL integrity          │
-                    │  - Performs incremental/full export│
-                    │  - Updates export hashes           │
-                    └────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────┐
+│                        CLI Layer                                 │
+│                                                                  │
+│  bd create, list, update, close, ready, show, dep, sync, ...    │
+│  - Cobra commands in cmd/bd/                                     │
+│  - All commands support --json for programmatic use              │
+│  - Tries daemon RPC first, falls back to direct DB access        │
+└──────────────────────────────┬──────────────────────────────────┘
+                               │
+                               v
+┌─────────────────────────────────────────────────────────────────┐
+│                     SQLite Database                              │
+│                     (.beads/beads.db)                            │
+│                                                                  │
+│  - Local working copy (gitignored)                               │
+│  - Fast queries, indexes, foreign keys                           │
+│  - Issues, dependencies, labels, comments, events                │
+│  - Each machine has its own copy                                 │
+└──────────────────────────────┬──────────────────────────────────┘
+                               │
+                         auto-sync
+                        (5s debounce)
+                               │
+                               v
+┌─────────────────────────────────────────────────────────────────┐
+│                       JSONL File                                 │
+│                   (.beads/beads.jsonl)                           │
+│                                                                  │
+│  - Git-tracked source of truth                                   │
+│  - One JSON line per entity (issue, dep, label, comment)         │
+│  - Merge-friendly: additions rarely conflict                     │
+│  - Shared across machines via git push/pull                      │
+└──────────────────────────────┬──────────────────────────────────┘
+                               │
+                          git push/pull
+                               │
+                               v
+┌─────────────────────────────────────────────────────────────────┐
+│                     Remote Repository                            │
+│                    (GitHub, GitLab, etc.)                        │
+│                                                                  │
+│  - Stores JSONL as part of normal repo history                   │
+│  - All collaborators share the same issue database               │
+│  - Protected branch support via separate sync branch             │
+└─────────────────────────────────────────────────────────────────┘
 ```

-#### Key Design Principles
+### Why This Design?

-**1. Single Owner Pattern**
+**SQLite for speed:** Local queries complete in milliseconds. Complex dependency graphs, full-text search, and joins are fast.

-All flush state (`isDirty`, `needsFullExport`, `debounceTimer`) is owned by a single background goroutine (`FlushManager.run()`). This eliminates the need for mutexes to protect this state.
+**JSONL for git:** One entity per line means git diffs are readable and merges usually succeed automatically. No binary database files in version control.

-**2. Channel-Based Communication**
+**Git for distribution:** No special sync server needed. Issues travel with your code. Offline work just works.

-External code communicates with FlushManager via buffered channels:
- `markDirtyCh`: Request to mark DB dirty (incremental or full export)
- `timerFiredCh`: Debounce timer expired notification
- `flushNowCh`: Synchronous flush request (returns error)
- `shutdownCh`: Graceful shutdown with final flush
+## Write Path

-**3. No Shared Mutable State**
-
-The only shared state is accessed via atomic operations (channel sends/receives). The `storeActive` flag and `store` pointer still use a mutex, but only to coordinate with store lifecycle, not flush logic.
-
-**4. Debouncing Without Locks**
-
-The timer callback sends to `timerFiredCh` instead of directly manipulating state. The run() goroutine processes timer events in its select loop, eliminating timer-related races.
-
-#### Concurrency Guarantees
-
-**Thread-Safety:**
- `MarkDirty(fullExport bool)` - Safe from any goroutine, non-blocking
- `FlushNow() error` - Safe from any goroutine, blocks until flush completes
- `Shutdown() error` - Idempotent, safe to call multiple times
-
-**Debouncing Guarantees:**
- Multiple `MarkDirty()` calls within the debounce window → single flush
- Timer resets on each mark, flush occurs after last modification
- FlushNow() bypasses debounce, forces immediate flush
-
-**Shutdown Guarantees:**
- Final flush performed if database is dirty
- Background goroutine cleanly exits
- Idempotent via `sync.Once` - safe for multiple calls
- Subsequent operations after shutdown are no-ops
-
-**Store Lifecycle:**
- FlushManager checks `storeActive` flag before every flush
- Store closure is coordinated via `storeMutex`
- Flush safely aborts if store closes mid-operation
-
-#### Migration Path
-
-The implementation maintains backward compatibility:
-
-1. **Legacy path (tests):** If `flushManager == nil`, falls back to old timer-based logic
-2. **New path (production):** Uses FlushManager event-driven architecture
-3. **Wrapper functions:** `markDirtyAndScheduleFlush()` and `markDirtyAndScheduleFullExport()` delegate to FlushManager when available
-
-This allows existing tests to pass without modification while fixing the race condition in production.
-
-## Testing
-
-### Race Detection
-
-Comprehensive race detector tests ensure concurrency safety:
-
- `TestFlushManagerConcurrentMarkDirty` - Many goroutines marking dirty
- `TestFlushManagerConcurrentFlushNow` - Concurrent immediate flushes
- `TestFlushManagerMarkDirtyDuringFlush` - Interleaved mark/flush operations
- `TestFlushManagerShutdownDuringOperation` - Shutdown while operations ongoing
- `TestMarkDirtyAndScheduleFlushConcurrency` - Integration test with legacy API
-
-Run with: `go test -race -run TestFlushManager ./cmd/bd`
-
-### In-Process Test Compatibility
-
-The FlushManager is designed to work correctly when commands run multiple times in the same process (common in tests):
-
- Each command execution in `PersistentPreRun` creates a new FlushManager
- `PersistentPostRun` shuts down the manager
- `Shutdown()` is idempotent via `sync.Once`
- Old managers are garbage collected when replaced
-
-## Related Subsystems
-
-### Daemon Mode
-
-When running with daemon mode (`--no-daemon=false`), the CLI delegates to an RPC server. The FlushManager is NOT used in daemon mode - the daemon process has its own flush coordination.
-
-The `daemonClient != nil` check in `PersistentPostRun` ensures FlushManager shutdown only occurs in direct mode.
-
-### Auto-Import
-
-Auto-import runs in `PersistentPreRun` before FlushManager is used. It may call `markDirtyAndScheduleFlush()` or `markDirtyAndScheduleFullExport()` if JSONL changes are detected.
-
-Hash-based comparison (not mtime) prevents git pull false positives (issue bd-84).
-
-### JSONL Integrity
-
-`flushToJSONLWithState()` validates JSONL file hash before flush:
- Compares stored hash with actual file hash
- If mismatch detected, clears export_hashes and forces full re-export (issue bd-160)
- Prevents staleness when JSONL is modified outside bd
-
-### Export Modes
-
-**Incremental export (default):**
- Exports only dirty issues (tracked in `dirty_issues` table)
- Merges with existing JSONL file
- Faster for small changesets
-
-**Full export (after ID changes):**
- Exports all issues from database
- Rebuilds JSONL from scratch
- Required after operations like `rename-prefix` that change issue IDs
- Triggered by `markDirtyAndScheduleFullExport()`
-
-## Performance Characteristics
-
- **Debounce window:** Configurable via `getDebounceDuration()` (default 5s)
- **Channel buffer sizes:**
-  - markDirtyCh: 10 events (prevents blocking during bursts)
-  - timerFiredCh: 1 event (timer notifications coalesce naturally)
-  - flushNowCh: 1 request (synchronous, one at a time)
-  - shutdownCh: 1 request (one-shot operation)
- **Memory overhead:** One goroutine + minimal channel buffers per command execution
- **Flush latency:** Debounce duration + JSONL write time (typically <100ms for incremental)
-
-## Blocked Issues Cache (bd-5qim)
-
-### Problem Statement
-
-The `bd ready` command originally computed blocked issues using a recursive CTE on every query. On a 10K issue database, each query took ~752ms, making the command feel sluggish and impractical for large projects.
-
-### Solution: Materialized Cache Table
-
-The `blocked_issues_cache` table materializes the blocking computation, storing issue IDs for all currently blocked issues. Queries now use a simple `NOT EXISTS` check against this cache, completing in ~29ms (25x speedup).
-
-### Architecture
+When you create or modify an issue:

 ```
-┌─────────────────────────────────────────────────────────┐
-│                   GetReadyWork Query                     │
-│                                                          │
-│  SELECT ... FROM issues WHERE status IN (...)            │
-│  AND NOT EXISTS (                                        │
-│    SELECT 1 FROM blocked_issues_cache                    │
-│    WHERE issue_id = issues.id                            │
-│  )                                                       │
-│                                                          │
-│  Performance: 29ms (was 752ms with recursive CTE)       │
-└─────────────────────────────────────────────────────────┘
-
-┌─────────────────────────────────────────────────────────┐
-│              Cache Invalidation Triggers                 │
-│                                                          │
-│  1. AddDependency (blocks/parent-child only)             │
-│  2. RemoveDependency (blocks/parent-child only)          │
-│  3. UpdateIssue (on any status change)                   │
-│  4. CloseIssue (changes status to closed)                │
-│                                                          │
-│  NOT triggered by: related, discovered-from deps         │
-└─────────────────────────────────────────────────────────┘
-
-┌─────────────────────────────────────────────────────────┐
-│               Cache Rebuild Process                      │
-│                                                          │
-│  1. DELETE FROM blocked_issues_cache                     │
-│  2. INSERT INTO blocked_issues_cache                     │
-│     WITH RECURSIVE CTE:                                  │
-│       - Find directly blocked issues (blocks deps)       │
-│       - Propagate to children (parent-child deps)        │
-│  3. Happens in same transaction as triggering change     │
-│                                                          │
-│  Performance: <50ms full rebuild on 10K database         │
-└─────────────────────────────────────────────────────────┘
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   CLI Command   │───▶│  SQLite Write   │───▶│  Mark Dirty     │
+│   (bd create)   │    │  (immediate)    │    │  (trigger sync) │
+└─────────────────┘    └─────────────────┘    └────────┬────────┘
+                                                       │
+                                              5-second debounce
+                                                       │
+                                                       v
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Git Commit    │◀───│  JSONL Export   │◀───│  FlushManager   │
+│   (git hooks)   │    │  (incremental)  │    │  (background)   │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
 ```

-### Blocking Semantics
+1. **Command executes:** `bd create "New feature"` writes to SQLite immediately
+2. **Mark dirty:** The operation marks the database as needing export
+3. **Debounce window:** Wait 5 seconds for batch operations (configurable)
+4. **Export to JSONL:** Only changed entities are appended/updated
+5. **Git commit:** If git hooks are installed, changes auto-commit

-An issue is blocked if:
+Key implementation:
+- Export: `cmd/bd/export.go`, `cmd/bd/autoflush.go`
+- FlushManager: `internal/flush/` (see [INTERNALS.md](INTERNALS.md))
+- Dirty tracking: `internal/storage/sqlite/dirty_issues.go`

-1. **Direct blocking**: Has a `blocks` dependency on an open/in_progress/blocked issue
-2. **Transitive blocking**: Parent is blocked and issue is connected via `parent-child` dependency
+## Read Path

-Closed issues never block others. Related and discovered-from dependencies don't affect blocking.
+When you query issues after a `git pull`:

-### Cache Invalidation Strategy
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   git pull      │───▶│  Auto-Import    │───▶│  SQLite Update  │
+│   (new JSONL)   │    │  (on next cmd)  │    │  (merge logic)  │
+└─────────────────┘    └─────────────────┘    └────────┬────────┘
+                                                       │
+                                                       v
+                                               ┌─────────────────┐
+                                               │  CLI Query      │
+                                               │  (bd ready)     │
+                                               └─────────────────┘
+```

-**Full rebuild on every change**
+1. **Git pull:** Fetches updated JSONL from remote
+2. **Auto-import detection:** First bd command checks if JSONL is newer than DB
+3. **Import to SQLite:** Parse JSONL, merge with local state using content hashes
+4. **Query:** Commands read from fast local SQLite

-Instead of incremental updates, the cache is completely rebuilt (DELETE + INSERT) on any triggering change. This approach is chosen because:
+Key implementation:
+- Import: `cmd/bd/import.go`, `cmd/bd/autoimport.go`
+- Auto-import logic: `internal/autoimport/autoimport.go`
+- Collision detection: `internal/importer/importer.go`

- Rebuild is fast (<50ms even on 10K issues) due to optimized CTE
- Simpler implementation with no risk of partial/stale updates
- Dependency changes are rare compared to reads
- Guarantees consistency - cache matches database state exactly
+## Hash-Based Collision Prevention

-**Transaction safety**
+The key insight that enables distributed operation: **content-based hashing for deduplication**.

-All cache operations happen within the same transaction as the triggering change:
- Uses transaction if provided, otherwise direct db connection
- Cache can never be in an inconsistent state visible to queries
- Foreign key CASCADE ensures cache entries deleted when issues are deleted
+### The Problem

-**Selective invalidation**
+Sequential IDs (bd-1, bd-2, bd-3) cause collisions when multiple agents create issues concurrently:

-Only `blocks` and `parent-child` dependencies trigger rebuilds since they affect blocking semantics. Related and discovered-from dependencies don't trigger invalidation, avoiding unnecessary work.
+```
+Branch A: bd create "Add OAuth"   → bd-10
+Branch B: bd create "Add Stripe"  → bd-10 (collision!)
+```

-### Performance Characteristics
+### The Solution

-**Query performance (GetReadyWork):**
- Before cache: ~752ms (recursive CTE)
- With cache: ~29ms (NOT EXISTS)
- Speedup: 25x
+Hash-based IDs derived from random UUIDs ensure uniqueness:

-**Write overhead:**
- Cache rebuild: <50ms
- Only triggered on dependency/status changes (rare operations)
- Trade-off: slower writes for much faster reads
+```
+Branch A: bd create "Add OAuth"   → bd-a1b2
+Branch B: bd create "Add Stripe"  → bd-f14c (no collision)
+```

-### Edge Cases
+### How It Works

-1. **Parent-child transitive blocking**
-   - Children of blocked parents are automatically marked as blocked
-   - Propagates through arbitrary depth hierarchies (limited to depth 50 for safety)
+1. **Issue creation:** Generate random UUID, derive short hash as ID
+2. **Progressive scaling:** IDs start at 4 chars, grow to 5-6 chars as database grows
+3. **Content hashing:** Each issue has a content hash for change detection
+4. **Import merge:** Same ID + different content = update, same ID + same content = skip

-2. **Multiple blockers**
-   - Issue blocked by multiple open issues stays blocked until all are closed
-   - DISTINCT in CTE ensures issue appears once in cache
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Import Logic                              │
+│                                                                  │
+│  For each issue in JSONL:                                       │
+│    1. Compute content hash                                       │
+│    2. Look up existing issue by ID                               │
+│    3. Compare hashes:                                            │
+│       - Same hash → skip (already imported)                      │
+│       - Different hash → update (newer version)                  │
+│       - No match → create (new issue)                            │
+└─────────────────────────────────────────────────────────────────┘
+```

-3. **Status changes**
-   - Closing a blocker removes all blocked descendants from cache
-   - Reopening a blocker adds them back
+This eliminates the need for central coordination while ensuring all machines converge to the same state.

-4. **Dependency removal**
-   - Removing last blocker unblocks the issue
-   - Removing parent-child link unblocks orphaned subtree
+See [COLLISION_MATH.md](COLLISION_MATH.md) for birthday paradox calculations on hash length vs collision probability.

-5. **Foreign key cascades**
-   - Cache entries automatically deleted when issue is deleted
-   - No manual cleanup needed
+## Daemon Architecture

-### Testing
+Each workspace runs its own background daemon for auto-sync:

-Comprehensive test coverage in `blocked_cache_test.go`:
- Cache invalidation on dependency add/remove
- Cache updates on status changes
- Multiple blockers
- Deep hierarchies
- Transitive blocking via parent-child
- Related dependencies (should NOT affect cache)
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     Per-Workspace Daemon                         │
+│                                                                  │
+│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
+│  │ RPC Server  │    │  Auto-Sync  │    │  Background │         │
+│  │ (bd.sock)   │    │  Manager    │    │  Tasks      │         │
+│  └─────────────┘    └─────────────┘    └─────────────┘         │
+│         │                  │                  │                  │
+│         └──────────────────┴──────────────────┘                  │
+│                            │                                     │
+│                            v                                     │
+│                   ┌─────────────┐                                │
+│                   │   SQLite    │                                │
+│                   │   Database  │                                │
+│                   └─────────────┘                                │
+└─────────────────────────────────────────────────────────────────┘

-Run tests: `go test -v ./internal/storage/sqlite -run TestCache`
+     CLI commands ───RPC───▶ Daemon ───SQL───▶ Database
+                              or
+     CLI commands ───SQL───▶ Database (if daemon unavailable)
+```

-### Implementation Files
+**Why daemons?**
+- Batches multiple operations before export
+- Holds database connection open (faster queries)
+- Coordinates auto-sync timing
+- One daemon per workspace (LSP-like model)

- `internal/storage/sqlite/blocked_cache.go` - Cache rebuild and invalidation
- `internal/storage/sqlite/ready.go` - Uses cache in GetReadyWork queries
- `internal/storage/sqlite/dependencies.go` - Invalidates on dep changes
- `internal/storage/sqlite/queries.go` - Invalidates on status changes
- `internal/storage/sqlite/migrations/015_blocked_issues_cache.go` - Schema and initial population
+**Communication:**
+- Unix domain socket at `.beads/bd.sock` (Windows: named pipes)
+- Protocol defined in `internal/rpc/protocol.go`
+- CLI tries daemon first, falls back to direct DB access

-### Future Optimizations
+**Lifecycle:**
+- Auto-starts on first bd command (unless `BEADS_NO_DAEMON=1`)
+- Auto-restarts after version upgrades
+- Managed via `bd daemons` command

-If rebuild becomes a bottleneck in very large databases (>100K issues):
- Consider incremental updates for specific dependency types
- Add indexes to dependencies table for CTE performance
- Implement dirty tracking to avoid rebuilds when cache is unchanged
+See [DAEMON.md](DAEMON.md) for operational details.

-However, current performance is excellent for realistic workloads.
+## Data Types

-## Future Improvements
+Core types in `internal/types/types.go`:

-Potential enhancements for multi-agent scenarios:
+| Type | Description | Key Fields |
+|------|-------------|------------|
+| **Issue** | Work item | ID, Title, Description, Status, Priority, Type |
+| **Dependency** | Relationship | FromID, ToID, Type (blocks/related/parent-child/discovered-from) |
+| **Label** | Tag | Name, Color, Description |
+| **Comment** | Discussion | IssueID, Author, Content, Timestamp |
+| **Event** | Audit trail | IssueID, Type, Data, Timestamp |

-1. **Flush coordination across agents:**
-   - Shared lock file to prevent concurrent JSONL writes
-   - Detection of external JSONL modifications during flush
+### Dependency Types

-2. **Adaptive debounce window:**
-   - Shorter debounce during interactive sessions
-   - Longer debounce during batch operations
+| Type | Semantic | Affects `bd ready`? |
+|------|----------|---------------------|
+| `blocks` | Issue X must close before Y starts | Yes |
+| `parent-child` | Hierarchical (epic/subtask) | Yes (children blocked if parent blocked) |
+| `related` | Soft link for reference | No |
+| `discovered-from` | Found during work on parent | No |

-3. **Flush progress tracking:**
-   - Expose flush queue depth via status API
-   - Allow clients to wait for flush completion
+### Status Flow

-4. **Per-issue dirty tracking optimization:**
-   - Currently tracks full vs. incremental
-   - Could track specific issue IDs for surgical updates
+```
+open ──▶ in_progress ──▶ closed
+  │                        │
+  └────────────────────────┘
+         (reopen)
+```
+
+## Directory Structure
+
+```
+.beads/
+├── beads.db          # SQLite database (gitignored)
+├── beads.jsonl       # JSONL source of truth (git-tracked)
+├── bd.sock           # Daemon socket (gitignored)
+├── daemon.log        # Daemon logs (gitignored)
+├── config.yaml       # Project config (optional)
+└── export_hashes.db  # Export tracking (gitignored)
+```
+
+## Key Code Paths
+
+| Area | Files |
+|------|-------|
+| CLI entry | `cmd/bd/main.go` |
+| Storage interface | `internal/storage/storage.go` |
+| SQLite implementation | `internal/storage/sqlite/` |
+| RPC protocol | `internal/rpc/protocol.go`, `server_*.go` |
+| Export logic | `cmd/bd/export.go`, `autoflush.go` |
+| Import logic | `cmd/bd/import.go`, `internal/importer/` |
+| Auto-sync | `internal/autoimport/`, `internal/flush/` |
+
+## Related Documentation
+
+- [INTERNALS.md](INTERNALS.md) - FlushManager, Blocked Cache implementation details
+- [DAEMON.md](DAEMON.md) - Daemon management and configuration
+- [EXTENDING.md](EXTENDING.md) - Adding custom tables to SQLite
+- [TROUBLESHOOTING.md](TROUBLESHOOTING.md) - Recovery procedures and common issues
+- [FAQ.md](FAQ.md) - Common questions about the architecture
+- [COLLISION_MATH.md](COLLISION_MATH.md) - Hash collision probability analysis