Files
beads/docs/ARCHITECTURE.md
2025-11-25 11:35:26 -08:00

276 lines
15 KiB
Markdown

# Architecture
This document describes bd's overall architecture - the data model, sync mechanism, and how components fit together. For internal implementation details (FlushManager, Blocked Cache), see [INTERNALS.md](INTERNALS.md).
## The Three-Layer Data Model
bd's core design enables a distributed, git-backed issue tracker that feels like a centralized database. The "magic" comes from three synchronized layers:
```
┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer │
│ │
│ bd create, list, update, close, ready, show, dep, sync, ... │
│ - Cobra commands in cmd/bd/ │
│ - All commands support --json for programmatic use │
│ - Tries daemon RPC first, falls back to direct DB access │
└──────────────────────────────┬──────────────────────────────────┘
v
┌─────────────────────────────────────────────────────────────────┐
│ SQLite Database │
│ (.beads/beads.db) │
│ │
│ - Local working copy (gitignored) │
│ - Fast queries, indexes, foreign keys │
│ - Issues, dependencies, labels, comments, events │
│ - Each machine has its own copy │
└──────────────────────────────┬──────────────────────────────────┘
auto-sync
(5s debounce)
v
┌─────────────────────────────────────────────────────────────────┐
│ JSONL File │
│ (.beads/beads.jsonl) │
│ │
│ - Git-tracked source of truth │
│ - One JSON line per entity (issue, dep, label, comment) │
│ - Merge-friendly: additions rarely conflict │
│ - Shared across machines via git push/pull │
└──────────────────────────────┬──────────────────────────────────┘
git push/pull
v
┌─────────────────────────────────────────────────────────────────┐
│ Remote Repository │
│ (GitHub, GitLab, etc.) │
│ │
│ - Stores JSONL as part of normal repo history │
│ - All collaborators share the same issue database │
│ - Protected branch support via separate sync branch │
└─────────────────────────────────────────────────────────────────┘
```
### Why This Design?
**SQLite for speed:** Local queries complete in milliseconds. Complex dependency graphs, full-text search, and joins are fast.
**JSONL for git:** One entity per line means git diffs are readable and merges usually succeed automatically. No binary database files in version control.
**Git for distribution:** No special sync server needed. Issues travel with your code. Offline work just works.
## Write Path
When you create or modify an issue:
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CLI Command │───▶│ SQLite Write │───▶│ Mark Dirty │
│ (bd create) │ │ (immediate) │ │ (trigger sync) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
5-second debounce
v
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Git Commit │◀───│ JSONL Export │◀───│ FlushManager │
│ (git hooks) │ │ (incremental) │ │ (background) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
1. **Command executes:** `bd create "New feature"` writes to SQLite immediately
2. **Mark dirty:** The operation marks the database as needing export
3. **Debounce window:** Wait 5 seconds for batch operations (configurable)
4. **Export to JSONL:** Only changed entities are appended/updated
5. **Git commit:** If git hooks are installed, changes auto-commit
Key implementation:
- Export: `cmd/bd/export.go`, `cmd/bd/autoflush.go`
- FlushManager: `internal/flush/` (see [INTERNALS.md](INTERNALS.md))
- Dirty tracking: `internal/storage/sqlite/dirty_issues.go`
## Read Path
When you query issues after a `git pull`:
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ git pull │───▶│ Auto-Import │───▶│ SQLite Update │
│ (new JSONL) │ │ (on next cmd) │ │ (merge logic) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
v
┌─────────────────┐
│ CLI Query │
│ (bd ready) │
└─────────────────┘
```
1. **Git pull:** Fetches updated JSONL from remote
2. **Auto-import detection:** First bd command checks if JSONL is newer than DB
3. **Import to SQLite:** Parse JSONL, merge with local state using content hashes
4. **Query:** Commands read from fast local SQLite
Key implementation:
- Import: `cmd/bd/import.go`, `cmd/bd/autoimport.go`
- Auto-import logic: `internal/autoimport/autoimport.go`
- Collision detection: `internal/importer/importer.go`
## Hash-Based Collision Prevention
The key insight that enables distributed operation: **content-based hashing for deduplication**.
### The Problem
Sequential IDs (bd-1, bd-2, bd-3) cause collisions when multiple agents create issues concurrently:
```
Branch A: bd create "Add OAuth" → bd-10
Branch B: bd create "Add Stripe" → bd-10 (collision!)
```
### The Solution
Hash-based IDs derived from random UUIDs ensure uniqueness:
```
Branch A: bd create "Add OAuth" → bd-a1b2
Branch B: bd create "Add Stripe" → bd-f14c (no collision)
```
### How It Works
1. **Issue creation:** Generate random UUID, derive short hash as ID
2. **Progressive scaling:** IDs start at 4 chars, grow to 5-6 chars as database grows
3. **Content hashing:** Each issue has a content hash for change detection
4. **Import merge:** Same ID + different content = update, same ID + same content = skip
```
┌─────────────────────────────────────────────────────────────────┐
│ Import Logic │
│ │
│ For each issue in JSONL: │
│ 1. Compute content hash │
│ 2. Look up existing issue by ID │
│ 3. Compare hashes: │
│ - Same hash → skip (already imported) │
│ - Different hash → update (newer version) │
│ - No match → create (new issue) │
└─────────────────────────────────────────────────────────────────┘
```
This eliminates the need for central coordination while ensuring all machines converge to the same state.
See [COLLISION_MATH.md](COLLISION_MATH.md) for birthday paradox calculations on hash length vs collision probability.
## Daemon Architecture
Each workspace runs its own background daemon for auto-sync:
```
┌─────────────────────────────────────────────────────────────────┐
│ Per-Workspace Daemon │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ RPC Server │ │ Auto-Sync │ │ Background │ │
│ │ (bd.sock) │ │ Manager │ │ Tasks │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └──────────────────┴──────────────────┘ │
│ │ │
│ v │
│ ┌─────────────┐ │
│ │ SQLite │ │
│ │ Database │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
CLI commands ───RPC───▶ Daemon ───SQL───▶ Database
or
CLI commands ───SQL───▶ Database (if daemon unavailable)
```
**Why daemons?**
- Batches multiple operations before export
- Holds database connection open (faster queries)
- Coordinates auto-sync timing
- One daemon per workspace (LSP-like model)
**Communication:**
- Unix domain socket at `.beads/bd.sock` (Windows: named pipes)
- Protocol defined in `internal/rpc/protocol.go`
- CLI tries daemon first, falls back to direct DB access
**Lifecycle:**
- Auto-starts on first bd command (unless `BEADS_NO_DAEMON=1`)
- Auto-restarts after version upgrades
- Managed via `bd daemons` command
See [DAEMON.md](DAEMON.md) for operational details.
## Data Types
Core types in `internal/types/types.go`:
| Type | Description | Key Fields |
|------|-------------|------------|
| **Issue** | Work item | ID, Title, Description, Status, Priority, Type |
| **Dependency** | Relationship | FromID, ToID, Type (blocks/related/parent-child/discovered-from) |
| **Label** | Tag | Name, Color, Description |
| **Comment** | Discussion | IssueID, Author, Content, Timestamp |
| **Event** | Audit trail | IssueID, Type, Data, Timestamp |
### Dependency Types
| Type | Semantic | Affects `bd ready`? |
|------|----------|---------------------|
| `blocks` | Issue X must close before Y starts | Yes |
| `parent-child` | Hierarchical (epic/subtask) | Yes (children blocked if parent blocked) |
| `related` | Soft link for reference | No |
| `discovered-from` | Found during work on parent | No |
### Status Flow
```
open ──▶ in_progress ──▶ closed
│ │
└────────────────────────┘
(reopen)
```
## Directory Structure
```
.beads/
├── beads.db # SQLite database (gitignored)
├── beads.jsonl # JSONL source of truth (git-tracked)
├── bd.sock # Daemon socket (gitignored)
├── daemon.log # Daemon logs (gitignored)
├── config.yaml # Project config (optional)
└── export_hashes.db # Export tracking (gitignored)
```
## Key Code Paths
| Area | Files |
|------|-------|
| CLI entry | `cmd/bd/main.go` |
| Storage interface | `internal/storage/storage.go` |
| SQLite implementation | `internal/storage/sqlite/` |
| RPC protocol | `internal/rpc/protocol.go`, `server_*.go` |
| Export logic | `cmd/bd/export.go`, `autoflush.go` |
| Import logic | `cmd/bd/import.go`, `internal/importer/` |
| Auto-sync | `internal/autoimport/`, `internal/flush/` |
## Related Documentation
- [INTERNALS.md](INTERNALS.md) - FlushManager, Blocked Cache implementation details
- [DAEMON.md](DAEMON.md) - Daemon management and configuration
- [EXTENDING.md](EXTENDING.md) - Adding custom tables to SQLite
- [TROUBLESHOOTING.md](TROUBLESHOOTING.md) - Recovery procedures and common issues
- [FAQ.md](FAQ.md) - Common questions about the architecture
- [COLLISION_MATH.md](COLLISION_MATH.md) - Hash collision probability analysis