Files

Steve Yegge 5e971bd90a docs: add molecular chemistry documentation (bd-ul59)

Create docs/MOLECULES.md with comprehensive coverage of:
- Layer cake architecture (formulas → protos → molecules → epics → issues)
- Phase metaphor (solid/proto, liquid/mol, vapor/wisp)
- Phase transitions (pour, wisp create, squash, burn)
- Bonding patterns (proto+proto, proto+mol, mol+mol)
- Agent pitfalls (temporal language, forgetting to squash)
- Orphan vs stale matrix
- Progress tracking (computed, not stored)
- Parallelism model (default parallel, opt-in sequential)

Update CLI_REFERENCE.md with Molecular Chemistry section covering:
- Proto/template commands
- Pour command
- Wisp commands
- Bonding commands
- Squash and burn commands

Update ARCHITECTURE.md with cross-reference to new MOLECULES.md.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-24 19:31:14 -08:00

17 KiB

Raw Blame History

Architecture

This document describes bd's overall architecture - the data model, sync mechanism, and how components fit together. For internal implementation details (FlushManager, Blocked Cache), see INTERNALS.md.

The Three-Layer Data Model

bd's core design enables a distributed, git-backed issue tracker that feels like a centralized database. The "magic" comes from three synchronized layers:

┌─────────────────────────────────────────────────────────────────┐
│                        CLI Layer                                 │
│                                                                  │
│  bd create, list, update, close, ready, show, dep, sync, ...    │
│  - Cobra commands in cmd/bd/                                     │
│  - All commands support --json for programmatic use              │
│  - Tries daemon RPC first, falls back to direct DB access        │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               v
┌─────────────────────────────────────────────────────────────────┐
│                     SQLite Database                              │
│                     (.beads/beads.db)                            │
│                                                                  │
│  - Local working copy (gitignored)                               │
│  - Fast queries, indexes, foreign keys                           │
│  - Issues, dependencies, labels, comments, events                │
│  - Each machine has its own copy                                 │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                         auto-sync
                        (5s debounce)
                               │
                               v
┌─────────────────────────────────────────────────────────────────┐
│                       JSONL File                                 │
│                   (.beads/issues.jsonl)                          │
│                                                                  │
│  - Git-tracked source of truth                                   │
│  - One JSON line per entity (issue, dep, label, comment)         │
│  - Merge-friendly: additions rarely conflict                     │
│  - Shared across machines via git push/pull                      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                          git push/pull
                               │
                               v
┌─────────────────────────────────────────────────────────────────┐
│                     Remote Repository                            │
│                    (GitHub, GitLab, etc.)                        │
│                                                                  │
│  - Stores JSONL as part of normal repo history                   │
│  - All collaborators share the same issue database               │
│  - Protected branch support via separate sync branch             │
└─────────────────────────────────────────────────────────────────┘

Why This Design?

SQLite for speed: Local queries complete in milliseconds. Complex dependency graphs, full-text search, and joins are fast.

JSONL for git: One entity per line means git diffs are readable and merges usually succeed automatically. No binary database files in version control.

Git for distribution: No special sync server needed. Issues travel with your code. Offline work just works.

Write Path

When you create or modify an issue:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CLI Command   │───▶│  SQLite Write   │───▶│  Mark Dirty     │
│   (bd create)   │    │  (immediate)    │    │  (trigger sync) │
└─────────────────┘    └─────────────────┘    └────────┬────────┘
                                                       │
                                              5-second debounce
                                                       │
                                                       v
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Git Commit    │◀───│  JSONL Export   │◀───│  FlushManager   │
│   (git hooks)   │    │  (incremental)  │    │  (background)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Command executes: bd create "New feature" writes to SQLite immediately
Mark dirty: The operation marks the database as needing export
Debounce window: Wait 5 seconds for batch operations (configurable)
Export to JSONL: Only changed entities are appended/updated
Git commit: If git hooks are installed, changes auto-commit

Key implementation:

Export: cmd/bd/export.go, cmd/bd/autoflush.go
FlushManager: internal/flush/ (see INTERNALS.md)
Dirty tracking: internal/storage/sqlite/dirty_issues.go

Read Path

When you query issues after a git pull:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   git pull      │───▶│  Auto-Import    │───▶│  SQLite Update  │
│   (new JSONL)   │    │  (on next cmd)  │    │  (merge logic)  │
└─────────────────┘    └─────────────────┘    └────────┬────────┘
                                                       │
                                                       v
                                               ┌─────────────────┐
                                               │  CLI Query      │
                                               │  (bd ready)     │
                                               └─────────────────┘

Git pull: Fetches updated JSONL from remote
Auto-import detection: First bd command checks if JSONL is newer than DB
Import to SQLite: Parse JSONL, merge with local state using content hashes
Query: Commands read from fast local SQLite

Key implementation:

Import: cmd/bd/import.go, cmd/bd/autoimport.go
Auto-import logic: internal/autoimport/autoimport.go
Collision detection: internal/importer/importer.go

Hash-Based Collision Prevention

The key insight that enables distributed operation: content-based hashing for deduplication.

The Problem

Sequential IDs (bd-1, bd-2, bd-3) cause collisions when multiple agents create issues concurrently:

Branch A: bd create "Add OAuth"   → bd-10
Branch B: bd create "Add Stripe"  → bd-10 (collision!)

The Solution

Hash-based IDs derived from random UUIDs ensure uniqueness:

Branch A: bd create "Add OAuth"   → bd-a1b2
Branch B: bd create "Add Stripe"  → bd-f14c (no collision)

How It Works

Issue creation: Generate random UUID, derive short hash as ID
Progressive scaling: IDs start at 4 chars, grow to 5-6 chars as database grows
Content hashing: Each issue has a content hash for change detection
Import merge: Same ID + different content = update, same ID + same content = skip

┌─────────────────────────────────────────────────────────────────┐
│                        Import Logic                              │
│                                                                  │
│  For each issue in JSONL:                                       │
│    1. Compute content hash                                       │
│    2. Look up existing issue by ID                               │
│    3. Compare hashes:                                            │
│       - Same hash → skip (already imported)                      │
│       - Different hash → update (newer version)                  │
│       - No match → create (new issue)                            │
└─────────────────────────────────────────────────────────────────┘

This eliminates the need for central coordination while ensuring all machines converge to the same state.

See COLLISION_MATH.md for birthday paradox calculations on hash length vs collision probability.

Daemon Architecture

Each workspace runs its own background daemon for auto-sync:

┌─────────────────────────────────────────────────────────────────┐
│                     Per-Workspace Daemon                         │
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │ RPC Server  │    │  Auto-Sync  │    │  Background │         │
│  │ (bd.sock)   │    │  Manager    │    │  Tasks      │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│         │                  │                  │                  │
│         └──────────────────┴──────────────────┘                  │
│                            │                                     │
│                            v                                     │
│                   ┌─────────────┐                                │
│                   │   SQLite    │                                │
│                   │   Database  │                                │
│                   └─────────────┘                                │
└─────────────────────────────────────────────────────────────────┘

     CLI commands ───RPC───▶ Daemon ───SQL───▶ Database
                              or
     CLI commands ───SQL───▶ Database (if daemon unavailable)

Why daemons?

Batches multiple operations before export
Holds database connection open (faster queries)
Coordinates auto-sync timing
One daemon per workspace (LSP-like model)

Communication:

Unix domain socket at .beads/bd.sock (Windows: named pipes)
Protocol defined in internal/rpc/protocol.go
CLI tries daemon first, falls back to direct DB access

Lifecycle:

Auto-starts on first bd command (unless BEADS_NO_DAEMON=1)
Auto-restarts after version upgrades
Managed via bd daemons command

See DAEMON.md for operational details.

Data Types

Core types in internal/types/types.go:

Type	Description	Key Fields
Issue	Work item	ID, Title, Description, Status, Priority, Type
Dependency	Relationship	FromID, ToID, Type (blocks/related/parent-child/discovered-from)
Label	Tag	Name, Color, Description
Comment	Discussion	IssueID, Author, Content, Timestamp
Event	Audit trail	IssueID, Type, Data, Timestamp

Dependency Types

Type	Semantic	Affects `bd ready`?
`blocks`	Issue X must close before Y starts	Yes
`parent-child`	Hierarchical (epic/subtask)	Yes (children blocked if parent blocked)
`related`	Soft link for reference	No
`discovered-from`	Found during work on parent	No

Status Flow

open ──▶ in_progress ──▶ closed
  │                        │
  └────────────────────────┘
         (reopen)

Directory Structure

.beads/
├── beads.db          # SQLite database (gitignored)
├── issues.jsonl      # JSONL source of truth (git-tracked)
├── bd.sock           # Daemon socket (gitignored)
├── daemon.log        # Daemon logs (gitignored)
├── config.yaml       # Project config (optional)
└── export_hashes.db  # Export tracking (gitignored)

Key Code Paths

Area	Files
CLI entry	`cmd/bd/main.go`
Storage interface	`internal/storage/storage.go`
SQLite implementation	`internal/storage/sqlite/`
RPC protocol	`internal/rpc/protocol.go`, `server_*.go`
Export logic	`cmd/bd/export.go`, `autoflush.go`
Import logic	`cmd/bd/import.go`, `internal/importer/`
Auto-sync	`internal/autoimport/`, `internal/flush/`

Wisps and Molecules

Molecules are template work items that define structured workflows. When spawned, they create wisps - ephemeral child issues that track execution steps.

For full documentation on the molecular chemistry metaphor (protos, pour, bond, squash, burn), see MOLECULES.md.

Wisp Lifecycle

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ bd wisp create  │───▶│  Wisp Issues    │───▶│  bd mol squash  │
│ (from template) │    │  (local-only)   │    │  (→ digest)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Create: Create wisps from a molecule template
Execute: Agent works through wisp steps (local SQLite only)
Squash: Compress wisps into a permanent digest issue

Why Wisps Don't Sync

Wisps are intentionally local-only:

They exist only in the spawning agent's SQLite database
They are never exported to JSONL
They cannot resurrect from other clones (they were never there)
They are hard-deleted when squashed (no tombstones needed)

This design enables:

Fast local iteration: No sync overhead during execution
Clean history: Only the digest (outcome) enters git
Agent isolation: Each agent's execution trace is private
Bounded storage: Wisps don't accumulate across clones

Wisp vs Regular Issue Deletion

Aspect	Regular Issues	Wisps
Exported to JSONL	Yes	No
Tombstone on delete	Yes	No
Can resurrect	Yes (without tombstone)	No (never synced)
Deletion method	`CreateTombstone()`	`DeleteIssue()` (hard delete)

The bd mol squash command uses hard delete intentionally - tombstones would be wasted overhead for data that never leaves the local database.

Future Directions

Separate wisp repo: Keep wisps in a dedicated ephemeral git repo
Digest migration: Explicit step to promote digests to main repo
Wisp retention: Option to preserve wisps in local git history

MOLECULES.md - Molecular chemistry metaphor (protos, pour, bond, squash, burn)
INTERNALS.md - FlushManager, Blocked Cache implementation details
DAEMON.md - Daemon management and configuration
EXTENDING.md - Adding custom tables to SQLite
TROUBLESHOOTING.md - Recovery procedures and common issues
FAQ.md - Common questions about the architecture
COLLISION_MATH.md - Hash collision probability analysis

17 KiB Raw Blame History