Files
beads/docs/ARCHITECTURE.md
Steve Yegge 15cfef0247 Update documentation for bd mol pour/wisp command structure
Updated all documentation and help text to reflect:
- bd pour → bd mol pour
- bd ephemeral → bd mol wisp
- bd ephemeral list → bd mol wisp list
- bd ephemeral gc → bd mol wisp gc

Files updated:
- docs/MOLECULES.md
- docs/CLI_REFERENCE.md
- docs/ARCHITECTURE.md
- docs/DELETIONS.md
- skills/beads/references/MOLECULES.md
- cmd/bd/mol_catalog.go
- cmd/bd/mol_current.go
- cmd/bd/mol_distill.go
- cmd/bd/cook.go
2025-12-26 23:52:59 -08:00

17 KiB

Architecture

This document describes bd's overall architecture - the data model, sync mechanism, and how components fit together. For internal implementation details (FlushManager, Blocked Cache), see INTERNALS.md.

The Three-Layer Data Model

bd's core design enables a distributed, git-backed issue tracker that feels like a centralized database. The "magic" comes from three synchronized layers:

┌─────────────────────────────────────────────────────────────────┐
│                        CLI Layer                                 │
│                                                                  │
│  bd create, list, update, close, ready, show, dep, sync, ...    │
│  - Cobra commands in cmd/bd/                                     │
│  - All commands support --json for programmatic use              │
│  - Tries daemon RPC first, falls back to direct DB access        │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               v
┌─────────────────────────────────────────────────────────────────┐
│                     SQLite Database                              │
│                     (.beads/beads.db)                            │
│                                                                  │
│  - Local working copy (gitignored)                               │
│  - Fast queries, indexes, foreign keys                           │
│  - Issues, dependencies, labels, comments, events                │
│  - Each machine has its own copy                                 │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                         auto-sync
                        (5s debounce)
                               │
                               v
┌─────────────────────────────────────────────────────────────────┐
│                       JSONL File                                 │
│                   (.beads/issues.jsonl)                          │
│                                                                  │
│  - Git-tracked source of truth                                   │
│  - One JSON line per entity (issue, dep, label, comment)         │
│  - Merge-friendly: additions rarely conflict                     │
│  - Shared across machines via git push/pull                      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                          git push/pull
                               │
                               v
┌─────────────────────────────────────────────────────────────────┐
│                     Remote Repository                            │
│                    (GitHub, GitLab, etc.)                        │
│                                                                  │
│  - Stores JSONL as part of normal repo history                   │
│  - All collaborators share the same issue database               │
│  - Protected branch support via separate sync branch             │
└─────────────────────────────────────────────────────────────────┘

Why This Design?

SQLite for speed: Local queries complete in milliseconds. Complex dependency graphs, full-text search, and joins are fast.

JSONL for git: One entity per line means git diffs are readable and merges usually succeed automatically. No binary database files in version control.

Git for distribution: No special sync server needed. Issues travel with your code. Offline work just works.

Write Path

When you create or modify an issue:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CLI Command   │───▶│  SQLite Write   │───▶│  Mark Dirty     │
│   (bd create)   │    │  (immediate)    │    │  (trigger sync) │
└─────────────────┘    └─────────────────┘    └────────┬────────┘
                                                       │
                                              5-second debounce
                                                       │
                                                       v
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Git Commit    │◀───│  JSONL Export   │◀───│  FlushManager   │
│   (git hooks)   │    │  (incremental)  │    │  (background)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
  1. Command executes: bd create "New feature" writes to SQLite immediately
  2. Mark dirty: The operation marks the database as needing export
  3. Debounce window: Wait 5 seconds for batch operations (configurable)
  4. Export to JSONL: Only changed entities are appended/updated
  5. Git commit: If git hooks are installed, changes auto-commit

Key implementation:

  • Export: cmd/bd/export.go, cmd/bd/autoflush.go
  • FlushManager: internal/flush/ (see INTERNALS.md)
  • Dirty tracking: internal/storage/sqlite/dirty_issues.go

Read Path

When you query issues after a git pull:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   git pull      │───▶│  Auto-Import    │───▶│  SQLite Update  │
│   (new JSONL)   │    │  (on next cmd)  │    │  (merge logic)  │
└─────────────────┘    └─────────────────┘    └────────┬────────┘
                                                       │
                                                       v
                                               ┌─────────────────┐
                                               │  CLI Query      │
                                               │  (bd ready)     │
                                               └─────────────────┘
  1. Git pull: Fetches updated JSONL from remote
  2. Auto-import detection: First bd command checks if JSONL is newer than DB
  3. Import to SQLite: Parse JSONL, merge with local state using content hashes
  4. Query: Commands read from fast local SQLite

Key implementation:

  • Import: cmd/bd/import.go, cmd/bd/autoimport.go
  • Auto-import logic: internal/autoimport/autoimport.go
  • Collision detection: internal/importer/importer.go

Hash-Based Collision Prevention

The key insight that enables distributed operation: content-based hashing for deduplication.

The Problem

Sequential IDs (bd-1, bd-2, bd-3) cause collisions when multiple agents create issues concurrently:

Branch A: bd create "Add OAuth"   → bd-10
Branch B: bd create "Add Stripe"  → bd-10 (collision!)

The Solution

Hash-based IDs derived from random UUIDs ensure uniqueness:

Branch A: bd create "Add OAuth"   → bd-a1b2
Branch B: bd create "Add Stripe"  → bd-f14c (no collision)

How It Works

  1. Issue creation: Generate random UUID, derive short hash as ID
  2. Progressive scaling: IDs start at 4 chars, grow to 5-6 chars as database grows
  3. Content hashing: Each issue has a content hash for change detection
  4. Import merge: Same ID + different content = update, same ID + same content = skip
┌─────────────────────────────────────────────────────────────────┐
│                        Import Logic                              │
│                                                                  │
│  For each issue in JSONL:                                       │
│    1. Compute content hash                                       │
│    2. Look up existing issue by ID                               │
│    3. Compare hashes:                                            │
│       - Same hash → skip (already imported)                      │
│       - Different hash → update (newer version)                  │
│       - No match → create (new issue)                            │
└─────────────────────────────────────────────────────────────────┘

This eliminates the need for central coordination while ensuring all machines converge to the same state.

See COLLISION_MATH.md for birthday paradox calculations on hash length vs collision probability.

Daemon Architecture

Each workspace runs its own background daemon for auto-sync:

┌─────────────────────────────────────────────────────────────────┐
│                     Per-Workspace Daemon                         │
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │ RPC Server  │    │  Auto-Sync  │    │  Background │         │
│  │ (bd.sock)   │    │  Manager    │    │  Tasks      │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│         │                  │                  │                  │
│         └──────────────────┴──────────────────┘                  │
│                            │                                     │
│                            v                                     │
│                   ┌─────────────┐                                │
│                   │   SQLite    │                                │
│                   │   Database  │                                │
│                   └─────────────┘                                │
└─────────────────────────────────────────────────────────────────┘

     CLI commands ───RPC───▶ Daemon ───SQL───▶ Database
                              or
     CLI commands ───SQL───▶ Database (if daemon unavailable)

Why daemons?

  • Batches multiple operations before export
  • Holds database connection open (faster queries)
  • Coordinates auto-sync timing
  • One daemon per workspace (LSP-like model)

Communication:

  • Unix domain socket at .beads/bd.sock (Windows: named pipes)
  • Protocol defined in internal/rpc/protocol.go
  • CLI tries daemon first, falls back to direct DB access

Lifecycle:

  • Auto-starts on first bd command (unless BEADS_NO_DAEMON=1)
  • Auto-restarts after version upgrades
  • Managed via bd daemons command

See DAEMON.md for operational details.

Data Types

Core types in internal/types/types.go:

Type Description Key Fields
Issue Work item ID, Title, Description, Status, Priority, Type
Dependency Relationship FromID, ToID, Type (blocks/related/parent-child/discovered-from)
Label Tag Name, Color, Description
Comment Discussion IssueID, Author, Content, Timestamp
Event Audit trail IssueID, Type, Data, Timestamp

Dependency Types

Type Semantic Affects bd ready?
blocks Issue X must close before Y starts Yes
parent-child Hierarchical (epic/subtask) Yes (children blocked if parent blocked)
related Soft link for reference No
discovered-from Found during work on parent No

Status Flow

open ──▶ in_progress ──▶ closed
  │                        │
  └────────────────────────┘
         (reopen)

Directory Structure

.beads/
├── beads.db          # SQLite database (gitignored)
├── issues.jsonl      # JSONL source of truth (git-tracked)
├── bd.sock           # Daemon socket (gitignored)
├── daemon.log        # Daemon logs (gitignored)
├── config.yaml       # Project config (optional)
└── export_hashes.db  # Export tracking (gitignored)

Key Code Paths

Area Files
CLI entry cmd/bd/main.go
Storage interface internal/storage/storage.go
SQLite implementation internal/storage/sqlite/
RPC protocol internal/rpc/protocol.go, server_*.go
Export logic cmd/bd/export.go, autoflush.go
Import logic cmd/bd/import.go, internal/importer/
Auto-sync internal/autoimport/, internal/flush/

Wisps and Molecules

Molecules are template work items that define structured workflows. When spawned, they create wisps - ephemeral child issues that track execution steps.

For full documentation on the molecular chemistry metaphor (protos, pour, bond, squash, burn), see MOLECULES.md.

Wisp Lifecycle

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   bd mol wisp       │───▶│  Wisp Issues    │───▶│  bd mol squash  │
│ (from template) │    │  (local-only)   │    │  (→ digest)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
  1. Create: Create wisps from a molecule template
  2. Execute: Agent works through wisp steps (local SQLite only)
  3. Squash: Compress wisps into a permanent digest issue

Why Wisps Don't Sync

Wisps are intentionally local-only:

  • They exist only in the spawning agent's SQLite database
  • They are never exported to JSONL
  • They cannot resurrect from other clones (they were never there)
  • They are hard-deleted when squashed (no tombstones needed)

This design enables:

  • Fast local iteration: No sync overhead during execution
  • Clean history: Only the digest (outcome) enters git
  • Agent isolation: Each agent's execution trace is private
  • Bounded storage: Wisps don't accumulate across clones

Wisp vs Regular Issue Deletion

Aspect Regular Issues Wisps
Exported to JSONL Yes No
Tombstone on delete Yes No
Can resurrect Yes (without tombstone) No (never synced)
Deletion method CreateTombstone() DeleteIssue() (hard delete)

The bd mol squash command uses hard delete intentionally - tombstones would be wasted overhead for data that never leaves the local database.

Future Directions

  • Separate wisp repo: Keep wisps in a dedicated ephemeral git repo
  • Digest migration: Explicit step to promote digests to main repo
  • Wisp retention: Option to preserve wisps in local git history