chore: add design docs and ready command

- Add convoy-lifecycle.md design doc - Add formula-resolution.md design doc - Add mol-mall-design.md design doc - Add ready.go command implementation - Move dog-pool-architecture.md to docs/design/ - Update .gitignore for beads sync files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 03:18:09 -08:00
parent e7b0af0295
commit 791b388a93
6 changed files with 1214 additions and 0 deletions
@@ -0,0 +1,197 @@
+# Convoy Lifecycle Design
+
+> Making convoys actively converge on completion.
+
+## Problem Statement
+
+Convoys are passive trackers. They group work but don't drive it. The completion
+loop has a structural gap:
+
+```
+Create → Assign → Execute → Issues close → ??? → Convoy closes
+```
+
+The `???` is "Deacon patrol runs `gt convoy check`" - a poll-based single point of
+failure. When Deacon is down, convoys don't close. Work completes but the loop
+never lands.
+
+## Current State
+
+### What Works
+- Convoy creation and issue tracking
+- `gt convoy status` shows progress
+- `gt convoy stranded` finds unassigned work
+- `gt convoy check` auto-closes completed convoys
+
+### What Breaks
+1. **Poll-based completion**: Only Deacon runs `gt convoy check`
+2. **No event-driven trigger**: Issue close doesn't propagate to convoy
+3. **No manual close**: Can't force-close abandoned convoys
+4. **Single observer**: No redundant completion detection
+5. **Weak notification**: Convoy owner not always clear
+
+## Design: Active Convoy Convergence
+
+### Principle: Event-Driven, Redundantly Observed
+
+Convoy completion should be:
+1. **Event-driven**: Triggered by issue close, not polling
+2. **Redundantly observed**: Multiple agents can detect and close
+3. **Manually overridable**: Humans can force-close
+
+### Event-Driven Completion
+
+When an issue closes, check if it's tracked by a convoy:
+
+```
+Issue closes
+    ↓
+Is issue tracked by convoy? ──(no)──► done
+    │
+   (yes)
+    ↓
+Run gt convoy check <convoy-id>
+    ↓
+All tracked issues closed? ──(no)──► done
+    │
+   (yes)
+    ↓
+Close convoy, send notifications
+```
+
+**Implementation options:**
+1. Daemon hook on `bd update --status=closed`
+2. Refinery step after successful merge
+3. Witness step after verifying polecat completion
+
+Option 1 is most reliable - catches all closes regardless of source.
+
+### Redundant Observers
+
+Per PRIMING.md: "Redundant Monitoring Is Resilience."
+
+Three places should check convoy completion:
+
+| Observer | When | Scope |
+|----------|------|-------|
+| **Daemon** | On any issue close | All convoys |
+| **Witness** | After verifying polecat work | Rig's convoy work |
+| **Deacon** | Periodic patrol | All convoys (backup) |
+
+Any observer noticing completion triggers close. Idempotent - closing
+an already-closed convoy is a no-op.
+
+### Manual Close Command
+
+**Desire path**: `gt convoy close` is expected but missing.
+
+```bash
+# Close a completed convoy
+gt convoy close hq-cv-abc
+
+# Force-close an abandoned convoy
+gt convoy close hq-cv-xyz --reason="work done differently"
+
+# Close with explicit notification
+gt convoy close hq-cv-abc --notify mayor/
+```
+
+Use cases:
+- Abandoned convoys no longer relevant
+- Work completed outside tracked path
+- Force-closing stuck convoys
+
+### Convoy Owner/Requester
+
+Track who requested the convoy for targeted notifications:
+
+```bash
+gt convoy create "Feature X" gt-abc --owner mayor/ --notify overseer
+```
+
+| Field | Purpose |
+|-------|---------|
+| `owner` | Who requested (gets completion notification) |
+| `notify` | Additional subscribers |
+
+If `owner` not specified, defaults to creator (from `created_by`).
+
+### Convoy States
+
+```
+OPEN ──(all issues close)──► CLOSED
+  │                             │
+  │                             ▼
+  │                    (add issues)
+  │                             │
+  └─────────────────────────────┘
+         (auto-reopens)
+```
+
+Adding issues to closed convoy reopens automatically.
+
+**New state for abandonment:**
+
+```
+OPEN ──► CLOSED (completed)
+  │
+  └────► ABANDONED (force-closed without completion)
+```
+
+### Timeout/SLA (Future)
+
+Optional `due_at` field for convoy deadline:
+
+```bash
+gt convoy create "Sprint work" gt-abc --due="2026-01-15"
+```
+
+Overdue convoys surface in `gt convoy stranded --overdue`.
+
+## Commands
+
+### New: `gt convoy close`
+
+```bash
+gt convoy close <convoy-id> [--reason=<reason>] [--notify=<agent>]
+```
+
+- Closes convoy regardless of tracked issue status
+- Sets `close_reason` field
+- Sends notification to owner and subscribers
+- Idempotent - closing closed convoy is no-op
+
+### Enhanced: `gt convoy check`
+
+```bash
+# Check all convoys (current behavior)
+gt convoy check
+
+# Check specific convoy (new)
+gt convoy check <convoy-id>
+
+# Dry-run mode
+gt convoy check --dry-run
+```
+
+### New: `gt convoy reopen`
+
+```bash
+gt convoy reopen <convoy-id>
+```
+
+Explicit reopen for clarity (currently implicit via add).
+
+## Implementation Priority
+
+1. **P0: `gt convoy close`** - Desire path, escape hatch
+2. **P0: Event-driven check** - Daemon hook on issue close
+3. **P1: Redundant observers** - Witness/Refinery integration
+4. **P2: Owner field** - Targeted notifications
+5. **P3: Timeout/SLA** - Deadline tracking
+
+## Related
+
+- [convoy.md](../concepts/convoy.md) - Convoy concept and usage
+- [watchdog-chain.md](watchdog-chain.md) - Deacon patrol system
+- [mail-protocol.md](mail-protocol.md) - Notification delivery
@@ -0,0 +1,495 @@
+# Dog Pool Architecture for Concurrent Shutdown Dances
+
+> Design document for gt-fsld8
+
+## Problem Statement
+
+Boot needs to run multiple shutdown-dance molecules concurrently when multiple death
+warrants are issued. The current hook design only allows one molecule per agent.
+
+Example scenario:
+- Warrant 1: Kill stuck polecat Toast (60s into interrogation)
+- Warrant 2: Kill stuck polecat Shadow (just started)
+- Warrant 3: Kill stuck witness (120s into interrogation)
+
+All three need concurrent tracking, independent timeouts, and separate outcomes.
+
+## Design Decision: Lightweight State Machines
+
+After analyzing the options, the shutdown-dance does NOT need Claude sessions.
+The dance is a deterministic state machine:
+
+```
+WARRANT -> INTERROGATE -> EVALUATE -> PARDON|EXECUTE
+```
+
+Each step is mechanical:
+1. Send a tmux message (no LLM needed)
+2. Wait for timeout or response (timer)
+3. Check tmux output for ALIVE keyword (string match)
+4. Repeat or terminate
+
+**Decision**: Dogs are lightweight Go routines, not Claude sessions.
+
+## Architecture Overview
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│                             BOOT                                    │
+│                     (Claude session in tmux)                        │
+│                                                                     │
+│  ┌──────────────────────────────────────────────────────────────┐  │
+│  │                      Dog Manager                              │  │
+│  │                                                               │  │
+│  │   Pool: [Dog1, Dog2, Dog3, ...]  (goroutines + state files)  │  │
+│  │                                                               │  │
+│  │   allocate() → Dog                                           │  │
+│  │   release(Dog)                                               │  │
+│  │   status() → []DogStatus                                     │  │
+│  └──────────────────────────────────────────────────────────────┘  │
+│                                                                     │
+│  Boot's job:                                                       │
+│  - Watch for warrants (file or event)                              │
+│  - Allocate dog from pool                                          │
+│  - Monitor dog progress                                            │
+│  - Handle dog completion/failure                                   │
+│  - Report results                                                  │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+## Dog Structure
+
+```go
+// Dog represents a shutdown-dance executor
+type Dog struct {
+    ID        string            // Unique ID (e.g., "dog-1704567890123")
+    Warrant   *Warrant          // The death warrant being processed
+    State     ShutdownDanceState
+    Attempt   int               // Current interrogation attempt (1-3)
+    StartedAt time.Time
+    StateFile string            // Persistent state: ~/gt/deacon/dogs/active/<id>.json
+}
+
+type ShutdownDanceState string
+
+const (
+    StateIdle          ShutdownDanceState = "idle"
+    StateInterrogating ShutdownDanceState = "interrogating"  // Sent message, waiting
+    StateEvaluating    ShutdownDanceState = "evaluating"     // Checking response
+    StatePardoned      ShutdownDanceState = "pardoned"       // Session responded
+    StateExecuting     ShutdownDanceState = "executing"      // Killing session
+    StateComplete      ShutdownDanceState = "complete"       // Done, ready for cleanup
+    StateFailed        ShutdownDanceState = "failed"         // Dog crashed/errored
+)
+
+type Warrant struct {
+    ID        string    // Bead ID for the warrant
+    Target    string    // Session to interrogate (e.g., "gt-gastown-Toast")
+    Reason    string    // Why warrant was issued
+    Requester string    // Who filed the warrant
+    FiledAt   time.Time
+}
+```
+
+## Pool Design
+
+### Fixed Pool Size
+
+**Decision**: Fixed pool of 5 dogs, configurable via environment.
+
+Rationale:
+- Dynamic sizing adds complexity without clear benefit
+- 5 concurrent shutdown dances handles worst-case scenarios
+- If pool exhausted, warrants queue (better than infinite dog spawning)
+- Memory footprint is negligible (goroutines + small state files)
+
+```go
+const (
+    DefaultPoolSize = 5
+    MaxPoolSize     = 20
+)
+
+type DogPool struct {
+    mu       sync.Mutex
+    dogs     []*Dog           // All dogs in pool
+    idle     chan *Dog        // Channel of available dogs
+    active   map[string]*Dog  // ID -> Dog for active dogs
+    stateDir string           // ~/gt/deacon/dogs/active/
+}
+
+func (p *DogPool) Allocate(warrant *Warrant) (*Dog, error) {
+    select {
+    case dog := <-p.idle:
+        dog.Warrant = warrant
+        dog.State = StateInterrogating
+        dog.Attempt = 1
+        dog.StartedAt = time.Now()
+        p.active[dog.ID] = dog
+        return dog, nil
+    default:
+        return nil, ErrPoolExhausted
+    }
+}
+
+func (p *DogPool) Release(dog *Dog) {
+    p.mu.Lock()
+    defer p.mu.Unlock()
+    delete(p.active, dog.ID)
+    dog.Reset()
+    p.idle <- dog
+}
+```
+
+### Why Not Dynamic Pool?
+
+Considered but rejected:
+- Adding dogs on demand increases complexity
+- No clear benefit - warrants rarely exceed 5 concurrent
+- If needed, raise DefaultPoolSize
+- Simpler to reason about fixed resources
+
+## Communication: State Files + Events
+
+### State Persistence
+
+Each active dog writes state to `~/gt/deacon/dogs/active/<id>.json`:
+
+```json
+{
+  "id": "dog-1704567890123",
+  "warrant": {
+    "id": "gt-abc123",
+    "target": "gt-gastown-Toast",
+    "reason": "no_response_health_check",
+    "requester": "deacon",
+    "filed_at": "2026-01-07T20:15:00Z"
+  },
+  "state": "interrogating",
+  "attempt": 2,
+  "started_at": "2026-01-07T20:15:00Z",
+  "last_message_at": "2026-01-07T20:16:00Z",
+  "next_timeout": "2026-01-07T20:18:00Z"
+}
+```
+
+### Boot Monitoring
+
+Boot monitors dogs via:
+1. **Polling**: `gt dog status --active` every tick
+2. **Completion files**: Dogs write `<id>.done` when complete
+
+```go
+type DogResult struct {
+    DogID    string
+    Warrant  *Warrant
+    Outcome  DogOutcome  // pardoned | executed | failed
+    Duration time.Duration
+    Details  string
+}
+
+type DogOutcome string
+
+const (
+    OutcomePardoned DogOutcome = "pardoned"  // Session responded
+    OutcomeExecuted DogOutcome = "executed"  // Session killed
+    OutcomeFailed   DogOutcome = "failed"    // Dog crashed
+)
+```
+
+### Why Not Mail?
+
+Considered but rejected for dog<->boot communication:
+- Mail is async, poll-based - adds latency
+- State files are simpler for local coordination
+- Dogs don't need complex inter-agent communication
+- Keep mail for external coordination (Witness, Mayor)
+
+## Shutdown Dance State Machine
+
+Each dog executes this state machine:
+
+```
+                    ┌─────────────────────────────────────────┐
+                    │                                         │
+                    ▼                                         │
+    ┌───────────────────────────┐                            │
+    │     INTERROGATING         │                            │
+    │                           │                            │
+    │  1. Send health check     │                            │
+    │  2. Start timeout timer   │                            │
+    └───────────┬───────────────┘                            │
+                │                                             │
+                │ timeout or response                         │
+                ▼                                             │
+    ┌───────────────────────────┐                            │
+    │      EVALUATING           │                            │
+    │                           │                            │
+    │  Check tmux output for    │                            │
+    │  ALIVE keyword            │                            │
+    └───────────┬───────────────┘                            │
+                │                                             │
+        ┌───────┴───────┐                                    │
+        │               │                                    │
+        ▼               ▼                                    │
+   [ALIVE found]   [No ALIVE]                               │
+        │               │                                    │
+        │               │ attempt < 3?                       │
+        │               ├──────────────────────────────────→─┘
+        │               │ yes: attempt++, longer timeout
+        │               │
+        │               │ no: attempt == 3
+        ▼               ▼
+    ┌─────────┐    ┌─────────────┐
+    │ PARDONED│    │  EXECUTING  │
+    │         │    │             │
+    │ Cancel  │    │ Kill tmux   │
+    │ warrant │    │ session     │
+    └────┬────┘    └──────┬──────┘
+         │                │
+         └────────┬───────┘
+                  │
+                  ▼
+         ┌────────────────┐
+         │    COMPLETE    │
+         │                │
+         │  Write result  │
+         │  Release dog   │
+         └────────────────┘
+```
+
+### Timeout Gates
+
+| Attempt | Timeout | Cumulative Wait |
+|---------|---------|-----------------|
+| 1       | 60s     | 60s             |
+| 2       | 120s    | 180s (3 min)    |
+| 3       | 240s    | 420s (7 min)    |
+
+### Health Check Message
+
+```
+[DOG] HEALTH CHECK: Session {target}, respond ALIVE within {timeout}s or face termination.
+Warrant reason: {reason}
+Filed by: {requester}
+Attempt: {attempt}/3
+```
+
+### Response Detection
+
+```go
+func (d *Dog) CheckForResponse() bool {
+    tm := tmux.NewTmux()
+    output, err := tm.CapturePane(d.Warrant.Target, 50) // Last 50 lines
+    if err != nil {
+        return false
+    }
+
+    // Any output after our health check counts as alive
+    // Specifically look for ALIVE keyword for explicit response
+    return strings.Contains(output, "ALIVE")
+}
+```
+
+## Dog Implementation
+
+### Not Reusing Polecat Infrastructure
+
+**Decision**: Dogs do NOT reuse polecat infrastructure.
+
+Rationale:
+- Polecats are Claude sessions with molecules, hooks, sandboxes
+- Dogs are simple state machine executors
+- Polecats have 3-layer lifecycle (session/sandbox/slot)
+- Dogs have single-layer lifecycle (just state)
+- Different resource profiles, different management
+
+What dogs DO share:
+- tmux utilities for message sending/capture
+- State file patterns
+- Name slot allocation pattern (pool of names, not instances)
+
+### Dog Execution Loop
+
+```go
+func (d *Dog) Run(ctx context.Context) DogResult {
+    d.State = StateInterrogating
+    d.saveState()
+
+    for d.Attempt <= 3 {
+        // Send interrogation message
+        if err := d.sendHealthCheck(); err != nil {
+            return d.fail(err)
+        }
+
+        // Wait for timeout or context cancellation
+        timeout := d.timeoutForAttempt(d.Attempt)
+        select {
+        case <-ctx.Done():
+            return d.fail(ctx.Err())
+        case <-time.After(timeout):
+            // Timeout reached
+        }
+
+        // Evaluate response
+        d.State = StateEvaluating
+        d.saveState()
+
+        if d.CheckForResponse() {
+            // Session is alive
+            return d.pardon()
+        }
+
+        // No response - try again or execute
+        d.Attempt++
+        if d.Attempt <= 3 {
+            d.State = StateInterrogating
+            d.saveState()
+        }
+    }
+
+    // All attempts exhausted - execute warrant
+    return d.execute()
+}
+```
+
+## Failure Handling
+
+### Dog Crashes Mid-Dance
+
+If a dog crashes (Boot process restarts, system crash):
+
+1. State files persist in `~/gt/deacon/dogs/active/`
+2. On Boot restart, scan for orphaned state files
+3. Resume or restart based on state:
+
+| State            | Recovery Action                    |
+|------------------|------------------------------------|
+| interrogating    | Restart from current attempt       |
+| evaluating       | Check response, continue           |
+| executing        | Verify kill, mark complete         |
+| pardoned/complete| Already done, clean up             |
+
+```go
+func (p *DogPool) RecoverOrphans() error {
+    files, _ := filepath.Glob(p.stateDir + "/*.json")
+    for _, f := range files {
+        state := loadDogState(f)
+        if state.State != StateComplete && state.State != StatePardoned {
+            dog := p.allocateForRecovery(state)
+            go dog.Resume()
+        }
+    }
+    return nil
+}
+```
+
+### Handling Pool Exhaustion
+
+If all dogs are busy when new warrant arrives:
+
+```go
+func (b *Boot) HandleWarrant(warrant *Warrant) error {
+    dog, err := b.pool.Allocate(warrant)
+    if err == ErrPoolExhausted {
+        // Queue the warrant for later processing
+        b.warrantQueue.Push(warrant)
+        b.log("Warrant %s queued (pool exhausted)", warrant.ID)
+        return nil
+    }
+
+    go func() {
+        result := dog.Run(b.ctx)
+        b.handleResult(result)
+        b.pool.Release(dog)
+
+        // Check queue for pending warrants
+        if next := b.warrantQueue.Pop(); next != nil {
+            b.HandleWarrant(next)
+        }
+    }()
+
+    return nil
+}
+```
+
+## Directory Structure
+
+```
+~/gt/deacon/dogs/
+├── boot/                    # Boot's working directory
+│   ├── CLAUDE.md            # Boot context
+│   └── .boot-status.json    # Boot execution status
+├── active/                  # Active dog state files
+│   ├── dog-123.json         # Dog 1 state
+│   ├── dog-456.json         # Dog 2 state
+│   └── ...
+├── completed/               # Completed dance records (for audit)
+│   ├── dog-789.json         # Historical record
+│   └── ...
+└── warrants/                # Pending warrant queue
+    ├── warrant-abc.json
+    └── ...
+```
+
+## Command Interface
+
+```bash
+# Pool status
+gt dog pool status
+# Output:
+# Dog Pool: 3/5 active
+#   dog-123: interrogating Toast (attempt 2, 45s remaining)
+#   dog-456: executing Shadow
+#   dog-789: idle
+
+# Manual dog operations (for debugging)
+gt dog pool allocate <warrant-id>
+gt dog pool release <dog-id>
+
+# View active dances
+gt dog dances
+# Output:
+# Active Shutdown Dances:
+#   dog-123 → Toast: Interrogating (2/3), timeout in 45s
+#   dog-456 → Shadow: Executing warrant
+
+# View warrant queue
+gt dog warrants
+# Output:
+# Pending Warrants: 2
+#   1. gt-abc: witness-gastown (stuck_no_progress)
+#   2. gt-def: polecat-Copper (crash_loop)
+```
+
+## Integration with Existing Dogs
+
+The existing `dog` package (`internal/dog/`) manages Deacon's multi-rig helper dogs.
+Those are different from shutdown-dance dogs:
+
+| Aspect          | Helper Dogs (existing)      | Dance Dogs (new)           |
+|-----------------|-----------------------------|-----------------------------|
+| Purpose         | Cross-rig infrastructure    | Shutdown dance execution    |
+| Sessions        | Claude sessions             | Goroutines (no Claude)      |
+| Worktrees       | One per rig                 | None                        |
+| Lifecycle       | Long-lived, reusable        | Ephemeral per warrant       |
+| State           | idle/working                | Dance state machine         |
+
+**Recommendation**: Use different package to avoid confusion:
+- `internal/dog/` - existing helper dogs
+- `internal/shutdown/` - shutdown dance pool
+
+## Summary: Answers to Design Questions
+
+| Question | Answer |
+|----------|--------|
+| How many Dogs in pool? | Fixed: 5 (configurable via GT_DOG_POOL_SIZE) |
+| How do Dogs communicate with Boot? | State files + completion markers |
+| Are Dogs tmux sessions? | No - goroutines with state machine |
+| Reuse polecat infrastructure? | No - too heavyweight, different model |
+| What if Dog dies mid-dance? | State file recovery on Boot restart |
+
+## Acceptance Criteria
+
+- [x] Architecture document for Dog pool
+- [x] Clear allocation/deallocation protocol
+- [x] Failure handling for Dog crashes
@@ -0,0 +1,248 @@
+# Formula Resolution Architecture
+
+> Where formulas live, how they're found, and how they'll scale to Mol Mall
+
+## The Problem
+
+Formulas currently exist in multiple locations with no clear precedence:
+- `.beads/formulas/` (source of truth for a project)
+- `internal/formula/formulas/` (embedded copy for `go install`)
+- Crew directories have their own `.beads/formulas/` (diverging copies)
+
+When an agent runs `bd cook mol-polecat-work`, which version do they get?
+
+## Design Goals
+
+1. **Predictable resolution** - Clear precedence rules
+2. **Local customization** - Override system defaults without forking
+3. **Project-specific formulas** - Committed workflows for collaborators
+4. **Mol Mall ready** - Architecture supports remote formula installation
+5. **Federation ready** - Formulas are shareable across towns via HOP (Highway Operations Protocol)
+
+## Three-Tier Resolution
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     FORMULA RESOLUTION ORDER                     │
+│                    (most specific wins)                          │
+└─────────────────────────────────────────────────────────────────┘
+
+TIER 1: PROJECT (rig-level)
+  Location: <project>/.beads/formulas/
+  Source:   Committed to project repo
+  Use case: Project-specific workflows (deploy, test, release)
+  Example:  ~/gt/gastown/.beads/formulas/mol-gastown-release.formula.toml
+
+TIER 2: TOWN (user-level)
+  Location: ~/gt/.beads/formulas/
+  Source:   Mol Mall installs, user customizations
+  Use case: Cross-project workflows, personal preferences
+  Example:  ~/gt/.beads/formulas/mol-polecat-work.formula.toml (customized)
+
+TIER 3: SYSTEM (embedded)
+  Location: Compiled into gt binary
+  Source:   gastown/mayor/rig/.beads/formulas/ at build time
+  Use case: Defaults, blessed patterns, fallback
+  Example:  mol-polecat-work.formula.toml (factory default)
+```
+
+### Resolution Algorithm
+
+```go
+func ResolveFormula(name string, cwd string) (Formula, Tier, error) {
+    // Tier 1: Project-level (walk up from cwd to find .beads/formulas/)
+    if projectDir := findProjectRoot(cwd); projectDir != "" {
+        path := filepath.Join(projectDir, ".beads", "formulas", name+".formula.toml")
+        if f, err := loadFormula(path); err == nil {
+            return f, TierProject, nil
+        }
+    }
+
+    // Tier 2: Town-level
+    townDir := getTownRoot() // ~/gt or $GT_HOME
+    path := filepath.Join(townDir, ".beads", "formulas", name+".formula.toml")
+    if f, err := loadFormula(path); err == nil {
+        return f, TierTown, nil
+    }
+
+    // Tier 3: Embedded (system)
+    if f, err := loadEmbeddedFormula(name); err == nil {
+        return f, TierSystem, nil
+    }
+
+    return nil, 0, ErrFormulaNotFound
+}
+```
+
+### Why This Order
+
+**Project wins** because:
+- Project maintainers know their workflows best
+- Collaborators get consistent behavior via git
+- CI/CD uses the same formulas as developers
+
+**Town is middle** because:
+- User customizations override system defaults
+- Mol Mall installs don't require project changes
+- Cross-project consistency for the user
+
+**System is fallback** because:
+- Always available (compiled in)
+- Factory reset target
+- The "blessed" versions
+
+## Formula Identity
+
+### Current Format
+
+```toml
+formula = "mol-polecat-work"
+version = 4
+description = "..."
+```
+
+### Extended Format (Mol Mall Ready)
+
+```toml
+[formula]
+name = "mol-polecat-work"
+version = "4.0.0"                          # Semver
+author = "steve@gastown.io"                # Author identity
+license = "MIT"
+repository = "https://github.com/steveyegge/gastown"
+
+[formula.registry]
+uri = "hop://molmall.gastown.io/formulas/mol-polecat-work@4.0.0"
+checksum = "sha256:abc123..."              # Integrity verification
+signed_by = "steve@gastown.io"             # Optional signing
+
+[formula.capabilities]
+# What capabilities does this formula exercise? Used for agent routing.
+primary = ["go", "testing", "code-review"]
+secondary = ["git", "ci-cd"]
+```
+
+### Version Resolution
+
+When multiple versions exist:
+
+```bash
+bd cook mol-polecat-work          # Resolves per tier order
+bd cook mol-polecat-work@4        # Specific major version
+bd cook mol-polecat-work@4.0.0    # Exact version
+bd cook mol-polecat-work@latest   # Explicit latest
+```
+
+## Crew Directory Problem
+
+### Current State
+
+Crew directories (`gastown/crew/max/`) are sparse checkouts of gastown. They have:
+- Their own `.beads/formulas/` (from the checkout)
+- These can diverge from `mayor/rig/.beads/formulas/`
+
+### The Fix
+
+Crew should NOT have their own formula copies. Options:
+
+**Option A: Symlink/Redirect**
+```bash
+# crew/max/.beads/formulas -> ../../mayor/rig/.beads/formulas
+```
+All crew share the rig's formulas.
+
+**Option B: Provision on Demand**
+Crew directories don't have `.beads/formulas/`. Resolution falls through to:
+1. Town-level (~/gt/.beads/formulas/)
+2. System (embedded)
+
+**Option C: Sparse Checkout Exclusion**
+Exclude `.beads/formulas/` from crew sparse checkouts entirely.
+
+**Recommendation: Option B** - Crew shouldn't need project-level formulas. They work on the project, they don't define its workflows.
+
+## Commands
+
+### Existing
+
+```bash
+bd formula list              # Available formulas (should show tier)
+bd formula show <name>       # Formula details
+bd cook <formula>            # Formula → Proto
+```
+
+### Enhanced
+
+```bash
+# List with tier information
+bd formula list
+  mol-polecat-work          v4    [project]
+  mol-polecat-code-review   v1    [town]
+  mol-witness-patrol        v2    [system]
+
+# Show resolution path
+bd formula show mol-polecat-work --resolve
+  Resolving: mol-polecat-work
+  ✓ Found at: ~/gt/gastown/.beads/formulas/mol-polecat-work.formula.toml
+  Tier: project
+  Version: 4
+
+  Resolution path checked:
+  1. [project] ~/gt/gastown/.beads/formulas/ ← FOUND
+  2. [town]    ~/gt/.beads/formulas/
+  3. [system]  <embedded>
+
+# Override tier for testing
+bd cook mol-polecat-work --tier=system    # Force embedded version
+bd cook mol-polecat-work --tier=town      # Force town version
+```
+
+### Future (Mol Mall)
+
+```bash
+# Install from Mol Mall
+gt formula install mol-code-review-strict
+gt formula install mol-code-review-strict@2.0.0
+gt formula install hop://acme.corp/formulas/mol-deploy
+
+# Manage installed formulas
+gt formula list --installed              # What's in town-level
+gt formula upgrade mol-polecat-work      # Update to latest
+gt formula pin mol-polecat-work@4.0.0    # Lock version
+gt formula uninstall mol-code-review-strict
+```
+
+## Migration Path
+
+### Phase 1: Resolution Order (Now)
+
+1. Implement three-tier resolution in `bd cook`
+2. Add `--resolve` flag to show resolution path
+3. Update `bd formula list` to show tiers
+4. Fix crew directories (Option B)
+
+### Phase 2: Town-Level Formulas
+
+1. Establish `~/gt/.beads/formulas/` as town formula location
+2. Add `gt formula` commands for managing town formulas
+3. Support manual installation (copy file, track in `.installed.json`)
+
+### Phase 3: Mol Mall Integration
+
+1. Define registry API (see mol-mall-design.md)
+2. Implement `gt formula install` from remote
+3. Add version pinning and upgrade flows
+4. Add integrity verification (checksums, optional signing)
+
+### Phase 4: Federation (HOP)
+
+1. Add capability tags to formula schema
+2. Track formula execution for agent accountability
+3. Enable federation (cross-town formula sharing via Highway Operations Protocol)
+4. Author attribution and validation records
+
+## Related Documents
+
+- [Mol Mall Design](mol-mall-design.md) - Registry architecture
+- [molecules.md](molecules.md) - Formula → Proto → Mol lifecycle
+- [understanding-gas-town.md](../../../docs/understanding-gas-town.md) - Gas Town architecture
@@ -0,0 +1,476 @@
+# Mol Mall Design
+
+> A marketplace for Gas Town formulas
+
+## Vision
+
+**Mol Mall** is a registry for sharing formulas across Gas Town installations. Think npm for molecules, or Terraform Registry for workflows.
+
+```
+"Cook a formula, sling it to a polecat, the witness watches, refinery merges."
+
+What if you could browse a mall of formulas, install one, and immediately
+have your polecats executing world-class workflows?
+```
+
+### The Network Effect
+
+A well-designed formula for "code review" or "security audit" or "deploy to K8s" can spread across thousands of Gas Town installations. Each adoption means:
+- More agents executing proven workflows
+- More structured, trackable work output
+- Better capability routing (agents with track records on a formula get similar work)
+
+## Architecture
+
+### Registry Types
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      MOL MALL REGISTRIES                         │
+└─────────────────────────────────────────────────────────────────┘
+
+PUBLIC REGISTRY (molmall.gastown.io)
+├── Community formulas (MIT licensed)
+├── Official Gas Town formulas (blessed)
+├── Verified publisher formulas
+└── Open contribution model
+
+PRIVATE REGISTRY (self-hosted)
+├── Organization-specific formulas
+├── Proprietary workflows
+├── Internal deployment patterns
+└── Enterprise compliance formulas
+
+FEDERATED REGISTRY (HOP future)
+├── Cross-organization discovery
+├── Skill-based search
+└── Attribution chain tracking
+└── hop:// URI resolution
+```
+
+### URI Scheme
+
+```
+hop://molmall.gastown.io/formulas/mol-polecat-work@4.0.0
+       └──────────────────┘         └──────────────┘ └───┘
+           registry host              formula name   version
+
+# Short forms
+mol-polecat-work                    # Default registry, latest version
+mol-polecat-work@4                  # Major version
+mol-polecat-work@4.0.0              # Exact version
+@acme/mol-deploy                    # Scoped to publisher
+hop://acme.corp/formulas/mol-deploy # Full HOP URI
+```
+
+### Registry API
+
+```yaml
+# OpenAPI-style specification
+
+GET /formulas
+  # List all formulas
+  Query:
+    - q: string          # Search query
+    - capabilities: string[]   # Filter by capability tags
+    - author: string     # Filter by author
+    - limit: int
+    - offset: int
+  Response:
+    formulas:
+      - name: mol-polecat-work
+        version: 4.0.0
+        description: "Full polecat work lifecycle..."
+        author: steve@gastown.io
+        downloads: 12543
+        capabilities: [go, testing, code-review]
+
+GET /formulas/{name}
+  # Get formula metadata
+  Response:
+    name: mol-polecat-work
+    versions: [4.0.0, 3.2.1, 3.2.0, ...]
+    latest: 4.0.0
+    author: steve@gastown.io
+    repository: https://github.com/steveyegge/gastown
+    license: MIT
+    capabilities:
+      primary: [go, testing]
+      secondary: [git, code-review]
+    stats:
+      downloads: 12543
+      stars: 234
+      used_by: 89  # towns using this formula
+
+GET /formulas/{name}/{version}
+  # Get specific version
+  Response:
+    name: mol-polecat-work
+    version: 4.0.0
+    checksum: sha256:abc123...
+    signature: <optional PGP signature>
+    content: <base64 or URL to .formula.toml>
+    changelog: "Added self-cleaning model..."
+    published_at: 2026-01-10T00:00:00Z
+
+POST /formulas
+  # Publish formula (authenticated)
+  Body:
+    name: mol-my-workflow
+    version: 1.0.0
+    content: <formula TOML>
+    changelog: "Initial release"
+  Auth: Bearer token (linked to HOP identity)
+
+GET /formulas/{name}/{version}/download
+  # Download formula content
+  Response: raw .formula.toml content
+```
+
+## Formula Package Format
+
+### Simple Case: Single File
+
+Most formulas are single `.formula.toml` files:
+
+```bash
+gt formula install mol-polecat-code-review
+# Downloads mol-polecat-code-review.formula.toml to ~/gt/.beads/formulas/
+```
+
+### Complex Case: Formula Bundle
+
+Some formulas need supporting files (scripts, templates, configs):
+
+```
+mol-deploy-k8s.formula.bundle/
+├── formula.toml              # Main formula
+├── templates/
+│   ├── deployment.yaml.tmpl
+│   └── service.yaml.tmpl
+├── scripts/
+│   └── healthcheck.sh
+└── README.md
+```
+
+Bundle format:
+```bash
+# Bundles are tarballs
+mol-deploy-k8s-1.0.0.bundle.tar.gz
+```
+
+Installation:
+```bash
+gt formula install mol-deploy-k8s
+# Extracts to ~/gt/.beads/formulas/mol-deploy-k8s/
+# formula.toml is at mol-deploy-k8s/formula.toml
+```
+
+## Installation Flow
+
+### Basic Install
+
+```bash
+$ gt formula install mol-polecat-code-review
+
+Resolving mol-polecat-code-review...
+  Registry: molmall.gastown.io
+  Version:  1.2.0 (latest)
+  Author:   steve@gastown.io
+  Skills:   code-review, security
+
+Downloading... ████████████████████ 100%
+Verifying checksum... ✓
+
+Installed to: ~/gt/.beads/formulas/mol-polecat-code-review.formula.toml
+```
+
+### Version Pinning
+
+```bash
+$ gt formula install mol-polecat-work@4.0.0
+
+Installing mol-polecat-work@4.0.0 (pinned)...
+✓ Installed
+
+$ gt formula list --installed
+  mol-polecat-work           4.0.0   [pinned]
+  mol-polecat-code-review    1.2.0   [latest]
+```
+
+### Upgrade Flow
+
+```bash
+$ gt formula upgrade mol-polecat-code-review
+
+Checking for updates...
+  Current: 1.2.0
+  Latest:  1.3.0
+
+Changelog for 1.3.0:
+  - Added security focus option
+  - Improved test coverage step
+
+Upgrade? [y/N] y
+
+Downloading... ✓
+Installed: mol-polecat-code-review@1.3.0
+```
+
+### Lock File
+
+```json
+// ~/gt/.beads/formulas/.lock.json
+{
+  "version": 1,
+  "formulas": {
+    "mol-polecat-work": {
+      "version": "4.0.0",
+      "pinned": true,
+      "checksum": "sha256:abc123...",
+      "installed_at": "2026-01-10T00:00:00Z",
+      "source": "hop://molmall.gastown.io/formulas/mol-polecat-work@4.0.0"
+    },
+    "mol-polecat-code-review": {
+      "version": "1.3.0",
+      "pinned": false,
+      "checksum": "sha256:def456...",
+      "installed_at": "2026-01-10T12:00:00Z",
+      "source": "hop://molmall.gastown.io/formulas/mol-polecat-code-review@1.3.0"
+    }
+  }
+}
+```
+
+## Publishing Flow
+
+### First-Time Setup
+
+```bash
+$ gt formula publish --init
+
+Setting up Mol Mall publishing...
+
+1. Create account at https://molmall.gastown.io/signup
+2. Generate API token at https://molmall.gastown.io/settings/tokens
+3. Run: gt formula login
+
+$ gt formula login
+Token: ********
+Logged in as: steve@gastown.io
+```
+
+### Publishing
+
+```bash
+$ gt formula publish mol-polecat-work
+
+Publishing mol-polecat-work...
+
+Pre-flight checks:
+  ✓ formula.toml is valid
+  ✓ Version 4.0.0 not yet published
+  ✓ Required fields present (name, version, description)
+  ✓ Skills declared
+
+Publish to molmall.gastown.io? [y/N] y
+
+Uploading... ✓
+Published: hop://molmall.gastown.io/formulas/mol-polecat-work@4.0.0
+
+View at: https://molmall.gastown.io/formulas/mol-polecat-work
+```
+
+### Verification Levels
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    FORMULA TRUST LEVELS                          │
+└─────────────────────────────────────────────────────────────────┘
+
+UNVERIFIED (default)
+  Anyone can publish
+  Basic validation only
+  Displayed with ⚠️ warning
+
+VERIFIED PUBLISHER
+  Publisher identity confirmed
+  Displayed with ✓ checkmark
+  Higher search ranking
+
+OFFICIAL
+  Maintained by Gas Town team
+  Displayed with 🏛️ badge
+  Included in embedded defaults
+
+AUDITED
+  Security review completed
+  Displayed with 🔒 badge
+  Required for enterprise registries
+```
+
+## Capability Tagging
+
+### Formula Capability Declaration
+
+```toml
+[formula.capabilities]
+# What capabilities does this formula exercise? Used for agent routing.
+primary = ["go", "testing", "code-review"]
+secondary = ["git", "ci-cd"]
+
+# Capability weights (optional, for fine-grained routing)
+[formula.capabilities.weights]
+go = 0.3           # 30% of formula work is Go
+testing = 0.4      # 40% is testing
+code-review = 0.3  # 30% is code review
+```
+
+### Capability-Based Search
+
+```bash
+$ gt formula search --capabilities="security,go"
+
+Formulas matching capabilities: security, go
+
+  mol-security-audit           v2.1.0   ⭐ 4.8   📥 8,234
+    Capabilities: security, go, code-review
+    "Comprehensive security audit workflow"
+
+  mol-dependency-scan          v1.0.0   ⭐ 4.2   📥 3,102
+    Capabilities: security, go, supply-chain
+    "Scan Go dependencies for vulnerabilities"
+```
+
+### Agent Accountability
+
+When a polecat completes a formula, the execution is tracked:
+
+```
+Polecat: beads/amber
+Formula: mol-polecat-code-review@1.3.0
+Completed: 2026-01-10T15:30:00Z
+Capabilities exercised:
+  - code-review (primary)
+  - security (secondary)
+  - go (secondary)
+```
+
+This execution record enables:
+1. **Routing** - Agents with successful track records get similar work
+2. **Debugging** - Trace which agent did what, when
+3. **Quality metrics** - Track success rates by agent and formula
+
+## Private Registries
+
+### Enterprise Deployment
+
+```yaml
+# ~/.gtconfig.yaml
+registries:
+  - name: acme
+    url: https://molmall.acme.corp
+    auth: token
+    priority: 1  # Check first
+
+  - name: public
+    url: https://molmall.gastown.io
+    auth: none
+    priority: 2  # Fallback
+```
+
+### Self-Hosted Registry
+
+```bash
+# Docker deployment
+docker run -d \
+  -p 8080:8080 \
+  -v /data/formulas:/formulas \
+  -e AUTH_PROVIDER=oidc \
+  gastown/molmall-registry:latest
+
+# Configuration
+MOLMALL_STORAGE=s3://bucket/formulas
+MOLMALL_AUTH=oidc
+MOLMALL_OIDC_ISSUER=https://auth.acme.corp
+```
+
+## Federation
+
+Federation enables formula sharing across organizations using the Highway Operations Protocol (HOP).
+
+### Cross-Registry Discovery
+
+```bash
+$ gt formula search "deploy kubernetes" --federated
+
+Searching across federated registries...
+
+  molmall.gastown.io:
+    mol-deploy-k8s           v3.0.0   🏛️ Official
+
+  molmall.acme.corp:
+    @acme/mol-deploy-k8s     v2.1.0   ✓ Verified
+
+  molmall.bigco.io:
+    @bigco/k8s-workflow      v1.0.0   ⚠️ Unverified
+```
+
+### HOP URI Resolution
+
+The `hop://` URI scheme provides cross-registry entity references:
+
+```bash
+# Full HOP URI
+gt formula install hop://molmall.acme.corp/formulas/@acme/mol-deploy@2.1.0
+
+# Resolution via HOP (Highway Operations Protocol)
+1. Parse hop:// URI
+2. Resolve registry endpoint (DNS/HOP discovery)
+3. Authenticate (if required)
+4. Download formula
+5. Verify checksum/signature
+6. Install to town-level
+```
+
+## Implementation Phases
+
+### Phase 1: Local Commands (Now)
+
+- `gt formula list` with tier display
+- `gt formula show --resolve`
+- Formula resolution order (project → town → system)
+
+### Phase 2: Manual Sharing
+
+- Formula export/import
+- `gt formula export mol-polecat-work > mol-polecat-work.formula.toml`
+- `gt formula import < mol-polecat-work.formula.toml`
+- Lock file format
+
+### Phase 3: Public Registry
+
+- molmall.gastown.io launch
+- `gt formula install` from registry
+- `gt formula publish` flow
+- Basic search and browse
+
+### Phase 4: Enterprise Features
+
+- Private registry support
+- Authentication integration
+- Verification levels
+- Audit logging
+
+### Phase 5: Federation (HOP)
+
+- Capability tags in schema
+- Federation protocol (Highway Operations Protocol)
+- Cross-registry search
+- Agent execution tracking for accountability
+
+## Related Documents
+
+- [Formula Resolution](formula-resolution.md) - Local resolution order
+- [molecules.md](molecules.md) - Formula lifecycle (cook, pour, squash)
+- [understanding-gas-town.md](../../../docs/understanding-gas-town.md) - Gas Town architecture