docs: reorganize documentation into concepts, design, and examples

Move documentation files into a clearer structure: - concepts/: core ideas (convoy, identity, molecules, polecat-lifecycle, propulsion) - design/: architecture and protocols (architecture, escalation, federation, mail, etc.) - examples/: demos and tutorials (hanoi-demo) - overview.md: renamed from understanding-gas-town.md Remove outdated/superseded docs and update reference.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 21:21:25 -08:00
parent 8ed31e9634
commit 88f784a9aa
22 changed files with 195 additions and 1356 deletions
@@ -0,0 +1,130 @@
+# Gas Town Architecture
+
+Technical architecture for Gas Town multi-agent workspace management.
+
+## Two-Level Beads Architecture
+
+Gas Town uses a two-level beads architecture to separate organizational coordination
+from project implementation work.
+
+| Level | Location | Prefix | Purpose |
+|-------|----------|--------|---------|
+| **Town** | `~/gt/.beads/` | `hq-*` | Cross-rig coordination, Mayor mail, agent identity |
+| **Rig** | `<rig>/mayor/rig/.beads/` | project prefix | Implementation work, MRs, project issues |
+
+### Town-Level Beads (`~/gt/.beads/`)
+
+Organizational chain for cross-rig coordination:
+- Mayor mail and messages
+- Convoy coordination (batch work across rigs)
+- Strategic issues and decisions
+- **Town-level agent beads** (Mayor, Deacon)
+- **Role definition beads** (global templates)
+
+### Rig-Level Beads (`<rig>/mayor/rig/.beads/`)
+
+Project chain for implementation work:
+- Bugs, features, tasks for the project
+- Merge requests and code reviews
+- Project-specific molecules
+- **Rig-level agent beads** (Witness, Refinery, Polecats)
+
+## Agent Bead Storage
+
+Agent beads track lifecycle state for each agent. Storage location depends on
+the agent's scope.
+
+| Agent Type | Scope | Bead Location | Bead ID Format |
+|------------|-------|---------------|----------------|
+| Mayor | Town | `~/gt/.beads/` | `hq-mayor` |
+| Deacon | Town | `~/gt/.beads/` | `hq-deacon` |
+| Dogs | Town | `~/gt/.beads/` | `hq-dog-<name>` |
+| Witness | Rig | `<rig>/.beads/` | `<prefix>-<rig>-witness` |
+| Refinery | Rig | `<rig>/.beads/` | `<prefix>-<rig>-refinery` |
+| Polecats | Rig | `<rig>/.beads/` | `<prefix>-<rig>-polecat-<name>` |
+
+### Role Beads
+
+Role beads are global templates stored in town beads with `hq-` prefix:
+- `hq-mayor-role` - Mayor role definition
+- `hq-deacon-role` - Deacon role definition
+- `hq-witness-role` - Witness role definition
+- `hq-refinery-role` - Refinery role definition
+- `hq-polecat-role` - Polecat role definition
+
+Each agent bead references its role bead via the `role_bead` field.
+
+## Agent Taxonomy
+
+### Town-Level Agents (Cross-Rig)
+
+| Agent | Role | Persistence |
+|-------|------|-------------|
+| **Mayor** | Global coordinator, handles cross-rig communication and escalations | Persistent |
+| **Deacon** | Daemon beacon - receives heartbeats, runs plugins and monitoring | Persistent |
+| **Dogs** | Long-running workers for cross-rig batch work | Variable |
+
+### Rig-Level Agents (Per-Project)
+
+| Agent | Role | Persistence |
+|-------|------|-------------|
+| **Witness** | Monitors polecat health, handles nudging and cleanup | Persistent |
+| **Refinery** | Processes merge queue, runs verification | Persistent |
+| **Polecats** | Ephemeral workers assigned to specific issues | Ephemeral |
+
+## Directory Structure
+
+```
+~/gt/                           Town root
+├── .beads/                     Town-level beads (hq-* prefix)
+│   ├── config.yaml             Beads configuration
+│   ├── issues.jsonl            Town issues (mail, agents, convoys)
+│   └── routes.jsonl            Prefix → rig routing table
+├── mayor/                      Mayor config
+│   └── town.json               Town configuration
+└── <rig>/                      Project container (NOT a git clone)
+    ├── config.json             Rig identity and beads prefix
+    ├── mayor/rig/              Canonical clone (beads live here)
+    │   └── .beads/             Rig-level beads database
+    ├── refinery/rig/           Worktree from mayor/rig
+    ├── witness/                No clone (monitors only)
+    ├── crew/<name>/            Human workspaces (full clones)
+    └── polecats/<name>/        Worker worktrees from mayor/rig
+```
+
+### Worktree Architecture
+
+Polecats and refinery are git worktrees, not full clones. This enables fast spawning
+and shared object storage. The worktree base is `mayor/rig`:
+
+```go
+// From polecat/manager.go - worktrees are based on mayor/rig
+git worktree add -b polecat/<name>-<timestamp> polecats/<name>
+```
+
+Crew workspaces (`crew/<name>/`) are full git clones for human developers who need
+independent repos. Polecats are ephemeral and benefit from worktree efficiency.
+
+## Beads Routing
+
+The `routes.jsonl` file maps issue ID prefixes to rig locations (relative to town root):
+
+```jsonl
+{"prefix":"hq-","path":"."}
+{"prefix":"gt-","path":"gastown/mayor/rig"}
+{"prefix":"bd-","path":"beads/mayor/rig"}
+```
+
+Routes point to `mayor/rig` because that's where the canonical `.beads/` lives.
+This enables transparent cross-rig beads operations:
+
+```bash
+bd show hq-mayor    # Routes to town beads (~/.gt/.beads)
+bd show gt-xyz      # Routes to gastown/mayor/rig/.beads
+```
+
+## See Also
+
+- [reference.md](../reference.md) - Command reference
+- [molecules.md](../concepts/molecules.md) - Workflow molecules
+- [identity.md](../concepts/identity.md) - Agent identity and BD_ACTOR
@@ -0,0 +1,312 @@
+# Gas Town Escalation Protocol
+
+> Reference for escalation paths in Gas Town
+
+## Overview
+
+Gas Town agents can escalate issues when automated resolution isn't possible.
+This document covers:
+
+- Severity levels and routing
+- Escalation categories for structured communication
+- Tiered escalation (Deacon -> Mayor -> Overseer)
+- Decision patterns for async resolution
+- Integration with gates and patrol lifecycles
+
+## Severity Levels
+
+| Level | Priority | Description | Examples |
+|-------|----------|-------------|----------|
+| **CRITICAL** | P0 (urgent) | System-threatening, immediate attention | Data corruption, security breach, system down |
+| **HIGH** | P1 (high) | Important blocker, needs human soon | Unresolvable merge conflict, critical bug, ambiguous spec |
+| **MEDIUM** | P2 (normal) | Standard escalation, human at convenience | Design decision needed, unclear requirements |
+
+## Escalation Categories
+
+Categories provide structured routing based on the nature of the escalation:
+
+| Category | Description | Default Route |
+|----------|-------------|---------------|
+| `decision` | Multiple valid paths, need choice | Deacon -> Mayor |
+| `help` | Need guidance or expertise | Deacon -> Mayor |
+| `blocked` | Waiting on unresolvable dependency | Mayor |
+| `failed` | Unexpected error, can't proceed | Deacon |
+| `emergency` | Security or data integrity issue | Overseer (direct) |
+| `gate_timeout` | Gate didn't resolve in time | Deacon |
+| `lifecycle` | Worker stuck or needs recycle | Witness |
+
+## Escalation Command
+
+### Basic Usage (unchanged)
+
+```bash
+# Basic escalation (default: MEDIUM severity)
+gt escalate "Database migration failed"
+
+# Critical escalation - immediate attention
+gt escalate -s CRITICAL "Data corruption detected in user table"
+
+# High priority escalation
+gt escalate -s HIGH "Merge conflict cannot be resolved automatically"
+
+# With additional details
+gt escalate -s MEDIUM "Need clarification on API design" -m "Details..."
+```
+
+### Category-Based Escalation
+
+```bash
+# Decision needed - routes to Deacon first
+gt escalate --type decision "Which auth approach?"
+
+# Help request
+gt escalate --type help "Need architecture guidance"
+
+# Blocked on dependency
+gt escalate --type blocked "Waiting on bd-xyz"
+
+# Failure that can't be recovered
+gt escalate --type failed "Tests failing unexpectedly"
+
+# Emergency - direct to Overseer
+gt escalate --type emergency "Security vulnerability found"
+```
+
+### Tiered Routing
+
+```bash
+# Explicit routing to specific tier
+gt escalate --to deacon "Infra issue"
+gt escalate --to mayor "Cross-rig coordination needed"
+gt escalate --to overseer "Human judgment required"
+
+# Forward from one tier to next
+gt escalate --forward --to mayor "Deacon couldn't resolve"
+```
+
+### Structured Decisions
+
+For decisions requiring explicit choices:
+
+```bash
+gt escalate --type decision \
+  --question "Which authentication approach?" \
+  --options "JWT tokens,Session cookies,OAuth2" \
+  --context "Admin panel needs login" \
+  --issue bd-xyz
+```
+
+This updates the issue with a structured decision format (see below).
+
+## What Happens on Escalation
+
+1. **Bead created/updated**: Escalation bead (tagged `escalation`) created or updated
+2. **Mail sent**: Routed to appropriate tier (Deacon, Mayor, or Overseer)
+3. **Activity logged**: Event logged to activity feed
+4. **Issue updated**: For decision type, issue gets structured format
+
+## Tiered Escalation Flow
+
+```
+Worker encounters issue
+    |
+    v
+gt escalate --type <category> [--to <tier>]
+    |
+    v
+[Deacon receives] (default for most categories)
+    |
+    +-- Can resolve? --> Updates issue, re-slings work
+    |
+    +-- Cannot resolve? --> gt escalate --forward --to mayor
+                                |
+                                v
+                           [Mayor receives]
+                                |
+                                +-- Can resolve? --> Updates issue, re-slings
+                                |
+                                +-- Cannot resolve? --> gt escalate --forward --to overseer
+                                                            |
+                                                            v
+                                                       [Overseer resolves]
+```
+
+Each tier can resolve OR forward. The escalation chain is tracked via comments.
+
+## Decision Pattern
+
+When `--type decision` is used, the issue is updated with structured format:
+
+```markdown
+## Decision Needed
+
+**Question:** Which authentication approach?
+
+| Option | Description |
+|--------|-------------|
+| A | JWT tokens |
+| B | Session cookies |
+| C | OAuth2 |
+
+**Context:** Admin panel needs login
+
+**Escalated by:** beads/polecats/obsidian
+**Escalated at:** 2026-01-01T15:00:00Z
+
+**To resolve:**
+1. Comment with chosen option (e.g., "Decision: A")
+2. Reassign to work queue or original worker
+```
+
+The issue becomes the async communication channel. Resolution updates the issue
+and can trigger re-slinging to the original worker.
+
+## Integration Points
+
+### Gate Timeouts
+
+When timer gates expire (see bd-7zka.2), Witness escalates:
+
+```go
+if gate.Expired() {
+    exec.Command("gt", "escalate",
+        "--type", "gate_timeout",
+        "--severity", "HIGH",
+        "--issue", gate.BlockedIssueID,
+        fmt.Sprintf("Gate %s timed out after %s", gate.ID, gate.Timeout)).Run()
+}
+```
+
+### Witness Patrol
+
+Witness formalizes stuck-polecat detection as escalation:
+
+```go
+exec.Command("gt", "escalate",
+    "--type", "lifecycle",
+    "--to", "mayor",
+    "--issue", polecat.CurrentIssue,
+    fmt.Sprintf("Polecat %s stuck: no progress for %d minutes", polecat.ID, minutes)).Run()
+```
+
+### Refinery
+
+On merge failures that can't be auto-resolved:
+
+```go
+exec.Command("gt", "escalate",
+    "--type", "failed",
+    "--issue", mr.IssueID,
+    "Merge failed: "+reason).Run()
+```
+
+## Polecat Exit with Escalation
+
+When a polecat needs a decision to continue:
+
+```bash
+# 1. Update issue with decision structure
+bd update $ISSUE --notes "$(cat <<EOF
+## Decision Needed
+
+**Question:** Which approach for caching?
+
+| Option | Description |
+|--------|-------------|
+| A | Redis (external dependency) |
+| B | In-memory (simpler, no persistence) |
+| C | SQLite (local persistence) |
+
+**Context:** API response times are slow, need caching layer.
+EOF
+)"
+
+# 2. Escalate
+gt escalate --type decision --issue $ISSUE "Caching approach needs decision"
+
+# 3. Exit cleanly
+gt done --status ESCALATED
+```
+
+## Mayor Startup Check
+
+On `gt prime`, Mayor checks for pending escalations:
+
+```
+## PENDING ESCALATIONS
+
+There are 3 escalation(s) awaiting attention:
+
+  CRITICAL: 1
+  HIGH: 1
+  MEDIUM: 1
+
+  [CRITICAL] Data corruption detected (gt-abc)
+  [HIGH] Merge conflict in auth module (gt-def)
+  [MEDIUM] API design clarification needed (gt-ghi)
+
+**Action required:** Review escalations with `bd list --tag=escalation`
+Close resolved ones with `bd close <id> --reason "resolution"`
+```
+
+## When to Escalate
+
+### Agents SHOULD escalate when:
+
+- **System errors**: Database corruption, disk full, network failures
+- **Security issues**: Unauthorized access attempts, credential exposure
+- **Unresolvable conflicts**: Merge conflicts that can't be auto-resolved
+- **Ambiguous requirements**: Spec is unclear, multiple valid interpretations
+- **Design decisions**: Architectural choices that need human judgment
+- **Stuck loops**: Agent is stuck and can't make progress
+- **Gate timeouts**: Async conditions didn't resolve in expected time
+
+### Agents should NOT escalate for:
+
+- **Normal workflow**: Regular work that can proceed without human input
+- **Recoverable errors**: Transient failures that will auto-retry
+- **Information queries**: Questions that can be answered from context
+
+## Viewing Escalations
+
+```bash
+# List all open escalations
+bd list --status=open --tag=escalation
+
+# Filter by category
+bd list --tag=escalation --tag=decision
+
+# View specific escalation
+bd show <escalation-id>
+
+# Close resolved escalation
+bd close <id> --reason "Resolved by fixing X"
+```
+
+## Implementation Phases
+
+### Phase 1: Extend gt escalate
+- Add `--type` flag for categories
+- Add `--to` flag for routing (deacon, mayor, overseer)
+- Add `--forward` flag for tier forwarding
+- Backward compatible with existing usage
+
+### Phase 2: Decision Pattern
+- Add `--question`, `--options`, `--context` flags
+- Auto-update issue with decision structure
+- Parse decision from issue comments on resolution
+
+### Phase 3: Gate Integration
+- Add `gate_timeout` escalation type
+- Witness checks timer gates, escalates on timeout
+- Refinery checks GH gates, escalates on timeout/failure
+
+### Phase 4: Patrol Integration
+- Formalize Witness stuck-polecat as escalation
+- Formalize Refinery merge-failure as escalation
+- Unified escalation handling in Mayor
+
+## References
+
+- bd-7zka.2: Gate evaluation (uses escalation for timeouts)
+- bd-0sgd: Design issue for this extended escalation system
@@ -0,0 +1,248 @@
+# Federation Architecture
+
+> **Status: Design spec - not yet implemented**
+
+> Multi-workspace coordination for Gas Town and Beads
+
+## Overview
+
+Federation enables multiple Gas Town instances to reference each other's work,
+coordinate across organizations, and track distributed projects.
+
+## Why Federation?
+
+Real enterprise projects don't live in a single repo:
+
+- **Microservices:** 50 repos, tight dependencies, coordinated releases
+- **Platform teams:** Shared libraries used by dozens of downstream projects
+- **Contractors:** External teams working on components you need to track
+- **Acquisitions:** New codebases that need to integrate with existing work
+
+Traditional tools force you to choose: unified tracking (monorepo) or team
+autonomy (multi-repo with fragmented visibility). Federation provides both:
+each workspace is autonomous, but cross-workspace references are first-class.
+
+## Entity Model
+
+### Three Levels
+
+```
+Level 1: Entity    - Person or organization (flat namespace)
+Level 2: Chain     - Workspace/town per entity
+Level 3: Work Unit - Issues, tasks, molecules on chains
+```
+
+### URI Scheme
+
+Full work unit reference (HOP protocol):
+
+```
+hop://entity/chain/rig/issue-id
+hop://steve@example.com/main-town/greenplace/gp-xyz
+```
+
+Cross-repo reference (same platform):
+
+```
+beads://platform/org/repo/issue-id
+beads://github/acme/backend/ac-123
+```
+
+Within a workspace, short forms are preferred:
+
+```
+gp-xyz             # Local (prefix routes via routes.jsonl)
+greenplace/gp-xyz  # Different rig, same chain
+./gp-xyz           # Explicit current-rig ref
+```
+
+See `~/gt/docs/hop/GRAPH-ARCHITECTURE.md` for full URI specification.
+
+## Relationship Types
+
+### Employment
+
+Track which entities belong to organizations:
+
+```json
+{
+  "type": "employment",
+  "entity": "alice@example.com",
+  "organization": "acme.com"
+}
+```
+
+### Cross-Reference
+
+Reference work in another workspace:
+
+```json
+{
+  "references": [
+    {
+      "type": "depends_on",
+      "target": "hop://other-entity/chain/rig/issue-id"
+    }
+  ]
+}
+```
+
+### Delegation
+
+Distribute work across workspaces:
+
+```json
+{
+  "type": "delegation",
+  "parent": "hop://acme.com/projects/proj-123",
+  "child": "hop://alice@example.com/town/greenplace/gp-xyz",
+  "terms": { "portion": "backend", "deadline": "2025-02-01" }
+}
+```
+
+## Agent Provenance
+
+Every agent operation is attributed. See [identity.md](../concepts/identity.md) for the
+complete BD_ACTOR format convention.
+
+### Git Commits
+
+```bash
+# Set per agent session
+GIT_AUTHOR_NAME="greenplace/crew/joe"
+GIT_AUTHOR_EMAIL="steve@example.com"  # Workspace owner
+```
+
+Result: `abc123 Fix bug (greenplace/crew/joe <steve@example.com>)`
+
+### Beads Operations
+
+```bash
+BD_ACTOR="greenplace/crew/joe"  # Set in agent environment
+bd create --title="Task"        # Actor auto-populated
+```
+
+### Event Logging
+
+All events include actor:
+
+```json
+{
+  "ts": "2025-01-15T10:30:00Z",
+  "type": "sling",
+  "actor": "greenplace/crew/joe",
+  "payload": { "bead": "gp-xyz", "target": "greenplace/polecats/Toast" }
+}
+```
+
+## Discovery
+
+### Workspace Metadata
+
+Each workspace has identity metadata:
+
+```json
+// ~/gt/.town.json
+{
+  "owner": "steve@example.com",
+  "name": "main-town",
+  "public_name": "steve-greenplace"
+}
+```
+
+### Remote Registration
+
+```bash
+gt remote add acme hop://acme.com/engineering
+gt remote list
+```
+
+### Cross-Workspace Queries
+
+```bash
+bd show hop://acme.com/eng/ac-123    # Fetch remote issue
+bd list --remote=acme                # List remote issues
+```
+
+## Aggregation
+
+Query across relationships without hierarchy:
+
+```bash
+# All work by org members
+bd list --org=acme.com
+
+# All work on a project (including delegated)
+bd list --project=proj-123 --include-delegated
+
+# Agent's full history
+bd audit --actor=greenplace/crew/joe
+```
+
+## Implementation Status
+
+- [x] Agent identity in git commits
+- [x] BD_ACTOR default in beads create
+- [x] Workspace metadata file (.town.json)
+- [x] Cross-workspace URI scheme (hop://, beads://, local forms)
+- [ ] Remote registration
+- [ ] Cross-workspace queries
+- [ ] Delegation primitives
+
+## Use Cases
+
+### Multi-Repo Projects
+
+Track work spanning multiple repositories:
+
+```
+Project X
+├── hop://team/frontend/fe-123
+├── hop://team/backend/be-456
+└── hop://team/infra/inf-789
+```
+
+### Distributed Teams
+
+Team members in different workspaces:
+
+```
+Alice's Town → works on → Project X
+Bob's Town   → works on → Project X
+```
+
+Each maintains their own CV/audit trail.
+
+### Contractor Coordination
+
+Prime contractor delegates to subcontractors:
+
+```
+Acme/Project
+└── delegates to → Vendor/SubProject
+                   └── delegates to → Contractor/Task
+```
+
+Completion cascades up. Attribution preserved.
+
+## Design Principles
+
+1. **Flat namespace** - Entities not nested, relationships connect them
+2. **Relationships over hierarchy** - Graph structure, not tree
+3. **Git-native** - Federation uses git mechanics (remotes, refs)
+4. **Incremental** - Works standalone, gains power with federation
+5. **Privacy-preserving** - Each entity controls their chain visibility
+
+## Enterprise Benefits
+
+| Challenge | Without Federation | With Federation |
+|-----------|-------------------|-----------------|
+| Cross-repo dependencies | "Check with backend team" | Explicit dependency tracking |
+| Contractor visibility | Email updates, status calls | Live status, same tooling |
+| Release coordination | Spreadsheets, Slack threads | Unified timeline view |
+| Agent attribution | Per-repo, fragmented | Cross-workspace CV |
+| Compliance audit | Stitch together logs | Query across workspaces |
+
+Federation isn't just about connecting repos - it's about treating distributed
+engineering as a first-class concern, with the same visibility and tooling
+you'd expect from a monorepo, while preserving team autonomy.
@@ -0,0 +1,361 @@
+# Gas Town Mail Protocol
+
+> Reference for inter-agent mail communication in Gas Town
+
+## Overview
+
+Gas Town agents coordinate via mail messages routed through the beads system.
+Mail uses `type=message` beads with routing handled by `gt mail`.
+
+## Message Types
+
+### POLECAT_DONE
+
+**Route**: Polecat → Witness
+
+**Purpose**: Signal work completion, trigger cleanup flow.
+
+**Subject format**: `POLECAT_DONE <polecat-name>`
+
+**Body format**:
+```
+Exit: MERGED|ESCALATED|DEFERRED
+Issue: <issue-id>
+MR: <mr-id>          # if exit=MERGED
+Branch: <branch>
+```
+
+**Trigger**: `gt done` command generates this automatically.
+
+**Handler**: Witness creates a cleanup wisp for the polecat.
+
+### MERGE_READY
+
+**Route**: Witness → Refinery
+
+**Purpose**: Signal a branch is ready for merge queue processing.
+
+**Subject format**: `MERGE_READY <polecat-name>`
+
+**Body format**:
+```
+Branch: <branch>
+Issue: <issue-id>
+Polecat: <polecat-name>
+Verified: clean git state, issue closed
+```
+
+**Trigger**: Witness sends after verifying polecat work is complete.
+
+**Handler**: Refinery adds to merge queue, processes when ready.
+
+### MERGED
+
+**Route**: Refinery → Witness
+
+**Purpose**: Confirm branch was merged successfully, safe to nuke polecat.
+
+**Subject format**: `MERGED <polecat-name>`
+
+**Body format**:
+```
+Branch: <branch>
+Issue: <issue-id>
+Polecat: <polecat-name>
+Rig: <rig>
+Target: <target-branch>
+Merged-At: <timestamp>
+Merge-Commit: <sha>
+```
+
+**Trigger**: Refinery sends after successful merge to main.
+
+**Handler**: Witness completes cleanup wisp, nukes polecat worktree.
+
+### MERGE_FAILED
+
+**Route**: Refinery → Witness
+
+**Purpose**: Notify that merge attempt failed (tests, build, or other non-conflict error).
+
+**Subject format**: `MERGE_FAILED <polecat-name>`
+
+**Body format**:
+```
+Branch: <branch>
+Issue: <issue-id>
+Polecat: <polecat-name>
+Rig: <rig>
+Target: <target-branch>
+Failed-At: <timestamp>
+Failure-Type: <tests|build|push|other>
+Error: <error-message>
+```
+
+**Trigger**: Refinery sends when merge fails for non-conflict reasons.
+
+**Handler**: Witness notifies polecat, assigns work back for rework.
+
+### REWORK_REQUEST
+
+**Route**: Refinery → Witness
+
+**Purpose**: Request polecat to rebase branch due to merge conflicts.
+
+**Subject format**: `REWORK_REQUEST <polecat-name>`
+
+**Body format**:
+```
+Branch: <branch>
+Issue: <issue-id>
+Polecat: <polecat-name>
+Rig: <rig>
+Target: <target-branch>
+Requested-At: <timestamp>
+Conflict-Files: <file1>, <file2>, ...
+
+Please rebase your changes onto <target-branch>:
+
+  git fetch origin
+  git rebase origin/<target-branch>
+  # Resolve any conflicts
+  git push -f
+
+The Refinery will retry the merge after rebase is complete.
+```
+
+**Trigger**: Refinery sends when merge has conflicts with target branch.
+
+**Handler**: Witness notifies polecat with rebase instructions.
+
+### WITNESS_PING
+
+**Route**: Witness → Deacon (all witnesses send)
+
+**Purpose**: Second-order monitoring - ensure Deacon is alive.
+
+**Subject format**: `WITNESS_PING <rig>`
+
+**Body format**:
+```
+Rig: <rig>
+Timestamp: <timestamp>
+Patrol: <cycle-number>
+```
+
+**Trigger**: Each witness sends periodically (every N patrol cycles).
+
+**Handler**: Deacon acknowledges. If no ack, witnesses escalate to Mayor.
+
+### HELP
+
+**Route**: Any → escalation target (usually Mayor)
+
+**Purpose**: Request intervention for stuck/blocked work.
+
+**Subject format**: `HELP: <brief-description>`
+
+**Body format**:
+```
+Agent: <agent-id>
+Issue: <issue-id>       # if applicable
+Problem: <description>
+Tried: <what was attempted>
+```
+
+**Trigger**: Agent unable to proceed, needs external help.
+
+**Handler**: Escalation target assesses and intervenes.
+
+### HANDOFF
+
+**Route**: Agent → self (or successor)
+
+**Purpose**: Session continuity across context limits/restarts.
+
+**Subject format**: `🤝 HANDOFF: <brief-context>`
+
+**Body format**:
+```
+attached_molecule: <molecule-id>   # if work in progress
+attached_at: <timestamp>
+
+## Context
+<freeform notes for successor>
+
+## Status
+<where things stand>
+
+## Next
+<what successor should do>
+```
+
+**Trigger**: `gt handoff` command, or manual send before session end.
+
+**Handler**: Next session reads handoff, continues from context.
+
+## Format Conventions
+
+### Subject Line
+
+- **Type prefix**: Uppercase, identifies message type
+- **Colon separator**: After type for structured info
+- **Brief context**: Human-readable summary
+
+Examples:
+```
+POLECAT_DONE nux
+MERGE_READY greenplace/nux
+HELP: Polecat stuck on test failures
+🤝 HANDOFF: Schema work in progress
+```
+
+### Body Structure
+
+- **Key-value pairs**: For structured data (one per line)
+- **Blank line**: Separates structured data from freeform content
+- **Markdown sections**: For freeform content (##, lists, code blocks)
+
+### Addresses
+
+Format: `<rig>/<role>` or `<rig>/<type>/<name>`
+
+Examples:
+```
+greenplace/witness       # Witness for greenplace rig
+beads/refinery           # Refinery for beads rig
+greenplace/polecats/nux  # Specific polecat
+mayor/                # Town-level Mayor
+deacon/               # Town-level Deacon
+```
+
+## Protocol Flows
+
+### Polecat Completion Flow
+
+```
+Polecat                    Witness                    Refinery
+   │                          │                          │
+   │ POLECAT_DONE             │                          │
+   │─────────────────────────>│                          │
+   │                          │                          │
+   │                    (verify clean)                   │
+   │                          │                          │
+   │                          │ MERGE_READY              │
+   │                          │─────────────────────────>│
+   │                          │                          │
+   │                          │                    (merge attempt)
+   │                          │                          │
+   │                          │ MERGED (success)         │
+   │                          │<─────────────────────────│
+   │                          │                          │
+   │                    (nuke polecat)                   │
+   │                          │                          │
+```
+
+### Merge Failure Flow
+
+```
+                           Witness                    Refinery
+                              │                          │
+                              │                    (merge fails)
+                              │                          │
+                              │ MERGE_FAILED             │
+   ┌──────────────────────────│<─────────────────────────│
+   │                          │                          │
+   │ (failure notification)   │                          │
+   │<─────────────────────────│                          │
+   │                          │                          │
+Polecat (rework needed)
+```
+
+### Rebase Required Flow
+
+```
+                           Witness                    Refinery
+                              │                          │
+                              │                    (conflict detected)
+                              │                          │
+                              │ REWORK_REQUEST           │
+   ┌──────────────────────────│<─────────────────────────│
+   │                          │                          │
+   │ (rebase instructions)    │                          │
+   │<─────────────────────────│                          │
+   │                          │                          │
+Polecat                       │                          │
+   │                          │                          │
+   │ (rebases, gt done)       │                          │
+   │─────────────────────────>│ MERGE_READY              │
+   │                          │─────────────────────────>│
+   │                          │                    (retry merge)
+```
+
+### Second-Order Monitoring
+
+```
+Witness-1 ──┐
+            │ WITNESS_PING
+Witness-2 ──┼────────────────> Deacon
+            │
+Witness-N ──┘
+                                 │
+                          (if no response)
+                                 │
+            <────────────────────┘
+            Escalate to Mayor
+```
+
+## Implementation
+
+### Sending Mail
+
+```bash
+# Basic send
+gt mail send <addr> -s "Subject" -m "Body"
+
+# With structured body
+gt mail send greenplace/witness -s "MERGE_READY nux" -m "Branch: feature-xyz
+Issue: gp-abc
+Polecat: nux
+Verified: clean"
+```
+
+### Receiving Mail
+
+```bash
+# Check inbox
+gt mail inbox
+
+# Read specific message
+gt mail read <msg-id>
+
+# Mark as read
+gt mail ack <msg-id>
+```
+
+### In Patrol Formulas
+
+Formulas should:
+1. Check inbox at start of each cycle
+2. Parse subject prefix to route handling
+3. Extract structured data from body
+4. Take appropriate action
+5. Mark mail as read after processing
+
+## Extensibility
+
+New message types follow the pattern:
+1. Define subject prefix (TYPE: or TYPE_SUBTYPE)
+2. Document body format (key-value pairs + freeform)
+3. Specify route (sender → receiver)
+4. Implement handlers in relevant patrol formulas
+
+The protocol is intentionally simple - structured enough for parsing,
+flexible enough for human debugging.
+
+## Related Documents
+
+- `docs/agent-as-bead.md` - Agent identity and slots
+- `.beads/formulas/mol-witness-patrol.formula.toml` - Witness handling
+- `internal/mail/` - Mail routing implementation
+- `internal/protocol/` - Protocol handlers for Witness-Refinery communication
@@ -0,0 +1,136 @@
+# Operational State in Gas Town
+
+> Managing runtime state through events and labels.
+
+## Overview
+
+Gas Town tracks operational state changes as structured data. This document covers:
+- **Events**: State transitions as beads (immutable audit trail)
+- **Labels-as-state**: Fast queries via role bead labels (current state cache)
+
+For Boot triage and degraded mode details, see [Watchdog Chain](watchdog-chain.md).
+
+## Events: State Transitions as Data
+
+Operational state changes are recorded as event beads. Each event captures:
+- **What** changed (`event_type`)
+- **Who** caused it (`actor`)
+- **What** was affected (`target`)
+- **Context** (`payload`)
+- **When** (`created_at`)
+
+### Event Types
+
+| Event Type | Description | Payload |
+|------------|-------------|---------|
+| `patrol.muted` | Patrol cycle disabled | `{reason, until?}` |
+| `patrol.unmuted` | Patrol cycle re-enabled | `{reason?}` |
+| `agent.started` | Agent session began | `{session_id?}` |
+| `agent.stopped` | Agent session ended | `{reason, outcome?}` |
+| `mode.degraded` | System entered degraded mode | `{reason}` |
+| `mode.normal` | System returned to normal | `{}` |
+
+### Creating Events
+
+```bash
+# Mute deacon patrol
+bd create --type=event --event-type=patrol.muted \
+  --actor=human:overseer --target=agent:deacon \
+  --payload='{"reason":"fixing convoy deadlock","until":"gt-abc1"}'
+
+# System entered degraded mode
+bd create --type=event --event-type=mode.degraded \
+  --actor=system:daemon --target=rig:greenplace \
+  --payload='{"reason":"tmux unavailable"}'
+```
+
+### Querying Events
+
+```bash
+# Recent events for an agent
+bd list --type=event --target=agent:deacon --limit=10
+
+# All patrol state changes
+bd list --type=event --event-type=patrol.muted
+bd list --type=event --event-type=patrol.unmuted
+
+# Events in the activity feed
+bd activity --follow --type=event
+```
+
+## Labels-as-State Pattern
+
+Events capture the full history. Labels cache the current state for fast queries.
+
+### Convention
+
+Labels use `<dimension>:<value>` format:
+- `patrol:muted` / `patrol:active`
+- `mode:degraded` / `mode:normal`
+- `status:idle` / `status:working`
+
+### State Change Flow
+
+1. Create event bead (full context, immutable)
+2. Update role bead labels (current state cache)
+
+```bash
+# Mute patrol
+bd create --type=event --event-type=patrol.muted ...
+bd update role-deacon --add-label=patrol:muted --remove-label=patrol:active
+
+# Unmute patrol
+bd create --type=event --event-type=patrol.unmuted ...
+bd update role-deacon --add-label=patrol:active --remove-label=patrol:muted
+```
+
+### Querying Current State
+
+```bash
+# Is deacon patrol muted?
+bd show role-deacon | grep patrol:
+
+# All agents with muted patrol
+bd list --type=role --label=patrol:muted
+
+# All agents in degraded mode
+bd list --type=role --label=mode:degraded
+```
+
+## Configuration vs State
+
+| Type | Storage | Example |
+|------|---------|---------|
+| **Static config** | TOML files | Daemon tick interval |
+| **Operational state** | Beads (events + labels) | Patrol muted |
+| **Runtime flags** | Marker files | `.deacon-disabled` |
+
+Static config rarely changes and doesn't need history.
+Operational state changes at runtime and benefits from audit trail.
+Marker files are fast checks that can trigger deeper beads queries.
+
+## Commands Summary
+
+```bash
+# Create operational event
+bd create --type=event --event-type=<type> \
+  --actor=<entity> --target=<entity> --payload='<json>'
+
+# Update state label
+bd update <role-bead> --add-label=<dim>:<val> --remove-label=<dim>:<old>
+
+# Query current state
+bd list --type=role --label=<dim>:<val>
+
+# Query state history
+bd list --type=event --target=<entity>
+
+# Boot management
+gt dog status boot
+gt dog call boot
+gt dog prime boot
+```
+
+---
+
+*Events are the source of truth. Labels are the cache.*
@@ -0,0 +1,300 @@
+# Property Layers: Multi-Level Configuration
+
+> Implementation guide for Gas Town's configuration system.
+> Created: 2025-01-06
+
+## Overview
+
+Gas Town uses a layered property system for configuration. Properties are
+looked up through multiple layers, with earlier layers overriding later ones.
+This enables both local control and global coordination.
+
+## The Four Layers
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ 1. WISP LAYER (transient, town-local)                       │
+│    Location: <rig>/.beads-wisp/config/                      │
+│    Synced: Never                                            │
+│    Use: Temporary local overrides                           │
+└─────────────────────────────┬───────────────────────────────┘
+                              │ if missing
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│ 2. RIG BEAD LAYER (persistent, synced globally)             │
+│    Location: <rig>/.beads/ (rig identity bead labels)       │
+│    Synced: Via git (all clones see it)                      │
+│    Use: Project-wide operational state                      │
+└─────────────────────────────┬───────────────────────────────┘
+                              │ if missing
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│ 3. TOWN DEFAULTS                                            │
+│    Location: ~/gt/config.json or ~/gt/.beads/               │
+│    Synced: N/A (per-town)                                   │
+│    Use: Town-wide policies                                  │
+└─────────────────────────────┬───────────────────────────────┘
+                              │ if missing
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│ 4. SYSTEM DEFAULTS (compiled in)                            │
+│    Use: Fallback when nothing else specified                │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Lookup Behavior
+
+### Override Semantics (Default)
+
+For most properties, the first non-nil value wins:
+
+```go
+func GetConfig(key string) interface{} {
+    if val := wisp.Get(key); val != nil {
+        if val == Blocked { return nil }
+        return val
+    }
+    if val := rigBead.GetLabel(key); val != nil {
+        return val
+    }
+    if val := townDefaults.Get(key); val != nil {
+        return val
+    }
+    return systemDefaults[key]
+}
+```
+
+### Stacking Semantics (Integers)
+
+For integer properties, values from wisp and bead layers **add** to the base:
+
+```go
+func GetIntConfig(key string) int {
+    base := getBaseDefault(key)    // Town or system default
+    beadAdj := rigBead.GetInt(key) // 0 if missing
+    wispAdj := wisp.GetInt(key)    // 0 if missing
+    return base + beadAdj + wispAdj
+}
+```
+
+This enables temporary adjustments without changing the base value.
+
+### Blocking Inheritance
+
+You can explicitly block a property from being inherited:
+
+```bash
+gt rig config set gastown auto_restart --block
+```
+
+This creates a "blocked" marker in the wisp layer. Even if the rig bead
+or defaults say `auto_restart: true`, the lookup returns nil.
+
+## Rig Identity Beads
+
+Each rig has an identity bead for operational state:
+
+```yaml
+id: gt-rig-gastown
+type: rig
+name: gastown
+repo: git@github.com:steveyegge/gastown.git
+prefix: gt
+
+labels:
+  - status:operational
+  - priority:normal
+```
+
+These beads sync via git, so all clones of the rig see the same state.
+
+## Two-Level Rig Control
+
+### Level 1: Park (Local, Ephemeral)
+
+```bash
+gt rig park gastown      # Stop services, daemon won't restart
+gt rig unpark gastown    # Allow services to run
+```
+
+- Stored in wisp layer (`.beads-wisp/config/`)
+- Only affects this town
+- Disappears on cleanup
+- Use: Local maintenance, debugging
+
+### Level 2: Dock (Global, Persistent)
+
+```bash
+gt rig dock gastown      # Set status:docked label on rig bead
+gt rig undock gastown    # Remove label
+```
+
+- Stored on rig identity bead
+- Syncs to all clones via git
+- Permanent until explicitly changed
+- Use: Project-wide maintenance, coordinated downtime
+
+### Daemon Behavior
+
+The daemon checks both levels before auto-restarting:
+
+```go
+func shouldAutoRestart(rig *Rig) bool {
+    status := rig.GetConfig("status")
+    if status == "parked" || status == "docked" {
+        return false
+    }
+    return true
+}
+```
+
+## Configuration Keys
+
+| Key | Type | Behavior | Description |
+|-----|------|----------|-------------|
+| `status` | string | Override | operational/parked/docked |
+| `auto_restart` | bool | Override | Daemon auto-restart behavior |
+| `max_polecats` | int | Override | Maximum concurrent polecats |
+| `priority_adjustment` | int | **Stack** | Scheduling priority modifier |
+| `maintenance_window` | string | Override | When maintenance allowed |
+| `dnd` | bool | Override | Do not disturb mode |
+
+## Commands
+
+### View Configuration
+
+```bash
+gt rig config show gastown           # Show effective config (all layers)
+gt rig config show gastown --layer   # Show which layer each value comes from
+```
+
+### Set Configuration
+
+```bash
+# Set in wisp layer (local, ephemeral)
+gt rig config set gastown key value
+
+# Set in bead layer (global, permanent)
+gt rig config set gastown key value --global
+
+# Block inheritance
+gt rig config set gastown key --block
+
+# Clear from wisp layer
+gt rig config unset gastown key
+```
+
+### Rig Lifecycle
+
+```bash
+gt rig park gastown          # Local: stop + prevent restart
+gt rig unpark gastown        # Local: allow restart
+
+gt rig dock gastown          # Global: mark as offline
+gt rig undock gastown        # Global: mark as operational
+
+gt rig status gastown        # Show current state
+```
+
+## Examples
+
+### Temporary Priority Boost
+
+```bash
+# Base priority: 0 (from defaults)
+# Give this rig temporary priority boost for urgent work
+
+gt rig config set gastown priority_adjustment 10
+
+# Effective priority: 0 + 10 = 10
+# When done, clear it:
+
+gt rig config unset gastown priority_adjustment
+```
+
+### Local Maintenance
+
+```bash
+# I'm upgrading the local clone, don't restart services
+gt rig park gastown
+
+# ... do maintenance ...
+
+gt rig unpark gastown
+```
+
+### Project-Wide Maintenance
+
+```bash
+# Major refactor in progress, all clones should pause
+gt rig dock gastown
+
+# Syncs via git - other towns see the rig as docked
+bd sync
+
+# When done:
+gt rig undock gastown
+bd sync
+```
+
+### Block Auto-Restart Locally
+
+```bash
+# Rig bead says auto_restart: true
+# But I'm debugging and don't want that here
+
+gt rig config set gastown auto_restart --block
+
+# Now auto_restart returns nil for this town only
+```
+
+## Implementation Notes
+
+### Wisp Storage
+
+Wisp config stored in `.beads-wisp/config/<rig>.json`:
+
+```json
+{
+  "rig": "gastown",
+  "values": {
+    "status": "parked",
+    "priority_adjustment": 10
+  },
+  "blocked": ["auto_restart"]
+}
+```
+
+### Rig Bead Labels
+
+Rig operational state stored as labels on the rig identity bead:
+
+```bash
+bd label add gt-rig-gastown status:docked
+bd label remove gt-rig-gastown status:docked
+```
+
+### Daemon Integration
+
+The daemon's lifecycle manager checks config before starting services:
+
+```go
+func (d *Daemon) maybeStartRigServices(rig string) {
+    r := d.getRig(rig)
+
+    status := r.GetConfig("status")
+    if status == "parked" || status == "docked" {
+        log.Info("Rig %s is offline, skipping auto-start", rig)
+        return
+    }
+
+    d.ensureWitness(rig)
+    d.ensureRefinery(rig)
+}
+```
+
+## Related Documents
+
+- `~/gt/docs/hop/PROPERTY-LAYERS.md` - Strategic architecture
+- `wisp-architecture.md` - Wisp system design
+- `agent-as-bead.md` - Agent identity beads (similar pattern)
@@ -0,0 +1,306 @@
+# Daemon/Boot/Deacon Watchdog Chain
+
+> Autonomous health monitoring and recovery in Gas Town.
+
+## Overview
+
+Gas Town uses a three-tier watchdog chain for autonomous health monitoring:
+
+```
+Daemon (Go process)          ← Dumb transport, 3-min heartbeat
+    │
+    └─► Boot (AI agent)       ← Intelligent triage, fresh each tick
+            │
+            └─► Deacon (AI agent)  ← Continuous patrol, long-running
+                    │
+                    └─► Witnesses & Refineries  ← Per-rig agents
+```
+
+**Key insight**: The daemon is mechanical (can't reason), but health decisions need
+intelligence (is the agent stuck or just thinking?). Boot bridges this gap.
+
+## Design Rationale: Why Two Agents?
+
+### The Problem
+
+The daemon needs to ensure the Deacon is healthy, but:
+
+1. **Daemon can't reason** - It's Go code following the ZFC principle (don't reason
+   about other agents). It can check "is session alive?" but not "is agent stuck?"
+
+2. **Waking costs context** - Each time you spawn an AI agent, you consume context
+   tokens. In idle towns, waking Deacon every 3 minutes wastes resources.
+
+3. **Observation requires intelligence** - Distinguishing "agent composing large
+   artifact" from "agent hung on tool prompt" requires reasoning.
+
+### The Solution: Boot as Triage
+
+Boot is a narrow, ephemeral AI agent that:
+- Runs fresh each daemon tick (no accumulated context debt)
+- Makes a single decision: should Deacon wake?
+- Exits immediately after deciding
+
+This gives us intelligent triage without the cost of keeping a full AI running.
+
+### Why Not Merge Boot into Deacon?
+
+We could have Deacon handle its own "should I be awake?" logic, but:
+
+1. **Deacon can't observe itself** - A hung Deacon can't detect it's hung
+2. **Context accumulation** - Deacon runs continuously; Boot restarts fresh
+3. **Cost in idle towns** - Boot only costs tokens when it runs; Deacon costs
+   tokens constantly if kept alive
+
+### Why Not Replace with Go Code?
+
+The daemon could directly monitor agents without AI, but:
+
+1. **Can't observe panes** - Go code can't interpret tmux output semantically
+2. **Can't distinguish stuck vs working** - No reasoning about agent state
+3. **Escalation is complex** - When to notify? When to force-restart? AI handles
+   nuanced decisions better than hardcoded thresholds
+
+## Session Ownership
+
+| Agent | Session Name | Location | Lifecycle |
+|-------|--------------|----------|-----------|
+| Daemon | (Go process) | `~/gt/daemon/` | Persistent, auto-restart |
+| Boot | `gt-boot` | `~/gt/deacon/dogs/boot/` | Ephemeral, fresh each tick |
+| Deacon | `hq-deacon` | `~/gt/deacon/` | Long-running, handoff loop |
+
+**Critical**: Boot runs in `gt-boot`, NOT `hq-deacon`. This prevents Boot
+from conflicting with a running Deacon session.
+
+## Heartbeat Mechanics
+
+### Daemon Heartbeat (3 minutes)
+
+The daemon runs a heartbeat tick every 3 minutes:
+
+```go
+func (d *Daemon) heartbeatTick() {
+    d.ensureBootRunning()           // 1. Spawn Boot for triage
+    d.checkDeaconHeartbeat()        // 2. Belt-and-suspenders fallback
+    d.ensureWitnessesRunning()      // 3. Witness health (checks tmux directly)
+    d.ensureRefineriesRunning()     // 4. Refinery health (checks tmux directly)
+    d.triggerPendingSpawns()        // 5. Bootstrap polecats
+    d.processLifecycleRequests()    // 6. Cycle/restart requests
+    // Agent state derived from tmux, not recorded in beads (gt-zecmc)
+}
+```
+
+### Deacon Heartbeat (continuous)
+
+The Deacon updates `~/gt/deacon/heartbeat.json` at the start of each patrol cycle:
+
+```json
+{
+  "timestamp": "2026-01-02T18:30:00Z",
+  "cycle": 42,
+  "last_action": "health-scan",
+  "healthy_agents": 3,
+  "unhealthy_agents": 0
+}
+```
+
+### Heartbeat Freshness
+
+| Age | State | Boot Action |
+|-----|-------|-------------|
+| < 5 min | Fresh | Nothing (Deacon active) |
+| 5-15 min | Stale | Nudge if pending mail |
+| > 15 min | Very stale | Wake (Deacon may be stuck) |
+
+## Boot Decision Matrix
+
+When Boot runs, it observes:
+- Is Deacon session alive?
+- How old is Deacon's heartbeat?
+- Is there pending mail for Deacon?
+- What's in Deacon's tmux pane?
+
+Then decides:
+
+| Condition | Action | Command |
+|-----------|--------|---------|
+| Session dead | START | Exit; daemon calls `ensureDeaconRunning()` |
+| Heartbeat > 15 min | WAKE | `gt nudge deacon "Boot wake: check your inbox"` |
+| Heartbeat 5-15 min + mail | NUDGE | `gt nudge deacon "Boot check-in: pending work"` |
+| Heartbeat fresh | NOTHING | Exit silently |
+
+## Handoff Flow
+
+### Deacon Handoff
+
+The Deacon runs continuous patrol cycles. After N cycles or high context:
+
+```
+End of patrol cycle:
+    │
+    ├─ Squash wisp to digest (ephemeral → permanent)
+    ├─ Write summary to molecule state
+    └─ gt handoff -s "Routine cycle" -m "Details"
+        │
+        └─ Creates mail for next session
+```
+
+Next daemon tick:
+```
+Daemon → ensureDeaconRunning()
+    │
+    └─ Spawns fresh Deacon in gt-deacon
+        │
+        └─ SessionStart hook: gt mail check --inject
+            │
+            └─ Previous handoff mail injected
+                │
+                └─ Deacon reads and continues
+```
+
+### Boot Handoff (Rare)
+
+Boot is ephemeral - it exits after each tick. No persistent handoff needed.
+
+However, Boot uses a marker file to prevent double-spawning:
+- Marker: `~/gt/deacon/dogs/boot/.boot-running` (TTL: 5 minutes)
+- Status: `~/gt/deacon/dogs/boot/.boot-status.json` (last action/result)
+
+If the marker exists and is recent, daemon skips Boot spawn for that tick.
+
+## Degraded Mode
+
+When tmux is unavailable, Gas Town enters degraded mode:
+
+| Capability | Normal | Degraded |
+|------------|--------|----------|
+| Boot runs | As AI in tmux | As Go code (mechanical) |
+| Observe panes | Yes | No |
+| Nudge agents | Yes | No |
+| Start agents | tmux sessions | Direct spawn |
+
+Degraded Boot triage is purely mechanical:
+- Session dead → start
+- Heartbeat stale → restart
+- No reasoning, just thresholds
+
+## Fallback Chain
+
+Multiple layers ensure recovery:
+
+1. **Boot triage** - Intelligent observation, first line
+2. **Daemon checkDeaconHeartbeat()** - Belt-and-suspenders if Boot fails
+3. **Tmux-based discovery** - Daemon checks tmux sessions directly (no bead state)
+4. **Human escalation** - Mail to overseer for unrecoverable states
+
+## State Files
+
+| File | Purpose | Updated By |
+|------|---------|-----------|
+| `deacon/heartbeat.json` | Deacon freshness | Deacon (each cycle) |
+| `deacon/dogs/boot/.boot-running` | Boot in-progress marker | Boot spawn |
+| `deacon/dogs/boot/.boot-status.json` | Boot last action | Boot triage |
+| `deacon/health-check-state.json` | Agent health tracking | `gt deacon health-check` |
+| `daemon/daemon.log` | Daemon activity | Daemon |
+| `daemon/daemon.pid` | Daemon process ID | Daemon startup |
+
+## Debugging
+
+```bash
+# Check Deacon heartbeat
+cat ~/gt/deacon/heartbeat.json | jq .
+
+# Check Boot status
+cat ~/gt/deacon/dogs/boot/.boot-status.json | jq .
+
+# View daemon log
+tail -f ~/gt/daemon/daemon.log
+
+# Manual Boot run
+gt boot triage
+
+# Manual Deacon health check
+gt deacon health-check
+```
+
+## Common Issues
+
+### Boot Spawns in Wrong Session
+
+**Symptom**: Boot runs in `hq-deacon` instead of `gt-boot`
+**Cause**: Session name confusion in spawn code
+**Fix**: Ensure `gt boot triage` specifies `--session=gt-boot`
+
+### Zombie Sessions Block Restart
+
+**Symptom**: tmux session exists but Claude is dead
+**Cause**: Daemon checks session existence, not process health
+**Fix**: Kill zombie sessions before recreating: `gt session kill hq-deacon`
+
+### Status Shows Wrong State
+
+**Symptom**: `gt status` shows wrong state for agents
+**Cause**: Previously bead state and tmux state could diverge
+**Fix**: As of gt-zecmc, status derives state from tmux directly (no bead state for
+observable conditions like running/stopped). Non-observable states (stuck, awaiting-gate)
+are still stored in beads.
+
+## Design Decision: Keep Separation
+
+The issue [gt-1847v] considered three options:
+
+### Option A: Keep Boot/Deacon Separation (CHOSEN)
+
+- Boot is ephemeral, spawns fresh each heartbeat
+- Boot runs in `gt-boot`, exits after triage
+- Deacon runs in `hq-deacon`, continuous patrol
+- Clear session boundaries, clear lifecycle
+
+**Verdict**: This is the correct design. The implementation needs fixing, not the architecture.
+
+### Option B: Merge Boot into Deacon (Rejected)
+
+- Single `hq-deacon` session handles everything
+- Deacon checks "should I be awake?" internally
+
+**Why rejected**:
+- Deacon can't observe itself (hung Deacon can't detect hang)
+- Context accumulates even when idle (cost in quiet towns)
+- No external watchdog means no recovery from Deacon failure
+
+### Option C: Replace with Go Watchdog (Rejected)
+
+- Daemon directly monitors witness/refinery
+- No Boot, no Deacon AI for health checks
+- AI agents only for complex decisions
+
+**Why rejected**:
+- Go code can't interpret tmux pane output semantically
+- Can't distinguish "stuck" from "thinking deeply"
+- Loses the intelligent triage that makes the system resilient
+- Escalation decisions are nuanced (when to notify? force-restart?)
+
+### Implementation Fixes Needed
+
+The separation is correct; these bugs need fixing:
+
+1. **Session confusion** (gt-sgzsb): Boot spawns in wrong session
+2. **Zombie blocking** (gt-j1i0r): Daemon can't kill zombie sessions
+3. ~~**Status mismatch** (gt-doih4): Bead vs tmux state divergence~~ → FIXED in gt-zecmc
+4. **Ensure semantics** (gt-ekc5u): Start should kill zombies first
+
+## Summary
+
+The watchdog chain provides autonomous recovery:
+
+- **Daemon**: Mechanical heartbeat, spawns Boot
+- **Boot**: Intelligent triage, decides Deacon fate
+- **Deacon**: Continuous patrol, monitors workers
+
+Boot exists because the daemon can't reason and Deacon can't observe itself.
+The separation costs complexity but enables:
+
+1. **Intelligent triage** without constant AI cost
+2. **Fresh context** for each triage decision
+3. **Graceful degradation** when tmux unavailable
+4. **Multiple fallback** layers for reliability