docs: reorganize documentation into concepts, design, and examples

Move documentation files into a clearer structure:
- concepts/: core ideas (convoy, identity, molecules, polecat-lifecycle, propulsion)
- design/: architecture and protocols (architecture, escalation, federation, mail, etc.)
- examples/: demos and tutorials (hanoi-demo)
- overview.md: renamed from understanding-gas-town.md

Remove outdated/superseded docs and update reference.md.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
gastown/crew/gus
2026-01-11 21:21:25 -08:00
committed by Steve Yegge
parent 8ed31e9634
commit 88f784a9aa
22 changed files with 195 additions and 1356 deletions

130
docs/design/architecture.md Normal file
View File

@@ -0,0 +1,130 @@
# Gas Town Architecture
Technical architecture for Gas Town multi-agent workspace management.
## Two-Level Beads Architecture
Gas Town uses a two-level beads architecture to separate organizational coordination
from project implementation work.
| Level | Location | Prefix | Purpose |
|-------|----------|--------|---------|
| **Town** | `~/gt/.beads/` | `hq-*` | Cross-rig coordination, Mayor mail, agent identity |
| **Rig** | `<rig>/mayor/rig/.beads/` | project prefix | Implementation work, MRs, project issues |
### Town-Level Beads (`~/gt/.beads/`)
Organizational chain for cross-rig coordination:
- Mayor mail and messages
- Convoy coordination (batch work across rigs)
- Strategic issues and decisions
- **Town-level agent beads** (Mayor, Deacon)
- **Role definition beads** (global templates)
### Rig-Level Beads (`<rig>/mayor/rig/.beads/`)
Project chain for implementation work:
- Bugs, features, tasks for the project
- Merge requests and code reviews
- Project-specific molecules
- **Rig-level agent beads** (Witness, Refinery, Polecats)
## Agent Bead Storage
Agent beads track lifecycle state for each agent. Storage location depends on
the agent's scope.
| Agent Type | Scope | Bead Location | Bead ID Format |
|------------|-------|---------------|----------------|
| Mayor | Town | `~/gt/.beads/` | `hq-mayor` |
| Deacon | Town | `~/gt/.beads/` | `hq-deacon` |
| Dogs | Town | `~/gt/.beads/` | `hq-dog-<name>` |
| Witness | Rig | `<rig>/.beads/` | `<prefix>-<rig>-witness` |
| Refinery | Rig | `<rig>/.beads/` | `<prefix>-<rig>-refinery` |
| Polecats | Rig | `<rig>/.beads/` | `<prefix>-<rig>-polecat-<name>` |
### Role Beads
Role beads are global templates stored in town beads with `hq-` prefix:
- `hq-mayor-role` - Mayor role definition
- `hq-deacon-role` - Deacon role definition
- `hq-witness-role` - Witness role definition
- `hq-refinery-role` - Refinery role definition
- `hq-polecat-role` - Polecat role definition
Each agent bead references its role bead via the `role_bead` field.
## Agent Taxonomy
### Town-Level Agents (Cross-Rig)
| Agent | Role | Persistence |
|-------|------|-------------|
| **Mayor** | Global coordinator, handles cross-rig communication and escalations | Persistent |
| **Deacon** | Daemon beacon - receives heartbeats, runs plugins and monitoring | Persistent |
| **Dogs** | Long-running workers for cross-rig batch work | Variable |
### Rig-Level Agents (Per-Project)
| Agent | Role | Persistence |
|-------|------|-------------|
| **Witness** | Monitors polecat health, handles nudging and cleanup | Persistent |
| **Refinery** | Processes merge queue, runs verification | Persistent |
| **Polecats** | Ephemeral workers assigned to specific issues | Ephemeral |
## Directory Structure
```
~/gt/ Town root
├── .beads/ Town-level beads (hq-* prefix)
│ ├── config.yaml Beads configuration
│ ├── issues.jsonl Town issues (mail, agents, convoys)
│ └── routes.jsonl Prefix → rig routing table
├── mayor/ Mayor config
│ └── town.json Town configuration
└── <rig>/ Project container (NOT a git clone)
├── config.json Rig identity and beads prefix
├── mayor/rig/ Canonical clone (beads live here)
│ └── .beads/ Rig-level beads database
├── refinery/rig/ Worktree from mayor/rig
├── witness/ No clone (monitors only)
├── crew/<name>/ Human workspaces (full clones)
└── polecats/<name>/ Worker worktrees from mayor/rig
```
### Worktree Architecture
Polecats and refinery are git worktrees, not full clones. This enables fast spawning
and shared object storage. The worktree base is `mayor/rig`:
```go
// From polecat/manager.go - worktrees are based on mayor/rig
git worktree add -b polecat/<name>-<timestamp> polecats/<name>
```
Crew workspaces (`crew/<name>/`) are full git clones for human developers who need
independent repos. Polecats are ephemeral and benefit from worktree efficiency.
## Beads Routing
The `routes.jsonl` file maps issue ID prefixes to rig locations (relative to town root):
```jsonl
{"prefix":"hq-","path":"."}
{"prefix":"gt-","path":"gastown/mayor/rig"}
{"prefix":"bd-","path":"beads/mayor/rig"}
```
Routes point to `mayor/rig` because that's where the canonical `.beads/` lives.
This enables transparent cross-rig beads operations:
```bash
bd show hq-mayor # Routes to town beads (~/.gt/.beads)
bd show gt-xyz # Routes to gastown/mayor/rig/.beads
```
## See Also
- [reference.md](../reference.md) - Command reference
- [molecules.md](../concepts/molecules.md) - Workflow molecules
- [identity.md](../concepts/identity.md) - Agent identity and BD_ACTOR

312
docs/design/escalation.md Normal file
View File

@@ -0,0 +1,312 @@
# Gas Town Escalation Protocol
> Reference for escalation paths in Gas Town
## Overview
Gas Town agents can escalate issues when automated resolution isn't possible.
This document covers:
- Severity levels and routing
- Escalation categories for structured communication
- Tiered escalation (Deacon -> Mayor -> Overseer)
- Decision patterns for async resolution
- Integration with gates and patrol lifecycles
## Severity Levels
| Level | Priority | Description | Examples |
|-------|----------|-------------|----------|
| **CRITICAL** | P0 (urgent) | System-threatening, immediate attention | Data corruption, security breach, system down |
| **HIGH** | P1 (high) | Important blocker, needs human soon | Unresolvable merge conflict, critical bug, ambiguous spec |
| **MEDIUM** | P2 (normal) | Standard escalation, human at convenience | Design decision needed, unclear requirements |
## Escalation Categories
Categories provide structured routing based on the nature of the escalation:
| Category | Description | Default Route |
|----------|-------------|---------------|
| `decision` | Multiple valid paths, need choice | Deacon -> Mayor |
| `help` | Need guidance or expertise | Deacon -> Mayor |
| `blocked` | Waiting on unresolvable dependency | Mayor |
| `failed` | Unexpected error, can't proceed | Deacon |
| `emergency` | Security or data integrity issue | Overseer (direct) |
| `gate_timeout` | Gate didn't resolve in time | Deacon |
| `lifecycle` | Worker stuck or needs recycle | Witness |
## Escalation Command
### Basic Usage (unchanged)
```bash
# Basic escalation (default: MEDIUM severity)
gt escalate "Database migration failed"
# Critical escalation - immediate attention
gt escalate -s CRITICAL "Data corruption detected in user table"
# High priority escalation
gt escalate -s HIGH "Merge conflict cannot be resolved automatically"
# With additional details
gt escalate -s MEDIUM "Need clarification on API design" -m "Details..."
```
### Category-Based Escalation
```bash
# Decision needed - routes to Deacon first
gt escalate --type decision "Which auth approach?"
# Help request
gt escalate --type help "Need architecture guidance"
# Blocked on dependency
gt escalate --type blocked "Waiting on bd-xyz"
# Failure that can't be recovered
gt escalate --type failed "Tests failing unexpectedly"
# Emergency - direct to Overseer
gt escalate --type emergency "Security vulnerability found"
```
### Tiered Routing
```bash
# Explicit routing to specific tier
gt escalate --to deacon "Infra issue"
gt escalate --to mayor "Cross-rig coordination needed"
gt escalate --to overseer "Human judgment required"
# Forward from one tier to next
gt escalate --forward --to mayor "Deacon couldn't resolve"
```
### Structured Decisions
For decisions requiring explicit choices:
```bash
gt escalate --type decision \
--question "Which authentication approach?" \
--options "JWT tokens,Session cookies,OAuth2" \
--context "Admin panel needs login" \
--issue bd-xyz
```
This updates the issue with a structured decision format (see below).
## What Happens on Escalation
1. **Bead created/updated**: Escalation bead (tagged `escalation`) created or updated
2. **Mail sent**: Routed to appropriate tier (Deacon, Mayor, or Overseer)
3. **Activity logged**: Event logged to activity feed
4. **Issue updated**: For decision type, issue gets structured format
## Tiered Escalation Flow
```
Worker encounters issue
|
v
gt escalate --type <category> [--to <tier>]
|
v
[Deacon receives] (default for most categories)
|
+-- Can resolve? --> Updates issue, re-slings work
|
+-- Cannot resolve? --> gt escalate --forward --to mayor
|
v
[Mayor receives]
|
+-- Can resolve? --> Updates issue, re-slings
|
+-- Cannot resolve? --> gt escalate --forward --to overseer
|
v
[Overseer resolves]
```
Each tier can resolve OR forward. The escalation chain is tracked via comments.
## Decision Pattern
When `--type decision` is used, the issue is updated with structured format:
```markdown
## Decision Needed
**Question:** Which authentication approach?
| Option | Description |
|--------|-------------|
| A | JWT tokens |
| B | Session cookies |
| C | OAuth2 |
**Context:** Admin panel needs login
**Escalated by:** beads/polecats/obsidian
**Escalated at:** 2026-01-01T15:00:00Z
**To resolve:**
1. Comment with chosen option (e.g., "Decision: A")
2. Reassign to work queue or original worker
```
The issue becomes the async communication channel. Resolution updates the issue
and can trigger re-slinging to the original worker.
## Integration Points
### Gate Timeouts
When timer gates expire (see bd-7zka.2), Witness escalates:
```go
if gate.Expired() {
exec.Command("gt", "escalate",
"--type", "gate_timeout",
"--severity", "HIGH",
"--issue", gate.BlockedIssueID,
fmt.Sprintf("Gate %s timed out after %s", gate.ID, gate.Timeout)).Run()
}
```
### Witness Patrol
Witness formalizes stuck-polecat detection as escalation:
```go
exec.Command("gt", "escalate",
"--type", "lifecycle",
"--to", "mayor",
"--issue", polecat.CurrentIssue,
fmt.Sprintf("Polecat %s stuck: no progress for %d minutes", polecat.ID, minutes)).Run()
```
### Refinery
On merge failures that can't be auto-resolved:
```go
exec.Command("gt", "escalate",
"--type", "failed",
"--issue", mr.IssueID,
"Merge failed: "+reason).Run()
```
## Polecat Exit with Escalation
When a polecat needs a decision to continue:
```bash
# 1. Update issue with decision structure
bd update $ISSUE --notes "$(cat <<EOF
## Decision Needed
**Question:** Which approach for caching?
| Option | Description |
|--------|-------------|
| A | Redis (external dependency) |
| B | In-memory (simpler, no persistence) |
| C | SQLite (local persistence) |
**Context:** API response times are slow, need caching layer.
EOF
)"
# 2. Escalate
gt escalate --type decision --issue $ISSUE "Caching approach needs decision"
# 3. Exit cleanly
gt done --status ESCALATED
```
## Mayor Startup Check
On `gt prime`, Mayor checks for pending escalations:
```
## PENDING ESCALATIONS
There are 3 escalation(s) awaiting attention:
CRITICAL: 1
HIGH: 1
MEDIUM: 1
[CRITICAL] Data corruption detected (gt-abc)
[HIGH] Merge conflict in auth module (gt-def)
[MEDIUM] API design clarification needed (gt-ghi)
**Action required:** Review escalations with `bd list --tag=escalation`
Close resolved ones with `bd close <id> --reason "resolution"`
```
## When to Escalate
### Agents SHOULD escalate when:
- **System errors**: Database corruption, disk full, network failures
- **Security issues**: Unauthorized access attempts, credential exposure
- **Unresolvable conflicts**: Merge conflicts that can't be auto-resolved
- **Ambiguous requirements**: Spec is unclear, multiple valid interpretations
- **Design decisions**: Architectural choices that need human judgment
- **Stuck loops**: Agent is stuck and can't make progress
- **Gate timeouts**: Async conditions didn't resolve in expected time
### Agents should NOT escalate for:
- **Normal workflow**: Regular work that can proceed without human input
- **Recoverable errors**: Transient failures that will auto-retry
- **Information queries**: Questions that can be answered from context
## Viewing Escalations
```bash
# List all open escalations
bd list --status=open --tag=escalation
# Filter by category
bd list --tag=escalation --tag=decision
# View specific escalation
bd show <escalation-id>
# Close resolved escalation
bd close <id> --reason "Resolved by fixing X"
```
## Implementation Phases
### Phase 1: Extend gt escalate
- Add `--type` flag for categories
- Add `--to` flag for routing (deacon, mayor, overseer)
- Add `--forward` flag for tier forwarding
- Backward compatible with existing usage
### Phase 2: Decision Pattern
- Add `--question`, `--options`, `--context` flags
- Auto-update issue with decision structure
- Parse decision from issue comments on resolution
### Phase 3: Gate Integration
- Add `gate_timeout` escalation type
- Witness checks timer gates, escalates on timeout
- Refinery checks GH gates, escalates on timeout/failure
### Phase 4: Patrol Integration
- Formalize Witness stuck-polecat as escalation
- Formalize Refinery merge-failure as escalation
- Unified escalation handling in Mayor
## References
- bd-7zka.2: Gate evaluation (uses escalation for timeouts)
- bd-0sgd: Design issue for this extended escalation system

248
docs/design/federation.md Normal file
View File

@@ -0,0 +1,248 @@
# Federation Architecture
> **Status: Design spec - not yet implemented**
> Multi-workspace coordination for Gas Town and Beads
## Overview
Federation enables multiple Gas Town instances to reference each other's work,
coordinate across organizations, and track distributed projects.
## Why Federation?
Real enterprise projects don't live in a single repo:
- **Microservices:** 50 repos, tight dependencies, coordinated releases
- **Platform teams:** Shared libraries used by dozens of downstream projects
- **Contractors:** External teams working on components you need to track
- **Acquisitions:** New codebases that need to integrate with existing work
Traditional tools force you to choose: unified tracking (monorepo) or team
autonomy (multi-repo with fragmented visibility). Federation provides both:
each workspace is autonomous, but cross-workspace references are first-class.
## Entity Model
### Three Levels
```
Level 1: Entity - Person or organization (flat namespace)
Level 2: Chain - Workspace/town per entity
Level 3: Work Unit - Issues, tasks, molecules on chains
```
### URI Scheme
Full work unit reference (HOP protocol):
```
hop://entity/chain/rig/issue-id
hop://steve@example.com/main-town/greenplace/gp-xyz
```
Cross-repo reference (same platform):
```
beads://platform/org/repo/issue-id
beads://github/acme/backend/ac-123
```
Within a workspace, short forms are preferred:
```
gp-xyz # Local (prefix routes via routes.jsonl)
greenplace/gp-xyz # Different rig, same chain
./gp-xyz # Explicit current-rig ref
```
See `~/gt/docs/hop/GRAPH-ARCHITECTURE.md` for full URI specification.
## Relationship Types
### Employment
Track which entities belong to organizations:
```json
{
"type": "employment",
"entity": "alice@example.com",
"organization": "acme.com"
}
```
### Cross-Reference
Reference work in another workspace:
```json
{
"references": [
{
"type": "depends_on",
"target": "hop://other-entity/chain/rig/issue-id"
}
]
}
```
### Delegation
Distribute work across workspaces:
```json
{
"type": "delegation",
"parent": "hop://acme.com/projects/proj-123",
"child": "hop://alice@example.com/town/greenplace/gp-xyz",
"terms": { "portion": "backend", "deadline": "2025-02-01" }
}
```
## Agent Provenance
Every agent operation is attributed. See [identity.md](../concepts/identity.md) for the
complete BD_ACTOR format convention.
### Git Commits
```bash
# Set per agent session
GIT_AUTHOR_NAME="greenplace/crew/joe"
GIT_AUTHOR_EMAIL="steve@example.com" # Workspace owner
```
Result: `abc123 Fix bug (greenplace/crew/joe <steve@example.com>)`
### Beads Operations
```bash
BD_ACTOR="greenplace/crew/joe" # Set in agent environment
bd create --title="Task" # Actor auto-populated
```
### Event Logging
All events include actor:
```json
{
"ts": "2025-01-15T10:30:00Z",
"type": "sling",
"actor": "greenplace/crew/joe",
"payload": { "bead": "gp-xyz", "target": "greenplace/polecats/Toast" }
}
```
## Discovery
### Workspace Metadata
Each workspace has identity metadata:
```json
// ~/gt/.town.json
{
"owner": "steve@example.com",
"name": "main-town",
"public_name": "steve-greenplace"
}
```
### Remote Registration
```bash
gt remote add acme hop://acme.com/engineering
gt remote list
```
### Cross-Workspace Queries
```bash
bd show hop://acme.com/eng/ac-123 # Fetch remote issue
bd list --remote=acme # List remote issues
```
## Aggregation
Query across relationships without hierarchy:
```bash
# All work by org members
bd list --org=acme.com
# All work on a project (including delegated)
bd list --project=proj-123 --include-delegated
# Agent's full history
bd audit --actor=greenplace/crew/joe
```
## Implementation Status
- [x] Agent identity in git commits
- [x] BD_ACTOR default in beads create
- [x] Workspace metadata file (.town.json)
- [x] Cross-workspace URI scheme (hop://, beads://, local forms)
- [ ] Remote registration
- [ ] Cross-workspace queries
- [ ] Delegation primitives
## Use Cases
### Multi-Repo Projects
Track work spanning multiple repositories:
```
Project X
├── hop://team/frontend/fe-123
├── hop://team/backend/be-456
└── hop://team/infra/inf-789
```
### Distributed Teams
Team members in different workspaces:
```
Alice's Town → works on → Project X
Bob's Town → works on → Project X
```
Each maintains their own CV/audit trail.
### Contractor Coordination
Prime contractor delegates to subcontractors:
```
Acme/Project
└── delegates to → Vendor/SubProject
└── delegates to → Contractor/Task
```
Completion cascades up. Attribution preserved.
## Design Principles
1. **Flat namespace** - Entities not nested, relationships connect them
2. **Relationships over hierarchy** - Graph structure, not tree
3. **Git-native** - Federation uses git mechanics (remotes, refs)
4. **Incremental** - Works standalone, gains power with federation
5. **Privacy-preserving** - Each entity controls their chain visibility
## Enterprise Benefits
| Challenge | Without Federation | With Federation |
|-----------|-------------------|-----------------|
| Cross-repo dependencies | "Check with backend team" | Explicit dependency tracking |
| Contractor visibility | Email updates, status calls | Live status, same tooling |
| Release coordination | Spreadsheets, Slack threads | Unified timeline view |
| Agent attribution | Per-repo, fragmented | Cross-workspace CV |
| Compliance audit | Stitch together logs | Query across workspaces |
Federation isn't just about connecting repos - it's about treating distributed
engineering as a first-class concern, with the same visibility and tooling
you'd expect from a monorepo, while preserving team autonomy.

View File

@@ -0,0 +1,361 @@
# Gas Town Mail Protocol
> Reference for inter-agent mail communication in Gas Town
## Overview
Gas Town agents coordinate via mail messages routed through the beads system.
Mail uses `type=message` beads with routing handled by `gt mail`.
## Message Types
### POLECAT_DONE
**Route**: Polecat → Witness
**Purpose**: Signal work completion, trigger cleanup flow.
**Subject format**: `POLECAT_DONE <polecat-name>`
**Body format**:
```
Exit: MERGED|ESCALATED|DEFERRED
Issue: <issue-id>
MR: <mr-id> # if exit=MERGED
Branch: <branch>
```
**Trigger**: `gt done` command generates this automatically.
**Handler**: Witness creates a cleanup wisp for the polecat.
### MERGE_READY
**Route**: Witness → Refinery
**Purpose**: Signal a branch is ready for merge queue processing.
**Subject format**: `MERGE_READY <polecat-name>`
**Body format**:
```
Branch: <branch>
Issue: <issue-id>
Polecat: <polecat-name>
Verified: clean git state, issue closed
```
**Trigger**: Witness sends after verifying polecat work is complete.
**Handler**: Refinery adds to merge queue, processes when ready.
### MERGED
**Route**: Refinery → Witness
**Purpose**: Confirm branch was merged successfully, safe to nuke polecat.
**Subject format**: `MERGED <polecat-name>`
**Body format**:
```
Branch: <branch>
Issue: <issue-id>
Polecat: <polecat-name>
Rig: <rig>
Target: <target-branch>
Merged-At: <timestamp>
Merge-Commit: <sha>
```
**Trigger**: Refinery sends after successful merge to main.
**Handler**: Witness completes cleanup wisp, nukes polecat worktree.
### MERGE_FAILED
**Route**: Refinery → Witness
**Purpose**: Notify that merge attempt failed (tests, build, or other non-conflict error).
**Subject format**: `MERGE_FAILED <polecat-name>`
**Body format**:
```
Branch: <branch>
Issue: <issue-id>
Polecat: <polecat-name>
Rig: <rig>
Target: <target-branch>
Failed-At: <timestamp>
Failure-Type: <tests|build|push|other>
Error: <error-message>
```
**Trigger**: Refinery sends when merge fails for non-conflict reasons.
**Handler**: Witness notifies polecat, assigns work back for rework.
### REWORK_REQUEST
**Route**: Refinery → Witness
**Purpose**: Request polecat to rebase branch due to merge conflicts.
**Subject format**: `REWORK_REQUEST <polecat-name>`
**Body format**:
```
Branch: <branch>
Issue: <issue-id>
Polecat: <polecat-name>
Rig: <rig>
Target: <target-branch>
Requested-At: <timestamp>
Conflict-Files: <file1>, <file2>, ...
Please rebase your changes onto <target-branch>:
git fetch origin
git rebase origin/<target-branch>
# Resolve any conflicts
git push -f
The Refinery will retry the merge after rebase is complete.
```
**Trigger**: Refinery sends when merge has conflicts with target branch.
**Handler**: Witness notifies polecat with rebase instructions.
### WITNESS_PING
**Route**: Witness → Deacon (all witnesses send)
**Purpose**: Second-order monitoring - ensure Deacon is alive.
**Subject format**: `WITNESS_PING <rig>`
**Body format**:
```
Rig: <rig>
Timestamp: <timestamp>
Patrol: <cycle-number>
```
**Trigger**: Each witness sends periodically (every N patrol cycles).
**Handler**: Deacon acknowledges. If no ack, witnesses escalate to Mayor.
### HELP
**Route**: Any → escalation target (usually Mayor)
**Purpose**: Request intervention for stuck/blocked work.
**Subject format**: `HELP: <brief-description>`
**Body format**:
```
Agent: <agent-id>
Issue: <issue-id> # if applicable
Problem: <description>
Tried: <what was attempted>
```
**Trigger**: Agent unable to proceed, needs external help.
**Handler**: Escalation target assesses and intervenes.
### HANDOFF
**Route**: Agent → self (or successor)
**Purpose**: Session continuity across context limits/restarts.
**Subject format**: `🤝 HANDOFF: <brief-context>`
**Body format**:
```
attached_molecule: <molecule-id> # if work in progress
attached_at: <timestamp>
## Context
<freeform notes for successor>
## Status
<where things stand>
## Next
<what successor should do>
```
**Trigger**: `gt handoff` command, or manual send before session end.
**Handler**: Next session reads handoff, continues from context.
## Format Conventions
### Subject Line
- **Type prefix**: Uppercase, identifies message type
- **Colon separator**: After type for structured info
- **Brief context**: Human-readable summary
Examples:
```
POLECAT_DONE nux
MERGE_READY greenplace/nux
HELP: Polecat stuck on test failures
🤝 HANDOFF: Schema work in progress
```
### Body Structure
- **Key-value pairs**: For structured data (one per line)
- **Blank line**: Separates structured data from freeform content
- **Markdown sections**: For freeform content (##, lists, code blocks)
### Addresses
Format: `<rig>/<role>` or `<rig>/<type>/<name>`
Examples:
```
greenplace/witness # Witness for greenplace rig
beads/refinery # Refinery for beads rig
greenplace/polecats/nux # Specific polecat
mayor/ # Town-level Mayor
deacon/ # Town-level Deacon
```
## Protocol Flows
### Polecat Completion Flow
```
Polecat Witness Refinery
│ │ │
│ POLECAT_DONE │ │
│─────────────────────────>│ │
│ │ │
│ (verify clean) │
│ │ │
│ │ MERGE_READY │
│ │─────────────────────────>│
│ │ │
│ │ (merge attempt)
│ │ │
│ │ MERGED (success) │
│ │<─────────────────────────│
│ │ │
│ (nuke polecat) │
│ │ │
```
### Merge Failure Flow
```
Witness Refinery
│ │
│ (merge fails)
│ │
│ MERGE_FAILED │
┌──────────────────────────│<─────────────────────────│
│ │ │
│ (failure notification) │ │
│<─────────────────────────│ │
│ │ │
Polecat (rework needed)
```
### Rebase Required Flow
```
Witness Refinery
│ │
│ (conflict detected)
│ │
│ REWORK_REQUEST │
┌──────────────────────────│<─────────────────────────│
│ │ │
│ (rebase instructions) │ │
│<─────────────────────────│ │
│ │ │
Polecat │ │
│ │ │
│ (rebases, gt done) │ │
│─────────────────────────>│ MERGE_READY │
│ │─────────────────────────>│
│ │ (retry merge)
```
### Second-Order Monitoring
```
Witness-1 ──┐
│ WITNESS_PING
Witness-2 ──┼────────────────> Deacon
Witness-N ──┘
(if no response)
<────────────────────┘
Escalate to Mayor
```
## Implementation
### Sending Mail
```bash
# Basic send
gt mail send <addr> -s "Subject" -m "Body"
# With structured body
gt mail send greenplace/witness -s "MERGE_READY nux" -m "Branch: feature-xyz
Issue: gp-abc
Polecat: nux
Verified: clean"
```
### Receiving Mail
```bash
# Check inbox
gt mail inbox
# Read specific message
gt mail read <msg-id>
# Mark as read
gt mail ack <msg-id>
```
### In Patrol Formulas
Formulas should:
1. Check inbox at start of each cycle
2. Parse subject prefix to route handling
3. Extract structured data from body
4. Take appropriate action
5. Mark mail as read after processing
## Extensibility
New message types follow the pattern:
1. Define subject prefix (TYPE: or TYPE_SUBTYPE)
2. Document body format (key-value pairs + freeform)
3. Specify route (sender → receiver)
4. Implement handlers in relevant patrol formulas
The protocol is intentionally simple - structured enough for parsing,
flexible enough for human debugging.
## Related Documents
- `docs/agent-as-bead.md` - Agent identity and slots
- `.beads/formulas/mol-witness-patrol.formula.toml` - Witness handling
- `internal/mail/` - Mail routing implementation
- `internal/protocol/` - Protocol handlers for Witness-Refinery communication

View File

@@ -0,0 +1,136 @@
# Operational State in Gas Town
> Managing runtime state through events and labels.
## Overview
Gas Town tracks operational state changes as structured data. This document covers:
- **Events**: State transitions as beads (immutable audit trail)
- **Labels-as-state**: Fast queries via role bead labels (current state cache)
For Boot triage and degraded mode details, see [Watchdog Chain](watchdog-chain.md).
## Events: State Transitions as Data
Operational state changes are recorded as event beads. Each event captures:
- **What** changed (`event_type`)
- **Who** caused it (`actor`)
- **What** was affected (`target`)
- **Context** (`payload`)
- **When** (`created_at`)
### Event Types
| Event Type | Description | Payload |
|------------|-------------|---------|
| `patrol.muted` | Patrol cycle disabled | `{reason, until?}` |
| `patrol.unmuted` | Patrol cycle re-enabled | `{reason?}` |
| `agent.started` | Agent session began | `{session_id?}` |
| `agent.stopped` | Agent session ended | `{reason, outcome?}` |
| `mode.degraded` | System entered degraded mode | `{reason}` |
| `mode.normal` | System returned to normal | `{}` |
### Creating Events
```bash
# Mute deacon patrol
bd create --type=event --event-type=patrol.muted \
--actor=human:overseer --target=agent:deacon \
--payload='{"reason":"fixing convoy deadlock","until":"gt-abc1"}'
# System entered degraded mode
bd create --type=event --event-type=mode.degraded \
--actor=system:daemon --target=rig:greenplace \
--payload='{"reason":"tmux unavailable"}'
```
### Querying Events
```bash
# Recent events for an agent
bd list --type=event --target=agent:deacon --limit=10
# All patrol state changes
bd list --type=event --event-type=patrol.muted
bd list --type=event --event-type=patrol.unmuted
# Events in the activity feed
bd activity --follow --type=event
```
## Labels-as-State Pattern
Events capture the full history. Labels cache the current state for fast queries.
### Convention
Labels use `<dimension>:<value>` format:
- `patrol:muted` / `patrol:active`
- `mode:degraded` / `mode:normal`
- `status:idle` / `status:working`
### State Change Flow
1. Create event bead (full context, immutable)
2. Update role bead labels (current state cache)
```bash
# Mute patrol
bd create --type=event --event-type=patrol.muted ...
bd update role-deacon --add-label=patrol:muted --remove-label=patrol:active
# Unmute patrol
bd create --type=event --event-type=patrol.unmuted ...
bd update role-deacon --add-label=patrol:active --remove-label=patrol:muted
```
### Querying Current State
```bash
# Is deacon patrol muted?
bd show role-deacon | grep patrol:
# All agents with muted patrol
bd list --type=role --label=patrol:muted
# All agents in degraded mode
bd list --type=role --label=mode:degraded
```
## Configuration vs State
| Type | Storage | Example |
|------|---------|---------|
| **Static config** | TOML files | Daemon tick interval |
| **Operational state** | Beads (events + labels) | Patrol muted |
| **Runtime flags** | Marker files | `.deacon-disabled` |
Static config rarely changes and doesn't need history.
Operational state changes at runtime and benefits from audit trail.
Marker files are fast checks that can trigger deeper beads queries.
## Commands Summary
```bash
# Create operational event
bd create --type=event --event-type=<type> \
--actor=<entity> --target=<entity> --payload='<json>'
# Update state label
bd update <role-bead> --add-label=<dim>:<val> --remove-label=<dim>:<old>
# Query current state
bd list --type=role --label=<dim>:<val>
# Query state history
bd list --type=event --target=<entity>
# Boot management
gt dog status boot
gt dog call boot
gt dog prime boot
```
---
*Events are the source of truth. Labels are the cache.*

View File

@@ -0,0 +1,300 @@
# Property Layers: Multi-Level Configuration
> Implementation guide for Gas Town's configuration system.
> Created: 2025-01-06
## Overview
Gas Town uses a layered property system for configuration. Properties are
looked up through multiple layers, with earlier layers overriding later ones.
This enables both local control and global coordination.
## The Four Layers
```
┌─────────────────────────────────────────────────────────────┐
│ 1. WISP LAYER (transient, town-local) │
│ Location: <rig>/.beads-wisp/config/ │
│ Synced: Never │
│ Use: Temporary local overrides │
└─────────────────────────────┬───────────────────────────────┘
│ if missing
┌─────────────────────────────────────────────────────────────┐
│ 2. RIG BEAD LAYER (persistent, synced globally) │
│ Location: <rig>/.beads/ (rig identity bead labels) │
│ Synced: Via git (all clones see it) │
│ Use: Project-wide operational state │
└─────────────────────────────┬───────────────────────────────┘
│ if missing
┌─────────────────────────────────────────────────────────────┐
│ 3. TOWN DEFAULTS │
│ Location: ~/gt/config.json or ~/gt/.beads/ │
│ Synced: N/A (per-town) │
│ Use: Town-wide policies │
└─────────────────────────────┬───────────────────────────────┘
│ if missing
┌─────────────────────────────────────────────────────────────┐
│ 4. SYSTEM DEFAULTS (compiled in) │
│ Use: Fallback when nothing else specified │
└─────────────────────────────────────────────────────────────┘
```
## Lookup Behavior
### Override Semantics (Default)
For most properties, the first non-nil value wins:
```go
func GetConfig(key string) interface{} {
if val := wisp.Get(key); val != nil {
if val == Blocked { return nil }
return val
}
if val := rigBead.GetLabel(key); val != nil {
return val
}
if val := townDefaults.Get(key); val != nil {
return val
}
return systemDefaults[key]
}
```
### Stacking Semantics (Integers)
For integer properties, values from wisp and bead layers **add** to the base:
```go
func GetIntConfig(key string) int {
base := getBaseDefault(key) // Town or system default
beadAdj := rigBead.GetInt(key) // 0 if missing
wispAdj := wisp.GetInt(key) // 0 if missing
return base + beadAdj + wispAdj
}
```
This enables temporary adjustments without changing the base value.
### Blocking Inheritance
You can explicitly block a property from being inherited:
```bash
gt rig config set gastown auto_restart --block
```
This creates a "blocked" marker in the wisp layer. Even if the rig bead
or defaults say `auto_restart: true`, the lookup returns nil.
## Rig Identity Beads
Each rig has an identity bead for operational state:
```yaml
id: gt-rig-gastown
type: rig
name: gastown
repo: git@github.com:steveyegge/gastown.git
prefix: gt
labels:
- status:operational
- priority:normal
```
These beads sync via git, so all clones of the rig see the same state.
## Two-Level Rig Control
### Level 1: Park (Local, Ephemeral)
```bash
gt rig park gastown # Stop services, daemon won't restart
gt rig unpark gastown # Allow services to run
```
- Stored in wisp layer (`.beads-wisp/config/`)
- Only affects this town
- Disappears on cleanup
- Use: Local maintenance, debugging
### Level 2: Dock (Global, Persistent)
```bash
gt rig dock gastown # Set status:docked label on rig bead
gt rig undock gastown # Remove label
```
- Stored on rig identity bead
- Syncs to all clones via git
- Permanent until explicitly changed
- Use: Project-wide maintenance, coordinated downtime
### Daemon Behavior
The daemon checks both levels before auto-restarting:
```go
func shouldAutoRestart(rig *Rig) bool {
status := rig.GetConfig("status")
if status == "parked" || status == "docked" {
return false
}
return true
}
```
## Configuration Keys
| Key | Type | Behavior | Description |
|-----|------|----------|-------------|
| `status` | string | Override | operational/parked/docked |
| `auto_restart` | bool | Override | Daemon auto-restart behavior |
| `max_polecats` | int | Override | Maximum concurrent polecats |
| `priority_adjustment` | int | **Stack** | Scheduling priority modifier |
| `maintenance_window` | string | Override | When maintenance allowed |
| `dnd` | bool | Override | Do not disturb mode |
## Commands
### View Configuration
```bash
gt rig config show gastown # Show effective config (all layers)
gt rig config show gastown --layer # Show which layer each value comes from
```
### Set Configuration
```bash
# Set in wisp layer (local, ephemeral)
gt rig config set gastown key value
# Set in bead layer (global, permanent)
gt rig config set gastown key value --global
# Block inheritance
gt rig config set gastown key --block
# Clear from wisp layer
gt rig config unset gastown key
```
### Rig Lifecycle
```bash
gt rig park gastown # Local: stop + prevent restart
gt rig unpark gastown # Local: allow restart
gt rig dock gastown # Global: mark as offline
gt rig undock gastown # Global: mark as operational
gt rig status gastown # Show current state
```
## Examples
### Temporary Priority Boost
```bash
# Base priority: 0 (from defaults)
# Give this rig temporary priority boost for urgent work
gt rig config set gastown priority_adjustment 10
# Effective priority: 0 + 10 = 10
# When done, clear it:
gt rig config unset gastown priority_adjustment
```
### Local Maintenance
```bash
# I'm upgrading the local clone, don't restart services
gt rig park gastown
# ... do maintenance ...
gt rig unpark gastown
```
### Project-Wide Maintenance
```bash
# Major refactor in progress, all clones should pause
gt rig dock gastown
# Syncs via git - other towns see the rig as docked
bd sync
# When done:
gt rig undock gastown
bd sync
```
### Block Auto-Restart Locally
```bash
# Rig bead says auto_restart: true
# But I'm debugging and don't want that here
gt rig config set gastown auto_restart --block
# Now auto_restart returns nil for this town only
```
## Implementation Notes
### Wisp Storage
Wisp config stored in `.beads-wisp/config/<rig>.json`:
```json
{
"rig": "gastown",
"values": {
"status": "parked",
"priority_adjustment": 10
},
"blocked": ["auto_restart"]
}
```
### Rig Bead Labels
Rig operational state stored as labels on the rig identity bead:
```bash
bd label add gt-rig-gastown status:docked
bd label remove gt-rig-gastown status:docked
```
### Daemon Integration
The daemon's lifecycle manager checks config before starting services:
```go
func (d *Daemon) maybeStartRigServices(rig string) {
r := d.getRig(rig)
status := r.GetConfig("status")
if status == "parked" || status == "docked" {
log.Info("Rig %s is offline, skipping auto-start", rig)
return
}
d.ensureWitness(rig)
d.ensureRefinery(rig)
}
```
## Related Documents
- `~/gt/docs/hop/PROPERTY-LAYERS.md` - Strategic architecture
- `wisp-architecture.md` - Wisp system design
- `agent-as-bead.md` - Agent identity beads (similar pattern)

View File

@@ -0,0 +1,306 @@
# Daemon/Boot/Deacon Watchdog Chain
> Autonomous health monitoring and recovery in Gas Town.
## Overview
Gas Town uses a three-tier watchdog chain for autonomous health monitoring:
```
Daemon (Go process) ← Dumb transport, 3-min heartbeat
└─► Boot (AI agent) ← Intelligent triage, fresh each tick
└─► Deacon (AI agent) ← Continuous patrol, long-running
└─► Witnesses & Refineries ← Per-rig agents
```
**Key insight**: The daemon is mechanical (can't reason), but health decisions need
intelligence (is the agent stuck or just thinking?). Boot bridges this gap.
## Design Rationale: Why Two Agents?
### The Problem
The daemon needs to ensure the Deacon is healthy, but:
1. **Daemon can't reason** - It's Go code following the ZFC principle (don't reason
about other agents). It can check "is session alive?" but not "is agent stuck?"
2. **Waking costs context** - Each time you spawn an AI agent, you consume context
tokens. In idle towns, waking Deacon every 3 minutes wastes resources.
3. **Observation requires intelligence** - Distinguishing "agent composing large
artifact" from "agent hung on tool prompt" requires reasoning.
### The Solution: Boot as Triage
Boot is a narrow, ephemeral AI agent that:
- Runs fresh each daemon tick (no accumulated context debt)
- Makes a single decision: should Deacon wake?
- Exits immediately after deciding
This gives us intelligent triage without the cost of keeping a full AI running.
### Why Not Merge Boot into Deacon?
We could have Deacon handle its own "should I be awake?" logic, but:
1. **Deacon can't observe itself** - A hung Deacon can't detect it's hung
2. **Context accumulation** - Deacon runs continuously; Boot restarts fresh
3. **Cost in idle towns** - Boot only costs tokens when it runs; Deacon costs
tokens constantly if kept alive
### Why Not Replace with Go Code?
The daemon could directly monitor agents without AI, but:
1. **Can't observe panes** - Go code can't interpret tmux output semantically
2. **Can't distinguish stuck vs working** - No reasoning about agent state
3. **Escalation is complex** - When to notify? When to force-restart? AI handles
nuanced decisions better than hardcoded thresholds
## Session Ownership
| Agent | Session Name | Location | Lifecycle |
|-------|--------------|----------|-----------|
| Daemon | (Go process) | `~/gt/daemon/` | Persistent, auto-restart |
| Boot | `gt-boot` | `~/gt/deacon/dogs/boot/` | Ephemeral, fresh each tick |
| Deacon | `hq-deacon` | `~/gt/deacon/` | Long-running, handoff loop |
**Critical**: Boot runs in `gt-boot`, NOT `hq-deacon`. This prevents Boot
from conflicting with a running Deacon session.
## Heartbeat Mechanics
### Daemon Heartbeat (3 minutes)
The daemon runs a heartbeat tick every 3 minutes:
```go
func (d *Daemon) heartbeatTick() {
d.ensureBootRunning() // 1. Spawn Boot for triage
d.checkDeaconHeartbeat() // 2. Belt-and-suspenders fallback
d.ensureWitnessesRunning() // 3. Witness health (checks tmux directly)
d.ensureRefineriesRunning() // 4. Refinery health (checks tmux directly)
d.triggerPendingSpawns() // 5. Bootstrap polecats
d.processLifecycleRequests() // 6. Cycle/restart requests
// Agent state derived from tmux, not recorded in beads (gt-zecmc)
}
```
### Deacon Heartbeat (continuous)
The Deacon updates `~/gt/deacon/heartbeat.json` at the start of each patrol cycle:
```json
{
"timestamp": "2026-01-02T18:30:00Z",
"cycle": 42,
"last_action": "health-scan",
"healthy_agents": 3,
"unhealthy_agents": 0
}
```
### Heartbeat Freshness
| Age | State | Boot Action |
|-----|-------|-------------|
| < 5 min | Fresh | Nothing (Deacon active) |
| 5-15 min | Stale | Nudge if pending mail |
| > 15 min | Very stale | Wake (Deacon may be stuck) |
## Boot Decision Matrix
When Boot runs, it observes:
- Is Deacon session alive?
- How old is Deacon's heartbeat?
- Is there pending mail for Deacon?
- What's in Deacon's tmux pane?
Then decides:
| Condition | Action | Command |
|-----------|--------|---------|
| Session dead | START | Exit; daemon calls `ensureDeaconRunning()` |
| Heartbeat > 15 min | WAKE | `gt nudge deacon "Boot wake: check your inbox"` |
| Heartbeat 5-15 min + mail | NUDGE | `gt nudge deacon "Boot check-in: pending work"` |
| Heartbeat fresh | NOTHING | Exit silently |
## Handoff Flow
### Deacon Handoff
The Deacon runs continuous patrol cycles. After N cycles or high context:
```
End of patrol cycle:
├─ Squash wisp to digest (ephemeral → permanent)
├─ Write summary to molecule state
└─ gt handoff -s "Routine cycle" -m "Details"
└─ Creates mail for next session
```
Next daemon tick:
```
Daemon → ensureDeaconRunning()
└─ Spawns fresh Deacon in gt-deacon
└─ SessionStart hook: gt mail check --inject
└─ Previous handoff mail injected
└─ Deacon reads and continues
```
### Boot Handoff (Rare)
Boot is ephemeral - it exits after each tick. No persistent handoff needed.
However, Boot uses a marker file to prevent double-spawning:
- Marker: `~/gt/deacon/dogs/boot/.boot-running` (TTL: 5 minutes)
- Status: `~/gt/deacon/dogs/boot/.boot-status.json` (last action/result)
If the marker exists and is recent, daemon skips Boot spawn for that tick.
## Degraded Mode
When tmux is unavailable, Gas Town enters degraded mode:
| Capability | Normal | Degraded |
|------------|--------|----------|
| Boot runs | As AI in tmux | As Go code (mechanical) |
| Observe panes | Yes | No |
| Nudge agents | Yes | No |
| Start agents | tmux sessions | Direct spawn |
Degraded Boot triage is purely mechanical:
- Session dead → start
- Heartbeat stale → restart
- No reasoning, just thresholds
## Fallback Chain
Multiple layers ensure recovery:
1. **Boot triage** - Intelligent observation, first line
2. **Daemon checkDeaconHeartbeat()** - Belt-and-suspenders if Boot fails
3. **Tmux-based discovery** - Daemon checks tmux sessions directly (no bead state)
4. **Human escalation** - Mail to overseer for unrecoverable states
## State Files
| File | Purpose | Updated By |
|------|---------|-----------|
| `deacon/heartbeat.json` | Deacon freshness | Deacon (each cycle) |
| `deacon/dogs/boot/.boot-running` | Boot in-progress marker | Boot spawn |
| `deacon/dogs/boot/.boot-status.json` | Boot last action | Boot triage |
| `deacon/health-check-state.json` | Agent health tracking | `gt deacon health-check` |
| `daemon/daemon.log` | Daemon activity | Daemon |
| `daemon/daemon.pid` | Daemon process ID | Daemon startup |
## Debugging
```bash
# Check Deacon heartbeat
cat ~/gt/deacon/heartbeat.json | jq .
# Check Boot status
cat ~/gt/deacon/dogs/boot/.boot-status.json | jq .
# View daemon log
tail -f ~/gt/daemon/daemon.log
# Manual Boot run
gt boot triage
# Manual Deacon health check
gt deacon health-check
```
## Common Issues
### Boot Spawns in Wrong Session
**Symptom**: Boot runs in `hq-deacon` instead of `gt-boot`
**Cause**: Session name confusion in spawn code
**Fix**: Ensure `gt boot triage` specifies `--session=gt-boot`
### Zombie Sessions Block Restart
**Symptom**: tmux session exists but Claude is dead
**Cause**: Daemon checks session existence, not process health
**Fix**: Kill zombie sessions before recreating: `gt session kill hq-deacon`
### Status Shows Wrong State
**Symptom**: `gt status` shows wrong state for agents
**Cause**: Previously bead state and tmux state could diverge
**Fix**: As of gt-zecmc, status derives state from tmux directly (no bead state for
observable conditions like running/stopped). Non-observable states (stuck, awaiting-gate)
are still stored in beads.
## Design Decision: Keep Separation
The issue [gt-1847v] considered three options:
### Option A: Keep Boot/Deacon Separation (CHOSEN)
- Boot is ephemeral, spawns fresh each heartbeat
- Boot runs in `gt-boot`, exits after triage
- Deacon runs in `hq-deacon`, continuous patrol
- Clear session boundaries, clear lifecycle
**Verdict**: This is the correct design. The implementation needs fixing, not the architecture.
### Option B: Merge Boot into Deacon (Rejected)
- Single `hq-deacon` session handles everything
- Deacon checks "should I be awake?" internally
**Why rejected**:
- Deacon can't observe itself (hung Deacon can't detect hang)
- Context accumulates even when idle (cost in quiet towns)
- No external watchdog means no recovery from Deacon failure
### Option C: Replace with Go Watchdog (Rejected)
- Daemon directly monitors witness/refinery
- No Boot, no Deacon AI for health checks
- AI agents only for complex decisions
**Why rejected**:
- Go code can't interpret tmux pane output semantically
- Can't distinguish "stuck" from "thinking deeply"
- Loses the intelligent triage that makes the system resilient
- Escalation decisions are nuanced (when to notify? force-restart?)
### Implementation Fixes Needed
The separation is correct; these bugs need fixing:
1. **Session confusion** (gt-sgzsb): Boot spawns in wrong session
2. **Zombie blocking** (gt-j1i0r): Daemon can't kill zombie sessions
3. ~~**Status mismatch** (gt-doih4): Bead vs tmux state divergence~~ → FIXED in gt-zecmc
4. **Ensure semantics** (gt-ekc5u): Start should kill zombies first
## Summary
The watchdog chain provides autonomous recovery:
- **Daemon**: Mechanical heartbeat, spawns Boot
- **Boot**: Intelligent triage, decides Deacon fate
- **Deacon**: Continuous patrol, monitors workers
Boot exists because the daemon can't reason and Deacon can't observe itself.
The separation costs complexity but enables:
1. **Intelligent triage** without constant AI cost
2. **Fresh context** for each triage decision
3. **Graceful degradation** when tmux unavailable
4. **Multiple fallback** layers for reliability