# Gas Town Architecture Gas Town is a multi-agent workspace manager that coordinates AI coding agents working on software projects. It provides the infrastructure for spawning workers, processing work through a priority queue, and coordinating agents through mail and issue tracking. **Key insight**: Work is a stream, not discrete batches. The Refinery's merge queue is the coordination mechanism. Beads (issues) are the data plane. There are no "swarm IDs" - just epics with children, processed by workers, merged through the queue. ## System Overview ```mermaid graph TB subgraph "Gas Town" Overseer["πŸ‘€ Overseer
(Human Operator)"] subgraph Town["Town (~/ai/)"] Mayor["🎩 Mayor
(Global Coordinator)"] subgraph Rig1["Rig: wyvern"] W1["πŸ‘ Witness"] R1["πŸ”§ Refinery"] P1["🐱 Polecat"] P2["🐱 Polecat"] P3["🐱 Polecat"] end subgraph Rig2["Rig: beads"] W2["πŸ‘ Witness"] R2["πŸ”§ Refinery"] P4["🐱 Polecat"] end end end Overseer --> Mayor Mayor --> W1 Mayor --> W2 W1 --> P1 W1 --> P2 W1 --> P3 W2 --> P4 P1 -.-> R1 P2 -.-> R1 P3 -.-> R1 P4 -.-> R2 ``` ## Core Concepts ### Town A **Town** is a complete Gas Town installation - the workspace where everything lives. A town contains: - Town configuration (`config/` directory) - Mayor's home (`mayor/` directory at town level) - One or more **Rigs** (managed project repositories) ### Rig A **Rig** is a container directory for managing a project and its agents. Importantly, the rig itself is NOT a git clone - it's a pure container that holds: - Rig configuration (`config.json`) - Rig-level beads database (`.beads/`) for coordinating work - Agent directories, each with their own git clone This design prevents agent confusion: each agent has exactly one place to work (their own clone), with no ambiguous "rig root" that could tempt a lost agent. ### Overseer (Human Operator) The **Overseer** is the human operator of Gas Town - not an AI agent, but the person who runs the system. The Overseer: - **Sets strategy**: Defines project goals and priorities - **Provisions resources**: Adds machines, polecats, and rigs - **Reviews output**: Approves merged code and completed work - **Handles escalations**: Makes final decisions on stuck or ambiguous work - **Operates the system**: Runs `gt` commands, monitors dashboards The Mayor reports to the Overseer. When agents can't resolve issues, they escalate up through the chain: Polecat β†’ Witness β†’ Mayor β†’ Overseer. ### Agents Gas Town has four AI agent roles: | Agent | Scope | Responsibility | |-------|-------|----------------| | **Mayor** | Town-wide | Global coordination, work dispatch, cross-rig decisions | | **Witness** | Per-rig | Worker lifecycle, nudging, pre-kill verification, session cycling | | **Refinery** | Per-rig | Merge queue processing, PR review, integration | | **Polecat** | Per-rig | Implementation work on assigned issues | ### Mail Agents communicate via **mail** - messages stored as beads issues with `type=message`. Mail enables: - Work assignment (Mayor β†’ Refinery β†’ Polecat) - Status reporting (Polecat β†’ Witness β†’ Mayor) - Session handoff (Agent β†’ Self for context cycling) - Escalation (Witness β†’ Mayor for stuck workers) **Two-tier mail architecture:** - **Town beads** (prefix: `gm-`): Mayor inbox, cross-rig coordination, handoffs - **Rig beads** (prefix: varies): Rig-local agent communication Mail commands use `bd mail` under the hood: ```bash gt mail send mayor/ -s "Subject" -m "Body" # Uses bd mail send gt mail inbox # Uses bd mail inbox gt mail read gm-abc # Uses bd mail read ``` ```mermaid flowchart LR subgraph "Communication Flows" direction LR Mayor -->|"dispatch work"| Refinery Refinery -->|"assign issue"| Polecat Polecat -->|"done signal"| Witness Witness -->|"work complete"| Mayor Witness -->|"escalation"| Mayor Mayor -->|"escalation"| Overseer["πŸ‘€ Overseer"] end ``` ### Beads **Beads** is the issue tracking system. Gas Town agents use beads to: - Track work items (`bd ready`, `bd list`) - Create issues for discovered work (`bd create`) - Claim and complete work (`bd update`, `bd close`) - Sync state to git (`bd sync`) Polecats have direct beads write access and file their own issues. #### Beads Configuration for Multi-Agent Gas Town uses beads in a **shared database** configuration where all agents in a rig share one `.beads/` directory. This requires careful configuration: | Agent Type | BEADS_DIR | BEADS_NO_DAEMON | sync-branch | Notes | |------------|-----------|-----------------|-------------|-------| | Polecat (worktree) | rig/.beads | **YES (required)** | recommended | Daemon can't handle worktrees | | Polecat (full clone) | rig/.beads | Optional | recommended | Daemon safe but sync-branch helps | | Refinery | rig/.beads | No | optional | Owns main, daemon is fine | | Witness | rig/.beads | No | optional | Read-mostly access | | Mayor | rig/.beads | No | optional | Infrequent access | **Critical: Worktrees require no-daemon mode.** The beads daemon doesn't know which branch each worktree has checked out, and can commit/push to the wrong branch. **Environment setup when spawning agents:** ```bash # For worktree polecats (REQUIRED) export BEADS_DIR=/path/to/rig/.beads export BEADS_NO_DAEMON=1 # For full-clone polecats (recommended) export BEADS_DIR=/path/to/rig/.beads # Daemon is safe, but consider sync-branch for coordination # Rig beads config.yaml should include: sync-branch: beads-sync # Separate branch for beads commits ``` **Why sync-branch?** When multiple agents share a beads database, using a dedicated sync branch prevents beads commits from interleaving with code commits on feature branches. ## Directory Structure ### Town Level ``` ~/gt/ # Town root (Gas Town harness) β”œβ”€β”€ CLAUDE.md # Mayor role prompting (at town root) β”œβ”€β”€ .beads/ # Town-level beads (prefix: gm-) β”‚ β”œβ”€β”€ beads.db # Mayor mail, coordination, handoffs β”‚ └── config.yaml β”‚ β”œβ”€β”€ mayor/ # Mayor's HOME at town level β”‚ β”œβ”€β”€ town.json # {"type": "town", "name": "..."} β”‚ β”œβ”€β”€ rigs.json # Registry of managed rigs β”‚ └── state.json # Mayor state (NO mail/ directory) β”‚ β”œβ”€β”€ gastown/ # A rig (project container) └── wyvern/ # Another rig ``` **Note**: Mayor's mail is now in town beads (`gm-*` issues), not JSONL files. ### Rig Level Created by `gt rig add `: ``` gastown/ # Rig = container (NOT a git clone) β”œβ”€β”€ config.json # Rig configuration (git_url, beads prefix) β”œβ”€β”€ .beads/ β†’ mayor/rig/.beads # Symlink to canonical beads in Mayor β”‚ β”œβ”€β”€ mayor/ # Mayor's per-rig presence β”‚ β”œβ”€β”€ rig/ # CANONICAL clone (beads authority) β”‚ β”‚ └── .beads/ # Canonical rig beads (prefix: gt-, etc.) β”‚ └── state.json β”‚ β”œβ”€β”€ refinery/ # Refinery agent (merge queue processor) β”‚ β”œβ”€β”€ rig/ # Refinery's clone (for merge operations) β”‚ └── state.json β”‚ β”œβ”€β”€ witness/ # Witness agent (per-rig pit boss) β”‚ └── state.json # No clone needed (monitors polecats) β”‚ β”œβ”€β”€ crew/ # Overseer's personal workspaces β”‚ └── / # Workspace (full git clone) β”‚ └── polecats/ # Worker directories (git worktrees) β”œβ”€β”€ Nux/ # Worktree from Mayor's clone └── Toast/ # Worktree from Mayor's clone ``` **Beads architecture:** - Mayor's clone holds the canonical `.beads/` for the rig - Rig root symlinks `.beads/` β†’ `mayor/rig/.beads` - All agents (crew, polecats, refinery) inherit beads via parent lookup - Polecats are git worktrees from Mayor's clone (much faster than full clones) **Key points:** - The rig root has no `.git/` - it's not a repository - All agents use `BEADS_DIR` to point to the rig's `.beads/` - Refinery's clone is the authoritative "main branch" view - Witness may not need its own clone (just monitors polecat state) ```mermaid graph TB subgraph Rig["Rig: gastown (container, NOT a git clone)"] Config["config.json"] Beads[".beads/"] subgraph Polecats["polecats/"] Nux["Nux/
(git clone)"] Toast["Toast/
(git clone)"] end subgraph Refinery["refinery/"] RefRig["rig/
(canonical main)"] RefState["state.json"] end subgraph Witness["witness/"] WitState["state.json"] end subgraph MayorRig["mayor/"] MayRig["rig/
(git clone)"] MayState["state.json"] end subgraph Crew["crew/"] CrewMain["main/
(git clone)"] end end Beads -.->|BEADS_DIR| Nux Beads -.->|BEADS_DIR| Toast Beads -.->|BEADS_DIR| RefRig Beads -.->|BEADS_DIR| MayRig Beads -.->|BEADS_DIR| CrewMain ``` ### ASCII Directory Layout For reference without mermaid rendering: ``` ~/gt/ # TOWN ROOT (Gas Town harness) β”œβ”€β”€ CLAUDE.md # Mayor role prompting β”œβ”€β”€ .beads/ # Town-level beads (gm-* prefix) β”‚ β”œβ”€β”€ beads.db # Mayor mail, coordination β”‚ └── config.yaml β”‚ β”œβ”€β”€ mayor/ # Mayor's home (at town level) β”‚ β”œβ”€β”€ town.json # {"type": "town", "name": "..."} β”‚ β”œβ”€β”€ rigs.json # Registry of managed rigs β”‚ └── state.json # Mayor state (no mail/ dir) β”‚ β”œβ”€β”€ gastown/ # RIG (container, NOT a git clone) β”‚ β”œβ”€β”€ config.json # Rig configuration β”‚ β”œβ”€β”€ .beads/ β†’ mayor/rig/.beads # Symlink to Mayor's canonical beads β”‚ β”‚ β”‚ β”œβ”€β”€ mayor/ # Mayor's per-rig presence β”‚ β”‚ β”œβ”€β”€ rig/ # CANONICAL clone (beads + worktree base) β”‚ β”‚ β”‚ β”œβ”€β”€ .git/ β”‚ β”‚ β”‚ β”œβ”€β”€ .beads/ # CANONICAL rig beads (gt-* prefix) β”‚ β”‚ β”‚ └── β”‚ β”‚ └── state.json β”‚ β”‚ β”‚ β”œβ”€β”€ refinery/ # Refinery agent (merge queue) β”‚ β”‚ β”œβ”€β”€ rig/ # Refinery's clone (for merges) β”‚ β”‚ β”‚ β”œβ”€β”€ .git/ β”‚ β”‚ β”‚ └── β”‚ β”‚ └── state.json β”‚ β”‚ β”‚ β”œβ”€β”€ witness/ # Witness agent (pit boss) β”‚ β”‚ └── state.json # No clone needed β”‚ β”‚ β”‚ β”œβ”€β”€ crew/ # Overseer's personal workspaces β”‚ β”‚ └── / # Full clone (inherits beads from rig) β”‚ β”‚ β”œβ”€β”€ .git/ β”‚ β”‚ └── β”‚ β”‚ β”‚ β”œβ”€β”€ polecats/ # Worker directories (worktrees) β”‚ β”‚ β”œβ”€β”€ Nux/ # Git worktree from Mayor's clone β”‚ β”‚ β”‚ └── # (inherits beads from rig) β”‚ β”‚ └── Toast/ # Git worktree from Mayor's clone β”‚ β”‚ β”‚ └── plugins/ # Optional plugins β”‚ └── merge-oracle/ β”‚ β”œβ”€β”€ CLAUDE.md β”‚ └── state.json β”‚ └── wyvern/ # Another rig (same structure) β”œβ”€β”€ config.json β”œβ”€β”€ .beads/ β†’ mayor/rig/.beads β”œβ”€β”€ mayor/ β”œβ”€β”€ refinery/ β”œβ”€β”€ witness/ β”œβ”€β”€ crew/ └── polecats/ ``` **Key changes from earlier design:** - Town beads (`gm-*`) hold Mayor mail instead of JSONL files - Mayor has per-rig clone that's canonical for beads and worktrees - Rig `.beads/` symlinks to Mayor's canonical beads - Polecats are git worktrees from Mayor's clone (fast) ### Why Decentralized? Agents live IN rigs rather than in a central location: - **Locality**: Each agent works in the context of its rig - **Independence**: Rigs can be added/removed without restructuring - **Parallelism**: Multiple rigs can have active workers simultaneously - **Simplicity**: Agent finds its context by looking at its own directory ## Agent Responsibilities ### Mayor The Mayor is the global coordinator: - **Work dispatch**: Spawns workers for issues, coordinates batch work on epics - **Cross-rig coordination**: Routes work between rigs when needed - **Escalation handling**: Resolves issues Witnesses can't handle - **Strategic decisions**: Architecture, priorities, integration planning **NOT Mayor's job**: Per-worker cleanup, session killing, nudging workers ### Witness The Witness is the per-rig "pit boss": - **Worker monitoring**: Track polecat health and progress - **Nudging**: Prompt workers toward completion - **Pre-kill verification**: Ensure git state is clean before killing sessions - **Session lifecycle**: Kill sessions, update worker state - **Self-cycling**: Hand off to fresh session when context fills - **Escalation**: Report stuck workers to Mayor **Key principle**: Witness owns ALL per-worker cleanup. Mayor is never involved in routine worker management. ### Refinery The Refinery manages the merge queue: - **PR review**: Check polecat work before merging - **Integration**: Merge completed work to main - **Conflict resolution**: Handle merge conflicts - **Quality gate**: Ensure tests pass, code quality maintained ```mermaid flowchart LR subgraph "Merge Queue Flow" P1[Polecat 1
branch] --> Q[Merge Queue] P2[Polecat 2
branch] --> Q P3[Polecat 3
branch] --> Q Q --> R{Refinery} R -->|merge| M[main] R -->|conflict| P1 end ``` #### Direct Landing (Bypass Merge Queue) Sometimes Mayor needs to land a polecat's work directly, skipping the Refinery: | Scenario | Use Direct Landing? | |----------|---------------------| | Single polecat, simple change | Yes | | Urgent hotfix | Yes | | Refinery unavailable | Yes | | Multiple polecats, potential conflicts | No - use Refinery | | Complex changes needing review | No - use Refinery | **Commands:** ```bash # Normal flow (through Refinery) gt merge-queue add # Polecat signals PR ready gt refinery process # Refinery processes queue # Direct landing (Mayor bypasses Refinery) gt land --direct / # Land directly to main gt land --direct --force / # Skip safety checks gt land --direct --skip-tests / # Skip test run gt land --direct --dry-run / # Preview only ``` **Direct landing workflow:** ```mermaid sequenceDiagram participant M as 🎩 Mayor participant R as Refinery Clone participant P as Polecat Branch participant B as πŸ“¦ Beads M->>M: Verify polecat session terminated M->>P: Check git state clean M->>R: Fetch polecat branch M->>R: Merge to main (fast-forward or merge commit) M->>R: Run tests (optional) M->>R: Push to origin M->>B: Close associated issue M->>P: Delete polecat branch (cleanup) ``` **Safety checks (skippable with --force):** 1. Polecat session must be terminated 2. Git working tree must be clean 3. No merge conflicts with main 4. Tests pass (skippable with --skip-tests) **When direct landing makes sense:** - Mayor is doing sequential, non-swarming work (like GGT scaffolding) - Single worker completed an isolated task - Hotfix needs to land immediately - Refinery agent is down or unavailable ### Polecat Polecats are the workers that do actual implementation: - **Issue completion**: Work on assigned beads issues - **Self-verification**: Run decommission checklist before signaling done - **Beads access**: Create issues for discovered work, close completed work - **Clean handoff**: Ensure git state is clean for Witness verification ## Key Workflows ### Work Dispatch Work flows through the system as a stream. The Overseer spawns workers, they process issues, and completed work enters the merge queue. ```mermaid sequenceDiagram participant O as πŸ‘€ Overseer participant M as 🎩 Mayor participant W as πŸ‘ Witness participant P as 🐱 Polecats participant R as πŸ”§ Refinery O->>M: Spawn workers for epic M->>W: Assign issues to workers W->>P: Start work loop For each worker P->>P: Work on issue P->>R: Submit to merge queue R->>R: Review & merge end R->>M: All work merged M->>O: Report results ``` **Note**: There is no "swarm ID" or batch boundary. Workers process issues independently. The merge queue handles coordination. "Swarming an epic" is just spawning multiple workers for the epic's child issues. ### Worker Cleanup (Witness-Owned) ```mermaid sequenceDiagram participant P as 🐱 Polecat participant W as πŸ‘ Witness participant M as 🎩 Mayor participant O as πŸ‘€ Overseer P->>P: Complete work P->>W: Done signal W->>W: Capture git state W->>W: Assess cleanliness alt Git state dirty W->>P: Nudge (fix issues) P->>P: Fix issues P->>W: Done signal (retry) end alt Clean after ≀3 tries W->>W: Verify clean W->>P: Kill session else Stuck after 3 tries W->>M: Escalate alt Mayor can fix M->>W: Resolution else Mayor can't fix M->>O: Escalate to human O->>M: Decision end end ``` ### Session Cycling (Mail-to-Self) When an agent's context fills, it hands off to its next session: 1. **Recognize**: Notice context filling (slow responses, losing track of state) 2. **Capture**: Gather current state (active work, pending decisions, warnings) 3. **Compose**: Write structured handoff note 4. **Send**: Mail handoff to own inbox 5. **Exit**: End session cleanly 6. **Resume**: New session reads handoff, picks up where old session left off ```mermaid sequenceDiagram participant S1 as Agent Session 1 participant MB as πŸ“¬ Mailbox participant S2 as Agent Session 2 S1->>S1: Context filling up S1->>S1: Capture current state S1->>MB: Send handoff note S1->>S1: Exit cleanly Note over S1,S2: Session boundary S2->>MB: Check inbox MB->>S2: Handoff note S2->>S2: Resume from handoff state ``` ## Key Design Decisions ### 1. Witness Owns Worker Cleanup **Decision**: Witness handles all per-worker cleanup. Mayor is never involved. **Rationale**: - Separation of concerns (Mayor strategic, Witness operational) - Reduced coordination overhead - Faster shutdown - Cleaner escalation path ### 2. Polecats Have Direct Beads Access **Decision**: Polecats can create, update, and close beads issues directly. **Rationale**: - Simplifies architecture (no proxy through Witness) - Empowers workers to file discovered work - Faster feedback loop - Beads v0.30.0+ handles multi-agent conflicts ### 3. Session Cycling via Mail-to-Self **Decision**: Agents mail handoff notes to themselves when cycling sessions. **Rationale**: - Consistent pattern across all agent types - Timestamped and logged - Works with existing inbox infrastructure - Clean separation between sessions ### 4. Decentralized Agent Architecture **Decision**: Agents live in rigs (`/witness/rig/`) not centralized (`mayor/rigs//`). **Rationale**: - Agents work in context of their rig - Rigs are independent units - Simpler role detection - Cleaner directory structure ### 5. Visible Config Directory **Decision**: Use `config/` not `.gastown/` for town configuration. **Rationale**: AI models often miss hidden directories. Visible is better. ### 6. Rig as Container, Not Clone **Decision**: The rig directory is a pure container, not a git clone of the project. **Rationale**: - **Prevents confusion**: Agents historically get lost (polecats in refinery, mayor in polecat dirs). If the rig root were a clone, it's another tempting target for confused agents. Two confused agents at once = collision disaster. - **Single work location**: Each agent has exactly one place to work (their own `/rig/` clone) - **Clear role detection**: "Am I in a `/rig/` directory?" = I'm in an agent clone - **Refinery is canonical main**: Refinery's clone serves as the authoritative "main branch" - it pulls, merges PRs, and pushes. No need for a separate rig-root clone. ### 7. Plugins as Agents **Decision**: Plugins are just additional agents with identities, mailboxes, and access to beads. No special plugin infrastructure. **Rationale**: - Fits Gas Town's intentionally rough aesthetic - Zero new infrastructure needed (uses existing mail, beads, identities) - Composable - plugins can invoke other plugins via mail - Debuggable - just look at mail logs and bead history - Extensible - anyone can add a plugin by creating a directory **Structure**: `/plugins//` with optional `rig/`, `CLAUDE.md`, `mail/`, `state.json`. ### 8. Rig-Level Beads via BEADS_DIR **Decision**: Each rig has its own `.beads/` directory. Agents use the `BEADS_DIR` environment variable to point to it. **Rationale**: - **Centralized issue tracking**: All polecats in a rig share the same beads database - **Project separation**: Even if the project repo has its own `.beads/`, Gas Town agents use the rig's beads instead - **OSS-friendly**: For contributing to projects you don't own, rig beads stay separate from upstream - **Already supported**: Beads supports `BEADS_DIR` env var (see beads `internal/beads/beads.go`) **Configuration**: Gas Town sets `BEADS_DIR` when spawning agents: ```bash export BEADS_DIR=/path/to/rig/.beads ``` **See also**: beads issue `bd-411u` for documentation of this pattern. ### 9. Direct Landing Option **Decision**: Mayor can land polecat work directly, bypassing the Refinery merge queue. **Rationale**: - **Flexibility**: Not all work needs merge queue overhead - **Sequential work**: Mayor doing non-swarming work (like GGT scaffolding) shouldn't need Refinery - **Emergency path**: Hotfixes can land immediately - **Resilience**: System works even if Refinery is down **Constraints**: - Direct landing still uses Refinery's clone as the canonical main - Safety checks prevent landing dirty or conflicting work - Mayor takes responsibility for quality (no Refinery review) **Commands**: ```bash gt land --direct / # Standard direct land gt land --direct --force / # Skip safety checks ``` ### 10. Beads Daemon Awareness **Decision**: Gas Town must disable the beads daemon for worktree-based polecats. **Rationale**: - The beads daemon doesn't track which branch each worktree has checked out - Daemon can commit beads changes to the wrong branch - This is a beads limitation, not a Gas Town bug - Full clones don't have this problem **Configuration**: ```bash # For worktree polecats (REQUIRED) export BEADS_NO_DAEMON=1 # For full-clone polecats (optional) # Daemon is safe, no special config needed ``` **See also**: beads docs/WORKTREES.md and docs/DAEMON.md for details. ### 11. Work is a Stream (No Swarm IDs) **Decision**: Work state is encoded in beads epics and issues. There are no "swarm IDs" or separate swarm infrastructure - the epic IS the grouping, the merge queue IS the coordination. **Rationale**: - **No new infrastructure**: Beads already provides hierarchy, dependencies, status, priority - **Shared state**: All rig agents share the same `.beads/` via BEADS_DIR - **Queryable**: `bd ready` finds work with no blockers, enabling multi-wave orchestration - **Auditable**: Beads history shows work progression - **Resilient**: Beads sync handles multi-agent conflicts - **No boundary problem**: When does a swarm start/end? Who's in it? These questions dissolve - work is a stream **How it works**: - Create an epic with child issues for batch work - Dependencies encode ordering (task B depends on task A) - Status transitions track progress (open β†’ in_progress β†’ closed) - Witness queries `bd ready` to find next available work - Spawn workers as needed - add more anytime - Batch complete = all child issues closed (or just keep going) **Example**: Batch work on authentication bugs: ``` gt-auth-epic # Epic: "Fix authentication bugs" β”œβ”€β”€ gt-auth-epic.1 # "Fix login timeout" (ready, no deps) β”œβ”€β”€ gt-auth-epic.2 # "Fix session expiry" (ready, no deps) └── gt-auth-epic.3 # "Update auth tests" (blocked by .1 and .2) ``` Workers process issues independently. Work flows through the merge queue. No "swarm ID" needed - the epic provides grouping, labels provide ad-hoc queries, dependencies provide sequencing. ### 12. Agent Session Lifecycle (Daemon Protection) **Decision**: A background daemon manages agent session lifecycles, including cycling sessions when agents request handoff. **Rationale**: - Agents can't restart themselves after exiting - Handoff mail is useless without someone to start the new session - Daemon provides reliable session management outside agent context - Enables autonomous long-running operation (hours/days) **Session cycling protocol**: 1. Agent detects context exhaustion or requests cycle 2. Agent sends handoff mail to own inbox 3. Agent sets `requesting_cycle: true` in state.json 4. Agent exits (or sends explicit signal to daemon) 5. Daemon detects exit + cycle request flag 6. Daemon starts new session 7. New session reads handoff mail, resumes work **Daemon responsibilities**: - Monitor agent session health (heartbeat) - Detect session exit - Check cycle request flag in state.json - Start replacement session if cycle requested - Clear cycle flag after successful restart - Report failures to Mayor (escalation) **Applies to**: Witness, Refinery (both long-running agents that may exhaust context) ```mermaid sequenceDiagram participant A1 as Agent Session 1 participant S as State.json participant D as Daemon participant A2 as Agent Session 2 participant MB as Mailbox A1->>MB: Send handoff mail A1->>S: Set requesting_cycle: true A1->>A1: Exit cleanly D->>D: Detect session exit D->>S: Check requesting_cycle S->>D: true D->>D: Start new session D->>S: Clear requesting_cycle A2->>MB: Read handoff mail A2->>A2: Resume from handoff ``` ### 13. Resource-Constrained Worker Pool **Decision**: Each rig has a configurable `max_workers` limit for concurrent polecats. **Rationale**: - Claude Code can use 500MB+ RAM per session - Prevents resource exhaustion on smaller machines - Enables autonomous operation without human oversight - Witness respects limit when spawning new workers **Configuration** (in rig config.json): ```json { "type": "rig", "max_workers": 8, "worker_spawn_delay": "5s" } ``` **Witness behavior**: - Query active worker count before spawning - If at limit, wait for workers to complete - Prioritize higher-priority ready issues ### 14. Outpost Abstraction for Federation **Decision**: Federation uses an "Outpost" abstraction to support multiple compute backends (local, SSH/VM, Cloud Run, etc.) through a unified interface. **Rationale**: - Different workloads need different compute: burst vs long-running, cheap vs fast - Cloud Run's pay-per-use model is ideal for elastic burst capacity - VMs are better for autonomous long-running work - Local is always the default for development - Platform flexibility lets users choose based on their needs and budget **Key insight**: Cloud Run's persistent HTTP/2 connections solve the "zero to one" cold start problem, making container workers viable for interactive-ish work at ~$0.017 per 5-minute session. **Design principles**: 1. **Local-first** - Remote outposts are overflow, not primary 2. **Git remains source of truth** - All outposts sync via git 3. **HTTP for Cloud Run** - Don't force filesystem mail onto containers 4. **Graceful degradation** - System works with any subset of outposts **See**: `docs/federation-design.md` for full architectural analysis. ## Multi-Wave Work Processing For large task trees (like implementing GGT itself), workers can process multiple "waves" of work automatically based on the dependency graph. ### Wave Orchestration A wave is not explicitly managed - it emerges from dependencies: 1. **Wave 1**: All issues with no dependencies (`bd ready`) 2. **Wave 2**: Issues whose dependencies are now closed 3. **Wave N**: Continue until all work is done ```mermaid graph TD subgraph "Wave 1 (no dependencies)" A[Task A] B[Task B] C[Task C] end subgraph "Wave 2 (depends on Wave 1)" D[Task D] E[Task E] end subgraph "Wave 3 (depends on Wave 2)" F[Task F] end A --> D B --> D C --> E D --> F E --> F ``` ### Witness Work Loop ``` while epic has open issues: ready_issues = bd ready --parent if ready_issues is empty and workers_active: wait for worker completion continue for issue in ready_issues: if active_workers < max_workers: spawn worker for issue else: break # wait for capacity monitor workers, handle completions all work complete - report to Mayor ``` ### Long-Running Autonomy With daemon session cycling, the system can run autonomously for extended periods: - **Witness cycles**: Every few hours as context fills - **Refinery cycles**: As merge queue grows complex - **Workers cycle**: If individual tasks are very large - **Daemon persistence**: Survives all agent restarts The daemon is the only truly persistent component. All agents are ephemeral sessions that hand off state via mail. Work is a continuous stream - you can add new issues, spawn new workers, reprioritize the queue, all without "starting a new swarm" or managing batch boundaries. ## Configuration ### town.json ```json { "type": "town", "version": 1, "name": "stevey-gastown", "created_at": "2024-01-15T10:30:00Z" } ``` ### rigs.json ```json { "version": 1, "rigs": { "wyvern": { "git_url": "https://github.com/steveyegge/wyvern", "added_at": "2024-01-15T10:30:00Z" } } } ``` ### rig.json (Per-Rig Config) Each rig has a `config.json` at its root: ```json { "type": "rig", "version": 1, "name": "wyvern", "git_url": "https://github.com/steveyegge/wyvern", "beads": { "prefix": "wyv", "sync_remote": "origin" // Optional: git remote for bd sync } } ``` The rig's `.beads/` directory is always at the rig root. Gas Town: 1. Creates `.beads/` when adding a rig (`gt rig add`) 2. Runs `bd init --prefix ` to initialize it 3. Sets `BEADS_DIR` environment variable when spawning agents This ensures all agents in the rig share a single beads database, separate from any beads the project itself might use. ## CLI Commands ### Town Management ```bash gt install [path] # Install Gas Town at path gt doctor # Check workspace health gt doctor --fix # Auto-fix issues ``` ### Agent Operations ```bash gt status # Overall town status gt rigs # List all rigs gt polecats # List polecats in a rig ``` ### Communication ```bash gt inbox # Check inbox gt send -s "Subject" -m "Message" gt inject "Message" # Direct injection to session gt capture "" # Run command in polecat session ``` ### Session Management ```bash gt spawn --issue # Start polecat on issue gt kill # Kill polecat session gt wake # Mark polecat as active gt sleep # Mark polecat as inactive ``` ### Landing & Merge Queue ```bash gt merge-queue add # Add to merge queue (normal flow) gt merge-queue list # Show pending merges gt refinery process # Trigger Refinery to process queue gt land --direct / # Direct landing (bypass Refinery) gt land --direct --force ... # Skip safety checks gt land --direct --skip-tests ... # Skip test verification gt land --direct --dry-run ... # Preview only ``` ### Emergency Operations ```bash gt stop --all # Kill ALL sessions (emergency halt) gt stop --rig # Kill all sessions in one rig gt doctor --fix # Auto-repair common issues ``` ## Plugins Gas Town supports **plugins** - but in the simplest possible way: plugins are just more agents. ### Philosophy Gas Town is intentionally rough and lightweight. A "credible plugin system" with manifests, schemas, and invocation frameworks would be pretentious for a project named after a Mad Max wasteland. Instead, plugins follow the same patterns as all Gas Town agents: - **Identity**: Plugins have persistent identities like polecats and witnesses - **Communication**: Plugins use mail for input/output - **Artifacts**: Plugins produce beads, files, or other handoff artifacts - **Lifecycle**: Plugins can be invoked on-demand or at specific workflow points ### Plugin Structure Plugins live in a rig's `plugins/` directory: ``` wyvern/ # Rig β”œβ”€β”€ plugins/ β”‚ └── merge-oracle/ # A plugin β”‚ β”œβ”€β”€ rig/ # Plugin's git clone (if needed) β”‚ β”œβ”€β”€ CLAUDE.md # Plugin's instructions/prompts β”‚ β”œβ”€β”€ mail/inbox.jsonl # Plugin's mailbox β”‚ └── state.json # Plugin state (optional) ``` That's it. No plugin.yaml, no special registration. If the directory exists, the plugin exists. ### Invoking Plugins Plugins are invoked like any other agent - via mail: ```bash # Refinery asks merge-oracle to analyze pending changesets gt send wyvern/plugins/merge-oracle -s "Analyze merge queue" -m "..." # Mayor asks plan-oracle for a work breakdown gt send beads/plugins/plan-oracle -s "Plan for bd-xyz" -m "..." ``` Plugins do their work (potentially spawning Claude sessions) and respond via mail, creating any necessary artifacts (beads, files, branches). ### Hook Points Existing agents can be configured to notify plugins at specific points. This is just convention - agents check if a plugin exists and mail it: | Workflow Point | Agent | Example Plugin | |----------------|-------|----------------| | Before merge processing | Refinery | merge-oracle | | Before work dispatch | Mayor | plan-oracle | | On worker stuck | Witness | debug-oracle | | On PR ready | Refinery | review-oracle | Configuration is minimal - perhaps a line in the agent's CLAUDE.md or state.json noting which plugins to consult. ### Example: Merge Oracle The **merge-oracle** plugin analyzes changesets before the Refinery processes them: **Input** (via mail from Refinery): - List of pending changesets - Current merge queue state **Processing**: 1. Build overlap graph (which changesets touch same files/regions) 2. Classify disjointness (fully disjoint β†’ parallel safe, overlapping β†’ needs sequencing) 3. Use LLM to assess semantic complexity of overlapping components 4. Identify high-risk patterns (deletions vs modifications, conflicting business logic) **Output**: - Bead with merge plan (parallel groups, sequential chains) - Mail to Refinery with recommendation (proceed / escalate to Mayor) - If escalation needed: mail to Mayor with explanation The merge-oracle's `CLAUDE.md` contains the prompts and classification criteria. Gas Town doesn't need to know the internals. ### Example: Plan Oracle The **plan-oracle** plugin helps decompose work: **Input**: An issue/epic that needs breakdown **Processing**: 1. Analyze the scope and requirements 2. Identify dependencies and blockers 3. Estimate complexity (for parallelization decisions) 4. Suggest task breakdown **Output**: - Beads for the sub-tasks (created via `bd create`) - Dependency links (via `bd dep add`) - Mail back with summary and recommendations ### Why This Design 1. **Fits Gas Town's aesthetic**: Rough, text-based, agent-shaped 2. **Zero new infrastructure**: Uses existing mail, beads, identities 3. **Composable**: Plugins can invoke other plugins 4. **Debuggable**: Just look at mail logs and bead history 5. **Extensible**: Anyone can add a plugin by creating a directory ### Plugin Discovery ```bash gt plugins # List plugins in a rig gt plugin status # Check plugin state ``` Or just `ls /plugins/`. ## Failure Modes and Recovery Gas Town is designed for resilience. Common failure modes and their recovery: | Failure | Detection | Recovery | |---------|-----------|----------| | Agent crash | Session gone, state shows 'working' | `gt doctor` detects, reset state to idle | | Git dirty state | Witness pre-kill check fails | Nudge worker, or manual commit/discard | | Beads sync conflict | `bd sync` fails | Beads tombstones handle most cases | | Tmux crash | All sessions inaccessible | `gt doctor --fix` cleans up | | Stuck work | No progress for 30+ minutes | Witness escalates, Overseer intervenes | | Disk full | Write operations fail | Clean logs, remove old clones | ### Recovery Principles 1. **Fail safe**: Prefer stopping over corrupting data 2. **State is recoverable**: Git and beads have built-in recovery 3. **Doctor heals**: `gt doctor --fix` handles common issues 4. **Emergency stop**: `gt stop --all` as last resort 5. **Human escalation**: Some failures need Overseer intervention ### Doctor Checks `gt doctor` performs health checks at both workspace and rig levels: **Workspace checks**: Config validity, Mayor mailbox, rig registry **Rig checks**: Git state, clone health, Witness/Refinery presence **Work checks**: Stuck detection, zombie sessions, heartbeat health Run `gt doctor` regularly. Run `gt doctor --fix` to auto-repair issues. ## Federation: Outposts Federation enables Gas Town to scale across machines via **Outposts** - remote compute environments that can run workers. **Full design**: See `docs/federation-design.md` ### Outpost Types | Type | Description | Cost Model | Best For | |------|-------------|------------|----------| | Local | Current tmux model | Free | Development, primary work | | SSH/VM | Full Gas Town clone on VM | Always-on | Long-running, autonomous | | CloudRun | Container workers on GCP | Pay-per-use | Burst, elastic, background | ### Core Abstraction ```go type Outpost interface { Name() string Type() OutpostType // local, ssh, cloudrun MaxWorkers() int ActiveWorkers() int Spawn(issue string, config WorkerConfig) (Worker, error) Workers() []Worker Ping() error } type Worker interface { ID() string Outpost() string Status() WorkerStatus // idle, working, done, failed Issue() string Attach() error // for interactive outposts Logs() (io.Reader, error) Stop() error } ``` ### Configuration ```yaml # ~/ai/config/outposts.yaml outposts: - name: local type: local max_workers: 4 - name: gce-burst type: ssh host: 10.0.0.5 user: steve town_path: /home/steve/ai max_workers: 8 - name: cloudrun-burst type: cloudrun project: my-gcp-project region: us-central1 service: gastown-worker max_workers: 20 cost_cap_hourly: 5.00 policy: default_preference: [local, gce-burst, cloudrun-burst] ``` ### Cloud Run Workers Cloud Run enables elastic, pay-per-use workers: - **Persistent HTTP/2 connections** solve cold start (zero-to-one) problem - **Cost**: ~$0.017 per 5-minute worker session - **Scaling**: 0β†’N automatically based on demand - **When idle**: Scales to zero, costs nothing Workers receive work via HTTP, clone code from git, run Claude, push results. No filesystem mail needed - HTTP is the control plane. ### SSH/VM Outposts Full Gas Town clone on remote machines: - **Model**: Complete town installation via SSH - **Workers**: Remote tmux sessions - **Sync**: Git for code and beads - **Good for**: Long-running work, full autonomy if disconnected ### Design Principles 1. **Outpost abstraction** - Support multiple backends via unified interface 2. **Local-first** - Remote outposts are for overflow/burst, not primary 3. **Git as source of truth** - Code and beads sync everywhere 4. **HTTP for Cloud Run** - Don't force mail onto stateless containers 5. **Graceful degradation** - System works with any subset of outposts ### Architecture Diagram ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MAYOR β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Outpost Manager β”‚ β”‚ β”‚ β”‚ - Tracks all registered outposts β”‚ β”‚ β”‚ β”‚ - Routes work to appropriate outpost β”‚ β”‚ β”‚ β”‚ - Monitors worker status across outposts β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Local β”‚ β”‚ SSH β”‚ β”‚ CloudRun β”‚ β”‚ β”‚ β”‚ Outpost β”‚ β”‚ Outpost β”‚ β”‚ Outpost β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ tmux β”‚ β”‚ SSH β”‚ β”‚ HTTP/2 β”‚ β”‚ panes β”‚ β”‚sessions β”‚ β”‚ connections β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Git Repos β”‚ β”‚ (code + beads) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### CLI Commands ```bash gt outpost list # List configured outposts gt outpost status [name] # Detailed status gt outpost add ... # Add new outpost gt outpost ping # Test connectivity ``` ### Implementation Status Federation is tracked in **gt-9a2** (P3 epic). Key tasks: - `gt-9a2.1`: Outpost/Worker interfaces - `gt-9a2.2`: LocalOutpost (refactor current spawning) - `gt-9a2.5`: SSHOutpost - `gt-9a2.8`: CloudRunOutpost ## Implementation Status Gas Town is being ported from Python (gastown-py) to Go (gastown). The Go port (GGT) is in development: - **Epic**: gt-u1j (Port Gas Town to Go) - **Scaffolding**: gt-u1j.1 (Go scaffolding - blocker for implementation) - **Management**: gt-f9x (Town & Rig Management: install, doctor, federation) See beads issues with `bd list --status=open` for current work items.