docs: reorganize documentation into concepts, design, and examples
Move documentation files into a clearer structure: - concepts/: core ideas (convoy, identity, molecules, polecat-lifecycle, propulsion) - design/: architecture and protocols (architecture, escalation, federation, mail, etc.) - examples/: demos and tutorials (hanoi-demo) - overview.md: renamed from understanding-gas-town.md Remove outdated/superseded docs and update reference.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
committed by
Steve Yegge
parent
8ed31e9634
commit
88f784a9aa
130
docs/design/architecture.md
Normal file
130
docs/design/architecture.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Gas Town Architecture
|
||||
|
||||
Technical architecture for Gas Town multi-agent workspace management.
|
||||
|
||||
## Two-Level Beads Architecture
|
||||
|
||||
Gas Town uses a two-level beads architecture to separate organizational coordination
|
||||
from project implementation work.
|
||||
|
||||
| Level | Location | Prefix | Purpose |
|
||||
|-------|----------|--------|---------|
|
||||
| **Town** | `~/gt/.beads/` | `hq-*` | Cross-rig coordination, Mayor mail, agent identity |
|
||||
| **Rig** | `<rig>/mayor/rig/.beads/` | project prefix | Implementation work, MRs, project issues |
|
||||
|
||||
### Town-Level Beads (`~/gt/.beads/`)
|
||||
|
||||
Organizational chain for cross-rig coordination:
|
||||
- Mayor mail and messages
|
||||
- Convoy coordination (batch work across rigs)
|
||||
- Strategic issues and decisions
|
||||
- **Town-level agent beads** (Mayor, Deacon)
|
||||
- **Role definition beads** (global templates)
|
||||
|
||||
### Rig-Level Beads (`<rig>/mayor/rig/.beads/`)
|
||||
|
||||
Project chain for implementation work:
|
||||
- Bugs, features, tasks for the project
|
||||
- Merge requests and code reviews
|
||||
- Project-specific molecules
|
||||
- **Rig-level agent beads** (Witness, Refinery, Polecats)
|
||||
|
||||
## Agent Bead Storage
|
||||
|
||||
Agent beads track lifecycle state for each agent. Storage location depends on
|
||||
the agent's scope.
|
||||
|
||||
| Agent Type | Scope | Bead Location | Bead ID Format |
|
||||
|------------|-------|---------------|----------------|
|
||||
| Mayor | Town | `~/gt/.beads/` | `hq-mayor` |
|
||||
| Deacon | Town | `~/gt/.beads/` | `hq-deacon` |
|
||||
| Dogs | Town | `~/gt/.beads/` | `hq-dog-<name>` |
|
||||
| Witness | Rig | `<rig>/.beads/` | `<prefix>-<rig>-witness` |
|
||||
| Refinery | Rig | `<rig>/.beads/` | `<prefix>-<rig>-refinery` |
|
||||
| Polecats | Rig | `<rig>/.beads/` | `<prefix>-<rig>-polecat-<name>` |
|
||||
|
||||
### Role Beads
|
||||
|
||||
Role beads are global templates stored in town beads with `hq-` prefix:
|
||||
- `hq-mayor-role` - Mayor role definition
|
||||
- `hq-deacon-role` - Deacon role definition
|
||||
- `hq-witness-role` - Witness role definition
|
||||
- `hq-refinery-role` - Refinery role definition
|
||||
- `hq-polecat-role` - Polecat role definition
|
||||
|
||||
Each agent bead references its role bead via the `role_bead` field.
|
||||
|
||||
## Agent Taxonomy
|
||||
|
||||
### Town-Level Agents (Cross-Rig)
|
||||
|
||||
| Agent | Role | Persistence |
|
||||
|-------|------|-------------|
|
||||
| **Mayor** | Global coordinator, handles cross-rig communication and escalations | Persistent |
|
||||
| **Deacon** | Daemon beacon - receives heartbeats, runs plugins and monitoring | Persistent |
|
||||
| **Dogs** | Long-running workers for cross-rig batch work | Variable |
|
||||
|
||||
### Rig-Level Agents (Per-Project)
|
||||
|
||||
| Agent | Role | Persistence |
|
||||
|-------|------|-------------|
|
||||
| **Witness** | Monitors polecat health, handles nudging and cleanup | Persistent |
|
||||
| **Refinery** | Processes merge queue, runs verification | Persistent |
|
||||
| **Polecats** | Ephemeral workers assigned to specific issues | Ephemeral |
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
~/gt/ Town root
|
||||
├── .beads/ Town-level beads (hq-* prefix)
|
||||
│ ├── config.yaml Beads configuration
|
||||
│ ├── issues.jsonl Town issues (mail, agents, convoys)
|
||||
│ └── routes.jsonl Prefix → rig routing table
|
||||
├── mayor/ Mayor config
|
||||
│ └── town.json Town configuration
|
||||
└── <rig>/ Project container (NOT a git clone)
|
||||
├── config.json Rig identity and beads prefix
|
||||
├── mayor/rig/ Canonical clone (beads live here)
|
||||
│ └── .beads/ Rig-level beads database
|
||||
├── refinery/rig/ Worktree from mayor/rig
|
||||
├── witness/ No clone (monitors only)
|
||||
├── crew/<name>/ Human workspaces (full clones)
|
||||
└── polecats/<name>/ Worker worktrees from mayor/rig
|
||||
```
|
||||
|
||||
### Worktree Architecture
|
||||
|
||||
Polecats and refinery are git worktrees, not full clones. This enables fast spawning
|
||||
and shared object storage. The worktree base is `mayor/rig`:
|
||||
|
||||
```go
|
||||
// From polecat/manager.go - worktrees are based on mayor/rig
|
||||
git worktree add -b polecat/<name>-<timestamp> polecats/<name>
|
||||
```
|
||||
|
||||
Crew workspaces (`crew/<name>/`) are full git clones for human developers who need
|
||||
independent repos. Polecats are ephemeral and benefit from worktree efficiency.
|
||||
|
||||
## Beads Routing
|
||||
|
||||
The `routes.jsonl` file maps issue ID prefixes to rig locations (relative to town root):
|
||||
|
||||
```jsonl
|
||||
{"prefix":"hq-","path":"."}
|
||||
{"prefix":"gt-","path":"gastown/mayor/rig"}
|
||||
{"prefix":"bd-","path":"beads/mayor/rig"}
|
||||
```
|
||||
|
||||
Routes point to `mayor/rig` because that's where the canonical `.beads/` lives.
|
||||
This enables transparent cross-rig beads operations:
|
||||
|
||||
```bash
|
||||
bd show hq-mayor # Routes to town beads (~/.gt/.beads)
|
||||
bd show gt-xyz # Routes to gastown/mayor/rig/.beads
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [reference.md](../reference.md) - Command reference
|
||||
- [molecules.md](../concepts/molecules.md) - Workflow molecules
|
||||
- [identity.md](../concepts/identity.md) - Agent identity and BD_ACTOR
|
||||
312
docs/design/escalation.md
Normal file
312
docs/design/escalation.md
Normal file
@@ -0,0 +1,312 @@
|
||||
# Gas Town Escalation Protocol
|
||||
|
||||
> Reference for escalation paths in Gas Town
|
||||
|
||||
## Overview
|
||||
|
||||
Gas Town agents can escalate issues when automated resolution isn't possible.
|
||||
This document covers:
|
||||
|
||||
- Severity levels and routing
|
||||
- Escalation categories for structured communication
|
||||
- Tiered escalation (Deacon -> Mayor -> Overseer)
|
||||
- Decision patterns for async resolution
|
||||
- Integration with gates and patrol lifecycles
|
||||
|
||||
## Severity Levels
|
||||
|
||||
| Level | Priority | Description | Examples |
|
||||
|-------|----------|-------------|----------|
|
||||
| **CRITICAL** | P0 (urgent) | System-threatening, immediate attention | Data corruption, security breach, system down |
|
||||
| **HIGH** | P1 (high) | Important blocker, needs human soon | Unresolvable merge conflict, critical bug, ambiguous spec |
|
||||
| **MEDIUM** | P2 (normal) | Standard escalation, human at convenience | Design decision needed, unclear requirements |
|
||||
|
||||
## Escalation Categories
|
||||
|
||||
Categories provide structured routing based on the nature of the escalation:
|
||||
|
||||
| Category | Description | Default Route |
|
||||
|----------|-------------|---------------|
|
||||
| `decision` | Multiple valid paths, need choice | Deacon -> Mayor |
|
||||
| `help` | Need guidance or expertise | Deacon -> Mayor |
|
||||
| `blocked` | Waiting on unresolvable dependency | Mayor |
|
||||
| `failed` | Unexpected error, can't proceed | Deacon |
|
||||
| `emergency` | Security or data integrity issue | Overseer (direct) |
|
||||
| `gate_timeout` | Gate didn't resolve in time | Deacon |
|
||||
| `lifecycle` | Worker stuck or needs recycle | Witness |
|
||||
|
||||
## Escalation Command
|
||||
|
||||
### Basic Usage (unchanged)
|
||||
|
||||
```bash
|
||||
# Basic escalation (default: MEDIUM severity)
|
||||
gt escalate "Database migration failed"
|
||||
|
||||
# Critical escalation - immediate attention
|
||||
gt escalate -s CRITICAL "Data corruption detected in user table"
|
||||
|
||||
# High priority escalation
|
||||
gt escalate -s HIGH "Merge conflict cannot be resolved automatically"
|
||||
|
||||
# With additional details
|
||||
gt escalate -s MEDIUM "Need clarification on API design" -m "Details..."
|
||||
```
|
||||
|
||||
### Category-Based Escalation
|
||||
|
||||
```bash
|
||||
# Decision needed - routes to Deacon first
|
||||
gt escalate --type decision "Which auth approach?"
|
||||
|
||||
# Help request
|
||||
gt escalate --type help "Need architecture guidance"
|
||||
|
||||
# Blocked on dependency
|
||||
gt escalate --type blocked "Waiting on bd-xyz"
|
||||
|
||||
# Failure that can't be recovered
|
||||
gt escalate --type failed "Tests failing unexpectedly"
|
||||
|
||||
# Emergency - direct to Overseer
|
||||
gt escalate --type emergency "Security vulnerability found"
|
||||
```
|
||||
|
||||
### Tiered Routing
|
||||
|
||||
```bash
|
||||
# Explicit routing to specific tier
|
||||
gt escalate --to deacon "Infra issue"
|
||||
gt escalate --to mayor "Cross-rig coordination needed"
|
||||
gt escalate --to overseer "Human judgment required"
|
||||
|
||||
# Forward from one tier to next
|
||||
gt escalate --forward --to mayor "Deacon couldn't resolve"
|
||||
```
|
||||
|
||||
### Structured Decisions
|
||||
|
||||
For decisions requiring explicit choices:
|
||||
|
||||
```bash
|
||||
gt escalate --type decision \
|
||||
--question "Which authentication approach?" \
|
||||
--options "JWT tokens,Session cookies,OAuth2" \
|
||||
--context "Admin panel needs login" \
|
||||
--issue bd-xyz
|
||||
```
|
||||
|
||||
This updates the issue with a structured decision format (see below).
|
||||
|
||||
## What Happens on Escalation
|
||||
|
||||
1. **Bead created/updated**: Escalation bead (tagged `escalation`) created or updated
|
||||
2. **Mail sent**: Routed to appropriate tier (Deacon, Mayor, or Overseer)
|
||||
3. **Activity logged**: Event logged to activity feed
|
||||
4. **Issue updated**: For decision type, issue gets structured format
|
||||
|
||||
## Tiered Escalation Flow
|
||||
|
||||
```
|
||||
Worker encounters issue
|
||||
|
|
||||
v
|
||||
gt escalate --type <category> [--to <tier>]
|
||||
|
|
||||
v
|
||||
[Deacon receives] (default for most categories)
|
||||
|
|
||||
+-- Can resolve? --> Updates issue, re-slings work
|
||||
|
|
||||
+-- Cannot resolve? --> gt escalate --forward --to mayor
|
||||
|
|
||||
v
|
||||
[Mayor receives]
|
||||
|
|
||||
+-- Can resolve? --> Updates issue, re-slings
|
||||
|
|
||||
+-- Cannot resolve? --> gt escalate --forward --to overseer
|
||||
|
|
||||
v
|
||||
[Overseer resolves]
|
||||
```
|
||||
|
||||
Each tier can resolve OR forward. The escalation chain is tracked via comments.
|
||||
|
||||
## Decision Pattern
|
||||
|
||||
When `--type decision` is used, the issue is updated with structured format:
|
||||
|
||||
```markdown
|
||||
## Decision Needed
|
||||
|
||||
**Question:** Which authentication approach?
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| A | JWT tokens |
|
||||
| B | Session cookies |
|
||||
| C | OAuth2 |
|
||||
|
||||
**Context:** Admin panel needs login
|
||||
|
||||
**Escalated by:** beads/polecats/obsidian
|
||||
**Escalated at:** 2026-01-01T15:00:00Z
|
||||
|
||||
**To resolve:**
|
||||
1. Comment with chosen option (e.g., "Decision: A")
|
||||
2. Reassign to work queue or original worker
|
||||
```
|
||||
|
||||
The issue becomes the async communication channel. Resolution updates the issue
|
||||
and can trigger re-slinging to the original worker.
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Gate Timeouts
|
||||
|
||||
When timer gates expire (see bd-7zka.2), Witness escalates:
|
||||
|
||||
```go
|
||||
if gate.Expired() {
|
||||
exec.Command("gt", "escalate",
|
||||
"--type", "gate_timeout",
|
||||
"--severity", "HIGH",
|
||||
"--issue", gate.BlockedIssueID,
|
||||
fmt.Sprintf("Gate %s timed out after %s", gate.ID, gate.Timeout)).Run()
|
||||
}
|
||||
```
|
||||
|
||||
### Witness Patrol
|
||||
|
||||
Witness formalizes stuck-polecat detection as escalation:
|
||||
|
||||
```go
|
||||
exec.Command("gt", "escalate",
|
||||
"--type", "lifecycle",
|
||||
"--to", "mayor",
|
||||
"--issue", polecat.CurrentIssue,
|
||||
fmt.Sprintf("Polecat %s stuck: no progress for %d minutes", polecat.ID, minutes)).Run()
|
||||
```
|
||||
|
||||
### Refinery
|
||||
|
||||
On merge failures that can't be auto-resolved:
|
||||
|
||||
```go
|
||||
exec.Command("gt", "escalate",
|
||||
"--type", "failed",
|
||||
"--issue", mr.IssueID,
|
||||
"Merge failed: "+reason).Run()
|
||||
```
|
||||
|
||||
## Polecat Exit with Escalation
|
||||
|
||||
When a polecat needs a decision to continue:
|
||||
|
||||
```bash
|
||||
# 1. Update issue with decision structure
|
||||
bd update $ISSUE --notes "$(cat <<EOF
|
||||
## Decision Needed
|
||||
|
||||
**Question:** Which approach for caching?
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| A | Redis (external dependency) |
|
||||
| B | In-memory (simpler, no persistence) |
|
||||
| C | SQLite (local persistence) |
|
||||
|
||||
**Context:** API response times are slow, need caching layer.
|
||||
EOF
|
||||
)"
|
||||
|
||||
# 2. Escalate
|
||||
gt escalate --type decision --issue $ISSUE "Caching approach needs decision"
|
||||
|
||||
# 3. Exit cleanly
|
||||
gt done --status ESCALATED
|
||||
```
|
||||
|
||||
## Mayor Startup Check
|
||||
|
||||
On `gt prime`, Mayor checks for pending escalations:
|
||||
|
||||
```
|
||||
## PENDING ESCALATIONS
|
||||
|
||||
There are 3 escalation(s) awaiting attention:
|
||||
|
||||
CRITICAL: 1
|
||||
HIGH: 1
|
||||
MEDIUM: 1
|
||||
|
||||
[CRITICAL] Data corruption detected (gt-abc)
|
||||
[HIGH] Merge conflict in auth module (gt-def)
|
||||
[MEDIUM] API design clarification needed (gt-ghi)
|
||||
|
||||
**Action required:** Review escalations with `bd list --tag=escalation`
|
||||
Close resolved ones with `bd close <id> --reason "resolution"`
|
||||
```
|
||||
|
||||
## When to Escalate
|
||||
|
||||
### Agents SHOULD escalate when:
|
||||
|
||||
- **System errors**: Database corruption, disk full, network failures
|
||||
- **Security issues**: Unauthorized access attempts, credential exposure
|
||||
- **Unresolvable conflicts**: Merge conflicts that can't be auto-resolved
|
||||
- **Ambiguous requirements**: Spec is unclear, multiple valid interpretations
|
||||
- **Design decisions**: Architectural choices that need human judgment
|
||||
- **Stuck loops**: Agent is stuck and can't make progress
|
||||
- **Gate timeouts**: Async conditions didn't resolve in expected time
|
||||
|
||||
### Agents should NOT escalate for:
|
||||
|
||||
- **Normal workflow**: Regular work that can proceed without human input
|
||||
- **Recoverable errors**: Transient failures that will auto-retry
|
||||
- **Information queries**: Questions that can be answered from context
|
||||
|
||||
## Viewing Escalations
|
||||
|
||||
```bash
|
||||
# List all open escalations
|
||||
bd list --status=open --tag=escalation
|
||||
|
||||
# Filter by category
|
||||
bd list --tag=escalation --tag=decision
|
||||
|
||||
# View specific escalation
|
||||
bd show <escalation-id>
|
||||
|
||||
# Close resolved escalation
|
||||
bd close <id> --reason "Resolved by fixing X"
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Extend gt escalate
|
||||
- Add `--type` flag for categories
|
||||
- Add `--to` flag for routing (deacon, mayor, overseer)
|
||||
- Add `--forward` flag for tier forwarding
|
||||
- Backward compatible with existing usage
|
||||
|
||||
### Phase 2: Decision Pattern
|
||||
- Add `--question`, `--options`, `--context` flags
|
||||
- Auto-update issue with decision structure
|
||||
- Parse decision from issue comments on resolution
|
||||
|
||||
### Phase 3: Gate Integration
|
||||
- Add `gate_timeout` escalation type
|
||||
- Witness checks timer gates, escalates on timeout
|
||||
- Refinery checks GH gates, escalates on timeout/failure
|
||||
|
||||
### Phase 4: Patrol Integration
|
||||
- Formalize Witness stuck-polecat as escalation
|
||||
- Formalize Refinery merge-failure as escalation
|
||||
- Unified escalation handling in Mayor
|
||||
|
||||
## References
|
||||
|
||||
- bd-7zka.2: Gate evaluation (uses escalation for timeouts)
|
||||
- bd-0sgd: Design issue for this extended escalation system
|
||||
248
docs/design/federation.md
Normal file
248
docs/design/federation.md
Normal file
@@ -0,0 +1,248 @@
|
||||
# Federation Architecture
|
||||
|
||||
> **Status: Design spec - not yet implemented**
|
||||
|
||||
> Multi-workspace coordination for Gas Town and Beads
|
||||
|
||||
## Overview
|
||||
|
||||
Federation enables multiple Gas Town instances to reference each other's work,
|
||||
coordinate across organizations, and track distributed projects.
|
||||
|
||||
## Why Federation?
|
||||
|
||||
Real enterprise projects don't live in a single repo:
|
||||
|
||||
- **Microservices:** 50 repos, tight dependencies, coordinated releases
|
||||
- **Platform teams:** Shared libraries used by dozens of downstream projects
|
||||
- **Contractors:** External teams working on components you need to track
|
||||
- **Acquisitions:** New codebases that need to integrate with existing work
|
||||
|
||||
Traditional tools force you to choose: unified tracking (monorepo) or team
|
||||
autonomy (multi-repo with fragmented visibility). Federation provides both:
|
||||
each workspace is autonomous, but cross-workspace references are first-class.
|
||||
|
||||
## Entity Model
|
||||
|
||||
### Three Levels
|
||||
|
||||
```
|
||||
Level 1: Entity - Person or organization (flat namespace)
|
||||
Level 2: Chain - Workspace/town per entity
|
||||
Level 3: Work Unit - Issues, tasks, molecules on chains
|
||||
```
|
||||
|
||||
### URI Scheme
|
||||
|
||||
Full work unit reference (HOP protocol):
|
||||
|
||||
```
|
||||
hop://entity/chain/rig/issue-id
|
||||
hop://steve@example.com/main-town/greenplace/gp-xyz
|
||||
```
|
||||
|
||||
Cross-repo reference (same platform):
|
||||
|
||||
```
|
||||
beads://platform/org/repo/issue-id
|
||||
beads://github/acme/backend/ac-123
|
||||
```
|
||||
|
||||
Within a workspace, short forms are preferred:
|
||||
|
||||
```
|
||||
gp-xyz # Local (prefix routes via routes.jsonl)
|
||||
greenplace/gp-xyz # Different rig, same chain
|
||||
./gp-xyz # Explicit current-rig ref
|
||||
```
|
||||
|
||||
See `~/gt/docs/hop/GRAPH-ARCHITECTURE.md` for full URI specification.
|
||||
|
||||
## Relationship Types
|
||||
|
||||
### Employment
|
||||
|
||||
Track which entities belong to organizations:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "employment",
|
||||
"entity": "alice@example.com",
|
||||
"organization": "acme.com"
|
||||
}
|
||||
```
|
||||
|
||||
### Cross-Reference
|
||||
|
||||
Reference work in another workspace:
|
||||
|
||||
```json
|
||||
{
|
||||
"references": [
|
||||
{
|
||||
"type": "depends_on",
|
||||
"target": "hop://other-entity/chain/rig/issue-id"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Delegation
|
||||
|
||||
Distribute work across workspaces:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "delegation",
|
||||
"parent": "hop://acme.com/projects/proj-123",
|
||||
"child": "hop://alice@example.com/town/greenplace/gp-xyz",
|
||||
"terms": { "portion": "backend", "deadline": "2025-02-01" }
|
||||
}
|
||||
```
|
||||
|
||||
## Agent Provenance
|
||||
|
||||
Every agent operation is attributed. See [identity.md](../concepts/identity.md) for the
|
||||
complete BD_ACTOR format convention.
|
||||
|
||||
### Git Commits
|
||||
|
||||
```bash
|
||||
# Set per agent session
|
||||
GIT_AUTHOR_NAME="greenplace/crew/joe"
|
||||
GIT_AUTHOR_EMAIL="steve@example.com" # Workspace owner
|
||||
```
|
||||
|
||||
Result: `abc123 Fix bug (greenplace/crew/joe <steve@example.com>)`
|
||||
|
||||
### Beads Operations
|
||||
|
||||
```bash
|
||||
BD_ACTOR="greenplace/crew/joe" # Set in agent environment
|
||||
bd create --title="Task" # Actor auto-populated
|
||||
```
|
||||
|
||||
### Event Logging
|
||||
|
||||
All events include actor:
|
||||
|
||||
```json
|
||||
{
|
||||
"ts": "2025-01-15T10:30:00Z",
|
||||
"type": "sling",
|
||||
"actor": "greenplace/crew/joe",
|
||||
"payload": { "bead": "gp-xyz", "target": "greenplace/polecats/Toast" }
|
||||
}
|
||||
```
|
||||
|
||||
## Discovery
|
||||
|
||||
### Workspace Metadata
|
||||
|
||||
Each workspace has identity metadata:
|
||||
|
||||
```json
|
||||
// ~/gt/.town.json
|
||||
{
|
||||
"owner": "steve@example.com",
|
||||
"name": "main-town",
|
||||
"public_name": "steve-greenplace"
|
||||
}
|
||||
```
|
||||
|
||||
### Remote Registration
|
||||
|
||||
```bash
|
||||
gt remote add acme hop://acme.com/engineering
|
||||
gt remote list
|
||||
```
|
||||
|
||||
### Cross-Workspace Queries
|
||||
|
||||
```bash
|
||||
bd show hop://acme.com/eng/ac-123 # Fetch remote issue
|
||||
bd list --remote=acme # List remote issues
|
||||
```
|
||||
|
||||
## Aggregation
|
||||
|
||||
Query across relationships without hierarchy:
|
||||
|
||||
```bash
|
||||
# All work by org members
|
||||
bd list --org=acme.com
|
||||
|
||||
# All work on a project (including delegated)
|
||||
bd list --project=proj-123 --include-delegated
|
||||
|
||||
# Agent's full history
|
||||
bd audit --actor=greenplace/crew/joe
|
||||
```
|
||||
|
||||
## Implementation Status
|
||||
|
||||
- [x] Agent identity in git commits
|
||||
- [x] BD_ACTOR default in beads create
|
||||
- [x] Workspace metadata file (.town.json)
|
||||
- [x] Cross-workspace URI scheme (hop://, beads://, local forms)
|
||||
- [ ] Remote registration
|
||||
- [ ] Cross-workspace queries
|
||||
- [ ] Delegation primitives
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Multi-Repo Projects
|
||||
|
||||
Track work spanning multiple repositories:
|
||||
|
||||
```
|
||||
Project X
|
||||
├── hop://team/frontend/fe-123
|
||||
├── hop://team/backend/be-456
|
||||
└── hop://team/infra/inf-789
|
||||
```
|
||||
|
||||
### Distributed Teams
|
||||
|
||||
Team members in different workspaces:
|
||||
|
||||
```
|
||||
Alice's Town → works on → Project X
|
||||
Bob's Town → works on → Project X
|
||||
```
|
||||
|
||||
Each maintains their own CV/audit trail.
|
||||
|
||||
### Contractor Coordination
|
||||
|
||||
Prime contractor delegates to subcontractors:
|
||||
|
||||
```
|
||||
Acme/Project
|
||||
└── delegates to → Vendor/SubProject
|
||||
└── delegates to → Contractor/Task
|
||||
```
|
||||
|
||||
Completion cascades up. Attribution preserved.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Flat namespace** - Entities not nested, relationships connect them
|
||||
2. **Relationships over hierarchy** - Graph structure, not tree
|
||||
3. **Git-native** - Federation uses git mechanics (remotes, refs)
|
||||
4. **Incremental** - Works standalone, gains power with federation
|
||||
5. **Privacy-preserving** - Each entity controls their chain visibility
|
||||
|
||||
## Enterprise Benefits
|
||||
|
||||
| Challenge | Without Federation | With Federation |
|
||||
|-----------|-------------------|-----------------|
|
||||
| Cross-repo dependencies | "Check with backend team" | Explicit dependency tracking |
|
||||
| Contractor visibility | Email updates, status calls | Live status, same tooling |
|
||||
| Release coordination | Spreadsheets, Slack threads | Unified timeline view |
|
||||
| Agent attribution | Per-repo, fragmented | Cross-workspace CV |
|
||||
| Compliance audit | Stitch together logs | Query across workspaces |
|
||||
|
||||
Federation isn't just about connecting repos - it's about treating distributed
|
||||
engineering as a first-class concern, with the same visibility and tooling
|
||||
you'd expect from a monorepo, while preserving team autonomy.
|
||||
361
docs/design/mail-protocol.md
Normal file
361
docs/design/mail-protocol.md
Normal file
@@ -0,0 +1,361 @@
|
||||
# Gas Town Mail Protocol
|
||||
|
||||
> Reference for inter-agent mail communication in Gas Town
|
||||
|
||||
## Overview
|
||||
|
||||
Gas Town agents coordinate via mail messages routed through the beads system.
|
||||
Mail uses `type=message` beads with routing handled by `gt mail`.
|
||||
|
||||
## Message Types
|
||||
|
||||
### POLECAT_DONE
|
||||
|
||||
**Route**: Polecat → Witness
|
||||
|
||||
**Purpose**: Signal work completion, trigger cleanup flow.
|
||||
|
||||
**Subject format**: `POLECAT_DONE <polecat-name>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Exit: MERGED|ESCALATED|DEFERRED
|
||||
Issue: <issue-id>
|
||||
MR: <mr-id> # if exit=MERGED
|
||||
Branch: <branch>
|
||||
```
|
||||
|
||||
**Trigger**: `gt done` command generates this automatically.
|
||||
|
||||
**Handler**: Witness creates a cleanup wisp for the polecat.
|
||||
|
||||
### MERGE_READY
|
||||
|
||||
**Route**: Witness → Refinery
|
||||
|
||||
**Purpose**: Signal a branch is ready for merge queue processing.
|
||||
|
||||
**Subject format**: `MERGE_READY <polecat-name>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Branch: <branch>
|
||||
Issue: <issue-id>
|
||||
Polecat: <polecat-name>
|
||||
Verified: clean git state, issue closed
|
||||
```
|
||||
|
||||
**Trigger**: Witness sends after verifying polecat work is complete.
|
||||
|
||||
**Handler**: Refinery adds to merge queue, processes when ready.
|
||||
|
||||
### MERGED
|
||||
|
||||
**Route**: Refinery → Witness
|
||||
|
||||
**Purpose**: Confirm branch was merged successfully, safe to nuke polecat.
|
||||
|
||||
**Subject format**: `MERGED <polecat-name>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Branch: <branch>
|
||||
Issue: <issue-id>
|
||||
Polecat: <polecat-name>
|
||||
Rig: <rig>
|
||||
Target: <target-branch>
|
||||
Merged-At: <timestamp>
|
||||
Merge-Commit: <sha>
|
||||
```
|
||||
|
||||
**Trigger**: Refinery sends after successful merge to main.
|
||||
|
||||
**Handler**: Witness completes cleanup wisp, nukes polecat worktree.
|
||||
|
||||
### MERGE_FAILED
|
||||
|
||||
**Route**: Refinery → Witness
|
||||
|
||||
**Purpose**: Notify that merge attempt failed (tests, build, or other non-conflict error).
|
||||
|
||||
**Subject format**: `MERGE_FAILED <polecat-name>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Branch: <branch>
|
||||
Issue: <issue-id>
|
||||
Polecat: <polecat-name>
|
||||
Rig: <rig>
|
||||
Target: <target-branch>
|
||||
Failed-At: <timestamp>
|
||||
Failure-Type: <tests|build|push|other>
|
||||
Error: <error-message>
|
||||
```
|
||||
|
||||
**Trigger**: Refinery sends when merge fails for non-conflict reasons.
|
||||
|
||||
**Handler**: Witness notifies polecat, assigns work back for rework.
|
||||
|
||||
### REWORK_REQUEST
|
||||
|
||||
**Route**: Refinery → Witness
|
||||
|
||||
**Purpose**: Request polecat to rebase branch due to merge conflicts.
|
||||
|
||||
**Subject format**: `REWORK_REQUEST <polecat-name>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Branch: <branch>
|
||||
Issue: <issue-id>
|
||||
Polecat: <polecat-name>
|
||||
Rig: <rig>
|
||||
Target: <target-branch>
|
||||
Requested-At: <timestamp>
|
||||
Conflict-Files: <file1>, <file2>, ...
|
||||
|
||||
Please rebase your changes onto <target-branch>:
|
||||
|
||||
git fetch origin
|
||||
git rebase origin/<target-branch>
|
||||
# Resolve any conflicts
|
||||
git push -f
|
||||
|
||||
The Refinery will retry the merge after rebase is complete.
|
||||
```
|
||||
|
||||
**Trigger**: Refinery sends when merge has conflicts with target branch.
|
||||
|
||||
**Handler**: Witness notifies polecat with rebase instructions.
|
||||
|
||||
### WITNESS_PING
|
||||
|
||||
**Route**: Witness → Deacon (all witnesses send)
|
||||
|
||||
**Purpose**: Second-order monitoring - ensure Deacon is alive.
|
||||
|
||||
**Subject format**: `WITNESS_PING <rig>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Rig: <rig>
|
||||
Timestamp: <timestamp>
|
||||
Patrol: <cycle-number>
|
||||
```
|
||||
|
||||
**Trigger**: Each witness sends periodically (every N patrol cycles).
|
||||
|
||||
**Handler**: Deacon acknowledges. If no ack, witnesses escalate to Mayor.
|
||||
|
||||
### HELP
|
||||
|
||||
**Route**: Any → escalation target (usually Mayor)
|
||||
|
||||
**Purpose**: Request intervention for stuck/blocked work.
|
||||
|
||||
**Subject format**: `HELP: <brief-description>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
Agent: <agent-id>
|
||||
Issue: <issue-id> # if applicable
|
||||
Problem: <description>
|
||||
Tried: <what was attempted>
|
||||
```
|
||||
|
||||
**Trigger**: Agent unable to proceed, needs external help.
|
||||
|
||||
**Handler**: Escalation target assesses and intervenes.
|
||||
|
||||
### HANDOFF
|
||||
|
||||
**Route**: Agent → self (or successor)
|
||||
|
||||
**Purpose**: Session continuity across context limits/restarts.
|
||||
|
||||
**Subject format**: `🤝 HANDOFF: <brief-context>`
|
||||
|
||||
**Body format**:
|
||||
```
|
||||
attached_molecule: <molecule-id> # if work in progress
|
||||
attached_at: <timestamp>
|
||||
|
||||
## Context
|
||||
<freeform notes for successor>
|
||||
|
||||
## Status
|
||||
<where things stand>
|
||||
|
||||
## Next
|
||||
<what successor should do>
|
||||
```
|
||||
|
||||
**Trigger**: `gt handoff` command, or manual send before session end.
|
||||
|
||||
**Handler**: Next session reads handoff, continues from context.
|
||||
|
||||
## Format Conventions
|
||||
|
||||
### Subject Line
|
||||
|
||||
- **Type prefix**: Uppercase, identifies message type
|
||||
- **Colon separator**: After type for structured info
|
||||
- **Brief context**: Human-readable summary
|
||||
|
||||
Examples:
|
||||
```
|
||||
POLECAT_DONE nux
|
||||
MERGE_READY greenplace/nux
|
||||
HELP: Polecat stuck on test failures
|
||||
🤝 HANDOFF: Schema work in progress
|
||||
```
|
||||
|
||||
### Body Structure
|
||||
|
||||
- **Key-value pairs**: For structured data (one per line)
|
||||
- **Blank line**: Separates structured data from freeform content
|
||||
- **Markdown sections**: For freeform content (##, lists, code blocks)
|
||||
|
||||
### Addresses
|
||||
|
||||
Format: `<rig>/<role>` or `<rig>/<type>/<name>`
|
||||
|
||||
Examples:
|
||||
```
|
||||
greenplace/witness # Witness for greenplace rig
|
||||
beads/refinery # Refinery for beads rig
|
||||
greenplace/polecats/nux # Specific polecat
|
||||
mayor/ # Town-level Mayor
|
||||
deacon/ # Town-level Deacon
|
||||
```
|
||||
|
||||
## Protocol Flows
|
||||
|
||||
### Polecat Completion Flow
|
||||
|
||||
```
|
||||
Polecat Witness Refinery
|
||||
│ │ │
|
||||
│ POLECAT_DONE │ │
|
||||
│─────────────────────────>│ │
|
||||
│ │ │
|
||||
│ (verify clean) │
|
||||
│ │ │
|
||||
│ │ MERGE_READY │
|
||||
│ │─────────────────────────>│
|
||||
│ │ │
|
||||
│ │ (merge attempt)
|
||||
│ │ │
|
||||
│ │ MERGED (success) │
|
||||
│ │<─────────────────────────│
|
||||
│ │ │
|
||||
│ (nuke polecat) │
|
||||
│ │ │
|
||||
```
|
||||
|
||||
### Merge Failure Flow
|
||||
|
||||
```
|
||||
Witness Refinery
|
||||
│ │
|
||||
│ (merge fails)
|
||||
│ │
|
||||
│ MERGE_FAILED │
|
||||
┌──────────────────────────│<─────────────────────────│
|
||||
│ │ │
|
||||
│ (failure notification) │ │
|
||||
│<─────────────────────────│ │
|
||||
│ │ │
|
||||
Polecat (rework needed)
|
||||
```
|
||||
|
||||
### Rebase Required Flow
|
||||
|
||||
```
|
||||
Witness Refinery
|
||||
│ │
|
||||
│ (conflict detected)
|
||||
│ │
|
||||
│ REWORK_REQUEST │
|
||||
┌──────────────────────────│<─────────────────────────│
|
||||
│ │ │
|
||||
│ (rebase instructions) │ │
|
||||
│<─────────────────────────│ │
|
||||
│ │ │
|
||||
Polecat │ │
|
||||
│ │ │
|
||||
│ (rebases, gt done) │ │
|
||||
│─────────────────────────>│ MERGE_READY │
|
||||
│ │─────────────────────────>│
|
||||
│ │ (retry merge)
|
||||
```
|
||||
|
||||
### Second-Order Monitoring
|
||||
|
||||
```
|
||||
Witness-1 ──┐
|
||||
│ WITNESS_PING
|
||||
Witness-2 ──┼────────────────> Deacon
|
||||
│
|
||||
Witness-N ──┘
|
||||
│
|
||||
(if no response)
|
||||
│
|
||||
<────────────────────┘
|
||||
Escalate to Mayor
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
### Sending Mail
|
||||
|
||||
```bash
|
||||
# Basic send
|
||||
gt mail send <addr> -s "Subject" -m "Body"
|
||||
|
||||
# With structured body
|
||||
gt mail send greenplace/witness -s "MERGE_READY nux" -m "Branch: feature-xyz
|
||||
Issue: gp-abc
|
||||
Polecat: nux
|
||||
Verified: clean"
|
||||
```
|
||||
|
||||
### Receiving Mail
|
||||
|
||||
```bash
|
||||
# Check inbox
|
||||
gt mail inbox
|
||||
|
||||
# Read specific message
|
||||
gt mail read <msg-id>
|
||||
|
||||
# Mark as read
|
||||
gt mail ack <msg-id>
|
||||
```
|
||||
|
||||
### In Patrol Formulas
|
||||
|
||||
Formulas should:
|
||||
1. Check inbox at start of each cycle
|
||||
2. Parse subject prefix to route handling
|
||||
3. Extract structured data from body
|
||||
4. Take appropriate action
|
||||
5. Mark mail as read after processing
|
||||
|
||||
## Extensibility
|
||||
|
||||
New message types follow the pattern:
|
||||
1. Define subject prefix (TYPE: or TYPE_SUBTYPE)
|
||||
2. Document body format (key-value pairs + freeform)
|
||||
3. Specify route (sender → receiver)
|
||||
4. Implement handlers in relevant patrol formulas
|
||||
|
||||
The protocol is intentionally simple - structured enough for parsing,
|
||||
flexible enough for human debugging.
|
||||
|
||||
## Related Documents
|
||||
|
||||
- `docs/agent-as-bead.md` - Agent identity and slots
|
||||
- `.beads/formulas/mol-witness-patrol.formula.toml` - Witness handling
|
||||
- `internal/mail/` - Mail routing implementation
|
||||
- `internal/protocol/` - Protocol handlers for Witness-Refinery communication
|
||||
136
docs/design/operational-state.md
Normal file
136
docs/design/operational-state.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Operational State in Gas Town
|
||||
|
||||
> Managing runtime state through events and labels.
|
||||
|
||||
## Overview
|
||||
|
||||
Gas Town tracks operational state changes as structured data. This document covers:
|
||||
- **Events**: State transitions as beads (immutable audit trail)
|
||||
- **Labels-as-state**: Fast queries via role bead labels (current state cache)
|
||||
|
||||
For Boot triage and degraded mode details, see [Watchdog Chain](watchdog-chain.md).
|
||||
|
||||
## Events: State Transitions as Data
|
||||
|
||||
Operational state changes are recorded as event beads. Each event captures:
|
||||
- **What** changed (`event_type`)
|
||||
- **Who** caused it (`actor`)
|
||||
- **What** was affected (`target`)
|
||||
- **Context** (`payload`)
|
||||
- **When** (`created_at`)
|
||||
|
||||
### Event Types
|
||||
|
||||
| Event Type | Description | Payload |
|
||||
|------------|-------------|---------|
|
||||
| `patrol.muted` | Patrol cycle disabled | `{reason, until?}` |
|
||||
| `patrol.unmuted` | Patrol cycle re-enabled | `{reason?}` |
|
||||
| `agent.started` | Agent session began | `{session_id?}` |
|
||||
| `agent.stopped` | Agent session ended | `{reason, outcome?}` |
|
||||
| `mode.degraded` | System entered degraded mode | `{reason}` |
|
||||
| `mode.normal` | System returned to normal | `{}` |
|
||||
|
||||
### Creating Events
|
||||
|
||||
```bash
|
||||
# Mute deacon patrol
|
||||
bd create --type=event --event-type=patrol.muted \
|
||||
--actor=human:overseer --target=agent:deacon \
|
||||
--payload='{"reason":"fixing convoy deadlock","until":"gt-abc1"}'
|
||||
|
||||
# System entered degraded mode
|
||||
bd create --type=event --event-type=mode.degraded \
|
||||
--actor=system:daemon --target=rig:greenplace \
|
||||
--payload='{"reason":"tmux unavailable"}'
|
||||
```
|
||||
|
||||
### Querying Events
|
||||
|
||||
```bash
|
||||
# Recent events for an agent
|
||||
bd list --type=event --target=agent:deacon --limit=10
|
||||
|
||||
# All patrol state changes
|
||||
bd list --type=event --event-type=patrol.muted
|
||||
bd list --type=event --event-type=patrol.unmuted
|
||||
|
||||
# Events in the activity feed
|
||||
bd activity --follow --type=event
|
||||
```
|
||||
|
||||
## Labels-as-State Pattern
|
||||
|
||||
Events capture the full history. Labels cache the current state for fast queries.
|
||||
|
||||
### Convention
|
||||
|
||||
Labels use `<dimension>:<value>` format:
|
||||
- `patrol:muted` / `patrol:active`
|
||||
- `mode:degraded` / `mode:normal`
|
||||
- `status:idle` / `status:working`
|
||||
|
||||
### State Change Flow
|
||||
|
||||
1. Create event bead (full context, immutable)
|
||||
2. Update role bead labels (current state cache)
|
||||
|
||||
```bash
|
||||
# Mute patrol
|
||||
bd create --type=event --event-type=patrol.muted ...
|
||||
bd update role-deacon --add-label=patrol:muted --remove-label=patrol:active
|
||||
|
||||
# Unmute patrol
|
||||
bd create --type=event --event-type=patrol.unmuted ...
|
||||
bd update role-deacon --add-label=patrol:active --remove-label=patrol:muted
|
||||
```
|
||||
|
||||
### Querying Current State
|
||||
|
||||
```bash
|
||||
# Is deacon patrol muted?
|
||||
bd show role-deacon | grep patrol:
|
||||
|
||||
# All agents with muted patrol
|
||||
bd list --type=role --label=patrol:muted
|
||||
|
||||
# All agents in degraded mode
|
||||
bd list --type=role --label=mode:degraded
|
||||
```
|
||||
|
||||
## Configuration vs State
|
||||
|
||||
| Type | Storage | Example |
|
||||
|------|---------|---------|
|
||||
| **Static config** | TOML files | Daemon tick interval |
|
||||
| **Operational state** | Beads (events + labels) | Patrol muted |
|
||||
| **Runtime flags** | Marker files | `.deacon-disabled` |
|
||||
|
||||
Static config rarely changes and doesn't need history.
|
||||
Operational state changes at runtime and benefits from audit trail.
|
||||
Marker files are fast checks that can trigger deeper beads queries.
|
||||
|
||||
## Commands Summary
|
||||
|
||||
```bash
|
||||
# Create operational event
|
||||
bd create --type=event --event-type=<type> \
|
||||
--actor=<entity> --target=<entity> --payload='<json>'
|
||||
|
||||
# Update state label
|
||||
bd update <role-bead> --add-label=<dim>:<val> --remove-label=<dim>:<old>
|
||||
|
||||
# Query current state
|
||||
bd list --type=role --label=<dim>:<val>
|
||||
|
||||
# Query state history
|
||||
bd list --type=event --target=<entity>
|
||||
|
||||
# Boot management
|
||||
gt dog status boot
|
||||
gt dog call boot
|
||||
gt dog prime boot
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Events are the source of truth. Labels are the cache.*
|
||||
300
docs/design/property-layers.md
Normal file
300
docs/design/property-layers.md
Normal file
@@ -0,0 +1,300 @@
|
||||
# Property Layers: Multi-Level Configuration
|
||||
|
||||
> Implementation guide for Gas Town's configuration system.
|
||||
> Created: 2025-01-06
|
||||
|
||||
## Overview
|
||||
|
||||
Gas Town uses a layered property system for configuration. Properties are
|
||||
looked up through multiple layers, with earlier layers overriding later ones.
|
||||
This enables both local control and global coordination.
|
||||
|
||||
## The Four Layers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 1. WISP LAYER (transient, town-local) │
|
||||
│ Location: <rig>/.beads-wisp/config/ │
|
||||
│ Synced: Never │
|
||||
│ Use: Temporary local overrides │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│ if missing
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 2. RIG BEAD LAYER (persistent, synced globally) │
|
||||
│ Location: <rig>/.beads/ (rig identity bead labels) │
|
||||
│ Synced: Via git (all clones see it) │
|
||||
│ Use: Project-wide operational state │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│ if missing
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 3. TOWN DEFAULTS │
|
||||
│ Location: ~/gt/config.json or ~/gt/.beads/ │
|
||||
│ Synced: N/A (per-town) │
|
||||
│ Use: Town-wide policies │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│ if missing
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 4. SYSTEM DEFAULTS (compiled in) │
|
||||
│ Use: Fallback when nothing else specified │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Lookup Behavior
|
||||
|
||||
### Override Semantics (Default)
|
||||
|
||||
For most properties, the first non-nil value wins:
|
||||
|
||||
```go
|
||||
func GetConfig(key string) interface{} {
|
||||
if val := wisp.Get(key); val != nil {
|
||||
if val == Blocked { return nil }
|
||||
return val
|
||||
}
|
||||
if val := rigBead.GetLabel(key); val != nil {
|
||||
return val
|
||||
}
|
||||
if val := townDefaults.Get(key); val != nil {
|
||||
return val
|
||||
}
|
||||
return systemDefaults[key]
|
||||
}
|
||||
```
|
||||
|
||||
### Stacking Semantics (Integers)
|
||||
|
||||
For integer properties, values from wisp and bead layers **add** to the base:
|
||||
|
||||
```go
|
||||
func GetIntConfig(key string) int {
|
||||
base := getBaseDefault(key) // Town or system default
|
||||
beadAdj := rigBead.GetInt(key) // 0 if missing
|
||||
wispAdj := wisp.GetInt(key) // 0 if missing
|
||||
return base + beadAdj + wispAdj
|
||||
}
|
||||
```
|
||||
|
||||
This enables temporary adjustments without changing the base value.
|
||||
|
||||
### Blocking Inheritance
|
||||
|
||||
You can explicitly block a property from being inherited:
|
||||
|
||||
```bash
|
||||
gt rig config set gastown auto_restart --block
|
||||
```
|
||||
|
||||
This creates a "blocked" marker in the wisp layer. Even if the rig bead
|
||||
or defaults say `auto_restart: true`, the lookup returns nil.
|
||||
|
||||
## Rig Identity Beads
|
||||
|
||||
Each rig has an identity bead for operational state:
|
||||
|
||||
```yaml
|
||||
id: gt-rig-gastown
|
||||
type: rig
|
||||
name: gastown
|
||||
repo: git@github.com:steveyegge/gastown.git
|
||||
prefix: gt
|
||||
|
||||
labels:
|
||||
- status:operational
|
||||
- priority:normal
|
||||
```
|
||||
|
||||
These beads sync via git, so all clones of the rig see the same state.
|
||||
|
||||
## Two-Level Rig Control
|
||||
|
||||
### Level 1: Park (Local, Ephemeral)
|
||||
|
||||
```bash
|
||||
gt rig park gastown # Stop services, daemon won't restart
|
||||
gt rig unpark gastown # Allow services to run
|
||||
```
|
||||
|
||||
- Stored in wisp layer (`.beads-wisp/config/`)
|
||||
- Only affects this town
|
||||
- Disappears on cleanup
|
||||
- Use: Local maintenance, debugging
|
||||
|
||||
### Level 2: Dock (Global, Persistent)
|
||||
|
||||
```bash
|
||||
gt rig dock gastown # Set status:docked label on rig bead
|
||||
gt rig undock gastown # Remove label
|
||||
```
|
||||
|
||||
- Stored on rig identity bead
|
||||
- Syncs to all clones via git
|
||||
- Permanent until explicitly changed
|
||||
- Use: Project-wide maintenance, coordinated downtime
|
||||
|
||||
### Daemon Behavior
|
||||
|
||||
The daemon checks both levels before auto-restarting:
|
||||
|
||||
```go
|
||||
func shouldAutoRestart(rig *Rig) bool {
|
||||
status := rig.GetConfig("status")
|
||||
if status == "parked" || status == "docked" {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration Keys
|
||||
|
||||
| Key | Type | Behavior | Description |
|
||||
|-----|------|----------|-------------|
|
||||
| `status` | string | Override | operational/parked/docked |
|
||||
| `auto_restart` | bool | Override | Daemon auto-restart behavior |
|
||||
| `max_polecats` | int | Override | Maximum concurrent polecats |
|
||||
| `priority_adjustment` | int | **Stack** | Scheduling priority modifier |
|
||||
| `maintenance_window` | string | Override | When maintenance allowed |
|
||||
| `dnd` | bool | Override | Do not disturb mode |
|
||||
|
||||
## Commands
|
||||
|
||||
### View Configuration
|
||||
|
||||
```bash
|
||||
gt rig config show gastown # Show effective config (all layers)
|
||||
gt rig config show gastown --layer # Show which layer each value comes from
|
||||
```
|
||||
|
||||
### Set Configuration
|
||||
|
||||
```bash
|
||||
# Set in wisp layer (local, ephemeral)
|
||||
gt rig config set gastown key value
|
||||
|
||||
# Set in bead layer (global, permanent)
|
||||
gt rig config set gastown key value --global
|
||||
|
||||
# Block inheritance
|
||||
gt rig config set gastown key --block
|
||||
|
||||
# Clear from wisp layer
|
||||
gt rig config unset gastown key
|
||||
```
|
||||
|
||||
### Rig Lifecycle
|
||||
|
||||
```bash
|
||||
gt rig park gastown # Local: stop + prevent restart
|
||||
gt rig unpark gastown # Local: allow restart
|
||||
|
||||
gt rig dock gastown # Global: mark as offline
|
||||
gt rig undock gastown # Global: mark as operational
|
||||
|
||||
gt rig status gastown # Show current state
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Temporary Priority Boost
|
||||
|
||||
```bash
|
||||
# Base priority: 0 (from defaults)
|
||||
# Give this rig temporary priority boost for urgent work
|
||||
|
||||
gt rig config set gastown priority_adjustment 10
|
||||
|
||||
# Effective priority: 0 + 10 = 10
|
||||
# When done, clear it:
|
||||
|
||||
gt rig config unset gastown priority_adjustment
|
||||
```
|
||||
|
||||
### Local Maintenance
|
||||
|
||||
```bash
|
||||
# I'm upgrading the local clone, don't restart services
|
||||
gt rig park gastown
|
||||
|
||||
# ... do maintenance ...
|
||||
|
||||
gt rig unpark gastown
|
||||
```
|
||||
|
||||
### Project-Wide Maintenance
|
||||
|
||||
```bash
|
||||
# Major refactor in progress, all clones should pause
|
||||
gt rig dock gastown
|
||||
|
||||
# Syncs via git - other towns see the rig as docked
|
||||
bd sync
|
||||
|
||||
# When done:
|
||||
gt rig undock gastown
|
||||
bd sync
|
||||
```
|
||||
|
||||
### Block Auto-Restart Locally
|
||||
|
||||
```bash
|
||||
# Rig bead says auto_restart: true
|
||||
# But I'm debugging and don't want that here
|
||||
|
||||
gt rig config set gastown auto_restart --block
|
||||
|
||||
# Now auto_restart returns nil for this town only
|
||||
```
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Wisp Storage
|
||||
|
||||
Wisp config stored in `.beads-wisp/config/<rig>.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"rig": "gastown",
|
||||
"values": {
|
||||
"status": "parked",
|
||||
"priority_adjustment": 10
|
||||
},
|
||||
"blocked": ["auto_restart"]
|
||||
}
|
||||
```
|
||||
|
||||
### Rig Bead Labels
|
||||
|
||||
Rig operational state stored as labels on the rig identity bead:
|
||||
|
||||
```bash
|
||||
bd label add gt-rig-gastown status:docked
|
||||
bd label remove gt-rig-gastown status:docked
|
||||
```
|
||||
|
||||
### Daemon Integration
|
||||
|
||||
The daemon's lifecycle manager checks config before starting services:
|
||||
|
||||
```go
|
||||
func (d *Daemon) maybeStartRigServices(rig string) {
|
||||
r := d.getRig(rig)
|
||||
|
||||
status := r.GetConfig("status")
|
||||
if status == "parked" || status == "docked" {
|
||||
log.Info("Rig %s is offline, skipping auto-start", rig)
|
||||
return
|
||||
}
|
||||
|
||||
d.ensureWitness(rig)
|
||||
d.ensureRefinery(rig)
|
||||
}
|
||||
```
|
||||
|
||||
## Related Documents
|
||||
|
||||
- `~/gt/docs/hop/PROPERTY-LAYERS.md` - Strategic architecture
|
||||
- `wisp-architecture.md` - Wisp system design
|
||||
- `agent-as-bead.md` - Agent identity beads (similar pattern)
|
||||
306
docs/design/watchdog-chain.md
Normal file
306
docs/design/watchdog-chain.md
Normal file
@@ -0,0 +1,306 @@
|
||||
# Daemon/Boot/Deacon Watchdog Chain
|
||||
|
||||
> Autonomous health monitoring and recovery in Gas Town.
|
||||
|
||||
## Overview
|
||||
|
||||
Gas Town uses a three-tier watchdog chain for autonomous health monitoring:
|
||||
|
||||
```
|
||||
Daemon (Go process) ← Dumb transport, 3-min heartbeat
|
||||
│
|
||||
└─► Boot (AI agent) ← Intelligent triage, fresh each tick
|
||||
│
|
||||
└─► Deacon (AI agent) ← Continuous patrol, long-running
|
||||
│
|
||||
└─► Witnesses & Refineries ← Per-rig agents
|
||||
```
|
||||
|
||||
**Key insight**: The daemon is mechanical (can't reason), but health decisions need
|
||||
intelligence (is the agent stuck or just thinking?). Boot bridges this gap.
|
||||
|
||||
## Design Rationale: Why Two Agents?
|
||||
|
||||
### The Problem
|
||||
|
||||
The daemon needs to ensure the Deacon is healthy, but:
|
||||
|
||||
1. **Daemon can't reason** - It's Go code following the ZFC principle (don't reason
|
||||
about other agents). It can check "is session alive?" but not "is agent stuck?"
|
||||
|
||||
2. **Waking costs context** - Each time you spawn an AI agent, you consume context
|
||||
tokens. In idle towns, waking Deacon every 3 minutes wastes resources.
|
||||
|
||||
3. **Observation requires intelligence** - Distinguishing "agent composing large
|
||||
artifact" from "agent hung on tool prompt" requires reasoning.
|
||||
|
||||
### The Solution: Boot as Triage
|
||||
|
||||
Boot is a narrow, ephemeral AI agent that:
|
||||
- Runs fresh each daemon tick (no accumulated context debt)
|
||||
- Makes a single decision: should Deacon wake?
|
||||
- Exits immediately after deciding
|
||||
|
||||
This gives us intelligent triage without the cost of keeping a full AI running.
|
||||
|
||||
### Why Not Merge Boot into Deacon?
|
||||
|
||||
We could have Deacon handle its own "should I be awake?" logic, but:
|
||||
|
||||
1. **Deacon can't observe itself** - A hung Deacon can't detect it's hung
|
||||
2. **Context accumulation** - Deacon runs continuously; Boot restarts fresh
|
||||
3. **Cost in idle towns** - Boot only costs tokens when it runs; Deacon costs
|
||||
tokens constantly if kept alive
|
||||
|
||||
### Why Not Replace with Go Code?
|
||||
|
||||
The daemon could directly monitor agents without AI, but:
|
||||
|
||||
1. **Can't observe panes** - Go code can't interpret tmux output semantically
|
||||
2. **Can't distinguish stuck vs working** - No reasoning about agent state
|
||||
3. **Escalation is complex** - When to notify? When to force-restart? AI handles
|
||||
nuanced decisions better than hardcoded thresholds
|
||||
|
||||
## Session Ownership
|
||||
|
||||
| Agent | Session Name | Location | Lifecycle |
|
||||
|-------|--------------|----------|-----------|
|
||||
| Daemon | (Go process) | `~/gt/daemon/` | Persistent, auto-restart |
|
||||
| Boot | `gt-boot` | `~/gt/deacon/dogs/boot/` | Ephemeral, fresh each tick |
|
||||
| Deacon | `hq-deacon` | `~/gt/deacon/` | Long-running, handoff loop |
|
||||
|
||||
**Critical**: Boot runs in `gt-boot`, NOT `hq-deacon`. This prevents Boot
|
||||
from conflicting with a running Deacon session.
|
||||
|
||||
## Heartbeat Mechanics
|
||||
|
||||
### Daemon Heartbeat (3 minutes)
|
||||
|
||||
The daemon runs a heartbeat tick every 3 minutes:
|
||||
|
||||
```go
|
||||
func (d *Daemon) heartbeatTick() {
|
||||
d.ensureBootRunning() // 1. Spawn Boot for triage
|
||||
d.checkDeaconHeartbeat() // 2. Belt-and-suspenders fallback
|
||||
d.ensureWitnessesRunning() // 3. Witness health (checks tmux directly)
|
||||
d.ensureRefineriesRunning() // 4. Refinery health (checks tmux directly)
|
||||
d.triggerPendingSpawns() // 5. Bootstrap polecats
|
||||
d.processLifecycleRequests() // 6. Cycle/restart requests
|
||||
// Agent state derived from tmux, not recorded in beads (gt-zecmc)
|
||||
}
|
||||
```
|
||||
|
||||
### Deacon Heartbeat (continuous)
|
||||
|
||||
The Deacon updates `~/gt/deacon/heartbeat.json` at the start of each patrol cycle:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-02T18:30:00Z",
|
||||
"cycle": 42,
|
||||
"last_action": "health-scan",
|
||||
"healthy_agents": 3,
|
||||
"unhealthy_agents": 0
|
||||
}
|
||||
```
|
||||
|
||||
### Heartbeat Freshness
|
||||
|
||||
| Age | State | Boot Action |
|
||||
|-----|-------|-------------|
|
||||
| < 5 min | Fresh | Nothing (Deacon active) |
|
||||
| 5-15 min | Stale | Nudge if pending mail |
|
||||
| > 15 min | Very stale | Wake (Deacon may be stuck) |
|
||||
|
||||
## Boot Decision Matrix
|
||||
|
||||
When Boot runs, it observes:
|
||||
- Is Deacon session alive?
|
||||
- How old is Deacon's heartbeat?
|
||||
- Is there pending mail for Deacon?
|
||||
- What's in Deacon's tmux pane?
|
||||
|
||||
Then decides:
|
||||
|
||||
| Condition | Action | Command |
|
||||
|-----------|--------|---------|
|
||||
| Session dead | START | Exit; daemon calls `ensureDeaconRunning()` |
|
||||
| Heartbeat > 15 min | WAKE | `gt nudge deacon "Boot wake: check your inbox"` |
|
||||
| Heartbeat 5-15 min + mail | NUDGE | `gt nudge deacon "Boot check-in: pending work"` |
|
||||
| Heartbeat fresh | NOTHING | Exit silently |
|
||||
|
||||
## Handoff Flow
|
||||
|
||||
### Deacon Handoff
|
||||
|
||||
The Deacon runs continuous patrol cycles. After N cycles or high context:
|
||||
|
||||
```
|
||||
End of patrol cycle:
|
||||
│
|
||||
├─ Squash wisp to digest (ephemeral → permanent)
|
||||
├─ Write summary to molecule state
|
||||
└─ gt handoff -s "Routine cycle" -m "Details"
|
||||
│
|
||||
└─ Creates mail for next session
|
||||
```
|
||||
|
||||
Next daemon tick:
|
||||
```
|
||||
Daemon → ensureDeaconRunning()
|
||||
│
|
||||
└─ Spawns fresh Deacon in gt-deacon
|
||||
│
|
||||
└─ SessionStart hook: gt mail check --inject
|
||||
│
|
||||
└─ Previous handoff mail injected
|
||||
│
|
||||
└─ Deacon reads and continues
|
||||
```
|
||||
|
||||
### Boot Handoff (Rare)
|
||||
|
||||
Boot is ephemeral - it exits after each tick. No persistent handoff needed.
|
||||
|
||||
However, Boot uses a marker file to prevent double-spawning:
|
||||
- Marker: `~/gt/deacon/dogs/boot/.boot-running` (TTL: 5 minutes)
|
||||
- Status: `~/gt/deacon/dogs/boot/.boot-status.json` (last action/result)
|
||||
|
||||
If the marker exists and is recent, daemon skips Boot spawn for that tick.
|
||||
|
||||
## Degraded Mode
|
||||
|
||||
When tmux is unavailable, Gas Town enters degraded mode:
|
||||
|
||||
| Capability | Normal | Degraded |
|
||||
|------------|--------|----------|
|
||||
| Boot runs | As AI in tmux | As Go code (mechanical) |
|
||||
| Observe panes | Yes | No |
|
||||
| Nudge agents | Yes | No |
|
||||
| Start agents | tmux sessions | Direct spawn |
|
||||
|
||||
Degraded Boot triage is purely mechanical:
|
||||
- Session dead → start
|
||||
- Heartbeat stale → restart
|
||||
- No reasoning, just thresholds
|
||||
|
||||
## Fallback Chain
|
||||
|
||||
Multiple layers ensure recovery:
|
||||
|
||||
1. **Boot triage** - Intelligent observation, first line
|
||||
2. **Daemon checkDeaconHeartbeat()** - Belt-and-suspenders if Boot fails
|
||||
3. **Tmux-based discovery** - Daemon checks tmux sessions directly (no bead state)
|
||||
4. **Human escalation** - Mail to overseer for unrecoverable states
|
||||
|
||||
## State Files
|
||||
|
||||
| File | Purpose | Updated By |
|
||||
|------|---------|-----------|
|
||||
| `deacon/heartbeat.json` | Deacon freshness | Deacon (each cycle) |
|
||||
| `deacon/dogs/boot/.boot-running` | Boot in-progress marker | Boot spawn |
|
||||
| `deacon/dogs/boot/.boot-status.json` | Boot last action | Boot triage |
|
||||
| `deacon/health-check-state.json` | Agent health tracking | `gt deacon health-check` |
|
||||
| `daemon/daemon.log` | Daemon activity | Daemon |
|
||||
| `daemon/daemon.pid` | Daemon process ID | Daemon startup |
|
||||
|
||||
## Debugging
|
||||
|
||||
```bash
|
||||
# Check Deacon heartbeat
|
||||
cat ~/gt/deacon/heartbeat.json | jq .
|
||||
|
||||
# Check Boot status
|
||||
cat ~/gt/deacon/dogs/boot/.boot-status.json | jq .
|
||||
|
||||
# View daemon log
|
||||
tail -f ~/gt/daemon/daemon.log
|
||||
|
||||
# Manual Boot run
|
||||
gt boot triage
|
||||
|
||||
# Manual Deacon health check
|
||||
gt deacon health-check
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Boot Spawns in Wrong Session
|
||||
|
||||
**Symptom**: Boot runs in `hq-deacon` instead of `gt-boot`
|
||||
**Cause**: Session name confusion in spawn code
|
||||
**Fix**: Ensure `gt boot triage` specifies `--session=gt-boot`
|
||||
|
||||
### Zombie Sessions Block Restart
|
||||
|
||||
**Symptom**: tmux session exists but Claude is dead
|
||||
**Cause**: Daemon checks session existence, not process health
|
||||
**Fix**: Kill zombie sessions before recreating: `gt session kill hq-deacon`
|
||||
|
||||
### Status Shows Wrong State
|
||||
|
||||
**Symptom**: `gt status` shows wrong state for agents
|
||||
**Cause**: Previously bead state and tmux state could diverge
|
||||
**Fix**: As of gt-zecmc, status derives state from tmux directly (no bead state for
|
||||
observable conditions like running/stopped). Non-observable states (stuck, awaiting-gate)
|
||||
are still stored in beads.
|
||||
|
||||
## Design Decision: Keep Separation
|
||||
|
||||
The issue [gt-1847v] considered three options:
|
||||
|
||||
### Option A: Keep Boot/Deacon Separation (CHOSEN)
|
||||
|
||||
- Boot is ephemeral, spawns fresh each heartbeat
|
||||
- Boot runs in `gt-boot`, exits after triage
|
||||
- Deacon runs in `hq-deacon`, continuous patrol
|
||||
- Clear session boundaries, clear lifecycle
|
||||
|
||||
**Verdict**: This is the correct design. The implementation needs fixing, not the architecture.
|
||||
|
||||
### Option B: Merge Boot into Deacon (Rejected)
|
||||
|
||||
- Single `hq-deacon` session handles everything
|
||||
- Deacon checks "should I be awake?" internally
|
||||
|
||||
**Why rejected**:
|
||||
- Deacon can't observe itself (hung Deacon can't detect hang)
|
||||
- Context accumulates even when idle (cost in quiet towns)
|
||||
- No external watchdog means no recovery from Deacon failure
|
||||
|
||||
### Option C: Replace with Go Watchdog (Rejected)
|
||||
|
||||
- Daemon directly monitors witness/refinery
|
||||
- No Boot, no Deacon AI for health checks
|
||||
- AI agents only for complex decisions
|
||||
|
||||
**Why rejected**:
|
||||
- Go code can't interpret tmux pane output semantically
|
||||
- Can't distinguish "stuck" from "thinking deeply"
|
||||
- Loses the intelligent triage that makes the system resilient
|
||||
- Escalation decisions are nuanced (when to notify? force-restart?)
|
||||
|
||||
### Implementation Fixes Needed
|
||||
|
||||
The separation is correct; these bugs need fixing:
|
||||
|
||||
1. **Session confusion** (gt-sgzsb): Boot spawns in wrong session
|
||||
2. **Zombie blocking** (gt-j1i0r): Daemon can't kill zombie sessions
|
||||
3. ~~**Status mismatch** (gt-doih4): Bead vs tmux state divergence~~ → FIXED in gt-zecmc
|
||||
4. **Ensure semantics** (gt-ekc5u): Start should kill zombies first
|
||||
|
||||
## Summary
|
||||
|
||||
The watchdog chain provides autonomous recovery:
|
||||
|
||||
- **Daemon**: Mechanical heartbeat, spawns Boot
|
||||
- **Boot**: Intelligent triage, decides Deacon fate
|
||||
- **Deacon**: Continuous patrol, monitors workers
|
||||
|
||||
Boot exists because the daemon can't reason and Deacon can't observe itself.
|
||||
The separation costs complexity but enables:
|
||||
|
||||
1. **Intelligent triage** without constant AI cost
|
||||
2. **Fresh context** for each triage decision
|
||||
3. **Graceful degradation** when tmux unavailable
|
||||
4. **Multiple fallback** layers for reliability
|
||||
Reference in New Issue
Block a user