Files
gastown/.beads/formulas/mol-witness-patrol.formula.toml
2025-12-29 21:58:13 -08:00

370 lines
10 KiB
TOML

description = """
Per-rig worker monitor patrol loop.
The Witness is the Pit Boss for your rig. You watch polecats, nudge them toward
completion, verify clean git state before kills, and escalate stuck workers.
**You do NOT do implementation work.** Your job is oversight, not coding.
## Design Philosophy
This patrol follows Gas Town principles:
- **Discovery over tracking**: Observe reality each cycle, don't maintain state
- **Events over state**: POLECAT_DONE mail triggers cleanup wisps
- **Cleanup wisps as finalizers**: Pending cleanups are wisps, not queue entries
- **Task tool for parallelism**: Subagents inspect polecats, not molecule arms
## Patrol Shape (Linear, Deacon-style)
```
inbox-check ─► process-cleanups ─► check-refinery ─► survey-workers
┌──────────────────────────────────────────────────┘
check-swarm ─► ping-deacon ─► context-check ─► loop-or-exit
```
No dynamic arms. No fanout gates. No persistent nudge counters.
State is discovered each cycle from reality (tmux, beads, mail)."""
formula = "mol-witness-patrol"
version = 1
[[steps]]
id = "inbox-check"
title = "Process witness mail"
description = """
Check inbox and handle messages.
```bash
gt mail inbox
```
For each message:
**POLECAT_DONE / LIFECYCLE:Shutdown**:
Create a cleanup wisp for this polecat:
```bash
bd create --wisp --title "cleanup:<polecat>" \
--description "Verify and cleanup polecat <name>" \
--labels cleanup,polecat:<name>,state:pending
```
The wisp's existence IS the pending cleanup. Process in next step.
Mark mail as read.
**MERGED**:
A branch was merged successfully. Complete the cleanup.
```bash
# Find the cleanup wisp for this polecat
bd list --wisp --labels=polecat:<name>,state:merge-requested --status=open
# If found, proceed with full polecat nuke:
# - Kill Claude session
# - Delete worktree
# - Delete branch
# - Remove agent bead
gt polecat nuke <name>
# Burn the cleanup wisp
bd close <wisp-id>
```
Mark mail as read.
**HELP / Blocked**:
Assess the request. Can you help? If not, escalate to Mayor:
```bash
gt mail send mayor/ -s "Escalation: <polecat> needs help" -m "<details>"
```
**HANDOFF**:
Read predecessor context. Continue from where they left off.
**SWARM_START**:
Mayor initiating batch polecat work. Initialize swarm tracking.
```bash
# Parse swarm info from mail body: {"swarm_id": "batch-123", "beads": ["bd-a", "bd-b"]}
bd create --wisp --title "swarm:<swarm_id>" \
--description "Tracking batch: <swarm_id>" \
--labels swarm,swarm_id:<swarm_id>,total:<N>,completed:0,start:<timestamp>
```
Mark mail as read."""
[[steps]]
id = "process-cleanups"
title = "Process pending cleanup wisps"
needs = ["inbox-check"]
description = """
Find and process cleanup wisps (the finalizer pattern).
```bash
# Find all cleanup wisps
bd list --wisp --labels=cleanup --status=open
```
For each cleanup wisp, check its state label:
## State: pending (needs verification → MERGE_READY)
1. **Extract polecat name** from wisp title/labels
2. **Pre-kill verification**:
```bash
cd polecats/<name>
git status # Must be clean
git log origin/main..HEAD # Commits should be pushed
bd show <assigned-issue> # Issue closed or deferred
```
3. **Get branch and issue info**:
```bash
# Get current branch
git rev-parse --abbrev-ref HEAD
# Get the hook_bead from agent bead
bd show <agent-bead> # Look for hook_bead field
```
4. **Verify productive work** (ZFC - you make the call):
- Check git log for commits mentioning the issue
- Legitimate exceptions: already fixed, duplicate, deferred
- If closing as 'done' with no commits, flag for review
5. **If clean**: Send MERGE_READY to refinery
```bash
gt mail send <rig>/refinery -s "MERGE_READY <polecat>" -m "Branch: <branch>
Issue: <issue-id>
Polecat: <polecat>
Verified: clean git state, issue closed"
```
Then update the wisp to merge-requested state:
```bash
bd update <wisp-id> --labels cleanup,polecat:<name>,state:merge-requested
```
**Do NOT kill the polecat yet** - wait for MERGED confirmation from refinery.
6. **If dirty**: Leave wisp open, log the issue, retry next cycle.
## State: merge-requested (waiting for refinery)
Skip - waiting for MERGED mail from refinery. The inbox-check step handles
MERGED messages and completes these cleanup wisps.
**Parallelism**: Use Task tool subagents to process multiple cleanups concurrently.
Each cleanup is independent - perfect for parallel execution."""
[[steps]]
id = "check-refinery"
title = "Ensure refinery is alive"
needs = ["process-cleanups"]
description = """
Ensure the refinery is alive and processing merge requests.
```bash
# Check if refinery session exists
gt session status <rig>/refinery
# Check for pending merge requests
bd list --type=merge-request --status=open
```
If MRs waiting AND refinery not running:
```bash
gt session start <rig>/refinery
gt mail send <rig>/refinery -s "PATROL: Wake up" \
-m "Merge requests in queue. Please process."
```
If refinery running but queue stale (>30 min), send nudge."""
[[steps]]
id = "survey-workers"
title = "Inspect all active polecats"
needs = ["check-refinery"]
description = """
Survey all polecats using agent beads (ZFC: trust what agents report).
**Step 1: List polecat agent beads**
```bash
bd list --type=agent --json
```
Filter the JSON output for entries where description contains `role_type: polecat`.
Each polecat agent bead has fields in its description:
- `role_type: polecat`
- `rig: <rig-name>`
- `agent_state: running|idle|stuck|done`
- `hook_bead: <current-work-id>`
**Step 2: For each polecat, check agent_state**
| agent_state | Meaning | Action |
|-------------|---------|--------|
| running | Actively working | Check progress (Step 3) |
| idle | No work assigned | Skip (no action needed) |
| stuck | Self-reported stuck | Handle stuck protocol |
| done | Work complete | Verify cleanup triggered (see Step 4a) |
**Step 3: For running polecats, assess progress**
Check the hook_bead field to see what they're working on:
```bash
bd show <hook_bead> # See current step/issue
```
You can also verify they're responsive:
```bash
tmux capture-pane -t gt-<rig>-<name> -p | tail -20
```
Look for:
- Recent tool activity → making progress
- Idle at prompt → may need nudge
- Error messages → may need help
**Step 4: Decide action**
| Observation | Action |
|-------------|--------|
| agent_state=running, recent activity | None |
| agent_state=running, idle 5-15 min | Gentle nudge |
| agent_state=running, idle 15+ min | Direct nudge with deadline |
| agent_state=stuck | Assess and help or escalate |
| agent_state=done | Verify cleanup triggered (see Step 4a) |
**Step 4a: Handle agent_state=done**
Check if a cleanup wisp exists for this polecat:
```bash
bd list --wisp --labels=polecat:<name> --status=open
```
If cleanup wisp exists:
- state:pending → Will be processed in process-cleanups
- state:merge-requested → Waiting for refinery MERGED response
If NO cleanup wisp exists (POLECAT_DONE mail missed):
Create one to trigger the cleanup flow:
```bash
bd create --wisp --title "cleanup:<polecat>" \
--description "Discovered done polecat without cleanup wisp" \
--labels cleanup,polecat:<name>,state:pending
```
This ensures done polecats eventually get cleaned up even if mail was lost.
**Step 5: Execute nudges**
```bash
gt nudge <rig>/polecats/<name> "How's progress? Need help?"
```
**Step 6: Escalate if needed**
```bash
gt mail send mayor/ -s "Escalation: <polecat> stuck" \\
-m "Polecat <name> reports stuck. Please intervene."
```
**Parallelism**: Use Task tool subagents to inspect multiple polecats concurrently.
**ZFC Principle**: Trust agent_state from beads. Don't infer state from PID/tmux."""
[[steps]]
id = "check-swarm-completion"
title = "Check if active swarm is complete"
needs = ["survey-workers"]
description = """
If Mayor started a batch (SWARM_START), check if all polecats have completed.
**Step 1: Find active swarm tracking wisps**
```bash
bd list --wisp --labels=swarm --status=open
```
If no active swarm, skip this step.
**Step 2: Count completed polecats for this swarm**
Extract from wisp labels: swarm_id, total, completed, start timestamp.
Check how many cleanup wisps have been closed for this swarm's polecats.
**Step 3: If all complete, notify Mayor**
```bash
gt mail send mayor/ -s "SWARM_COMPLETE: <swarm_id>" -m "All <total> polecats merged.
Duration: <minutes> minutes
Swarm: <swarm_id>"
# Close the swarm tracking wisp
bd close <swarm-wisp-id> --reason "All polecats merged"
```
Note: Runs every patrol cycle. Notification sent exactly once when all complete."""
[[steps]]
id = "ping-deacon"
title = "Ping Deacon for health check"
needs = ["check-swarm-completion"]
description = """
Send WITNESS_PING to Deacon for second-order monitoring.
The Witness fleet collectively monitors Deacon health - this prevents the
"who watches the watchers" problem. If Deacon dies, Witnesses detect it.
**Step 1: Send ping**
```bash
gt mail send deacon/ -s "WITNESS_PING <rig>" -m "Rig: <rig>
Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)
Patrol: <cycle-number>"
```
**Step 2: Check Deacon health**
```bash
# Check Deacon agent bead for last_activity
bd list --type=agent --json | jq '.[] | select(.description | contains("deacon"))'
```
Look at the `last_activity` timestamp. If stale (>5 minutes since last update):
- Deacon may be dead or stuck
**Step 3: Escalate if needed**
```bash
# If Deacon appears down
gt mail send mayor/ -s "ALERT: Deacon appears unresponsive" \
-m "No Deacon activity for >5 minutes.
Last seen: <timestamp>
Witness: <rig>/witness"
```
Note: Multiple Witnesses may send this alert. Mayor should handle deduplication."""
[[steps]]
id = "context-check"
title = "Check own context limit"
needs = ["ping-deacon"]
description = """
Check own context usage.
If context is HIGH (>80%):
- Ensure any notes are written to handoff mail
- Prepare for session restart
If context is LOW:
- Can continue patrolling"""
[[steps]]
id = "loop-or-exit"
title = "Loop or exit for respawn"
needs = ["context-check"]
description = """
End of patrol cycle decision.
**If context LOW**:
- Sleep briefly to avoid tight loop (30-60 seconds)
- Return to inbox-check step
- Continue patrolling
**If context HIGH**:
- Write handoff mail to self with any notable observations:
```bash
gt handoff -s "Witness patrol handoff" -m "<observations>"
```
- Exit cleanly (daemon respawns fresh Witness)
The daemon ensures Witness is always running."""