gastown/.beads/formulas/mol-witness-patrol.formula.toml

description = """
Per-rig worker monitor patrol loop.

The Witness is the Pit Boss for your rig. You watch polecats, nudge them toward
completion, verify clean git state before kills, and escalate stuck workers.

**You do NOT do implementation work.** Your job is oversight, not coding.

## Design Philosophy

This patrol follows Gas Town principles:
- **Discovery over tracking**: Observe reality each cycle, don't maintain state
- **Events over state**: POLECAT_DONE mail triggers cleanup wisps
- **Cleanup wisps as finalizers**: Pending cleanups are wisps, not queue entries
- **Task tool for parallelism**: Subagents inspect polecats, not molecule arms

## Patrol Shape (Linear, Deacon-style)

```
inbox-check ─► process-cleanups ─► check-refinery ─► survey-workers
                                                            │
         ┌──────────────────────────────────────────────────┘
         ▼
  check-swarm ─► ping-deacon ─► context-check ─► loop-or-exit
```

No dynamic arms. No fanout gates. No persistent nudge counters.
State is discovered each cycle from reality (tmux, beads, mail)."""
formula = "mol-witness-patrol"
version = 1

[[steps]]
id = "inbox-check"
title = "Process witness mail"
description = """
Check inbox and handle messages.

```bash
gt mail inbox
```

For each message:

**POLECAT_DONE / LIFECYCLE:Shutdown**:
Create a cleanup wisp for this polecat:
```bash
bd create --wisp --title "cleanup:<polecat>" \
  --description "Verify and cleanup polecat <name>" \
  --labels cleanup,polecat:<name>,state:pending
```
The wisp's existence IS the pending cleanup. Process in next step.
Mark mail as read.

**MERGED**:
A branch was merged successfully. Complete the cleanup.
```bash
# Find the cleanup wisp for this polecat
bd list --wisp --labels=polecat:<name>,state:merge-requested --status=open

# If found, proceed with full polecat nuke:
# - Kill Claude session
# - Delete worktree
# - Delete branch
# - Remove agent bead
gt polecat nuke <name>

# Burn the cleanup wisp
bd close <wisp-id>
```
Mark mail as read.

**HELP / Blocked**:
Assess the request. Can you help? If not, escalate to Mayor:
```bash
gt mail send mayor/ -s "Escalation: <polecat> needs help" -m "<details>"
```

**HANDOFF**:
Read predecessor context. Continue from where they left off.

**SWARM: <epic-id>** (subject pattern):
Mayor assigning swarm coordination. The epic IS the swarm - beads tracks everything.
```bash
# Parse epic ID from subject: "SWARM: gt-epic-123"
# Create tracking wisp with epic_id in labels for dispatch step
bd create --wisp --title "swarm:<epic_id>" \
  --description "Coordinating swarm for epic: <epic_id>" \
  --labels swarm,epic_id:<epic_id>,start:<timestamp>
```
The dispatch-swarm-work step will use epic_id to query `bd ready --parent=<epic_id>`.
Mark mail as read."""

[[steps]]
id = "process-cleanups"
title = "Process pending cleanup wisps"
needs = ["inbox-check"]
description = """
Find and process cleanup wisps (the finalizer pattern).

```bash
# Find all cleanup wisps
bd list --wisp --labels=cleanup --status=open
```

For each cleanup wisp, check its state label:

## State: pending (needs verification → MERGE_READY)

1. **Extract polecat name** from wisp title/labels

2. **Pre-kill verification**:
```bash
cd polecats/<name>
git status                    # Must be clean
git log origin/main..HEAD     # Commits should be pushed
bd show <assigned-issue>      # Issue closed or deferred
```

3. **Get branch and issue info**:
```bash
# Get current branch
git rev-parse --abbrev-ref HEAD

# Get the hook_bead from agent bead
bd show <agent-bead>   # Look for hook_bead field
```

4. **Verify productive work** (ZFC - you make the call):
   - Check git log for commits mentioning the issue
   - Legitimate exceptions: already fixed, duplicate, deferred
   - If closing as 'done' with no commits, flag for review

5. **If clean**: Send MERGE_READY to refinery
```bash
gt mail send <rig>/refinery -s "MERGE_READY <polecat>" -m "Branch: <branch>
Issue: <issue-id>
Polecat: <polecat>
Verified: clean git state, issue closed"
```
Then update the wisp to merge-requested state:
```bash
bd update <wisp-id> --labels cleanup,polecat:<name>,state:merge-requested
```
**Do NOT kill the polecat yet** - wait for MERGED confirmation from refinery.

6. **If dirty**: Leave wisp open, log the issue, retry next cycle.

## State: merge-requested (waiting for refinery)

Skip - waiting for MERGED mail from refinery. The inbox-check step handles
MERGED messages and completes these cleanup wisps.

**Parallelism**: Use Task tool subagents to process multiple cleanups concurrently.
Each cleanup is independent - perfect for parallel execution."""

[[steps]]
id = "check-refinery"
title = "Ensure refinery is alive"
needs = ["process-cleanups"]
description = """
Ensure the refinery is alive and processing merge requests.

```bash
# Check if refinery session exists
gt session status <rig>/refinery

# Check for pending merge requests
bd list --type=merge-request --status=open
```

If MRs waiting AND refinery not running:
```bash
gt session start <rig>/refinery
gt mail send <rig>/refinery -s "PATROL: Wake up" \
  -m "Merge requests in queue. Please process."
```

If refinery running but queue stale (>30 min), send nudge."""

[[steps]]
id = "survey-workers"
title = "Inspect all active polecats"
needs = ["check-refinery"]
description = """
Survey all polecats using agent beads (ZFC: trust what agents report).

**Step 1: List polecat agent beads**

```bash
bd list --type=agent --json
```

Filter the JSON output for entries where description contains `role_type: polecat`.
Each polecat agent bead has fields in its description:
- `role_type: polecat`
- `rig: <rig-name>`
- `agent_state: running|idle|stuck|done`
- `hook_bead: <current-work-id>`

**Step 2: For each polecat, check agent_state**

| agent_state | Meaning | Action |
|-------------|---------|--------|
| running | Actively working | Check progress (Step 3) |
| idle | No work assigned | Skip (no action needed) |
| stuck | Self-reported stuck | Handle stuck protocol |
| done | Work complete | Verify cleanup triggered (see Step 4a) |

**Step 3: For running polecats, assess progress**

Check the hook_bead field to see what they're working on:
```bash
bd show <hook_bead>  # See current step/issue
```

You can also verify they're responsive:
```bash
tmux capture-pane -t gt-<rig>-<name> -p | tail -20
```

Look for:
- Recent tool activity → making progress
- Idle at prompt → may need nudge
- Error messages → may need help

**Step 4: Decide action**

| Observation | Action |
|-------------|--------|
| agent_state=running, recent activity | None |
| agent_state=running, idle 5-15 min | Gentle nudge |
| agent_state=running, idle 15+ min | Direct nudge with deadline |
| agent_state=stuck | Assess and help or escalate |
| agent_state=done | Verify cleanup triggered (see Step 4a) |

**Step 4a: Handle agent_state=done**

Check if a cleanup wisp exists for this polecat:
```bash
bd list --wisp --labels=polecat:<name> --status=open
```

If cleanup wisp exists:
- state:pending → Will be processed in process-cleanups
- state:merge-requested → Waiting for refinery MERGED response

If NO cleanup wisp exists (POLECAT_DONE mail missed):
Create one to trigger the cleanup flow:
```bash
bd create --wisp --title "cleanup:<polecat>" \
  --description "Discovered done polecat without cleanup wisp" \
  --labels cleanup,polecat:<name>,state:pending
```
This ensures done polecats eventually get cleaned up even if mail was lost.

**Step 5: Execute nudges**
```bash
gt nudge <rig>/polecats/<name> "How's progress? Need help?"
```

**Step 6: Escalate if needed**
```bash
gt mail send mayor/ -s "Escalation: <polecat> stuck" \\
  -m "Polecat <name> reports stuck. Please intervene."
```

**Parallelism**: Use Task tool subagents to inspect multiple polecats concurrently.

**ZFC Principle**: Trust agent_state from beads. Don't infer state from PID/tmux."""

[[steps]]
id = "dispatch-swarm-work"
title = "Dispatch ready swarm tasks to idle polecats"
needs = ["survey-workers"]
description = """
If an active swarm exists, dispatch ready tasks to available polecats.

This is the core swarm coordination logic - the Witness keeps the swarm moving
by matching ready issues with idle workers.

**Step 1: Find active swarm tracking wisps**
```bash
bd list --wisp --labels=swarm --status=open
```
If no active swarm, skip this step entirely.

**Step 2: For each swarm, get ready tasks**

Extract the epic_id from the swarm wisp (stored in labels or description).
Then query the ready front:
```bash
bd ready --parent=<epic_id>
```
This returns issues that have no blockers and are ready to work.

**Step 3: Find idle polecats**

From the survey-workers step, you should have agent beads data.
Filter for polecats where:
- agent_state = idle (no work assigned)
- OR hook_bead is empty/none

```bash
bd list --type=agent --json | jq '[.[] | select(.description | contains("role_type: polecat") and contains("agent_state: idle"))]'
```

**Step 4: Dispatch ready tasks to idle polecats**

If ready tasks exist but NO idle polecats available:
- Log: "Swarm <epic_id>: N ready tasks waiting, 0 idle polecats"
- Skip dispatch this cycle (polecats will become idle after completing current work)

For each ready task (up to number of idle polecats):
```bash
gt sling <task-id> <rig>/<polecat-name>
```

This will:
- Attach the task to the polecat's hook
- Spawn the polecat session if not running
- Inject work context

Example:
```bash
# Ready task: gt-abc.3, Idle polecat: toast
gt sling gt-abc.3 gastown/toast
```

**Step 5: Log dispatch activity**

For observability, record what was dispatched:
```bash
# Update swarm wisp with dispatch info (optional)
echo "Dispatched gt-abc.3 -> toast at $(date)"
```

**Parallelism**: Dispatch calls can be parallelized if multiple idle polecats
and multiple ready tasks exist.

**Rate limiting**: Don't dispatch more than max_active_polecats (typically 3-5)
to avoid overwhelming the rig/refinery."""

[[steps]]
id = "check-swarm-completion"
title = "Check if active swarm is complete"
needs = ["dispatch-swarm-work"]
description = """
Check if any active swarm has completed (all tasks closed).

**Discovery over tracking**: Don't count - QUERY. The beads state IS the source of truth.

**Step 1: Find active swarm tracking wisps**
```bash
bd list --wisp --labels=swarm --status=open
```
If no active swarm, skip this step.

**Step 2: For each swarm, check completion via beads**

Extract epic_id from wisp labels, then check if all children are closed:
```bash
# Get swarm status from beads (shows ready/active/blocked/completed)
bd swarm status <epic_id> --json
```

Swarm is complete when:
- ready = [] (empty)
- active = [] (empty)
- blocked = [] (empty)
- All children are closed

Alternative direct check:
```bash
# If bd ready --parent returns nothing AND no active issues, swarm is done
bd ready --parent=<epic_id>   # Should return nothing
bd list --parent=<epic_id> --status=in_progress  # Should return nothing
```

**Step 3: If complete, close the epic and notify Mayor**

When all children are closed, the swarm is complete. Close everything:
```bash
# 1. Close the epic itself (swarm molecule)
bd close <epic_id> --reason "All swarm tasks completed"

# 2. Close the swarm tracking wisp
bd close <swarm-wisp-id> --reason "All tasks completed"

# 3. Notify Mayor of completion
gt mail send mayor/ -s "SWARM_COMPLETE: <epic_id>" -m "Epic: <epic_id>
All tasks completed and merged.
Duration: <minutes> minutes
Closed by: <rig>/witness"
```

**Activity Feed**: The `bd close` commands create activity events automatically.
The SWARM_COMPLETE mail provides a human-readable summary for the Mayor.

**Key insight**: Notification sent exactly once because the wisp gets closed.
Next patrol cycle finds no open swarm wisps, so this step is skipped."""

[[steps]]
id = "ping-deacon"
title = "Ping Deacon for health check"
needs = ["check-swarm-completion"]
description = """
Send WITNESS_PING to Deacon for second-order monitoring.

The Witness fleet collectively monitors Deacon health - this prevents the
"who watches the watchers" problem. If Deacon dies, Witnesses detect it.

**Step 1: Send ping**
```bash
gt mail send deacon/ -s "WITNESS_PING <rig>" -m "Rig: <rig>
Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)
Patrol: <cycle-number>"
```

**Step 2: Check Deacon health**
```bash
# Check Deacon agent bead for last_activity
bd list --type=agent --json | jq '.[] | select(.description | contains("deacon"))'
```

Look at the `last_activity` timestamp. If stale (>5 minutes since last update):
- Deacon may be dead or stuck

**Step 3: Escalate if needed**
```bash
# If Deacon appears down
gt mail send mayor/ -s "ALERT: Deacon appears unresponsive" \
  -m "No Deacon activity for >5 minutes.
Last seen: <timestamp>
Witness: <rig>/witness"
```

Note: Multiple Witnesses may send this alert. Mayor should handle deduplication."""

[[steps]]
id = "context-check"
title = "Check own context limit"
needs = ["ping-deacon"]
description = """
Check own context usage.

If context is HIGH (>80%):
- Ensure any notes are written to handoff mail
- Prepare for session restart

If context is LOW:
- Can continue patrolling"""

[[steps]]
id = "loop-or-exit"
title = "Loop or exit for respawn"
needs = ["context-check"]
description = """
End of patrol cycle decision.

**If context LOW**:
- Sleep briefly to avoid tight loop (30-60 seconds)
- Return to inbox-check step
- Continue patrolling

**If context HIGH**:
- Write handoff mail to self with any notable observations:
```bash
gt handoff -s "Witness patrol handoff" -m "<observations>"
```
- Exit cleanly (daemon respawns fresh Witness)

The daemon ensures Witness is always running."""