gastown/.beads/formulas/mol-witness-patrol.formula.toml

description = "Per-rig worker monitor patrol loop.\n\nThe Witness is the Pit Boss for your rig. You watch polecats, nudge them toward\ncompletion, verify clean git state before kills, and escalate stuck workers.\n\n**You do NOT do implementation work.** Your job is oversight, not coding.\n\n## Ephemeral Polecat Model\n\nPolecats are truly ephemeral - done at MR submission, recyclable immediately:\n\n```\nPolecat lifecycle: spawning → working → mr_submitted → nuked\nMR lifecycle:      created → queued → processed → merged (Refinery handles)\n```\n\nOnce a polecat's branch is pushed (cleanup_status=clean), the polecat can be\nnuked immediately. The MR continues independently in the Refinery. If conflicts\narise, Refinery creates a NEW conflict-resolution task for a NEW polecat.\n\n**Key principle**: Polecat lifecycle is separate from MR lifecycle.\n\n## Design Philosophy\n\nThis patrol follows Gas Town principles:\n- **Discovery over tracking**: Observe reality each cycle, don't maintain state\n- **Events over state**: POLECAT_DONE mail triggers immediate cleanup\n- **Ephemeral by default**: Clean polecats are nuked immediately, no waiting\n- **Cleanup wisps for exceptions**: Only created when intervention needed\n- **Task tool for parallelism**: Subagents inspect polecats, not molecule arms\n\n## Patrol Shape (Linear, Deacon-style)\n\n```\ninbox-check ─► process-cleanups ─► check-refinery ─► survey-workers\n                                                            │\n         ┌──────────────────────────────────────────────────┘\n         ▼\n  check-timer-gates ─► check-swarm ─► ping-deacon ─► patrol-cleanup ─► context-check ─► loop-or-exit\n```\n\nNo dynamic arms. No fanout gates. No persistent nudge counters.\nState is discovered each cycle from reality (tmux, beads, mail)."
formula = 'mol-witness-patrol'
version = 2

[[steps]]
description = "Check inbox and handle messages.\n\n```bash\ngt mail inbox\n```\n\nFor each message:\n\n**POLECAT_STARTED**:\nA new polecat has started working. Acknowledge and archive.\n```bash\n# Acknowledge startup (optional: log for activity tracking)\ngt mail archive <message-id>\n```\nNo action needed beyond acknowledgment - archive immediately.\n\n**POLECAT_DONE / LIFECYCLE:Shutdown**:\n\n*EPHEMERAL MODEL*: Polecats are truly ephemeral - done at MR submission,\nrecyclable immediately. Once the branch is pushed (cleanup_status=clean),\nthe polecat can be nuked. The MR lifecycle continues independently in the\nRefinery. If conflicts arise, Refinery creates a NEW conflict-resolution\ntask for a NEW polecat.\n\nPolecat lifecycle: spawning → working → mr_submitted → nuked\nMR lifecycle: created → queued → processed → merged (handled by Refinery)\n\nThe handler (HandlePolecatDone) will:\n1. Check cleanup_status from agent bead\n2. If \"clean\" (branch pushed): AUTO-NUKE immediately, archive mail\n3. If dirty: Create cleanup wisp for manual intervention\n\n```bash\n# The handler does this automatically:\n# - For clean state: gt polecat nuke <name> → archive mail\n# - For dirty state: create wisp → process in next step\n```\n\nCleanup wisps are only created when something is wrong (uncommitted changes,\nunpushed commits). Most POLECAT_DONE messages result in immediate nuke.\n\n**MERGED**:\nA branch was merged successfully. This is informational in the ephemeral model\nsince the polecat was already nuked after MR submission.\n\nIf a cleanup wisp exists (dirty state), complete the cleanup:\n```bash\n# Find the cleanup wisp for this polecat\nbd list --wisp --labels=polecat:<name>,state:merge-requested --status=open\n\n# If found, proceed with full polecat nuke:\ngt polecat nuke <name>\n\n# Burn the cleanup wisp\nbd close <wisp-id>\n```\nArchive after cleanup is complete.\n\n**HELP / Blocked**:\nAssess the request. Can you help? If not, escalate to Mayor:\n```bash\ngt mail send mayor/ -s \"Escalation: <polecat> needs help\" -m \"<details>\"\n```\nArchive after handling (escalated or resolved):\n```bash\ngt mail archive <message-id>\n```\n\n**HANDOFF**:\nRead predecessor context. Continue from where they left off.\nArchive after absorbing context:\n```bash\ngt mail archive <message-id>\n```\n\n**SWARM_START**:\nMayor initiating batch polecat work. Initialize swarm tracking.\n```bash\n# Parse swarm info from mail body: {\"swarm_id\": \"batch-123\", \"beads\": [\"bd-a\", \"bd-b\"]}\nbd create --wisp --title \"swarm:<swarm_id>\" --description \"Tracking batch: <swarm_id>\" --labels swarm,swarm_id:<swarm_id>,total:<N>,completed:0,start:<timestamp>\n```\nArchive after creating swarm tracking wisp:\n```bash\ngt mail archive <message-id>\n```\n\n**Hygiene principle**: Archive messages after they're fully processed.\nKeep only: active work, unprocessed requests. Inbox should be near-empty."
id = 'inbox-check'
title = 'Process witness mail'

[[steps]]
description = "Process cleanup wisps (exception handling for dirty polecats).\n\nIn the ephemeral model, cleanup wisps are only created when a polecat has\ndirty state (uncommitted changes, unpushed commits) that prevented immediate\nnuke. Most polecats are nuked immediately on POLECAT_DONE and never create wisps.\n\n```bash\n# Find all cleanup wisps\nbd list --wisp --labels=cleanup --status=open\n```\n\nIf no wisps, skip this step (most common case in ephemeral model).\n\nFor each cleanup wisp, investigate and resolve the dirty state:\n\n## State: pending (needs investigation)\n\n1. **Extract polecat name** from wisp title/labels\n\n2. **Diagnose the problem**:\n```bash\ncd polecats/<name>\ngit status                    # What's uncommitted?\ngit stash list                # Any stashed work?\ngit log origin/main..HEAD     # Any unpushed commits?\n```\n\n3. **Resolution options**:\n   - **Uncommitted changes**: Commit and push, then nuke\n   - **Stashed work**: Pop and commit, or discard if not valuable\n   - **Unpushed commits**: Push to origin, then nuke\n   - **All valuable work lost**: Escalate to Mayor for recovery\n\n4. **If resolvable locally**: Fix and nuke\n```bash\n# Example: push unpushed commits\ngit push origin HEAD\n\n# Then nuke\ngt polecat nuke <name>\n\n# Close the wisp\nbd close <wisp-id> --reason \"Resolved: pushed commits, nuked\"\n```\n\n5. **If needs escalation**: Send RECOVERY_NEEDED to Mayor\n```bash\ngt mail send mayor/ -s \"RECOVERY_NEEDED <rig>/<polecat>\" \\\n  -m \"Cleanup Status: <status>\nBranch: <branch>\nIssue: <issue-id>\n\nCannot auto-resolve. Please advise.\"\n```\nLeave wisp open until Mayor resolves.\n\n## State: merge-requested (legacy, rare)\n\nThis state was used before the ephemeral model. If found, the polecat is\nwaiting for a MERGED signal. The inbox-check step handles these.\n\n**Parallelism**: Use Task tool subagents to process multiple cleanups concurrently.\nEach cleanup is independent - perfect for parallel execution."
id = 'process-cleanups'
needs = ['inbox-check']
title = 'Process pending cleanup wisps'

[[steps]]
description = "Ensure the refinery is alive and processing merge requests.\n\n```bash\n# Check if refinery session exists\ngt session status <rig>/refinery\n\n# Check for pending merge requests\nbd list --type=merge-request --status=open\n```\n\nIf MRs waiting AND refinery not running:\n```bash\ngt session start <rig>/refinery\ngt mail send <rig>/refinery -s \"PATROL: Wake up\" -m \"Merge requests in queue. Please process.\"\n```\n\nIf refinery running but queue stale (>30 min), send nudge."
id = 'check-refinery'
needs = ['process-cleanups']
title = 'Ensure refinery is alive'

[[steps]]
description = "Survey all polecats using agent beads (ZFC: trust what agents report).\n\n**Step 1: List polecat agent beads**\n\n```bash\nbd list --type=agent --json\n```\n\nFilter the JSON output for entries where description contains `role_type: polecat`.\nEach polecat agent bead has fields in its description:\n- `role_type: polecat`\n- `rig: <rig-name>`\n- `agent_state: running|idle|stuck|done`\n- `hook_bead: <current-work-id>`\n\n**Step 2: For each polecat, check agent_state**\n\n| agent_state | Meaning | Action |\n|-------------|---------|--------|\n| running | Actively working | Check progress (Step 3) |\n| idle | No work assigned | Auto-nuke if clean (Step 3a) |\n| stuck | Self-reported stuck | Handle stuck protocol |\n| done | Work complete | Verify cleanup triggered (see Step 4a) |\n\n**Step 3: For running polecats, assess progress**\n\nCheck the hook_bead field to see what they're working on:\n```bash\nbd show <hook_bead>  # See current step/issue\n```\n\nYou can also verify they're responsive:\n```bash\ntmux capture-pane -t gt-<rig>-<name> -p | tail -20\n```\n\nLook for:\n- Recent tool activity → making progress\n- Idle at prompt → may need nudge\n- Error messages → may need help\n\n**Step 3a: For idle polecats, auto-nuke if clean**\n\nWhen agent_state=idle, the polecat has no work assigned. Check if it's safe to nuke:\n\n```bash\n# Check git status in the polecat's worktree\ncd polecats/<name>\ngit status --porcelain         # Should be empty (clean)\ngit log origin/main..HEAD      # Should have no unpushed commits\n```\n\n**If clean** (no uncommitted changes, no unpushed commits):\n```bash\n# Safe to nuke - no work to lose\ngt polecat nuke <name>\n```\nLog the auto-nuke for audit purposes. No escalation needed.\n\n**If dirty** (uncommitted or unpushed work):\n```bash\n# Escalate to Mayor - polecat has work that might be valuable\ngt mail send mayor/ -s \\\"IDLE_DIRTY: <polecat> has uncommitted work\\\" \\\n  -m \\\"Polecat: <name>\nState: idle (no hook_bead)\nGit status: <uncommitted-files>\nUnpushed commits: <count>\n\nPlease advise: recover work or discard?\\\"\n```\n\n**Rationale**: Idle polecats with clean git state are pure overhead. They have\nno work and no state worth preserving. Nuking them immediately frees resources\nand reduces noise. Only escalate when there's actual work at risk.\n\n**Step 4: Decide action**\n\n| Observation | Action |\n|-------------|--------|\n| agent_state=running, recent activity | None |\n| agent_state=running, idle 5-15 min | Gentle nudge |\n| agent_state=running, idle 15+ min | Direct nudge with deadline |\n| agent_state=stuck | Assess and help or escalate |\n| agent_state=done | Verify cleanup triggered (see Step 4a) |\n\n**Step 4a: Handle agent_state=done**\n\nIn the ephemeral model, polecats with agent_state=done and cleanup_status=clean\nshould already be nuked by HandlePolecatDone. Finding one here indicates:\n\n1. **Stale agent bead** - polecat was nuked but bead remains\n   ```bash\n   # Verify polecat doesn't exist anymore\n   ls polecats/<name> 2>/dev/null || echo \"Already nuked\"\n   ```\n   If nuked, the agent bead is stale. Clean it up or ignore.\n\n2. **Cleanup wisp exists** - polecat has dirty state needing intervention\n   ```bash\n   bd list --wisp --labels=polecat:<name> --status=open\n   ```\n   Process in process-cleanups step.\n\n3. **No wisp, polecat exists** - POLECAT_DONE mail was missed\n   Try auto-nuke directly (ephemeral model):\n   ```bash\n   # Check cleanup_status and nuke if clean\n   gt polecat nuke <name>  # Will fail if dirty\n   ```\n   If nuke fails (dirty state), create cleanup wisp for investigation.\n\n**Step 5: Execute nudges**\n```bash\ngt nudge <rig>/polecats/<name> \"How's progress? Need help?\"\n```\n\n**Step 6: Escalate if needed**\n```bash\ngt mail send mayor/ -s \"Escalation: <polecat> stuck\" \\\n  -m \"Polecat <name> reports stuck. Please intervene.\"\n```\n\n**Parallelism**: Use Task tool subagents to inspect multiple polecats concurrently.\n\n**ZFC Principle**: Trust agent_state from beads. Don't infer state from PID/tmux."
id = 'survey-workers'
needs = ['check-refinery']
title = 'Inspect all active polecats'

[[steps]]
description = "Check for expired timer gates and escalate as needed.\n\nTimer gates are async wait conditions with a timeout. When the timeout expires,\nthe gate should be escalated to the overseer for human intervention.\n\n**Step 1: Run timer gate check**\n```bash\nbd gate check --type=timer --escalate\n```\n\nThis command:\n1. Finds all open gate issues with await_type=timer\n2. Checks if `now > created_at + timeout`\n3. Escalates expired gates via `gt escalate` (HIGH severity)\n4. Reports summary of gate status\n\n**Step 2: Review output**\n\nIf expired gates were found and escalated:\n- The escalation creates an audit trail bead\n- Overseer will be notified via mail\n- Gate remains open until manually resolved\n\nIf no expired gates:\n- Continue patrol normally\n\n**Note**: Timer gates do NOT auto-close on expiration. They escalate.\nThis ensures human oversight of timeout conditions.\n\n**Parallelism**: This is a single command, no parallel execution needed."
id = 'check-timer-gates'
needs = ['survey-workers']
title = 'Check timer gates for expiration'

[[steps]]
description = "If Mayor started a batch (SWARM_START), check if all polecats have completed.\n\n**Step 1: Find active swarm tracking wisps**\n```bash\nbd list --wisp --labels=swarm --status=open\n```\nIf no active swarm, skip this step.\n\n**Step 2: Count completed polecats for this swarm**\n\nExtract from wisp labels: swarm_id, total, completed, start timestamp.\nCheck how many cleanup wisps have been closed for this swarm's polecats.\n\n**Step 3: If all complete, notify Mayor**\n```bash\ngt mail send mayor/ -s \"SWARM_COMPLETE: <swarm_id>\" -m \"All <total> polecats merged.\nDuration: <minutes> minutes\nSwarm: <swarm_id>\"\n\n# Close the swarm tracking wisp\nbd close <swarm-wisp-id> --reason \"All polecats merged\"\n```\n\nNote: Runs every patrol cycle. Notification sent exactly once when all complete."
id = 'check-swarm-completion'
needs = ['check-timer-gates']
title = 'Check if active swarm is complete'

[[steps]]
description = "Send WITNESS_PING to Deacon for second-order monitoring.\n\nThe Witness fleet collectively monitors Deacon health - this prevents the\n\"who watches the watchers\" problem. If Deacon dies, Witnesses detect it.\n\n**Step 1: Send ping**\n```bash\ngt mail send deacon/ -s \"WITNESS_PING <rig>\" -m \"Rig: <rig>\nTimestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)\nPatrol: <cycle-number>\"\n```\n\n**Step 2: Check Deacon health**\n```bash\n# Check Deacon agent bead for last_activity\nbd list --type=agent --json | jq '.[] | select(.description | contains(\"deacon\"))'\n```\n\nLook at the `last_activity` timestamp. If stale (>5 minutes since last update):\n- Deacon may be dead or stuck\n\n**Step 3: Escalate if needed**\n```bash\n# If Deacon appears down\ngt mail send mayor/ -s \"ALERT: Deacon appears unresponsive\" -m \"No Deacon activity for >5 minutes.\nLast seen: <timestamp>\nWitness: <rig>/witness\"\n```\n\nNote: Multiple Witnesses may send this alert. Mayor should handle deduplication."
id = 'ping-deacon'
needs = ['check-swarm-completion']
title = 'Ping Deacon for health check'

[[steps]]
description = "Verify inbox hygiene before ending patrol cycle.\n\n**Step 1: Check inbox state**\n```bash\ngt mail inbox\n```\n\nIn the ephemeral model, most POLECAT_DONE messages are handled immediately\n(auto-nuke) and archived. Inbox should contain ONLY:\n- Unprocessed messages (just arrived, will handle next cycle)\n- MERGED notifications (informational, archive after reading)\n\n**Step 2: Archive any stale messages**\n\nLook for messages that were processed but not archived:\n- POLECAT_STARTED older than this cycle → archive\n- POLECAT_DONE that was auto-nuked → should be archived already\n- MERGED notifications → archive after acknowledging\n- HELP/Blocked that was escalated → archive\n- SWARM_START that created tracking wisp → archive\n\n```bash\n# For each stale message found:\ngt mail archive <message-id>\n```\n\n**Step 3: Verify cleanup wisp hygiene**\n\nIn the ephemeral model, cleanup wisps should be rare (only for dirty polecats):\n```bash\nbd list --wisp --labels=cleanup --status=open\n```\n\n- state:pending → Needs investigation in process-cleanups\n- state:merge-requested → Legacy state, handle in inbox-check\n\nIf cleanup wisps are accumulating, investigate why polecats aren't clean.\n\n**Goal**: Inbox should be nearly empty. Cleanup wisps should be rare."
id = 'patrol-cleanup'
needs = ['ping-deacon']
title = 'End-of-cycle inbox hygiene'

[[steps]]
description = "Check own context usage.\n\nIf context is HIGH (>80%):\n- Ensure any notes are written to handoff mail\n- Prepare for session restart\n\nIf context is LOW:\n- Can continue patrolling"
id = 'context-check'
needs = ['patrol-cleanup']
title = 'Check own context limit'

[[steps]]
description = "End of patrol cycle decision.\n\n**If context LOW** (can continue patrolling):\n1. Generate a brief summary of this patrol cycle\n2. Squash the current wisp:\n```bash\nbd mol squash <mol-id> --summary \"<patrol-summary>\"\n```\n3. Create a new patrol wisp:\n```bash\nbd mol wisp mol-witness-patrol\n```\n4. Continue executing from the inbox-check step of the new wisp\n\n**If context HIGH** (approaching limit):\n1. Write handoff mail with notable observations:\n```bash\ngt handoff -s \"Witness patrol handoff\" -m \"<observations>\"\n```\n2. Exit cleanly - the daemon will respawn a fresh Witness session\n\n**IMPORTANT**: You must either create a new wisp (context LOW) or exit (context HIGH).\nNever leave the session idle without work on your hook."
id = 'loop-or-exit'
needs = ['context-check']
title = 'Loop or exit for respawn'