Added WITNESS_PING protocol for monitoring Deacon health: Witness patrol (mol-witness-patrol): - Added ping-deacon step after survey-workers - Sends WITNESS_PING mail to Deacon each patrol cycle - Checks Deacon agent bead last_activity timestamp - Escalates to Mayor if Deacon appears unresponsive Deacon patrol (mol-deacon-patrol): - Added WITNESS_PING handling in inbox-check - Added second-order monitoring section to description - Bumped formula version to 2 This prevents the "who watches the watchers" problem - if Deacon dies, the collective Witness fleet detects it and escalates. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
326 lines
9.1 KiB
TOML
326 lines
9.1 KiB
TOML
description = """
|
|
Per-rig worker monitor patrol loop.
|
|
|
|
The Witness is the Pit Boss for your rig. You watch polecats, nudge them toward
|
|
completion, verify clean git state before kills, and escalate stuck workers.
|
|
|
|
**You do NOT do implementation work.** Your job is oversight, not coding.
|
|
|
|
## Design Philosophy
|
|
|
|
This patrol follows Gas Town principles:
|
|
- **Discovery over tracking**: Observe reality each cycle, don't maintain state
|
|
- **Events over state**: POLECAT_DONE mail triggers cleanup wisps
|
|
- **Cleanup wisps as finalizers**: Pending cleanups are wisps, not queue entries
|
|
- **Task tool for parallelism**: Subagents inspect polecats, not molecule arms
|
|
|
|
## Patrol Shape (Linear, Deacon-style)
|
|
|
|
```
|
|
inbox-check ─► process-cleanups ─► check-refinery ─► survey-workers
|
|
│
|
|
┌──────────────────────────────────────────────────┘
|
|
▼
|
|
ping-deacon ─► context-check ─► loop-or-exit
|
|
```
|
|
|
|
No dynamic arms. No fanout gates. No persistent nudge counters.
|
|
State is discovered each cycle from reality (tmux, beads, mail)."""
|
|
formula = "mol-witness-patrol"
|
|
version = 1
|
|
|
|
[[steps]]
|
|
id = "inbox-check"
|
|
title = "Process witness mail"
|
|
description = """
|
|
Check inbox and handle messages.
|
|
|
|
```bash
|
|
gt mail inbox
|
|
```
|
|
|
|
For each message:
|
|
|
|
**POLECAT_DONE / LIFECYCLE:Shutdown**:
|
|
Create a cleanup wisp for this polecat:
|
|
```bash
|
|
bd create --wisp --title "cleanup:<polecat>" \
|
|
--description "Verify and cleanup polecat <name>" \
|
|
--labels cleanup,polecat:<name>,state:pending
|
|
```
|
|
The wisp's existence IS the pending cleanup. Process in next step.
|
|
Mark mail as read.
|
|
|
|
**MERGED**:
|
|
A branch was merged successfully. Complete the cleanup.
|
|
```bash
|
|
# Find the cleanup wisp for this polecat
|
|
bd list --wisp --labels=polecat:<name>,state:merge-requested --status=open
|
|
|
|
# If found, proceed with polecat removal
|
|
gt session kill <rig>/polecats/<name>
|
|
|
|
# Burn the cleanup wisp
|
|
bd close <wisp-id>
|
|
```
|
|
Mark mail as read.
|
|
|
|
**HELP / Blocked**:
|
|
Assess the request. Can you help? If not, escalate to Mayor:
|
|
```bash
|
|
gt mail send mayor/ -s "Escalation: <polecat> needs help" -m "<details>"
|
|
```
|
|
|
|
**HANDOFF**:
|
|
Read predecessor context. Continue from where they left off."""
|
|
|
|
[[steps]]
|
|
id = "process-cleanups"
|
|
title = "Process pending cleanup wisps"
|
|
needs = ["inbox-check"]
|
|
description = """
|
|
Find and process cleanup wisps (the finalizer pattern).
|
|
|
|
```bash
|
|
# Find all cleanup wisps
|
|
bd list --wisp --labels=cleanup --status=open
|
|
```
|
|
|
|
For each cleanup wisp, check its state label:
|
|
|
|
## State: pending (needs verification → MERGE_READY)
|
|
|
|
1. **Extract polecat name** from wisp title/labels
|
|
|
|
2. **Pre-kill verification**:
|
|
```bash
|
|
cd polecats/<name>
|
|
git status # Must be clean
|
|
git log origin/main..HEAD # Commits should be pushed
|
|
bd show <assigned-issue> # Issue closed or deferred
|
|
```
|
|
|
|
3. **Get branch and issue info**:
|
|
```bash
|
|
# Get current branch
|
|
git rev-parse --abbrev-ref HEAD
|
|
|
|
# Get the hook_bead from agent bead
|
|
bd show <agent-bead> # Look for hook_bead field
|
|
```
|
|
|
|
4. **Verify productive work** (ZFC - you make the call):
|
|
- Check git log for commits mentioning the issue
|
|
- Legitimate exceptions: already fixed, duplicate, deferred
|
|
- If closing as 'done' with no commits, flag for review
|
|
|
|
5. **If clean**: Send MERGE_READY to refinery
|
|
```bash
|
|
gt mail send <rig>/refinery -s "MERGE_READY <polecat>" -m "Branch: <branch>
|
|
Issue: <issue-id>
|
|
Polecat: <polecat>
|
|
Verified: clean git state, issue closed"
|
|
```
|
|
Then update the wisp to merge-requested state:
|
|
```bash
|
|
bd update <wisp-id> --labels cleanup,polecat:<name>,state:merge-requested
|
|
```
|
|
**Do NOT kill the polecat yet** - wait for MERGED confirmation from refinery.
|
|
|
|
6. **If dirty**: Leave wisp open, log the issue, retry next cycle.
|
|
|
|
## State: merge-requested (waiting for refinery)
|
|
|
|
Skip - waiting for MERGED mail from refinery. The inbox-check step handles
|
|
MERGED messages and completes these cleanup wisps.
|
|
|
|
**Parallelism**: Use Task tool subagents to process multiple cleanups concurrently.
|
|
Each cleanup is independent - perfect for parallel execution."""
|
|
|
|
[[steps]]
|
|
id = "check-refinery"
|
|
title = "Ensure refinery is alive"
|
|
needs = ["process-cleanups"]
|
|
description = """
|
|
Ensure the refinery is alive and processing merge requests.
|
|
|
|
```bash
|
|
# Check if refinery session exists
|
|
gt session status <rig>/refinery
|
|
|
|
# Check for pending merge requests
|
|
bd list --type=merge-request --status=open
|
|
```
|
|
|
|
If MRs waiting AND refinery not running:
|
|
```bash
|
|
gt session start <rig>/refinery
|
|
gt mail send <rig>/refinery -s "PATROL: Wake up" \
|
|
-m "Merge requests in queue. Please process."
|
|
```
|
|
|
|
If refinery running but queue stale (>30 min), send nudge."""
|
|
|
|
[[steps]]
|
|
id = "survey-workers"
|
|
title = "Inspect all active polecats"
|
|
needs = ["check-refinery"]
|
|
description = """
|
|
Survey all polecats using agent beads (ZFC: trust what agents report).
|
|
|
|
**Step 1: List polecat agent beads**
|
|
|
|
```bash
|
|
bd list --type=agent --json
|
|
```
|
|
|
|
Filter the JSON output for entries where description contains `role_type: polecat`.
|
|
Each polecat agent bead has fields in its description:
|
|
- `role_type: polecat`
|
|
- `rig: <rig-name>`
|
|
- `agent_state: running|idle|stuck|done`
|
|
- `hook_bead: <current-work-id>`
|
|
|
|
**Step 2: For each polecat, check agent_state**
|
|
|
|
| agent_state | Meaning | Action |
|
|
|-------------|---------|--------|
|
|
| running | Actively working | Check progress (Step 3) |
|
|
| idle | No work assigned | Skip (no action needed) |
|
|
| stuck | Self-reported stuck | Handle stuck protocol |
|
|
| done | Work complete | Verify cleanup triggered (see Step 4a) |
|
|
|
|
**Step 3: For running polecats, assess progress**
|
|
|
|
Check the hook_bead field to see what they're working on:
|
|
```bash
|
|
bd show <hook_bead> # See current step/issue
|
|
```
|
|
|
|
You can also verify they're responsive:
|
|
```bash
|
|
tmux capture-pane -t gt-<rig>-<name> -p | tail -20
|
|
```
|
|
|
|
Look for:
|
|
- Recent tool activity → making progress
|
|
- Idle at prompt → may need nudge
|
|
- Error messages → may need help
|
|
|
|
**Step 4: Decide action**
|
|
|
|
| Observation | Action |
|
|
|-------------|--------|
|
|
| agent_state=running, recent activity | None |
|
|
| agent_state=running, idle 5-15 min | Gentle nudge |
|
|
| agent_state=running, idle 15+ min | Direct nudge with deadline |
|
|
| agent_state=stuck | Assess and help or escalate |
|
|
| agent_state=done | Verify cleanup triggered (see Step 4a) |
|
|
|
|
**Step 4a: Handle agent_state=done**
|
|
|
|
Check if a cleanup wisp exists for this polecat:
|
|
```bash
|
|
bd list --wisp --labels=polecat:<name> --status=open
|
|
```
|
|
|
|
If cleanup wisp exists:
|
|
- state:pending → Will be processed in process-cleanups
|
|
- state:merge-requested → Waiting for refinery MERGED response
|
|
|
|
If NO cleanup wisp exists (POLECAT_DONE mail missed):
|
|
Create one to trigger the cleanup flow:
|
|
```bash
|
|
bd create --wisp --title "cleanup:<polecat>" \
|
|
--description "Discovered done polecat without cleanup wisp" \
|
|
--labels cleanup,polecat:<name>,state:pending
|
|
```
|
|
This ensures done polecats eventually get cleaned up even if mail was lost.
|
|
|
|
**Step 5: Execute nudges**
|
|
```bash
|
|
gt nudge <rig>/polecats/<name> "How's progress? Need help?"
|
|
```
|
|
|
|
**Step 6: Escalate if needed**
|
|
```bash
|
|
gt mail send mayor/ -s "Escalation: <polecat> stuck" \\
|
|
-m "Polecat <name> reports stuck. Please intervene."
|
|
```
|
|
|
|
**Parallelism**: Use Task tool subagents to inspect multiple polecats concurrently.
|
|
|
|
**ZFC Principle**: Trust agent_state from beads. Don't infer state from PID/tmux."""
|
|
|
|
[[steps]]
|
|
id = "ping-deacon"
|
|
title = "Ping Deacon for health check"
|
|
needs = ["survey-workers"]
|
|
description = """
|
|
Send WITNESS_PING to Deacon for second-order monitoring.
|
|
|
|
The Witness fleet collectively monitors Deacon health - this prevents the
|
|
"who watches the watchers" problem. If Deacon dies, Witnesses detect it.
|
|
|
|
**Step 1: Send ping**
|
|
```bash
|
|
gt mail send deacon/ -s "WITNESS_PING <rig>" -m "Rig: <rig>
|
|
Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)
|
|
Patrol: <cycle-number>"
|
|
```
|
|
|
|
**Step 2: Check Deacon health**
|
|
```bash
|
|
# Check Deacon agent bead for last_activity
|
|
bd list --type=agent --json | jq '.[] | select(.description | contains("deacon"))'
|
|
```
|
|
|
|
Look at the `last_activity` timestamp. If stale (>5 minutes since last update):
|
|
- Deacon may be dead or stuck
|
|
|
|
**Step 3: Escalate if needed**
|
|
```bash
|
|
# If Deacon appears down
|
|
gt mail send mayor/ -s "ALERT: Deacon appears unresponsive" \
|
|
-m "No Deacon activity for >5 minutes.
|
|
Last seen: <timestamp>
|
|
Witness: <rig>/witness"
|
|
```
|
|
|
|
Note: Multiple Witnesses may send this alert. Mayor should handle deduplication."""
|
|
|
|
[[steps]]
|
|
id = "context-check"
|
|
title = "Check own context limit"
|
|
needs = ["ping-deacon"]
|
|
description = """
|
|
Check own context usage.
|
|
|
|
If context is HIGH (>80%):
|
|
- Ensure any notes are written to handoff mail
|
|
- Prepare for session restart
|
|
|
|
If context is LOW:
|
|
- Can continue patrolling"""
|
|
|
|
[[steps]]
|
|
id = "loop-or-exit"
|
|
title = "Loop or exit for respawn"
|
|
needs = ["context-check"]
|
|
description = """
|
|
End of patrol cycle decision.
|
|
|
|
**If context LOW**:
|
|
- Sleep briefly to avoid tight loop (30-60 seconds)
|
|
- Return to inbox-check step
|
|
- Continue patrolling
|
|
|
|
**If context HIGH**:
|
|
- Write handoff mail to self with any notable observations:
|
|
```bash
|
|
gt handoff -s "Witness patrol handoff" -m "<observations>"
|
|
```
|
|
- Exit cleanly (daemon respawns fresh Witness)
|
|
|
|
The daemon ensures Witness is always running."""
|