Files
gastown/.beads/formulas/mol-deacon-patrol.formula.toml
Steve Yegge 5838d4cd1b Witness pings Deacon for second-order monitoring (gt-5v8ls)
Added WITNESS_PING protocol for monitoring Deacon health:

Witness patrol (mol-witness-patrol):
- Added ping-deacon step after survey-workers
- Sends WITNESS_PING mail to Deacon each patrol cycle
- Checks Deacon agent bead last_activity timestamp
- Escalates to Mayor if Deacon appears unresponsive

Deacon patrol (mol-deacon-patrol):
- Added WITNESS_PING handling in inbox-check
- Added second-order monitoring section to description
- Bumped formula version to 2

This prevents the "who watches the watchers" problem - if Deacon dies,
the collective Witness fleet detects it and escalates.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:02:55 -08:00

291 lines
8.3 KiB
TOML

description = """
Mayor's daemon patrol loop.
The Deacon is the Mayor's background process that runs continuously, handling callbacks, monitoring rig health, and performing cleanup. Each patrol cycle runs these steps in sequence, then loops or exits.
## Second-Order Monitoring
Witnesses send WITNESS_PING messages to verify the Deacon is alive. This
prevents the "who watches the watchers" problem - if the Deacon dies,
Witnesses detect it and escalate to the Mayor.
The Deacon's agent bead last_activity timestamp is updated during each patrol
cycle. Witnesses check this timestamp to verify health."""
formula = "mol-deacon-patrol"
version = 2
[[steps]]
id = "inbox-check"
title = "Handle callbacks from agents"
description = """
Handle callbacks from agents.
Check the Mayor's inbox for messages from:
- Witnesses reporting polecat status
- Refineries reporting merge results
- Polecats requesting help or escalation
- External triggers (webhooks, timers)
```bash
gt mail inbox
# For each message:
gt mail read <id>
# Handle based on message type
```
**WITNESS_PING**:
Witnesses periodically ping to verify Deacon is alive. Simply acknowledge
and mark as read - the fact that you're processing mail proves you're running.
Your agent bead last_activity is updated automatically during patrol.
**HELP / Escalation**:
Assess and handle or forward to Mayor.
**LIFECYCLE messages**:
Polecats reporting completion, refineries reporting merge results.
Callbacks may spawn new polecats, update issue state, or trigger other actions."""
[[steps]]
id = "trigger-pending-spawns"
title = "Nudge newly spawned polecats"
needs = ["inbox-check"]
description = """
Nudge newly spawned polecats that are ready for input.
When polecats are spawned, their Claude session takes 10-20 seconds to initialize. The spawn command returns immediately without waiting. This step finds spawned polecats that are now ready and sends them a trigger to start working.
**ZFC-Compliant Observation** (AI observes AI):
```bash
# View pending spawns with captured terminal output
gt deacon pending
```
For each pending session, analyze the captured output:
- Look for Claude's prompt indicator "> " at the start of a line
- If prompt is visible, Claude is ready for input
- Make the judgment call yourself - you're the AI observer
For each ready polecat:
```bash
# 1. Trigger the polecat
gt nudge <session> "Begin."
# 2. Clear from pending list
gt deacon pending <session>
```
This triggers the UserPromptSubmit hook, which injects mail so the polecat sees its assignment.
**Bootstrap mode** (daemon-only, no AI available):
The daemon uses `gt deacon trigger-pending` with regex detection. This ZFC violation is acceptable during cold startup when no AI agent is running yet."""
[[steps]]
id = "gate-evaluation"
title = "Evaluate pending async gates"
needs = ["inbox-check"]
description = """
Evaluate pending async gates.
Gates are async coordination primitives that block until conditions are met.
The Deacon is responsible for monitoring gates and closing them when ready.
**Timer gates** (await_type: timer):
Check if elapsed time since creation exceeds the timeout duration.
```bash
# List all open gates
bd gate list --json
# For each timer gate, check if elapsed:
# - CreatedAt + Timeout < Now → gate is ready to close
# - Close with: bd gate close <id> --reason "Timer elapsed"
```
**GitHub gates** (await_type: gh:run, gh:pr) - handled in separate step.
**Human/Mail gates** - require external input, skip here.
After closing a gate, the Waiters field contains mail addresses to notify.
Send a brief notification to each waiter that the gate has cleared."""
[[steps]]
id = "health-scan"
title = "Check Witness and Refinery health"
needs = ["trigger-pending-spawns", "gate-evaluation"]
description = """
Check Witness and Refinery health for each rig.
**ZFC Principle**: You (Claude) make the judgment call about what is "stuck" or "unresponsive" - there are no hardcoded thresholds in Go. Read the signals, consider context, and decide.
For each rig, run:
```bash
gt witness status <rig>
gt refinery status <rig>
```
**Signals to assess:**
| Component | Healthy Signals | Concerning Signals |
|-----------|-----------------|-------------------|
| Witness | State: running, recent activity | State: not running, no heartbeat |
| Refinery | State: running, queue processing | Queue stuck, merge failures |
**Tracking unresponsive cycles:**
Maintain in your patrol state (persisted across cycles):
```
health_state:
<rig>:
witness:
unresponsive_cycles: 0
last_seen_healthy: <timestamp>
refinery:
unresponsive_cycles: 0
last_seen_healthy: <timestamp>
```
**Decision matrix** (you decide the thresholds based on context):
| Cycles Unresponsive | Suggested Action |
|---------------------|------------------|
| 1-2 | Note it, check again next cycle |
| 3-4 | Attempt restart: gt witness restart <rig> |
| 5+ | Escalate to Mayor with context |
**Restart commands:**
```bash
gt witness restart <rig>
gt refinery restart <rig>
```
**Escalation:**
```bash
gt mail send mayor/ -s "Health: <rig> <component> unresponsive" \\
-m "Component has been unresponsive for N cycles. Restart attempts failed.
Last healthy: <timestamp>
Error signals: <details>"
```
Reset unresponsive_cycles to 0 when component responds normally."""
[[steps]]
id = "plugin-run"
title = "Execute registered plugins"
needs = ["health-scan"]
description = """
Execute registered plugins.
Scan ~/gt/plugins/ for plugin directories. Each plugin has a plugin.md with YAML frontmatter defining its gate (when to run) and instructions (what to do).
See docs/deacon-plugins.md for full documentation.
Gate types:
- cooldown: Time since last run (e.g., 24h)
- cron: Schedule-based (e.g., "0 9 * * *")
- condition: Metric threshold (e.g., wisp count > 50)
- event: Trigger-based (e.g., startup, heartbeat)
For each plugin:
1. Read plugin.md frontmatter to check gate
2. Compare against state.json (last run, etc.)
3. If gate is open, execute the plugin
Plugins marked parallel: true can run concurrently using Task tool subagents. Sequential plugins run one at a time in directory order.
Skip this step if ~/gt/plugins/ does not exist or is empty."""
[[steps]]
id = "orphan-check"
title = "Find abandoned work"
needs = ["health-scan"]
description = """
Find abandoned work.
Scan for orphaned state:
- Issues marked in_progress with no active polecat
- Polecats that stopped responding mid-work
- Merge queue entries with no polecat owner
- Wisp sessions that outlived their spawner
```bash
bd list --status=in_progress
gt polecats --all --orphan
```
For each orphan:
- Check if polecat session still exists
- If not, mark issue for reassignment or retry
- File incident beads if data loss occurred"""
[[steps]]
id = "session-gc"
title = "Clean dead sessions"
needs = ["orphan-check"]
description = """
Clean dead sessions and orphaned state.
Run `gt doctor --fix` to handle all cleanup:
```bash
# Preview what needs cleaning
gt doctor -v
# Fix everything
gt doctor --fix
```
This handles:
- **orphan-sessions**: Kill orphaned tmux sessions (gt-* not matching valid patterns)
- **orphan-processes**: Kill orphaned Claude processes (no tmux parent)
- **wisp-gc**: Garbage collect abandoned wisps (>1h old)
All cleanup is handled by doctor checks - no need to run separate commands."""
[[steps]]
id = "context-check"
title = "Check own context limit"
needs = ["session-gc"]
description = """
Check own context limit.
The Deacon runs in a Claude session with finite context. Check if approaching the limit:
```bash
gt context --usage
```
If context is high (>80%), prepare for handoff:
- Summarize current state
- Note any pending work
- Write handoff to molecule state
This enables the Deacon to burn and respawn cleanly."""
[[steps]]
id = "loop-or-exit"
title = "Burn and respawn or loop"
needs = ["context-check"]
description = """
Burn and let daemon respawn, or exit if context high.
Decision point at end of patrol cycle:
If context is LOW:
- Sleep briefly (avoid tight loop)
- Return to inbox-check step
If context is HIGH:
- Write state to persistent storage
- Exit cleanly
- Let the daemon orchestrator respawn a fresh Deacon
The daemon ensures Deacon is always running:
```bash
# Daemon respawns on exit
gt daemon status
```
This enables infinite patrol duration via context-aware respawning."""