docs: improve help text and add nudge documentation

Polish help text across all agent commands to clarify roles: - crew: persistent workspaces vs ephemeral polecats - deacon: town-level watchdog receiving heartbeats - dog: cross-rig infrastructure workers (cats vs dogs) - mayor: Chief of Staff for cross-rig coordination - nudge: universal synchronous messaging API - polecat: ephemeral one-task workers, self-cleaning - refinery: merge queue serializer per rig - witness: per-rig polecat health monitor Add comprehensive gt nudge documentation to crew template explaining when to use nudge vs mail, common patterns, and target shortcuts. Add orphan-process-cleanup step to deacon patrol formula to clean up claude subagent processes that fail to exit (TTY = "?"). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 02:55:24 -08:00
parent bd655f58f9
commit 7ff87ff012
11 changed files with 191 additions and 52 deletions
--- a/internal/formula/formulas/mol-deacon-patrol.formula.toml
+++ b/internal/formula/formulas/mol-deacon-patrol.formula.toml
@@ -84,10 +84,46 @@ Callbacks may spawn new polecats, update issue state, or trigger other actions.
 **Hygiene principle**: Archive messages after they're fully processed.
 Keep inbox near-empty - only unprocessed items should remain."""

+[[steps]]
+id = "orphan-process-cleanup"
+title = "Clean up orphaned claude subagent processes"
+needs = ["inbox-check"]
+description = """
+Clean up orphaned claude subagent processes.
+
+Claude Code's Task tool spawns subagent processes that sometimes don't clean up
+properly after completion. These accumulate and consume significant memory.
+
+**Detection method:**
+Orphaned processes have no controlling terminal (TTY = "?"). Legitimate claude
+instances in terminals have a TTY like "pts/0".
+
+**Run cleanup:**
+```bash
+gt deacon cleanup-orphans
+```
+
+This command:
+1. Lists all claude/codex processes with `ps -eo pid,tty,comm`
+2. Filters for TTY = "?" (no controlling terminal)
+3. Sends SIGTERM to each orphaned process
+4. Reports how many were killed
+
+**Why this is safe:**
+- Processes in terminals (your personal sessions) have a TTY - they won't be touched
+- Only kills processes that have no controlling terminal
+- These orphans are children of the tmux server with no TTY, indicating they're
+  detached subagents that failed to exit
+
+**If cleanup fails:**
+Log the error but continue patrol - this is best-effort cleanup.
+
+**Exit criteria:** Orphan cleanup attempted (success or logged failure)."""
+
 [[steps]]
 id = "trigger-pending-spawns"
 title = "Nudge newly spawned polecats"
-needs = ["inbox-check"]
+needs = ["orphan-process-cleanup"]
 description = """
 Nudge newly spawned polecats that are ready for input.

--- a/internal/formula/formulas/mol-witness-patrol.formula.toml
+++ b/internal/formula/formulas/mol-witness-patrol.formula.toml
@@ -38,7 +38,7 @@ needs = ['check-timer-gates']
 title = 'Check if active swarm is complete'

 [[steps]]
-description = "Send WITNESS_PING to Deacon for second-order monitoring.\n\nThe Witness fleet collectively monitors Deacon health - this prevents the\n\"who watches the watchers\" problem. If Deacon dies, Witnesses detect it.\n\n**Step 1: Send ping**\n```bash\ngt mail send deacon/ -s \"WITNESS_PING <rig>\" -m \"Rig: <rig>\nTimestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)\nPatrol: <cycle-number>\"\n```\n\n**Step 2: Check Deacon health**\n```bash\n# Check Deacon agent bead for last_activity\nbd list --type=agent --json | jq '.[] | select(.description | contains(\"role_type: deacon\"))'\n```\n\nLook at the `last_activity` timestamp. If stale (>5 minutes since last update):\n- Deacon may be dead or stuck\n\n**Step 3: Escalate if needed**\n```bash\n# If Deacon appears down\ngt mail send mayor/ -s \"ALERT: Town-level Deacon appears unresponsive\" -m \"Town Deacon (hq-deacon) has no activity for >5 minutes.\nLast seen: <timestamp>\nWitness: <rig>/witness\"\n```\n\nNote: Multiple Witnesses may send this alert. Mayor should handle deduplication."
+description = "Send WITNESS_PING to Deacon for second-order monitoring.\n\nThe Witness fleet collectively monitors Deacon health - this prevents the\n\"who watches the watchers\" problem. If Deacon dies, Witnesses detect it.\n\n**Step 1: Send ping**\n```bash\ngt mail send deacon/ -s \"WITNESS_PING <rig>\" -m \"Rig: <rig>\nTimestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)\nPatrol: <cycle-number>\"\n```\n\n**Step 2: Check Deacon health**\n```bash\n# Check Deacon agent bead for last_activity\nbd list --type=agent --json | jq '.[] | select(.description | contains(\"deacon\"))'\n```\n\nLook at the `last_activity` timestamp. If stale (>5 minutes since last update):\n- Deacon may be dead or stuck\n\n**Step 3: Escalate if needed**\n```bash\n# If Deacon appears down\ngt mail send mayor/ -s \"ALERT: Deacon appears unresponsive\" -m \"No Deacon activity for >5 minutes.\nLast seen: <timestamp>\nWitness: <rig>/witness\"\n```\n\nNote: Multiple Witnesses may send this alert. Mayor should handle deduplication."
 id = 'ping-deacon'
 needs = ['check-swarm-completion']
 title = 'Ping Deacon for health check'