chore: sync formula and recovery work

- Update mol-deacon-patrol formula - Fix sling helpers, doctor branch check - Update startup session and tests - Remove obsolete research doc Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 14:23:10 -08:00
parent 0e19529186
commit e2e43b8bf5
6 changed files with 65 additions and 418 deletions
@@ -1,45 +1,29 @@
 description = """
-Mayor's daemon patrol loop - CONTINUOUS EXECUTION.
+Mayor's daemon patrol loop.
-The Deacon is the Mayor's background process that runs CONTINUOUSLY in a loop:
+The Deacon is the Mayor's background process that runs continuously, handling callbacks, monitoring rig health, and performing cleanup. Each patrol cycle runs these steps in sequence, then loops or exits.
 1. Execute all patrol steps (inbox-check through context-check)
 2. Wait for activity OR timeout (15-minute max)
 3. Create new patrol wisp and repeat from step 1
 **This is a continuous loop, not a one-shot execution.**
 ## Patrol Loop Flow
 ```
 START → inbox-check → [all patrol steps] → loop-or-exit
                                               ↓
                                     await-signal (wait for activity)
                                               ↓
                                     create new wisp → START
 ```
 ## Plugin Dispatch
 The plugin-run step scans $GT_ROOT/plugins/ for plugins with open gates and
 dispatches them to dogs. With a 15-minute max backoff, plugins with 15m
 cooldown gates will be checked at least once per interval.
 ## Idle Town Principle
 **The Deacon should be silent/invisible when the town is healthy and idle.**
 - Skip HEALTH_CHECK nudges when no active work exists
- Sleep via await-signal (exponential backoff up to 15 min)
+- Sleep 60+ seconds between patrol cycles (longer when idle)
- Let the feed subscription wake on actual events
+- Let the feed subscription wake agents on actual events
- The daemon is the safety net for dead sessions
+- The daemon (10-minute heartbeat) is the safety net for dead sessions
 This prevents flooding idle agents with health checks every few seconds.
 ## Second-Order Monitoring
 Witnesses send WITNESS_PING messages to verify the Deacon is alive. This
 prevents the "who watches the watchers" problem - if the Deacon dies,
-Witnesses detect it and escalate to the Mayor."""
+Witnesses detect it and escalate to the Mayor.
 The Deacon's agent bead last_activity timestamp is updated during each patrol
 cycle. Witnesses check this timestamp to verify health."""
 formula = "mol-deacon-patrol"
-version = 9
+version = 8
 [[steps]]
 id = "inbox-check"
@@ -504,48 +488,29 @@ investigate why the Witness isn't cleaning up properly."""
 [[steps]]
 id = "plugin-run"
-title = "Scan and dispatch plugins"
+title = "Execute registered plugins"
 needs = ["zombie-scan"]
 description = """
-Scan plugins and dispatch any with open gates to dogs.
+Execute registered plugins.
-**Step 1: List plugins and check gates**
+Scan $GT_ROOT/plugins/ for plugin directories. Each plugin has a plugin.md with TOML frontmatter defining its gate (when to run) and instructions (what to do).
 ```bash
 gt plugin list
 ```
-For each plugin, check if its gate is open:
+See docs/deacon-plugins.md for full documentation.
 - **cooldown**: Time since last run (e.g., 15m) - check state.json
 - **cron**: Schedule-based (e.g., "0 9 * * *")
 - **condition**: Metric threshold (e.g., wisp count > 50)
 - **event**: Trigger-based (e.g., startup, heartbeat)
-**Step 2: Dispatch plugins with open gates**
+Gate types:
-```bash
+- cooldown: Time since last run (e.g., 24h)
-# For each plugin with an open gate:
+- cron: Schedule-based (e.g., "0 9 * * *")
-gt dog dispatch --plugin <plugin-name>
+- condition: Metric threshold (e.g., wisp count > 50)
-```
+- event: Trigger-based (e.g., startup, heartbeat)
-This sends the plugin to an idle dog for execution. The dog will:
+For each plugin:
-1. Execute the plugin instructions from plugin.md
+1. Read plugin.md frontmatter to check gate
-2. Send DOG_DONE mail when complete (processed in next patrol's inbox-check)
+2. Compare against state.json (last run, etc.)
 3. If gate is open, execute the plugin
-**Step 3: Track dispatched plugins**
+Plugins marked parallel: true can run concurrently using Task tool subagents. Sequential plugins run one at a time in directory order.
 Record in state.json which plugins were dispatched this cycle:
 ```json
 {
  "plugins_dispatched": ["scout-patrol"],
  "last_plugin_run": "2026-01-23T13:45:00Z"
 }
 ```
-**If no plugins have open gates:**
+Skip this step if $GT_ROOT/plugins/ does not exist or is empty."""
 Skip dispatch - all plugins are within their cooldown/schedule.
 **If no dogs available:**
 Log warning and skip dispatch this cycle. Dog pool maintenance step will spawn dogs.
 See docs/deacon-plugins.md for full documentation."""
 [[steps]]
 id = "dog-pool-maintenance"
@@ -872,89 +837,57 @@ This enables the Deacon to burn and respawn cleanly."""
 [[steps]]
 id = "loop-or-exit"
-title = "Continuous patrol loop"
+title = "Burn and respawn or loop"
 needs = ["context-check"]
 description = """
-Continue the patrol loop or exit for context refresh.
+Burn and let daemon respawn, or exit if context high.
-**CRITICAL**: This is where the continuous patrol loop happens. The Deacon MUST
+Decision point at end of patrol cycle:
 loop back and start a new patrol cycle. Do NOT wait for external triggers.
-## Decision Matrix
+If context is LOW:
-
+Use await-signal with exponential backoff to wait for activity:
 1. **Check context usage**: `gt context --usage`
 2. **If context HIGH (>80%)**: Exit cleanly, daemon respawns fresh session
 3. **If context LOW**: Continue to patrol loop below
 ## The Continuous Patrol Loop
 When context is low, execute this loop:
 ```bash
 # Step 1: Squash current patrol wisp (clean up)
 gt mol squash
 # Step 2: Wait for activity OR timeout (15-minute default)
 gt mol step await-signal --agent-bead hq-deacon \
-  --backoff-base 60s --backoff-mult 2 --backoff-max 15m
+  --backoff-base 60s --backoff-mult 2 --backoff-max 10m
 # Step 3: Reset idle counter if activity was detected
 gt agents state hq-deacon --set idle=0
 # Step 4: Create new patrol wisp
 WISP_ID=$(bd mol wisp mol-deacon-patrol 2>&1 | grep -o 'hq-[a-z0-9]*')
 # Step 5: Hook it and start executing
 gt hook $WISP_ID
 ```
-After hooking, immediately begin executing the new wisp from its first step
+This command:
 (inbox-check). The wisp is now on your hook, so just continue with patrol.
 **IMPORTANT**: After await-signal returns (either signal or timeout), you MUST:
 1. Squash the current wisp
 2. Create a new patrol wisp
 3. Hook it
 4. Start executing from inbox-check
 This IS the loop. There is no "return to inbox-check" command - you create a new
 wisp and that wisp starts fresh from inbox-check.
 ## await-signal Behavior
 The await-signal command:
 1. Subscribes to `bd activity --follow` (beads activity feed)
 2. Returns IMMEDIATELY when any beads activity occurs
 3. If no activity, times out with exponential backoff:
   - First timeout: 60s
   - Second timeout: 120s
-   - Third timeout: 240s (4 min)
+   - Third timeout: 240s
-   - ...capped at 15 minutes max
+   - ...capped at 10 minutes max
 4. Tracks `idle:N` label on hq-deacon bead for backoff state
 **On signal received** (activity detected):
 Reset the idle counter and start next patrol cycle:
 ```bash
 gt agent state hq-deacon --set idle=0
 ```
 Then return to inbox-check step.
 **On timeout** (no activity):
 The idle counter was auto-incremented. Continue to next patrol cycle
 (the longer backoff will apply next time). Return to inbox-check step.
 **Why this approach?**
 - Any `gt` or `bd` command triggers beads activity, waking the Deacon
- Idle towns let the Deacon sleep longer (up to 15 min between patrols)
+- Idle towns let the Deacon sleep longer (up to 10 min between patrols)
 - Active work wakes the Deacon immediately via the feed
- No fixed polling intervals - event-driven wake
+- No polling or fixed sleep intervals
-## Plugin Dispatch Timing
+If context is HIGH:
 - Write state to persistent storage
 - Exit cleanly
 - Let the daemon orchestrator respawn a fresh Deacon
-The plugin-run step (earlier in patrol) handles plugin dispatch:
+The daemon ensures Deacon is always running:
 - Scans $GT_ROOT/plugins/ for plugins with open gates
 - Dispatches to dogs via `gt dog dispatch --plugin <name>`
 - Dogs send DOG_DONE when complete (processed in next patrol's inbox-check)
 With a 15-minute max backoff, plugins with 15m cooldown gates will be checked
 at least once per interval when idle.
 ## Exit Path (High Context)
 If context is HIGH (>80%):
 ```bash
-# Exit cleanly - daemon will respawn with fresh context
+# Daemon respawns on exit
-exit 0
+gt daemon status
 ```
-The daemon ensures Deacon is always running. Exiting is safe - you'll be
+This enables infinite patrol duration via context-aware respawning."""
 respawned with fresh context and the patrol loop continues."""
@@ -319,7 +319,7 @@ func injectStartPrompt(pane, beadID, subject, args string) error {
 	} else if subject != "" {
 		prompt = fmt.Sprintf("Work slung: %s (%s). Start working on it now - no questions, just begin.", beadID, subject)
 	} else {
-		prompt = fmt.Sprintf("Work slung: %s. Start working on it now - run `gt prime --hook` to load context, then begin.", beadID)
+		prompt = fmt.Sprintf("Work slung: %s. Start working on it now - run `gt hook` to see the hook, then begin.", beadID)
 	}
 	// Use the reliable nudge pattern (same as gt nudge / tmux.NudgeSession)
@@ -1,13 +1,11 @@
 package doctor
 import (
 	"context"
 	"fmt"
 	"os"
 	"os/exec"
 	"path/filepath"
 	"strings"
 	"time"
 )
 // BranchCheck detects persistent roles (crew, witness, refinery) that are
@@ -89,18 +87,15 @@ func (c *BranchCheck) Run(ctx *CheckContext) *CheckResult {
 	}
 }
 // gitNetworkTimeout is the timeout for git network operations (pull, fetch).
 const gitNetworkTimeout = 30 * time.Second
 // Fix switches all off-main directories to main branch.
-func (c *BranchCheck) Fix(checkCtx *CheckContext) error {
+func (c *BranchCheck) Fix(ctx *CheckContext) error {
 	if len(c.offMainDirs) == 0 {
 		return nil
 	}
 	var lastErr error
 	for _, dir := range c.offMainDirs {
-		// git checkout main (local operation, short timeout)
+		// git checkout main
 		cmd := exec.Command("git", "checkout", "main")
 		cmd.Dir = dir
 		if err := cmd.Run(); err != nil {
@@ -108,16 +103,10 @@ func (c *BranchCheck) Fix(checkCtx *CheckContext) error {
 			continue
 		}
-		// git pull --rebase (network operation, needs timeout)
+		// git pull --rebase
-		ctx, cancel := context.WithTimeout(context.Background(), gitNetworkTimeout)
+		cmd = exec.Command("git", "pull", "--rebase")
 		cmd = exec.CommandContext(ctx, "git", "pull", "--rebase")
 		cmd.Dir = dir
-		err := cmd.Run()
+		if err := cmd.Run(); err != nil {
 		cancel()
 		if err != nil {
 			if ctx.Err() == context.DeadlineExceeded {
 				lastErr = fmt.Errorf("%s: git pull timed out after %v", dir, gitNetworkTimeout)
 			}
 			// Pull failure is not fatal, just warn
 			continue
 		}
@@ -66,10 +66,8 @@ func FormatStartupBeacon(cfg BeaconConfig) string {
 	// For assigned, work is already on the hook - just tell them to run it
 	// This prevents the "helpful assistant" exploration pattern (see PRIMING.md)
 	// Use `gt prime --hook` instead of `gt hook` so polecats get full role context
 	// including THE IDLE POLECAT HERESY guidance about running `gt done`.
 	if cfg.Topic == "assigned" {
-		beacon += "\n\nWork is on your hook. Run `gt prime --hook` now and begin immediately."
+		beacon += "\n\nWork is on your hook. Run `gt hook` now and begin immediately."
 	}
 	// For start/restart, add fallback instructions in case SessionStart hook fails
@@ -26,7 +26,7 @@ func TestFormatStartupBeacon(t *testing.T) {
 				"<- deacon",
 				"assigned:gt-abc12",
 				"Work is on your hook", // assigned includes actionable instructions
-				"gt prime --hook",      // full context including IDLE POLECAT HERESY
+				"gt hook",
 			},
 		},
 		{
@@ -1,273 +0,0 @@
 # Role Template Management Strategy
 **Research Date:** 2026-01-26
 **Researcher:** kerosene (gastown/crew)
 **Status:** Analysis complete, recommendation provided
 ## Executive Summary
 Gas Town currently has **two competing mechanisms** for managing role context, leading to divergent content and maintenance complexity:
 1. **Embedded templates** (`internal/templates/roles/*.md.tmpl`) - source of truth in binary
 2. **Local-fork edits** - direct modifications to runtime `CLAUDE.md` files
 Additionally, there's a **third mechanism** for operational config that works well:
 3. **Role config overrides** (`internal/config/roles.go`) - TOML-based config override chain
 **Recommendation:** Extend the TOML override pattern to support template content sections, unifying all customization under one mechanism.
 ---
 ## Inventory: Current Mechanisms
 ### 1. Embedded Templates (internal/templates/roles/*.md.tmpl)
 **Location:** `internal/templates/roles/`
 **Files:**
 - `mayor.md.tmpl` (337 lines)
 - `crew.md.tmpl` (17,607 bytes)
 - `polecat.md.tmpl` (17,527 bytes)
 - `witness.md.tmpl` (11,746 bytes)
 - `refinery.md.tmpl` (13,525 bytes)
 - `deacon.md.tmpl` (13,727 bytes)
 - `boot.md.tmpl` (4,445 bytes)
 **How it works:**
 - Templates are embedded into the binary via `//go:embed` directive
 - `gt prime` command renders templates with role-specific data (TownRoot, RigName, etc.)
 - Output is printed to stdout, where Claude picks it up as context
 - Uses Go template syntax: `{{ .TownRoot }}`, `{{ .RigName }}`, etc.
 **Code path:** `templates.New()` → `tmpl.RenderRole()` → stdout
 ### 2. Local-Fork Edits (Runtime CLAUDE.md)
 **Location:** Various agent directories (e.g., `mayor/CLAUDE.md`, `<rig>/crew/<name>/CLAUDE.md`)
 **How it works:**
 - `gt install` creates minimal bootstrap CLAUDE.md (~15 lines) via `createMayorCLAUDEmd()`
 - Bootstrap content just says "Run `gt prime` for full context"
 - THEN humans/agents directly edit these files with custom content
 - These edits are committed to the town's git repo
 **Example:** Mayor's CLAUDE.md grew from bootstrap to 532 lines
 **Key local-fork commit:**
 ```
 1cdbc27 docs: Enhance Mayor role template with coordination system knowledge (sc-n2oiz)
 ```
 This commit added ~500 lines to `mayor/CLAUDE.md` including:
 - Colony Model (why Gas Town uses coordinated specialists)
 - Escalation Patterns (Witness vs Mayor responsibilities)
 - Decision Flow (when to use polecats vs crew)
 - Multi-phase Orchestration
 - Monitoring without Micromanaging
 - Teaching GUPP patterns
 - Communication Patterns
 - Speed Asymmetry
 **None of this content exists in the embedded template** - it's purely local-fork.
 ### 3. Role Config Overrides (TOML files)
 **Location:**
 - Built-in: `internal/config/roles/*.toml` (embedded in binary)
 - Town-level: `<town>/roles/<role>.toml` (optional override)
 - Rig-level: `<rig>/roles/<role>.toml` (optional override)
 **Resolution order (later wins):**
 1. Built-in defaults (embedded)
 2. Town-level overrides
 3. Rig-level overrides
 **What it handles:**
 ```toml
 # Example: mayor.toml
 role = "mayor"
 scope = "town"
 nudge = "Check mail and hook status, then act accordingly."
 prompt_template = "mayor.md.tmpl"
 [session]
 pattern = "hq-mayor"
 work_dir = "{town}"
 needs_pre_sync = false
 start_command = "exec claude --dangerously-skip-permissions"
 [env]
 GT_ROLE = "mayor"
 GT_SCOPE = "town"
 [health]
 ping_timeout = "30s"
 consecutive_failures = 3
 kill_cooldown = "5m"
 stuck_threshold = "1h"
 ```
 **What it DOES NOT handle:**
 - Template content (the actual markdown context)
 - The `prompt_template` field just names which .md.tmpl to use
 **Implementation:** `LoadRoleDefinition()` in `roles.go` handles the override chain with `mergeRoleDefinition()`.
 ---
 ## Analysis: Trade-offs
 ### Embedded Templates
 | Pros | Cons |
 |------|------|
 | Single source of truth in binary | Requires recompile for changes |
 | Consistent across all installations | No per-town customization |
 | Supports placeholder substitution | Can't add town-specific sections |
 | Version-controlled in gastown repo | Changes don't propagate to existing installs |
 ### Local-Fork Edits
 | Pros | Cons |
 |------|------|
 | Per-installation customization | Diverges from template source |
 | No recompile needed | Manual sync to keep up with template changes |
 | Town-specific content | Each install is unique snowflake |
 | Immediate effect | Template improvements don't propagate |
 ### Role Config Overrides
 | Pros | Cons |
 |------|------|
 | Clean override chain | Only handles operational config |
 | Town/rig level customization | Doesn't handle template content |
 | Merge semantics (not replace) | - |
 | No recompile needed | - |
 ---
 ## Problem Statement
 The current situation creates **three-way divergence**:
 ```
                    ┌──────────────────────────────────────────┐
                    │  Embedded Template (mayor.md.tmpl)       │
                    │  337 lines - "official" content          │
                    └──────────────────────────────────────────┘
                                        │
                                        │ gt prime renders
                                        │ BUT doesn't include
                                        │ local-fork additions
                                        v
 ┌──────────────────────────────────────────────────────────────────┐
 │  Runtime CLAUDE.md (mayor/CLAUDE.md)                            │
 │  532 lines - has ~200 lines of local-fork content               │
 │  INCLUDING: Colony Model, Escalation Patterns, etc.             │
 └──────────────────────────────────────────────────────────────────┘
 ```
 **Issues:**
 1. When `gt prime` runs, it outputs the embedded template (337 lines)
 2. The local-fork content (Colony Model, etc.) is in `mayor/CLAUDE.md`
 3. Claude Code reads BOTH via `CLAUDE.md` + startup hooks
 4. But the embedded template and local CLAUDE.md overlap/conflict
 5. Template improvements in new gt versions don't include local-fork content
 6. Local-fork improvements aren't shared with other installations
 ---
 ## Recommendation: Unified Override System
 **Extend the existing TOML override mechanism to support template content sections.**
 ### Proposed Design
 ```toml
 # <town>/roles/mayor.toml (town-level override)
 # Existing operational overrides work as-is
 [health]
 stuck_threshold = "2h"  # Town needs longer threshold
 # NEW: Template content sections
 [content]
 # Append sections after the embedded template
 append = """
 ## The Colony Model: Why Gas Town Works
 Gas Town rejects the "super-ant" model... [rest of content]
 """
 # OR reference a file
 append_file = "mayor-additions.md"
 # OR override specific sections by ID
 [content.sections.escalation]
 replace = """
 ## Escalation Patterns: What to Handle vs Delegate
 ...[custom content]...
 """
 ```
 ### Why This Works
 1. **Single source of truth**: Embedded templates remain canonical
 2. **Clean override semantics**: Town/rig can append or replace sections
 3. **Existing infrastructure**: Uses the same TOML loading + merge pattern
 4. **No recompile**: Content overrides are runtime files
 5. **Shareable**: Town-level overrides can be committed to town repo
 6. **Migrateable**: Existing local-fork content can move to `[content]` sections
 ### Implementation Path
 1. **Phase 1**: Add `[content]` support to role config
   - Parse `append`, `append_file`, `replace_sections` fields
   - Apply after template rendering in `outputPrimeContext()`
 2. **Phase 2**: Migrate local-fork content
   - Extract custom sections from `mayor/CLAUDE.md`
   - Move to `<town>/roles/mayor.toml` `[content]` section
   - Reduce `mayor/CLAUDE.md` back to bootstrap pointer
 3. **Phase 3**: Document the pattern
   - How to add town-specific guidance
   - How to share improvements back to embedded templates
 ---
 ## Alternative Considered: Pure Template Approach
 **Idea:** Move all content into embedded templates, remove local CLAUDE.md entirely.
 **Rejected because:**
 - Can't support per-town customization (e.g., different escalation policies)
 - Requires recompile for any content change
 - Forces all installations to be identical
 - Doesn't leverage existing override infrastructure
 ---
 ## Files Involved
 For implementation, these files would need modification:
 | File | Change |
 |------|--------|
 | `internal/config/roles.go` | Add `[content]` parsing to `RoleDefinition` |
 | `internal/cmd/prime_output.go` | Apply content overrides after template render |
 | `internal/templates/templates.go` | Potentially add section markers for replace |
 | `internal/cmd/install.go` | Update bootstrap to not create full CLAUDE.md |
 ---
 ## Summary
 | Approach | Verdict |
 |----------|---------|
 | **Embedded templates only** | Insufficient - no customization |
 | **Local-fork edits** | Current state - creates divergence |
 | **TOML content overrides** | **Recommended** - unifies all customization |
 The TOML content override approach leverages existing infrastructure, provides clean semantics, and allows both standardization (embedded templates) and customization (override sections).