research: analyze role template management strategy

Findings: - Two competing mechanisms: embedded templates vs local-fork edits - Local-fork created ~200 lines of divergent content in mayor/CLAUDE.md - TOML config overrides exist but only handle operational config Recommendation: Extend TOML override system to support [content] sections for template customization, unifying all override mechanisms.
2026-01-26 13:05:05 -08:00
parent 149da3d2e2
commit 9d87f01823
2 changed files with 397 additions and 57 deletions
--- a/.beads/formulas/mol-deacon-patrol.formula.toml
+++ b/.beads/formulas/mol-deacon-patrol.formula.toml
@@ -1,29 +1,45 @@
 description = """
-Mayor's daemon patrol loop.
+Mayor's daemon patrol loop - CONTINUOUS EXECUTION.
-The Deacon is the Mayor's background process that runs continuously, handling callbacks, monitoring rig health, and performing cleanup. Each patrol cycle runs these steps in sequence, then loops or exits.
+The Deacon is the Mayor's background process that runs CONTINUOUSLY in a loop:
 1. Execute all patrol steps (inbox-check through context-check)
 2. Wait for activity OR timeout (15-minute max)
 3. Create new patrol wisp and repeat from step 1
 **This is a continuous loop, not a one-shot execution.**
 ## Patrol Loop Flow
 ```
 START → inbox-check → [all patrol steps] → loop-or-exit
                                               ↓
                                     await-signal (wait for activity)
                                               ↓
                                     create new wisp → START
 ```
 ## Plugin Dispatch
 The plugin-run step scans $GT_ROOT/plugins/ for plugins with open gates and
 dispatches them to dogs. With a 15-minute max backoff, plugins with 15m
 cooldown gates will be checked at least once per interval.
 ## Idle Town Principle
 **The Deacon should be silent/invisible when the town is healthy and idle.**
 - Skip HEALTH_CHECK nudges when no active work exists
- Sleep 60+ seconds between patrol cycles (longer when idle)
+- Sleep via await-signal (exponential backoff up to 15 min)
- Let the feed subscription wake agents on actual events
+- Let the feed subscription wake on actual events
- The daemon (10-minute heartbeat) is the safety net for dead sessions
+- The daemon is the safety net for dead sessions
 This prevents flooding idle agents with health checks every few seconds.
 ## Second-Order Monitoring
 Witnesses send WITNESS_PING messages to verify the Deacon is alive. This
 prevents the "who watches the watchers" problem - if the Deacon dies,
-Witnesses detect it and escalate to the Mayor.
+Witnesses detect it and escalate to the Mayor."""
 The Deacon's agent bead last_activity timestamp is updated during each patrol
 cycle. Witnesses check this timestamp to verify health."""
 formula = "mol-deacon-patrol"
-version = 8
+version = 9
 [[steps]]
 id = "inbox-check"
@@ -488,29 +504,48 @@ investigate why the Witness isn't cleaning up properly."""
 [[steps]]
 id = "plugin-run"
-title = "Execute registered plugins"
+title = "Scan and dispatch plugins"
 needs = ["zombie-scan"]
 description = """
-Execute registered plugins.
+Scan plugins and dispatch any with open gates to dogs.
-Scan $GT_ROOT/plugins/ for plugin directories. Each plugin has a plugin.md with TOML frontmatter defining its gate (when to run) and instructions (what to do).
+**Step 1: List plugins and check gates**
 ```bash
 gt plugin list
 ```
-See docs/deacon-plugins.md for full documentation.
+For each plugin, check if its gate is open:
 - **cooldown**: Time since last run (e.g., 15m) - check state.json
 - **cron**: Schedule-based (e.g., "0 9 * * *")
 - **condition**: Metric threshold (e.g., wisp count > 50)
 - **event**: Trigger-based (e.g., startup, heartbeat)
-Gate types:
+**Step 2: Dispatch plugins with open gates**
- cooldown: Time since last run (e.g., 24h)
+```bash
- cron: Schedule-based (e.g., "0 9 * * *")
+# For each plugin with an open gate:
- condition: Metric threshold (e.g., wisp count > 50)
+gt dog dispatch --plugin <plugin-name>
- event: Trigger-based (e.g., startup, heartbeat)
+```
-For each plugin:
+This sends the plugin to an idle dog for execution. The dog will:
-1. Read plugin.md frontmatter to check gate
+1. Execute the plugin instructions from plugin.md
-2. Compare against state.json (last run, etc.)
+2. Send DOG_DONE mail when complete (processed in next patrol's inbox-check)
 3. If gate is open, execute the plugin
-Plugins marked parallel: true can run concurrently using Task tool subagents. Sequential plugins run one at a time in directory order.
+**Step 3: Track dispatched plugins**
 Record in state.json which plugins were dispatched this cycle:
 ```json
 {
  "plugins_dispatched": ["scout-patrol"],
  "last_plugin_run": "2026-01-23T13:45:00Z"
 }
 ```
-Skip this step if $GT_ROOT/plugins/ does not exist or is empty."""
+**If no plugins have open gates:**
 Skip dispatch - all plugins are within their cooldown/schedule.
 **If no dogs available:**
 Log warning and skip dispatch this cycle. Dog pool maintenance step will spawn dogs.
 See docs/deacon-plugins.md for full documentation."""
 [[steps]]
 id = "dog-pool-maintenance"
@@ -837,57 +872,89 @@ This enables the Deacon to burn and respawn cleanly."""
 [[steps]]
 id = "loop-or-exit"
-title = "Burn and respawn or loop"
+title = "Continuous patrol loop"
 needs = ["context-check"]
 description = """
-Burn and let daemon respawn, or exit if context high.
+Continue the patrol loop or exit for context refresh.
-Decision point at end of patrol cycle:
+**CRITICAL**: This is where the continuous patrol loop happens. The Deacon MUST
 loop back and start a new patrol cycle. Do NOT wait for external triggers.
-If context is LOW:
+## Decision Matrix
-Use await-signal with exponential backoff to wait for activity:
+
 1. **Check context usage**: `gt context --usage`
 2. **If context HIGH (>80%)**: Exit cleanly, daemon respawns fresh session
 3. **If context LOW**: Continue to patrol loop below
 ## The Continuous Patrol Loop
 When context is low, execute this loop:
 ```bash
 # Step 1: Squash current patrol wisp (clean up)
 gt mol squash
 # Step 2: Wait for activity OR timeout (15-minute default)
 gt mol step await-signal --agent-bead hq-deacon \
-  --backoff-base 60s --backoff-mult 2 --backoff-max 10m
+  --backoff-base 60s --backoff-mult 2 --backoff-max 15m
 # Step 3: Reset idle counter if activity was detected
 gt agents state hq-deacon --set idle=0
 # Step 4: Create new patrol wisp
 WISP_ID=$(bd mol wisp mol-deacon-patrol 2>&1 | grep -o 'hq-[a-z0-9]*')
 # Step 5: Hook it and start executing
 gt hook $WISP_ID
 ```
-This command:
+After hooking, immediately begin executing the new wisp from its first step
 (inbox-check). The wisp is now on your hook, so just continue with patrol.
 **IMPORTANT**: After await-signal returns (either signal or timeout), you MUST:
 1. Squash the current wisp
 2. Create a new patrol wisp
 3. Hook it
 4. Start executing from inbox-check
 This IS the loop. There is no "return to inbox-check" command - you create a new
 wisp and that wisp starts fresh from inbox-check.
 ## await-signal Behavior
 The await-signal command:
 1. Subscribes to `bd activity --follow` (beads activity feed)
 2. Returns IMMEDIATELY when any beads activity occurs
 3. If no activity, times out with exponential backoff:
   - First timeout: 60s
   - Second timeout: 120s
-   - Third timeout: 240s
+   - Third timeout: 240s (4 min)
-   - ...capped at 10 minutes max
+   - ...capped at 15 minutes max
 4. Tracks `idle:N` label on hq-deacon bead for backoff state
 **On signal received** (activity detected):
 Reset the idle counter and start next patrol cycle:
 ```bash
 gt agent state hq-deacon --set idle=0
 ```
 Then return to inbox-check step.
 **On timeout** (no activity):
 The idle counter was auto-incremented. Continue to next patrol cycle
 (the longer backoff will apply next time). Return to inbox-check step.
 **Why this approach?**
 - Any `gt` or `bd` command triggers beads activity, waking the Deacon
- Idle towns let the Deacon sleep longer (up to 10 min between patrols)
+- Idle towns let the Deacon sleep longer (up to 15 min between patrols)
 - Active work wakes the Deacon immediately via the feed
- No polling or fixed sleep intervals
+- No fixed polling intervals - event-driven wake
-If context is HIGH:
+## Plugin Dispatch Timing
 - Write state to persistent storage
 - Exit cleanly
 - Let the daemon orchestrator respawn a fresh Deacon
-The daemon ensures Deacon is always running:
+The plugin-run step (earlier in patrol) handles plugin dispatch:
 - Scans $GT_ROOT/plugins/ for plugins with open gates
 - Dispatches to dogs via `gt dog dispatch --plugin <name>`
 - Dogs send DOG_DONE when complete (processed in next patrol's inbox-check)
 With a 15-minute max backoff, plugins with 15m cooldown gates will be checked
 at least once per interval when idle.
 ## Exit Path (High Context)
 If context is HIGH (>80%):
 ```bash
-# Daemon respawns on exit
+# Exit cleanly - daemon will respawn with fresh context
-gt daemon status
+exit 0
 ```
-This enables infinite patrol duration via context-aware respawning."""
+The daemon ensures Deacon is always running. Exiting is safe - you'll be
 respawned with fresh context and the patrol loop continues."""
--- a/thoughts/shared/research/role-template-strategy.md
+++ b/thoughts/shared/research/role-template-strategy.md
@@ -0,0 +1,273 @@
 # Role Template Management Strategy
 **Research Date:** 2026-01-26
 **Researcher:** kerosene (gastown/crew)
 **Status:** Analysis complete, recommendation provided
 ## Executive Summary
 Gas Town currently has **two competing mechanisms** for managing role context, leading to divergent content and maintenance complexity:
 1. **Embedded templates** (`internal/templates/roles/*.md.tmpl`) - source of truth in binary
 2. **Local-fork edits** - direct modifications to runtime `CLAUDE.md` files
 Additionally, there's a **third mechanism** for operational config that works well:
 3. **Role config overrides** (`internal/config/roles.go`) - TOML-based config override chain
 **Recommendation:** Extend the TOML override pattern to support template content sections, unifying all customization under one mechanism.
 ---
 ## Inventory: Current Mechanisms
 ### 1. Embedded Templates (internal/templates/roles/*.md.tmpl)
 **Location:** `internal/templates/roles/`
 **Files:**
 - `mayor.md.tmpl` (337 lines)
 - `crew.md.tmpl` (17,607 bytes)
 - `polecat.md.tmpl` (17,527 bytes)
 - `witness.md.tmpl` (11,746 bytes)
 - `refinery.md.tmpl` (13,525 bytes)
 - `deacon.md.tmpl` (13,727 bytes)
 - `boot.md.tmpl` (4,445 bytes)
 **How it works:**
 - Templates are embedded into the binary via `//go:embed` directive
 - `gt prime` command renders templates with role-specific data (TownRoot, RigName, etc.)
 - Output is printed to stdout, where Claude picks it up as context
 - Uses Go template syntax: `{{ .TownRoot }}`, `{{ .RigName }}`, etc.
 **Code path:** `templates.New()` → `tmpl.RenderRole()` → stdout
 ### 2. Local-Fork Edits (Runtime CLAUDE.md)
 **Location:** Various agent directories (e.g., `mayor/CLAUDE.md`, `<rig>/crew/<name>/CLAUDE.md`)
 **How it works:**
 - `gt install` creates minimal bootstrap CLAUDE.md (~15 lines) via `createMayorCLAUDEmd()`
 - Bootstrap content just says "Run `gt prime` for full context"
 - THEN humans/agents directly edit these files with custom content
 - These edits are committed to the town's git repo
 **Example:** Mayor's CLAUDE.md grew from bootstrap to 532 lines
 **Key local-fork commit:**
 ```
 1cdbc27 docs: Enhance Mayor role template with coordination system knowledge (sc-n2oiz)
 ```
 This commit added ~500 lines to `mayor/CLAUDE.md` including:
 - Colony Model (why Gas Town uses coordinated specialists)
 - Escalation Patterns (Witness vs Mayor responsibilities)
 - Decision Flow (when to use polecats vs crew)
 - Multi-phase Orchestration
 - Monitoring without Micromanaging
 - Teaching GUPP patterns
 - Communication Patterns
 - Speed Asymmetry
 **None of this content exists in the embedded template** - it's purely local-fork.
 ### 3. Role Config Overrides (TOML files)
 **Location:**
 - Built-in: `internal/config/roles/*.toml` (embedded in binary)
 - Town-level: `<town>/roles/<role>.toml` (optional override)
 - Rig-level: `<rig>/roles/<role>.toml` (optional override)
 **Resolution order (later wins):**
 1. Built-in defaults (embedded)
 2. Town-level overrides
 3. Rig-level overrides
 **What it handles:**
 ```toml
 # Example: mayor.toml
 role = "mayor"
 scope = "town"
 nudge = "Check mail and hook status, then act accordingly."
 prompt_template = "mayor.md.tmpl"
 [session]
 pattern = "hq-mayor"
 work_dir = "{town}"
 needs_pre_sync = false
 start_command = "exec claude --dangerously-skip-permissions"
 [env]
 GT_ROLE = "mayor"
 GT_SCOPE = "town"
 [health]
 ping_timeout = "30s"
 consecutive_failures = 3
 kill_cooldown = "5m"
 stuck_threshold = "1h"
 ```
 **What it DOES NOT handle:**
 - Template content (the actual markdown context)
 - The `prompt_template` field just names which .md.tmpl to use
 **Implementation:** `LoadRoleDefinition()` in `roles.go` handles the override chain with `mergeRoleDefinition()`.
 ---
 ## Analysis: Trade-offs
 ### Embedded Templates
 | Pros | Cons |
 |------|------|
 | Single source of truth in binary | Requires recompile for changes |
 | Consistent across all installations | No per-town customization |
 | Supports placeholder substitution | Can't add town-specific sections |
 | Version-controlled in gastown repo | Changes don't propagate to existing installs |
 ### Local-Fork Edits
 | Pros | Cons |
 |------|------|
 | Per-installation customization | Diverges from template source |
 | No recompile needed | Manual sync to keep up with template changes |
 | Town-specific content | Each install is unique snowflake |
 | Immediate effect | Template improvements don't propagate |
 ### Role Config Overrides
 | Pros | Cons |
 |------|------|
 | Clean override chain | Only handles operational config |
 | Town/rig level customization | Doesn't handle template content |
 | Merge semantics (not replace) | - |
 | No recompile needed | - |
 ---
 ## Problem Statement
 The current situation creates **three-way divergence**:
 ```
                    ┌──────────────────────────────────────────┐
                    │  Embedded Template (mayor.md.tmpl)       │
                    │  337 lines - "official" content          │
                    └──────────────────────────────────────────┘
                                        │
                                        │ gt prime renders
                                        │ BUT doesn't include
                                        │ local-fork additions
                                        v
 ┌──────────────────────────────────────────────────────────────────┐
 │  Runtime CLAUDE.md (mayor/CLAUDE.md)                            │
 │  532 lines - has ~200 lines of local-fork content               │
 │  INCLUDING: Colony Model, Escalation Patterns, etc.             │
 └──────────────────────────────────────────────────────────────────┘
 ```
 **Issues:**
 1. When `gt prime` runs, it outputs the embedded template (337 lines)
 2. The local-fork content (Colony Model, etc.) is in `mayor/CLAUDE.md`
 3. Claude Code reads BOTH via `CLAUDE.md` + startup hooks
 4. But the embedded template and local CLAUDE.md overlap/conflict
 5. Template improvements in new gt versions don't include local-fork content
 6. Local-fork improvements aren't shared with other installations
 ---
 ## Recommendation: Unified Override System
 **Extend the existing TOML override mechanism to support template content sections.**
 ### Proposed Design
 ```toml
 # <town>/roles/mayor.toml (town-level override)
 # Existing operational overrides work as-is
 [health]
 stuck_threshold = "2h"  # Town needs longer threshold
 # NEW: Template content sections
 [content]
 # Append sections after the embedded template
 append = """
 ## The Colony Model: Why Gas Town Works
 Gas Town rejects the "super-ant" model... [rest of content]
 """
 # OR reference a file
 append_file = "mayor-additions.md"
 # OR override specific sections by ID
 [content.sections.escalation]
 replace = """
 ## Escalation Patterns: What to Handle vs Delegate
 ...[custom content]...
 """
 ```
 ### Why This Works
 1. **Single source of truth**: Embedded templates remain canonical
 2. **Clean override semantics**: Town/rig can append or replace sections
 3. **Existing infrastructure**: Uses the same TOML loading + merge pattern
 4. **No recompile**: Content overrides are runtime files
 5. **Shareable**: Town-level overrides can be committed to town repo
 6. **Migrateable**: Existing local-fork content can move to `[content]` sections
 ### Implementation Path
 1. **Phase 1**: Add `[content]` support to role config
   - Parse `append`, `append_file`, `replace_sections` fields
   - Apply after template rendering in `outputPrimeContext()`
 2. **Phase 2**: Migrate local-fork content
   - Extract custom sections from `mayor/CLAUDE.md`
   - Move to `<town>/roles/mayor.toml` `[content]` section
   - Reduce `mayor/CLAUDE.md` back to bootstrap pointer
 3. **Phase 3**: Document the pattern
   - How to add town-specific guidance
   - How to share improvements back to embedded templates
 ---
 ## Alternative Considered: Pure Template Approach
 **Idea:** Move all content into embedded templates, remove local CLAUDE.md entirely.
 **Rejected because:**
 - Can't support per-town customization (e.g., different escalation policies)
 - Requires recompile for any content change
 - Forces all installations to be identical
 - Doesn't leverage existing override infrastructure
 ---
 ## Files Involved
 For implementation, these files would need modification:
 | File | Change |
 |------|--------|
 | `internal/config/roles.go` | Add `[content]` parsing to `RoleDefinition` |
 | `internal/cmd/prime_output.go` | Apply content overrides after template render |
 | `internal/templates/templates.go` | Potentially add section markers for replace |
 | `internal/cmd/install.go` | Update bootstrap to not create full CLAUDE.md |
 ---
 ## Summary
 | Approach | Verdict |
 |----------|---------|
 | **Embedded templates only** | Insufficient - no customization |
 | **Local-fork edits** | Current state - creates divergence |
 | **TOML content overrides** | **Recommended** - unifies all customization |
 The TOML content override approach leverages existing infrastructure, provides clean semantics, and allows both standardization (embedded templates) and customization (override sections).