research: analyze role template management strategy

Findings:
- Two competing mechanisms: embedded templates vs local-fork edits
- Local-fork created ~200 lines of divergent content in mayor/CLAUDE.md
- TOML config overrides exist but only handle operational config

Recommendation: Extend TOML override system to support [content] sections
for template customization, unifying all override mechanisms.
This commit is contained in:
kerosene
2026-01-26 13:05:05 -08:00
committed by John Ogle
parent 149da3d2e2
commit 9d87f01823
2 changed files with 397 additions and 57 deletions

View File

@@ -1,29 +1,45 @@
description = """ description = """
Mayor's daemon patrol loop. Mayor's daemon patrol loop - CONTINUOUS EXECUTION.
The Deacon is the Mayor's background process that runs continuously, handling callbacks, monitoring rig health, and performing cleanup. Each patrol cycle runs these steps in sequence, then loops or exits. The Deacon is the Mayor's background process that runs CONTINUOUSLY in a loop:
1. Execute all patrol steps (inbox-check through context-check)
2. Wait for activity OR timeout (15-minute max)
3. Create new patrol wisp and repeat from step 1
**This is a continuous loop, not a one-shot execution.**
## Patrol Loop Flow
```
START → inbox-check → [all patrol steps] → loop-or-exit
await-signal (wait for activity)
create new wisp → START
```
## Plugin Dispatch
The plugin-run step scans $GT_ROOT/plugins/ for plugins with open gates and
dispatches them to dogs. With a 15-minute max backoff, plugins with 15m
cooldown gates will be checked at least once per interval.
## Idle Town Principle ## Idle Town Principle
**The Deacon should be silent/invisible when the town is healthy and idle.** **The Deacon should be silent/invisible when the town is healthy and idle.**
- Skip HEALTH_CHECK nudges when no active work exists - Skip HEALTH_CHECK nudges when no active work exists
- Sleep 60+ seconds between patrol cycles (longer when idle) - Sleep via await-signal (exponential backoff up to 15 min)
- Let the feed subscription wake agents on actual events - Let the feed subscription wake on actual events
- The daemon (10-minute heartbeat) is the safety net for dead sessions - The daemon is the safety net for dead sessions
This prevents flooding idle agents with health checks every few seconds.
## Second-Order Monitoring ## Second-Order Monitoring
Witnesses send WITNESS_PING messages to verify the Deacon is alive. This Witnesses send WITNESS_PING messages to verify the Deacon is alive. This
prevents the "who watches the watchers" problem - if the Deacon dies, prevents the "who watches the watchers" problem - if the Deacon dies,
Witnesses detect it and escalate to the Mayor. Witnesses detect it and escalate to the Mayor."""
The Deacon's agent bead last_activity timestamp is updated during each patrol
cycle. Witnesses check this timestamp to verify health."""
formula = "mol-deacon-patrol" formula = "mol-deacon-patrol"
version = 8 version = 9
[[steps]] [[steps]]
id = "inbox-check" id = "inbox-check"
@@ -488,29 +504,48 @@ investigate why the Witness isn't cleaning up properly."""
[[steps]] [[steps]]
id = "plugin-run" id = "plugin-run"
title = "Execute registered plugins" title = "Scan and dispatch plugins"
needs = ["zombie-scan"] needs = ["zombie-scan"]
description = """ description = """
Execute registered plugins. Scan plugins and dispatch any with open gates to dogs.
Scan $GT_ROOT/plugins/ for plugin directories. Each plugin has a plugin.md with TOML frontmatter defining its gate (when to run) and instructions (what to do). **Step 1: List plugins and check gates**
```bash
gt plugin list
```
See docs/deacon-plugins.md for full documentation. For each plugin, check if its gate is open:
- **cooldown**: Time since last run (e.g., 15m) - check state.json
- **cron**: Schedule-based (e.g., "0 9 * * *")
- **condition**: Metric threshold (e.g., wisp count > 50)
- **event**: Trigger-based (e.g., startup, heartbeat)
Gate types: **Step 2: Dispatch plugins with open gates**
- cooldown: Time since last run (e.g., 24h) ```bash
- cron: Schedule-based (e.g., "0 9 * * *") # For each plugin with an open gate:
- condition: Metric threshold (e.g., wisp count > 50) gt dog dispatch --plugin <plugin-name>
- event: Trigger-based (e.g., startup, heartbeat) ```
For each plugin: This sends the plugin to an idle dog for execution. The dog will:
1. Read plugin.md frontmatter to check gate 1. Execute the plugin instructions from plugin.md
2. Compare against state.json (last run, etc.) 2. Send DOG_DONE mail when complete (processed in next patrol's inbox-check)
3. If gate is open, execute the plugin
Plugins marked parallel: true can run concurrently using Task tool subagents. Sequential plugins run one at a time in directory order. **Step 3: Track dispatched plugins**
Record in state.json which plugins were dispatched this cycle:
```json
{
"plugins_dispatched": ["scout-patrol"],
"last_plugin_run": "2026-01-23T13:45:00Z"
}
```
Skip this step if $GT_ROOT/plugins/ does not exist or is empty.""" **If no plugins have open gates:**
Skip dispatch - all plugins are within their cooldown/schedule.
**If no dogs available:**
Log warning and skip dispatch this cycle. Dog pool maintenance step will spawn dogs.
See docs/deacon-plugins.md for full documentation."""
[[steps]] [[steps]]
id = "dog-pool-maintenance" id = "dog-pool-maintenance"
@@ -837,57 +872,89 @@ This enables the Deacon to burn and respawn cleanly."""
[[steps]] [[steps]]
id = "loop-or-exit" id = "loop-or-exit"
title = "Burn and respawn or loop" title = "Continuous patrol loop"
needs = ["context-check"] needs = ["context-check"]
description = """ description = """
Burn and let daemon respawn, or exit if context high. Continue the patrol loop or exit for context refresh.
Decision point at end of patrol cycle: **CRITICAL**: This is where the continuous patrol loop happens. The Deacon MUST
loop back and start a new patrol cycle. Do NOT wait for external triggers.
If context is LOW: ## Decision Matrix
Use await-signal with exponential backoff to wait for activity:
1. **Check context usage**: `gt context --usage`
2. **If context HIGH (>80%)**: Exit cleanly, daemon respawns fresh session
3. **If context LOW**: Continue to patrol loop below
## The Continuous Patrol Loop
When context is low, execute this loop:
```bash ```bash
# Step 1: Squash current patrol wisp (clean up)
gt mol squash
# Step 2: Wait for activity OR timeout (15-minute default)
gt mol step await-signal --agent-bead hq-deacon \ gt mol step await-signal --agent-bead hq-deacon \
--backoff-base 60s --backoff-mult 2 --backoff-max 10m --backoff-base 60s --backoff-mult 2 --backoff-max 15m
# Step 3: Reset idle counter if activity was detected
gt agents state hq-deacon --set idle=0
# Step 4: Create new patrol wisp
WISP_ID=$(bd mol wisp mol-deacon-patrol 2>&1 | grep -o 'hq-[a-z0-9]*')
# Step 5: Hook it and start executing
gt hook $WISP_ID
``` ```
This command: After hooking, immediately begin executing the new wisp from its first step
(inbox-check). The wisp is now on your hook, so just continue with patrol.
**IMPORTANT**: After await-signal returns (either signal or timeout), you MUST:
1. Squash the current wisp
2. Create a new patrol wisp
3. Hook it
4. Start executing from inbox-check
This IS the loop. There is no "return to inbox-check" command - you create a new
wisp and that wisp starts fresh from inbox-check.
## await-signal Behavior
The await-signal command:
1. Subscribes to `bd activity --follow` (beads activity feed) 1. Subscribes to `bd activity --follow` (beads activity feed)
2. Returns IMMEDIATELY when any beads activity occurs 2. Returns IMMEDIATELY when any beads activity occurs
3. If no activity, times out with exponential backoff: 3. If no activity, times out with exponential backoff:
- First timeout: 60s - First timeout: 60s
- Second timeout: 120s - Second timeout: 120s
- Third timeout: 240s - Third timeout: 240s (4 min)
- ...capped at 10 minutes max - ...capped at 15 minutes max
4. Tracks `idle:N` label on hq-deacon bead for backoff state 4. Tracks `idle:N` label on hq-deacon bead for backoff state
**On signal received** (activity detected):
Reset the idle counter and start next patrol cycle:
```bash
gt agent state hq-deacon --set idle=0
```
Then return to inbox-check step.
**On timeout** (no activity):
The idle counter was auto-incremented. Continue to next patrol cycle
(the longer backoff will apply next time). Return to inbox-check step.
**Why this approach?** **Why this approach?**
- Any `gt` or `bd` command triggers beads activity, waking the Deacon - Any `gt` or `bd` command triggers beads activity, waking the Deacon
- Idle towns let the Deacon sleep longer (up to 10 min between patrols) - Idle towns let the Deacon sleep longer (up to 15 min between patrols)
- Active work wakes the Deacon immediately via the feed - Active work wakes the Deacon immediately via the feed
- No polling or fixed sleep intervals - No fixed polling intervals - event-driven wake
If context is HIGH: ## Plugin Dispatch Timing
- Write state to persistent storage
- Exit cleanly
- Let the daemon orchestrator respawn a fresh Deacon
The daemon ensures Deacon is always running: The plugin-run step (earlier in patrol) handles plugin dispatch:
- Scans $GT_ROOT/plugins/ for plugins with open gates
- Dispatches to dogs via `gt dog dispatch --plugin <name>`
- Dogs send DOG_DONE when complete (processed in next patrol's inbox-check)
With a 15-minute max backoff, plugins with 15m cooldown gates will be checked
at least once per interval when idle.
## Exit Path (High Context)
If context is HIGH (>80%):
```bash ```bash
# Daemon respawns on exit # Exit cleanly - daemon will respawn with fresh context
gt daemon status exit 0
``` ```
This enables infinite patrol duration via context-aware respawning.""" The daemon ensures Deacon is always running. Exiting is safe - you'll be
respawned with fresh context and the patrol loop continues."""

View File

@@ -0,0 +1,273 @@
# Role Template Management Strategy
**Research Date:** 2026-01-26
**Researcher:** kerosene (gastown/crew)
**Status:** Analysis complete, recommendation provided
## Executive Summary
Gas Town currently has **two competing mechanisms** for managing role context, leading to divergent content and maintenance complexity:
1. **Embedded templates** (`internal/templates/roles/*.md.tmpl`) - source of truth in binary
2. **Local-fork edits** - direct modifications to runtime `CLAUDE.md` files
Additionally, there's a **third mechanism** for operational config that works well:
3. **Role config overrides** (`internal/config/roles.go`) - TOML-based config override chain
**Recommendation:** Extend the TOML override pattern to support template content sections, unifying all customization under one mechanism.
---
## Inventory: Current Mechanisms
### 1. Embedded Templates (internal/templates/roles/*.md.tmpl)
**Location:** `internal/templates/roles/`
**Files:**
- `mayor.md.tmpl` (337 lines)
- `crew.md.tmpl` (17,607 bytes)
- `polecat.md.tmpl` (17,527 bytes)
- `witness.md.tmpl` (11,746 bytes)
- `refinery.md.tmpl` (13,525 bytes)
- `deacon.md.tmpl` (13,727 bytes)
- `boot.md.tmpl` (4,445 bytes)
**How it works:**
- Templates are embedded into the binary via `//go:embed` directive
- `gt prime` command renders templates with role-specific data (TownRoot, RigName, etc.)
- Output is printed to stdout, where Claude picks it up as context
- Uses Go template syntax: `{{ .TownRoot }}`, `{{ .RigName }}`, etc.
**Code path:** `templates.New()``tmpl.RenderRole()` → stdout
### 2. Local-Fork Edits (Runtime CLAUDE.md)
**Location:** Various agent directories (e.g., `mayor/CLAUDE.md`, `<rig>/crew/<name>/CLAUDE.md`)
**How it works:**
- `gt install` creates minimal bootstrap CLAUDE.md (~15 lines) via `createMayorCLAUDEmd()`
- Bootstrap content just says "Run `gt prime` for full context"
- THEN humans/agents directly edit these files with custom content
- These edits are committed to the town's git repo
**Example:** Mayor's CLAUDE.md grew from bootstrap to 532 lines
**Key local-fork commit:**
```
1cdbc27 docs: Enhance Mayor role template with coordination system knowledge (sc-n2oiz)
```
This commit added ~500 lines to `mayor/CLAUDE.md` including:
- Colony Model (why Gas Town uses coordinated specialists)
- Escalation Patterns (Witness vs Mayor responsibilities)
- Decision Flow (when to use polecats vs crew)
- Multi-phase Orchestration
- Monitoring without Micromanaging
- Teaching GUPP patterns
- Communication Patterns
- Speed Asymmetry
**None of this content exists in the embedded template** - it's purely local-fork.
### 3. Role Config Overrides (TOML files)
**Location:**
- Built-in: `internal/config/roles/*.toml` (embedded in binary)
- Town-level: `<town>/roles/<role>.toml` (optional override)
- Rig-level: `<rig>/roles/<role>.toml` (optional override)
**Resolution order (later wins):**
1. Built-in defaults (embedded)
2. Town-level overrides
3. Rig-level overrides
**What it handles:**
```toml
# Example: mayor.toml
role = "mayor"
scope = "town"
nudge = "Check mail and hook status, then act accordingly."
prompt_template = "mayor.md.tmpl"
[session]
pattern = "hq-mayor"
work_dir = "{town}"
needs_pre_sync = false
start_command = "exec claude --dangerously-skip-permissions"
[env]
GT_ROLE = "mayor"
GT_SCOPE = "town"
[health]
ping_timeout = "30s"
consecutive_failures = 3
kill_cooldown = "5m"
stuck_threshold = "1h"
```
**What it DOES NOT handle:**
- Template content (the actual markdown context)
- The `prompt_template` field just names which .md.tmpl to use
**Implementation:** `LoadRoleDefinition()` in `roles.go` handles the override chain with `mergeRoleDefinition()`.
---
## Analysis: Trade-offs
### Embedded Templates
| Pros | Cons |
|------|------|
| Single source of truth in binary | Requires recompile for changes |
| Consistent across all installations | No per-town customization |
| Supports placeholder substitution | Can't add town-specific sections |
| Version-controlled in gastown repo | Changes don't propagate to existing installs |
### Local-Fork Edits
| Pros | Cons |
|------|------|
| Per-installation customization | Diverges from template source |
| No recompile needed | Manual sync to keep up with template changes |
| Town-specific content | Each install is unique snowflake |
| Immediate effect | Template improvements don't propagate |
### Role Config Overrides
| Pros | Cons |
|------|------|
| Clean override chain | Only handles operational config |
| Town/rig level customization | Doesn't handle template content |
| Merge semantics (not replace) | - |
| No recompile needed | - |
---
## Problem Statement
The current situation creates **three-way divergence**:
```
┌──────────────────────────────────────────┐
│ Embedded Template (mayor.md.tmpl) │
│ 337 lines - "official" content │
└──────────────────────────────────────────┘
│ gt prime renders
│ BUT doesn't include
│ local-fork additions
v
┌──────────────────────────────────────────────────────────────────┐
│ Runtime CLAUDE.md (mayor/CLAUDE.md) │
│ 532 lines - has ~200 lines of local-fork content │
│ INCLUDING: Colony Model, Escalation Patterns, etc. │
└──────────────────────────────────────────────────────────────────┘
```
**Issues:**
1. When `gt prime` runs, it outputs the embedded template (337 lines)
2. The local-fork content (Colony Model, etc.) is in `mayor/CLAUDE.md`
3. Claude Code reads BOTH via `CLAUDE.md` + startup hooks
4. But the embedded template and local CLAUDE.md overlap/conflict
5. Template improvements in new gt versions don't include local-fork content
6. Local-fork improvements aren't shared with other installations
---
## Recommendation: Unified Override System
**Extend the existing TOML override mechanism to support template content sections.**
### Proposed Design
```toml
# <town>/roles/mayor.toml (town-level override)
# Existing operational overrides work as-is
[health]
stuck_threshold = "2h" # Town needs longer threshold
# NEW: Template content sections
[content]
# Append sections after the embedded template
append = """
## The Colony Model: Why Gas Town Works
Gas Town rejects the "super-ant" model... [rest of content]
"""
# OR reference a file
append_file = "mayor-additions.md"
# OR override specific sections by ID
[content.sections.escalation]
replace = """
## Escalation Patterns: What to Handle vs Delegate
...[custom content]...
"""
```
### Why This Works
1. **Single source of truth**: Embedded templates remain canonical
2. **Clean override semantics**: Town/rig can append or replace sections
3. **Existing infrastructure**: Uses the same TOML loading + merge pattern
4. **No recompile**: Content overrides are runtime files
5. **Shareable**: Town-level overrides can be committed to town repo
6. **Migrateable**: Existing local-fork content can move to `[content]` sections
### Implementation Path
1. **Phase 1**: Add `[content]` support to role config
- Parse `append`, `append_file`, `replace_sections` fields
- Apply after template rendering in `outputPrimeContext()`
2. **Phase 2**: Migrate local-fork content
- Extract custom sections from `mayor/CLAUDE.md`
- Move to `<town>/roles/mayor.toml` `[content]` section
- Reduce `mayor/CLAUDE.md` back to bootstrap pointer
3. **Phase 3**: Document the pattern
- How to add town-specific guidance
- How to share improvements back to embedded templates
---
## Alternative Considered: Pure Template Approach
**Idea:** Move all content into embedded templates, remove local CLAUDE.md entirely.
**Rejected because:**
- Can't support per-town customization (e.g., different escalation policies)
- Requires recompile for any content change
- Forces all installations to be identical
- Doesn't leverage existing override infrastructure
---
## Files Involved
For implementation, these files would need modification:
| File | Change |
|------|--------|
| `internal/config/roles.go` | Add `[content]` parsing to `RoleDefinition` |
| `internal/cmd/prime_output.go` | Apply content overrides after template render |
| `internal/templates/templates.go` | Potentially add section markers for replace |
| `internal/cmd/install.go` | Update bootstrap to not create full CLAUDE.md |
---
## Summary
| Approach | Verdict |
|----------|---------|
| **Embedded templates only** | Insufficient - no customization |
| **Local-fork edits** | Current state - creates divergence |
| **TOML content overrides** | **Recommended** - unifies all customization |
The TOML content override approach leverages existing infrastructure, provides clean semantics, and allows both standardization (embedded templates) and customization (override sections).