The daemon was failing to restart agents when zombie tmux sessions existed
(session alive but Claude dead). Added EnsureSessionFresh() helper to
tmux package that:
- Checks if session exists
- If exists but Claude not running (zombie), kills the session
- Creates fresh session
Updated all daemon session creation points to use EnsureSessionFresh:
- ensureDeaconRunning()
- ensureWitnessRunning()
- restartPolecatSession()
- restartSession() in lifecycle.go
Added tests for the new helper function. (gt-j1i0r)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Claude starts with --dangerously-skip-permissions, it shows a warning
dialog requiring Down+Enter to accept. This blocked automated polecat and
agent startup.
Added AcceptBypassPermissionsWarning() to tmux package that:
- Checks if the warning dialog is present by capturing pane content
- Only sends Down+Enter if "Bypass Permissions mode" text is found
- Avoids interfering with sessions that don't show the warning
Updated all Claude startup locations:
- session/manager.go (polecat sessions)
- cmd/up.go (mayor, witness, crew, polecat cold starts)
- daemon/daemon.go (crashed polecat restarts)
- daemon/lifecycle.go (role session starts)
The 10-minute recovery heartbeat was too slow to detect stuck agents promptly.
Reduced to 3 minutes for better balance between detection speed and overhead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add heartbeat checking to Boot degraded triage: detects stale Deacon
heartbeat (>15min nudges, >30min restarts)
- Add checkDeaconHeartbeat to daemon heartbeat cycle as fallback
- This ensures the Deacon is monitored continuously, not just at startup
The mol-deacon-patrol formula was also updated separately to use
gt deacon health-check instead of ephemeral context memory tracking.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove TouchTownActivity() calls from root.go PersistentPreRun hook
- Remove ReadTownActivity() and activity-based backoff from daemon.go
- Delete TouchTownActivity/ReadTownActivity functions from keepalive.go
- Replace dynamic backoff with fixed 10-min recovery heartbeat
Normal wake is now handled by feed subscription (bd activity --follow).
The daemon is a safety net for dead sessions, GUPP violations, and orphaned work.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Boot is a watchdog that the daemon pokes instead of Deacon directly,
centralizing the 'when to wake Deacon' decision in an agent that can
reason about context.
Key changes:
- Add internal/boot package with marker file and status tracking
- Add gt boot commands: status, spawn, triage
- Add mol-boot-triage formula for Boot's triage cycle
- Modify daemon to call ensureBootRunning instead of ensureDeaconRunning
- Add tmux.IsAvailable() for degraded mode detection
- Add boot.md.tmpl role template
Boot lifecycle:
1. Daemon tick spawns Boot (fresh each time)
2. Boot runs triage: observe, decide, act
3. Boot cleans stale handoffs from Deacon inbox
4. Boot exits (or handoffs in non-degraded mode)
In degraded mode (no tmux), Boot runs mechanical triage directly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the feed daemon goroutine that:
- Tails ~/gt/.events.jsonl for raw events
- Filters by visibility tag (keeps feed/both, drops audit)
- Deduplicates repeated done events from same actor
- Aggregates sling events when multiple occur in window
- Writes curated events to ~/gt/.feed.jsonl with summaries
The curator runs as a goroutine in the gt daemon, starting when
the daemon starts and stopping gracefully on shutdown.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This fix addresses the issue where polecat sessions terminate unexpectedly
during work without recovery:
Changes:
- Add `checkPolecatSessionHealth()` to daemon heartbeat loop
- Proactively validates tmux sessions are alive for polecats
- Detects crashed polecats that have work-on-hook
- Auto-restarts crashed polecats with proper environment setup
- Notifies Witness if restart fails as fallback
- Add polecat support to lifecycle identity mapping
- `identityToSession()` now handles polecat identities
- `restartSession()` can restart crashed polecat sessions
- `identityToStateFile()` handles polecat state files
- `identityToAgentBeadID()` handles polecat agent beads
- `identityToBDActor()` handles polecat BD_ACTOR conversion
- Add `gt session check` command for manual health checking
- Validates tmux sessions exist for all polecats
- Shows summary of healthy vs not-running sessions
- Useful for debugging session issues
This provides faster recovery (within heartbeat interval) compared to
waiting for GUPP violation timeout (30 min) or Witness detection.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change daemon from wake-focused to recovery-focused:
Before: Daemon pokes agents every 5-60min as primary wake
After: Daemon only checks for edge cases that feed-wake cannot handle
Recovery checks:
- Dead sessions that need restart (ensureDeaconRunning, ensureWitnessesRunning)
- Stale agents that crashed without updating state (checkStaleAgents)
- GUPP violations: agents with work-on-hook not progressing (checkGUPPViolations)
- Orphaned work: work assigned to dead agents (checkOrphanedWork)
Removed:
- pokeDeacon() - no longer sending HEARTBEAT messages
- pokeWitness()/pokeWitnesses() - no longer sending HEARTBEAT messages
- MOTD message arrays - only used by poke functions
Normal agent wake is now handled by feed subscription (bd activity --follow).
The daemon is the safety net for edge cases, not the primary propulsion.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements canonical naming convention for agent bead IDs:
- Town-level: gt-mayor, gt-deacon (unchanged)
- Rig-level: gt-<rig>-witness, gt-<rig>-refinery (was gt-witness-<rig>)
- Named: gt-<rig>-crew-<name>, gt-<rig>-polecat-<name> (was gt-crew-<rig>-<name>)
Changes:
- Added AgentBeadID helper functions to internal/beads/beads.go
- Updated all ID generation call sites to use helpers
- Fixed session parsing in theme.go, statusline.go, agents.go
- Updated doctor check and fix to use canonical format
- Updated tests for new format
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Strip (gt-xxx), (bd-xxx) suffixes from code comments. The descriptions
remain meaningful without the ephemeral issue IDs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds witness patrol to daemon heartbeat cycle:
- ensureWitnessesRunning(): Start witness for each rig if not running
- pokeWitnesses(): Send heartbeat messages to all witnesses
- getKnownRigs(): Read rig list from mayor/rigs.json
The daemon now ensures both Deacon and Witnesses are running
and sends periodic heartbeats to trigger their patrol molecules.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add checkStaleAgents() to detect agents reporting "running" but not updating
- Add markAgentDead() to update agent bead state to "dead"
- Integrate stale agent check into heartbeat cycle
- DeadAgentTimeout set to 15 minutes
This is a safety mechanism for agents that crash without updating their state.
The daemon now marks them as dead so they can be restarted.
Also fixes duplicate AgentFields declaration - now uses beads.go version with
ParseAgentFieldsFromDescription alias in fields.go.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add getAgentBeadState() and getAgentBeadInfo() to read agent state from beads
- Add identityToAgentBeadID() to map daemon identities to agent bead IDs
- Update ensureDeaconRunning() to check agent bead state first (ZFC compliant)
- Add agent bead state logging in executeLifecycleAction()
This is the first step toward ZFC-compliant state detection. Dependent tasks:
- gt-psuw7: Remove PID/tmux state inference
- gt-2hzl4: Add timeout fallback for dead agents
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix crew next/prev: Pass session name via key binding to avoid run-shell context issue
- Add TouchTownActivity() for town-level activity signaling
- Implement daemon exponential backoff based on activity.json:
- 0-5 min idle → 5 min heartbeat
- 5-15 min idle → 10 min heartbeat
- 15-45 min idle → 30 min heartbeat
- 45+ min idle → 60 min heartbeat (max)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.
Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer
This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.
Generated with Claude Code https://claude.com/claude-code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The daemon now calls triggerPendingSpawns() on each heartbeat, which:
- Checks for POLECAT_STARTED messages in Deacon inbox
- Polls pending spawns with WaitForClaudeReady (2s timeout)
- Nudges ready polecats with Begin message
- Prunes stale pending spawns (>5 min old)
This ensures polecats get triggered even when Deacon is not in an
active patrol cycle. Uses bootstrap mode regex detection which
is acceptable for daemon operations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When gt spawns agents (polecats, crew, patrol roles), it now sets the
BD_ACTOR env var so that bd commands (like `bd hook`) know the agent
identity without coupling to gt.
Updated spawn points:
- gt up (mayor, deacon, witness via ensureSession/ensureWitness)
- gt deacon start
- gt witness start
- gt start refinery
- gt mayor start
- Daemon deacon restart
- Daemon lifecycle restart
- Handoff respawn
- Refinery manager start
BD_ACTOR uses slash format (e.g., gastown/witness, gastown/crew/max)
while GT_ROLE may use dash format internally.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove Go code that makes workflow decisions. All health checking,
staleness detection, nudging, and escalation belongs in the Deacon
molecule where Claude executes it.
Removed:
- internal/daemon/backoff.go (190 lines) - exponential backoff decisions
- internal/doctor/stale_check.go (284 lines) - staleness detection
- IsFresh/IsStale/IsVeryStale from keepalive.go
- pokeMayor, pokeWitnesses, pokeWitness from daemon.go
- Heartbeat staleness classification from pokeDeacon
Changed:
- Lifecycle parsing now uses structured body (JSON or simple text)
instead of keyword matching on subject line
- Daemon now only ensures Deacon is running and sends simple heartbeats
- No backoff, no staleness classification, no decision-making
Total: ~800 lines removed from Go code
The Deacon molecule will handle all health checking, nudging, and
escalation. Go is now just a message router.
See gt-gaxo epic for full rationale.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The daemon was creating the deacon session in ~/gt (town root) which loaded
the Mayor CLAUDE.md instead of the Deacon CLAUDE.md. Also, tmux SetEnvironment
does not export variables to spawned processes.
Changes:
- Create deacon session in ~/gt/deacon (correct CLAUDE.md)
- Export GT_ROLE=deacon in launch command (process inherits env)
- Apply to both initial launch and restart paths
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Shell loops bypassed the proper lifecycle architecture. Now:
- Witness: Launches Claude directly, daemon/deacon health-scan handles restart
- Deacon: Launches Claude directly, daemon detects exit and restarts
- Daemon: ensureDeaconRunning() now checks if Claude is running (pane cmd)
and restarts it if session exists but Claude has exited
- gt deacon restart: Now does stop+start instead of Ctrl-C
This enforces proper lifecycle flow through LIFECYCLE mail and daemon
heartbeat rather than bypassing it with shell loops.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- .beads-ephemeral/ -> .beads-wisp/
- Rename doctor checks: EphemeralCheck -> WispCheck
- Update all docs to use 'transient' for polecats, 'wisp' for molecules
- Preserve 'ephemeral' only as descriptive adjective for wisps
- Steam engine metaphor: wisps are steam vapors that dissipate
Part of Christmas launch wisp terminology unification.
The deacon patrol role is thankless - it gets a nudge every minute.
This adds rotating motivational and educational tips to make it more fun.
Messages include gratitude, encouragement, and tips about Gas Town.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- daemon.go: Add SIGUSR1 handler to process lifecycle requests immediately
- handoff.go: Signal daemon after sending lifecycle mail to deacon/
- mail.go: Show message IDs in hook output for direct reading
Previously, gt handoff would wait up to 5 min for daemon heartbeat poll.
Now lifecycle requests are processed within milliseconds.
Closes gt-zut3
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement replaceable notifications to prevent heartbeat stacking when
agents are busy. Only the latest notification per slot is delivered.
Changes:
- Add NotificationManager for tracking pending notifications
- Add SendKeysReplace() that clears input line before sending
- Integrate slot tracking into daemon heartbeat pokes
- Mark notifications consumed when agent shows activity
The system tracks pending notifications in state files and skips
sending if a notification for the same slot is still pending.
When agent activity is detected (keepalive), slots are marked
consumed allowing new notifications to be sent.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Robustness improvements for the Deacon:
- Add DeaconTheme (purple/silver ecclesiastical theme)
- Apply theme to deacon sessions like mayor/witness
- Daemon now auto-starts deacon if not running
- Daemon pokes deacon instead of directly poking mayor/witnesses
- Deacon is responsible for monitoring mayor and witnesses
The daemon is a "dumb scheduler" that keeps the deacon alive.
The deacon (Claude agent) has the intelligence to understand
context and take remedial action when agents are unhealthy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace tmux session scanning with proper rig discovery:
- Primary: Load rigs from mayor/rigs.json via rig manager
- Fallback: Scan town directory for .beads or config.json
- Handle rigs without running witnesses gracefully
Closes gt-oc2
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix ~50 errcheck warnings across the codebase:
- Add explicit `_ =` for intentionally ignored error returns (cleanup,
best-effort operations, etc.)
- Use `defer func() { _ = ... }()` pattern for defer statements
- Handle tmux SetEnvironment, KillSession, SendKeysRaw returns
- Handle mail router.Send returns
- Handle os.RemoveAll, os.Rename in cleanup paths
- Handle rand.Read returns for ID generation
- Handle fmt.Fprint* returns when writing to io.Writer
- Fix for-select with single case to use for-range
- Handle cobra MarkFlagRequired returns
All tests pass. Code compiles without errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements per-agent backoff tracking to reduce noise for busy agents:
- AgentBackoff type tracks interval, miss count, and last activity
- BackoffManager manages state across all agents
- Geometric backoff strategy (1.5x factor, 10min cap)
- Integrates with keepalive to skip pokes when agents are fresh
- Resets backoff immediately when activity detected
Closes gt-8bx
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the town daemon (gt-99m) that handles:
- Periodic heartbeat to poke Mayor and Witnesses
- Lifecycle request processing (cycle, restart, shutdown)
- Session management for agent restarts
Commands:
- gt daemon start: Start daemon in background
- gt daemon stop: Stop running daemon
- gt daemon status: Show daemon status and stats
- gt daemon logs: View daemon log file
The daemon is a "dumb scheduler" - all intelligence remains in agents.
It simply pokes them on schedule and executes lifecycle requests.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>