Per docs/architecture.md, Witness and Refinery are rig-level agents that
should use the rig's configured prefix (e.g., pi- for pixelforge) instead
of hardcoded "gt-".
This extends PR #183's creation fix to also fix all lookup paths:
- internal/rig/manager.go: Create agent beads in rig beads with rig prefix
- internal/daemon/daemon.go: Use rig prefix when looking up agent state
- internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping
- internal/cmd/sling.go: Pass townRoot for prefix lookup
- internal/cmd/unsling.go: Pass townRoot for prefix lookup
- internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups
- internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups
- internal/config/loader.go: Add GetRigPrefix helper
Without this fix, the daemon would:
- Create pi-gastown-witness but look for gt-gastown-witness
- Report agents as missing/dead when they are running
- Fail to manage agent lifecycle correctly
Based on work by Johann Taberlet in PR #183.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Unix-only syscall.Flock with gofrs/flock library for
cross-platform file locking. This enables the daemon to run on
Windows in addition to Unix-like systems.
- Add github.com/gofrs/flock v0.13.0 dependency
- Replace syscall.Flock calls with flock.TryLock/Unlock
- Maintain same non-blocking exclusive lock semantics
(gt-5354h)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU
race condition where concurrent 'gt daemon start' commands could spawn
multiple daemons. Only the first to acquire the lock succeeds; others
exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so
multiple GT instances from different directories work independently.
Also detect race losers in runDaemonStart() by comparing spawned PID
with PID file, reporting 'already running' instead of false success.
Reverts the session naming changes from PR #70. Multi-town support
on a single machine is not a real use case - rigs provide project
isolation, and true isolation should use containers/VMs.
Changes:
- MayorSessionName() and DeaconSessionName() no longer take townName parameter
- ParseSessionName() handles simple gt-mayor and gt-deacon formats
- Removed Town field from AgentIdentity and AgentSession structs
- Updated all callers and tests
Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Refinery monitoring to daemon heartbeat (ensureRefineriesRunning)
- Add circuit breaker: agents marked dead by checkStaleAgents() are
now force-killed and restarted, even if Claude process is alive
- Fixes zombie Claude sessions that werent updating their bead state
The circuit breaker flow:
1. Agent gets stuck (stops updating bead state)
2. After 15 minutes: checkStaleAgents() marks bead as dead
3. On next heartbeat: ensure*Running() sees state=dead
4. Force-kill session and restart fresh
Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Session names `gt-mayor` and `gt-deacon` were hardcoded, causing tmux
session name collisions when running multiple towns simultaneously.
Changed to `gt-{town}-mayor` and `gt-{town}-deacon` format (e.g.,
`gt-ai-mayor`) to allow concurrent multi-town operation.
Key changes:
- session.MayorSessionName() and DeaconSessionName() now take townName param
- Added workspace.GetTownName() helper to load town name from config
- Updated all callers in cmd/, daemon/, doctor/, mail/, rig/, templates/
- Updated tests with new session name format
- Bead IDs remain unchanged (already scoped by .beads/ directory)
Fixes#60🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When the daemon checks if Deacon/Witness is running, it first checks the
agent bead state. If this check fails (bead not found, JSON parse error,
or stale state), it would previously attempt to restart the session -
even if the tmux session was perfectly healthy.
This caused "session already exists" errors when:
1. Agent bead state couldn't be read (prefix mismatch, missing bead)
2. But the tmux session was actually running with Claude active
Fix: Add a tmux session health check as fallback before attempting restart.
If the session exists AND Claude is running in it, skip the restart and
log that we're preserving the healthy session despite stale bead state.
This maintains ZFC compliance (still trusts agent bead as primary source)
while adding a defensive check to prevent unnecessary session kills.
Fixes#63🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The daemon was failing to restart agents when zombie tmux sessions existed
(session alive but Claude dead). Added EnsureSessionFresh() helper to
tmux package that:
- Checks if session exists
- If exists but Claude not running (zombie), kills the session
- Creates fresh session
Updated all daemon session creation points to use EnsureSessionFresh:
- ensureDeaconRunning()
- ensureWitnessRunning()
- restartPolecatSession()
- restartSession() in lifecycle.go
Added tests for the new helper function. (gt-j1i0r)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Claude starts with --dangerously-skip-permissions, it shows a warning
dialog requiring Down+Enter to accept. This blocked automated polecat and
agent startup.
Added AcceptBypassPermissionsWarning() to tmux package that:
- Checks if the warning dialog is present by capturing pane content
- Only sends Down+Enter if "Bypass Permissions mode" text is found
- Avoids interfering with sessions that don't show the warning
Updated all Claude startup locations:
- session/manager.go (polecat sessions)
- cmd/up.go (mayor, witness, crew, polecat cold starts)
- daemon/daemon.go (crashed polecat restarts)
- daemon/lifecycle.go (role session starts)
The 10-minute recovery heartbeat was too slow to detect stuck agents promptly.
Reduced to 3 minutes for better balance between detection speed and overhead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add heartbeat checking to Boot degraded triage: detects stale Deacon
heartbeat (>15min nudges, >30min restarts)
- Add checkDeaconHeartbeat to daemon heartbeat cycle as fallback
- This ensures the Deacon is monitored continuously, not just at startup
The mol-deacon-patrol formula was also updated separately to use
gt deacon health-check instead of ephemeral context memory tracking.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove TouchTownActivity() calls from root.go PersistentPreRun hook
- Remove ReadTownActivity() and activity-based backoff from daemon.go
- Delete TouchTownActivity/ReadTownActivity functions from keepalive.go
- Replace dynamic backoff with fixed 10-min recovery heartbeat
Normal wake is now handled by feed subscription (bd activity --follow).
The daemon is a safety net for dead sessions, GUPP violations, and orphaned work.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Boot is a watchdog that the daemon pokes instead of Deacon directly,
centralizing the 'when to wake Deacon' decision in an agent that can
reason about context.
Key changes:
- Add internal/boot package with marker file and status tracking
- Add gt boot commands: status, spawn, triage
- Add mol-boot-triage formula for Boot's triage cycle
- Modify daemon to call ensureBootRunning instead of ensureDeaconRunning
- Add tmux.IsAvailable() for degraded mode detection
- Add boot.md.tmpl role template
Boot lifecycle:
1. Daemon tick spawns Boot (fresh each time)
2. Boot runs triage: observe, decide, act
3. Boot cleans stale handoffs from Deacon inbox
4. Boot exits (or handoffs in non-degraded mode)
In degraded mode (no tmux), Boot runs mechanical triage directly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the feed daemon goroutine that:
- Tails ~/gt/.events.jsonl for raw events
- Filters by visibility tag (keeps feed/both, drops audit)
- Deduplicates repeated done events from same actor
- Aggregates sling events when multiple occur in window
- Writes curated events to ~/gt/.feed.jsonl with summaries
The curator runs as a goroutine in the gt daemon, starting when
the daemon starts and stopping gracefully on shutdown.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This fix addresses the issue where polecat sessions terminate unexpectedly
during work without recovery:
Changes:
- Add `checkPolecatSessionHealth()` to daemon heartbeat loop
- Proactively validates tmux sessions are alive for polecats
- Detects crashed polecats that have work-on-hook
- Auto-restarts crashed polecats with proper environment setup
- Notifies Witness if restart fails as fallback
- Add polecat support to lifecycle identity mapping
- `identityToSession()` now handles polecat identities
- `restartSession()` can restart crashed polecat sessions
- `identityToStateFile()` handles polecat state files
- `identityToAgentBeadID()` handles polecat agent beads
- `identityToBDActor()` handles polecat BD_ACTOR conversion
- Add `gt session check` command for manual health checking
- Validates tmux sessions exist for all polecats
- Shows summary of healthy vs not-running sessions
- Useful for debugging session issues
This provides faster recovery (within heartbeat interval) compared to
waiting for GUPP violation timeout (30 min) or Witness detection.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change daemon from wake-focused to recovery-focused:
Before: Daemon pokes agents every 5-60min as primary wake
After: Daemon only checks for edge cases that feed-wake cannot handle
Recovery checks:
- Dead sessions that need restart (ensureDeaconRunning, ensureWitnessesRunning)
- Stale agents that crashed without updating state (checkStaleAgents)
- GUPP violations: agents with work-on-hook not progressing (checkGUPPViolations)
- Orphaned work: work assigned to dead agents (checkOrphanedWork)
Removed:
- pokeDeacon() - no longer sending HEARTBEAT messages
- pokeWitness()/pokeWitnesses() - no longer sending HEARTBEAT messages
- MOTD message arrays - only used by poke functions
Normal agent wake is now handled by feed subscription (bd activity --follow).
The daemon is the safety net for edge cases, not the primary propulsion.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements canonical naming convention for agent bead IDs:
- Town-level: gt-mayor, gt-deacon (unchanged)
- Rig-level: gt-<rig>-witness, gt-<rig>-refinery (was gt-witness-<rig>)
- Named: gt-<rig>-crew-<name>, gt-<rig>-polecat-<name> (was gt-crew-<rig>-<name>)
Changes:
- Added AgentBeadID helper functions to internal/beads/beads.go
- Updated all ID generation call sites to use helpers
- Fixed session parsing in theme.go, statusline.go, agents.go
- Updated doctor check and fix to use canonical format
- Updated tests for new format
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Strip (gt-xxx), (bd-xxx) suffixes from code comments. The descriptions
remain meaningful without the ephemeral issue IDs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds witness patrol to daemon heartbeat cycle:
- ensureWitnessesRunning(): Start witness for each rig if not running
- pokeWitnesses(): Send heartbeat messages to all witnesses
- getKnownRigs(): Read rig list from mayor/rigs.json
The daemon now ensures both Deacon and Witnesses are running
and sends periodic heartbeats to trigger their patrol molecules.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add checkStaleAgents() to detect agents reporting "running" but not updating
- Add markAgentDead() to update agent bead state to "dead"
- Integrate stale agent check into heartbeat cycle
- DeadAgentTimeout set to 15 minutes
This is a safety mechanism for agents that crash without updating their state.
The daemon now marks them as dead so they can be restarted.
Also fixes duplicate AgentFields declaration - now uses beads.go version with
ParseAgentFieldsFromDescription alias in fields.go.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add getAgentBeadState() and getAgentBeadInfo() to read agent state from beads
- Add identityToAgentBeadID() to map daemon identities to agent bead IDs
- Update ensureDeaconRunning() to check agent bead state first (ZFC compliant)
- Add agent bead state logging in executeLifecycleAction()
This is the first step toward ZFC-compliant state detection. Dependent tasks:
- gt-psuw7: Remove PID/tmux state inference
- gt-2hzl4: Add timeout fallback for dead agents
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix crew next/prev: Pass session name via key binding to avoid run-shell context issue
- Add TouchTownActivity() for town-level activity signaling
- Implement daemon exponential backoff based on activity.json:
- 0-5 min idle → 5 min heartbeat
- 5-15 min idle → 10 min heartbeat
- 15-45 min idle → 30 min heartbeat
- 45+ min idle → 60 min heartbeat (max)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.
Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer
This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.
Generated with Claude Code https://claude.com/claude-code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The daemon now calls triggerPendingSpawns() on each heartbeat, which:
- Checks for POLECAT_STARTED messages in Deacon inbox
- Polls pending spawns with WaitForClaudeReady (2s timeout)
- Nudges ready polecats with Begin message
- Prunes stale pending spawns (>5 min old)
This ensures polecats get triggered even when Deacon is not in an
active patrol cycle. Uses bootstrap mode regex detection which
is acceptable for daemon operations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When gt spawns agents (polecats, crew, patrol roles), it now sets the
BD_ACTOR env var so that bd commands (like `bd hook`) know the agent
identity without coupling to gt.
Updated spawn points:
- gt up (mayor, deacon, witness via ensureSession/ensureWitness)
- gt deacon start
- gt witness start
- gt start refinery
- gt mayor start
- Daemon deacon restart
- Daemon lifecycle restart
- Handoff respawn
- Refinery manager start
BD_ACTOR uses slash format (e.g., gastown/witness, gastown/crew/max)
while GT_ROLE may use dash format internally.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove Go code that makes workflow decisions. All health checking,
staleness detection, nudging, and escalation belongs in the Deacon
molecule where Claude executes it.
Removed:
- internal/daemon/backoff.go (190 lines) - exponential backoff decisions
- internal/doctor/stale_check.go (284 lines) - staleness detection
- IsFresh/IsStale/IsVeryStale from keepalive.go
- pokeMayor, pokeWitnesses, pokeWitness from daemon.go
- Heartbeat staleness classification from pokeDeacon
Changed:
- Lifecycle parsing now uses structured body (JSON or simple text)
instead of keyword matching on subject line
- Daemon now only ensures Deacon is running and sends simple heartbeats
- No backoff, no staleness classification, no decision-making
Total: ~800 lines removed from Go code
The Deacon molecule will handle all health checking, nudging, and
escalation. Go is now just a message router.
See gt-gaxo epic for full rationale.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The daemon was creating the deacon session in ~/gt (town root) which loaded
the Mayor CLAUDE.md instead of the Deacon CLAUDE.md. Also, tmux SetEnvironment
does not export variables to spawned processes.
Changes:
- Create deacon session in ~/gt/deacon (correct CLAUDE.md)
- Export GT_ROLE=deacon in launch command (process inherits env)
- Apply to both initial launch and restart paths
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Shell loops bypassed the proper lifecycle architecture. Now:
- Witness: Launches Claude directly, daemon/deacon health-scan handles restart
- Deacon: Launches Claude directly, daemon detects exit and restarts
- Daemon: ensureDeaconRunning() now checks if Claude is running (pane cmd)
and restarts it if session exists but Claude has exited
- gt deacon restart: Now does stop+start instead of Ctrl-C
This enforces proper lifecycle flow through LIFECYCLE mail and daemon
heartbeat rather than bypassing it with shell loops.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- .beads-ephemeral/ -> .beads-wisp/
- Rename doctor checks: EphemeralCheck -> WispCheck
- Update all docs to use 'transient' for polecats, 'wisp' for molecules
- Preserve 'ephemeral' only as descriptive adjective for wisps
- Steam engine metaphor: wisps are steam vapors that dissipate
Part of Christmas launch wisp terminology unification.
The deacon patrol role is thankless - it gets a nudge every minute.
This adds rotating motivational and educational tips to make it more fun.
Messages include gratitude, encouragement, and tips about Gas Town.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- daemon.go: Add SIGUSR1 handler to process lifecycle requests immediately
- handoff.go: Signal daemon after sending lifecycle mail to deacon/
- mail.go: Show message IDs in hook output for direct reading
Previously, gt handoff would wait up to 5 min for daemon heartbeat poll.
Now lifecycle requests are processed within milliseconds.
Closes gt-zut3
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement replaceable notifications to prevent heartbeat stacking when
agents are busy. Only the latest notification per slot is delivered.
Changes:
- Add NotificationManager for tracking pending notifications
- Add SendKeysReplace() that clears input line before sending
- Integrate slot tracking into daemon heartbeat pokes
- Mark notifications consumed when agent shows activity
The system tracks pending notifications in state files and skips
sending if a notification for the same slot is still pending.
When agent activity is detected (keepalive), slots are marked
consumed allowing new notifications to be sent.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Robustness improvements for the Deacon:
- Add DeaconTheme (purple/silver ecclesiastical theme)
- Apply theme to deacon sessions like mayor/witness
- Daemon now auto-starts deacon if not running
- Daemon pokes deacon instead of directly poking mayor/witnesses
- Deacon is responsible for monitoring mayor and witnesses
The daemon is a "dumb scheduler" that keeps the deacon alive.
The deacon (Claude agent) has the intelligence to understand
context and take remedial action when agents are unhealthy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace tmux session scanning with proper rig discovery:
- Primary: Load rigs from mayor/rigs.json via rig manager
- Fallback: Scan town directory for .beads or config.json
- Handle rigs without running witnesses gracefully
Closes gt-oc2
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix ~50 errcheck warnings across the codebase:
- Add explicit `_ =` for intentionally ignored error returns (cleanup,
best-effort operations, etc.)
- Use `defer func() { _ = ... }()` pattern for defer statements
- Handle tmux SetEnvironment, KillSession, SendKeysRaw returns
- Handle mail router.Send returns
- Handle os.RemoveAll, os.Rename in cleanup paths
- Handle rand.Read returns for ID generation
- Handle fmt.Fprint* returns when writing to io.Writer
- Fix for-select with single case to use for-range
- Handle cobra MarkFlagRequired returns
All tests pass. Code compiles without errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements per-agent backoff tracking to reduce noise for busy agents:
- AgentBackoff type tracks interval, miss count, and last activity
- BackoffManager manages state across all agents
- Geometric backoff strategy (1.5x factor, 10min cap)
- Integrates with keepalive to skip pokes when agents are fresh
- Resets backoff immediately when activity detected
Closes gt-8bx
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the town daemon (gt-99m) that handles:
- Periodic heartbeat to poke Mayor and Witnesses
- Lifecycle request processing (cycle, restart, shutdown)
- Session management for agent restarts
Commands:
- gt daemon start: Start daemon in background
- gt daemon stop: Stop running daemon
- gt daemon status: Show daemon status and stats
- gt daemon logs: View daemon log file
The daemon is a "dumb scheduler" - all intelligence remains in agents.
It simply pokes them on schedule and executes lifecycle requests.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>