When Claude sessions were terminated using KillSession(), bash subprocesses
spawned by Claude's Bash tool could survive because they ignore SIGHUP.
This caused zombie processes to accumulate over time.
Changed all critical session termination paths to use KillSessionWithProcesses()
which explicitly kills all descendant processes before terminating the session.
Fixes: gt-ew3tk
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Unlike cleanup-orphans (which uses TTY="?" detection), zombie-scan uses
tmux verification: it checks if each Claude process is in an active
tmux session by comparing against actual pane PIDs.
A process is a zombie if:
- It's a Claude/codex process
- It's NOT the pane PID of any active tmux session
- It's NOT a child of any pane PID
- It's older than 60 seconds
Also refactors:
- getChildPIDs() with ps fallback when pgrep unavailable
- State file handling with file locking for concurrent access
Usage:
gt deacon zombie-scan # Find and kill zombies
gt deacon zombie-scan --dry-run # Just list zombies
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Polish help text across all agent commands to clarify roles:
- crew: persistent workspaces vs ephemeral polecats
- deacon: town-level watchdog receiving heartbeats
- dog: cross-rig infrastructure workers (cats vs dogs)
- mayor: Chief of Staff for cross-rig coordination
- nudge: universal synchronous messaging API
- polecat: ephemeral one-task workers, self-cleaning
- refinery: merge queue serializer per rig
- witness: per-rig polecat health monitor
Add comprehensive gt nudge documentation to crew template explaining
when to use nudge vs mail, common patterns, and target shortcuts.
Add orphan-process-cleanup step to deacon patrol formula to clean up
claude subagent processes that fail to exit (TTY = "?").
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: Add automatic orphaned claude process cleanup
Claude Code's Task tool spawns subagent processes that sometimes don't clean up
properly after completion. These accumulate and consume significant memory
(observed: 17 processes using ~6GB RAM).
This change adds automatic cleanup in two places:
1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs
`gt deacon cleanup-orphans` early in each cycle. More responsive (~30s).
2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net
when deacon is down.
Detection uses TTY column - processes with TTY "?" have no controlling terminal.
This is safe because:
- Processes in terminals (user sessions) have a TTY like "pts/0" - untouched
- Only kills processes with no controlling terminal
- Orphaned subagents are children of tmux server with no TTY
New files:
- internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses
- internal/util/orphan_test.go: Tests for orphan detection
New command:
- `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup
Fixes#587
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(orphan): add Windows build tag and minimum age check
Addresses review feedback on PR #588:
1. Add //go:build !windows to orphan.go and orphan_test.go
- The code uses Unix-specific syscalls (SIGTERM, ESRCH) and
ps command options that don't exist on Windows
2. Add minimum age check (60 seconds) to prevent false positives
- Prevents race conditions with newly spawned subagents
- Addresses reviewer concern about cron/systemd processes
- Uses portable etime format instead of Linux-only etimes
3. Add parseEtime helper with comprehensive tests
- Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS)
- etimes (seconds) is Linux-specific, etime is portable
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking
Previous approach used process age which doesn't work: a Task subagent
runs without TTY from birth, so a long-running legitimate subagent that
later fails to exit would be immediately SIGKILLed without trying SIGTERM.
New approach uses a state file to track signal history:
1. First encounter → SIGTERM, record PID + timestamp in state file
2. Next cycle (after 60s grace period) → if still alive, SIGKILL
3. Next cycle → if survived SIGKILL, log as unkillable and remove
State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/)
Format: "<pid> <signal> <unix_timestamp>" per line
The state file is automatically cleaned up:
- Dead processes removed on load
- Unkillable processes removed after logging
Also updates callers to use new CleanupResult type which includes
the signal sent (SIGTERM, SIGKILL, or UNKILLABLE).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(sling_test): update test for cook dir change
The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(tests): skip tests requiring missing binaries, handle --allow-stale
- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(config): remove BEADS_DIR from agent environment
Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.
Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(doctor): detect BEADS_DIR in tmux session environment
Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.
The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Migrate witness, boot, and deacon spawns to use NewSessionWithCommand
instead of NewSession+SendKeys to ensure BD_ACTOR is visible in the
process tree for orphan detection via ps.
Refs: gt-emi5b
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)
Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
Add `gt deacon stale-hooks` command to find and unhook stale beads.
Problem: Beads can get stuck in 'hooked' status when agents die or
abandon work without properly unhooking.
Solution:
- New command scans for hooked beads older than threshold (default 1h)
- Checks if assignee agent is still alive (tmux session exists)
- Unhooks beads with dead agents (sets status back to 'open')
- Supports --dry-run to preview without making changes
Also adds "stale-hook-check" step to Deacon patrol formula.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Settings creation was scattered across multiple places (createPatrolHooks,
ensurePatrolHooks, inline code). Now unified via claude.EnsureSettingsForRole().
Changes:
- Add "deacon" to autonomous roles in claude/settings.go
- Remove ensurePatrolHooks() from cmd/deacon.go, use EnsureSettingsForRole
- Remove createPatrolHooks() from rig/manager.go (no longer needed at rig add)
- Add EnsureSettingsForRole call in crew_lifecycle.go
- Add doctor check for stale/missing Claude settings files
- Wire up claude-settings check in cmd/doctor.go
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The transient polecat model says: "Polecats exist only while working."
This removes the deprecated StateIdle and updates the codebase:
- Remove StateIdle from polecat/types.go (keep StateActive for legacy data)
- Update manager.go: Get() returns StateDone (not StateIdle) when no work
- Update manager.go: Add/Recreate return StateWorking (not StateIdle)
- Remove zombie scan logic from deacon.go (no idle polecats to scan for)
- Update tests to reflect new behavior
The correct lifecycle is now:
- Spawn: polecat created with work (StateWorking)
- Work: sessions cycle, sandbox persists
- Done: polecat signals completion (StateDone)
- Nuke: Witness destroys sandbox
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement two-level beads architecture for agent lookups:
- Town-level agents (Mayor, Deacon) now use hq- prefix and are
looked up in town beads (~/.beads/)
- Rig-level agents continue using rig prefix (e.g., gt-) and are
looked up in rig beads
Changes:
- Add MayorBeadIDTown(), DeaconBeadIDTown(), DogBeadIDTown() helpers
- Add GetTownBeadsPath() for town beads path resolution
- Update gt status to pre-fetch town-level agent beads
- Update agentIDToBeadID() to use town-level IDs
- Update agent_beads_check.go to check/fix in correct tier
- Update agentAddressToIDs() in deacon.go
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reverts the session naming changes from PR #70. Multi-town support
on a single machine is not a real use case - rigs provide project
isolation, and true isolation should use containers/VMs.
Changes:
- MayorSessionName() and DeaconSessionName() no longer take townName parameter
- ParseSessionName() handles simple gt-mayor and gt-deacon formats
- Removed Town field from AgentIdentity and AgentSession structs
- Updated all callers and tests
Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Session names `gt-mayor` and `gt-deacon` were hardcoded, causing tmux
session name collisions when running multiple towns simultaneously.
Changed to `gt-{town}-mayor` and `gt-{town}-deacon` format (e.g.,
`gt-ai-mayor`) to allow concurrent multi-town operation.
Key changes:
- session.MayorSessionName() and DeaconSessionName() now take townName param
- Added workspace.GetTownName() helper to load town name from config
- Updated all callers in cmd/, daemon/, doctor/, mail/, rig/, templates/
- Updated tests with new session name format
- Bead IDs remain unchanged (already scoped by .beads/ directory)
Fixes#60🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PropulsionNudgeForRole now accepts a workDir parameter and reads
session ID from .runtime/session_id to append [session:xxx] to the
nudge message. This enables Claude Code's /resume picker to discover
Gas Town sessions.
(gt-u49zh)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added 'spawn' as an alias for 'start' in the following commands:
- gt witness start → also gt witness spawn
- gt refinery start → also gt refinery spawn
- gt deacon start → also gt deacon spawn
- gt crew start → also gt crew spawn
This improves discoverability since agents are guessing 'spawn'
when trying to start roles.
Note: gt polecat does not have a 'start' command - polecat spawning
is handled via 'gt sling'.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GUPP (Gas Town Universal Propulsion Principle) is the propulsion nudge sent
after beacon to trigger autonomous work execution. Previously only polecats
received this nudge.
Now all roles get role-specific propulsion nudges on startup:
- Polecat/Crew: "Run `gt hook` to check your hook and begin work."
- Witness: "Run `gt prime` to check patrol status and begin work."
- Refinery: "Run `gt prime` to check MQ status and begin patrol."
- Deacon: "Run `gt prime` to check patrol status and begin heartbeat cycle."
- Mayor: "Run `gt prime` to check mail and begin coordination."
Changes:
- internal/session/names.go: Add PropulsionNudgeForRole() function
- internal/cmd/witness.go: Add GUPP nudge to ensureWitnessSession
- internal/cmd/start.go: Add GUPP nudge to ensureRefinerySession (also
converted from respawn loop to direct Claude launch like other roles)
- internal/cmd/deacon.go: Add GUPP nudge to startDeaconSession
- internal/cmd/mayor.go: Add GUPP nudge to startMayorSession
Fixes: gt-zzpmt
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add gt deacon zombie-scan command that provides defense-in-depth
against Witness cleanup failures. The command:
- Scans all rigs for polecats that are idle, have no session running,
no hooked work, and are stale (>10 min inactive)
- Reports found zombies with details
- Optionally nukes them and notifies the mayor
Flags:
--dry-run Preview only
--threshold Custom staleness threshold (default 10m)
--nuke=false Report only, do not clean up
Also adds zombie-scan step to mol-deacon-patrol formula.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parent commands (mol, mail, crew, polecat, etc.) previously showed help
and exited 0 for unknown subcommands like "gt mol foobar". This masked
errors in scripts and confused users.
Added requireSubcommand() helper to root.go and applied it to all parent
commands. Now unknown subcommands properly error with exit code 1.
Example before: gt mol unhook → shows help, exits 0
Example after: gt mol unhook → "Error: unknown command "unhook"", exits 1
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Inject an identity beacon as the first message when Gas Town starts Claude
sessions. This beacon becomes the session title in Claude Code '/resume
picker, enabling workers to find their predecessor sessions for debugging.
Beacon format: [GAS TOWN] <address> • <mol-id or "ready"> • <timestamp>
Examples:
- [GAS TOWN] gastown/crew/max • gt-abc12 • 2025-12-30T14:32
- [GAS TOWN] gastown/polecats/Toast • ready • 2025-12-30T09:15
- [GAS TOWN] deacon • patrol • 2025-12-30T08:00
Workers can now press / in /resume picker and search for their address
(e.g., "gastown/crew/max") to see all predecessor sessions.
Note: Respawn-loop agents (deacon/refinery via up.go) skip beacon injection
since Claude restarts multiple times - would need post-restart injection.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add special handling for 'deacon' target in gt nudge command
- Maps 'deacon' to gt-deacon session, gracefully handles if not running
- Add gt nudge deacon session-started to SessionStart hooks
- Updated settings-autonomous.json, settings-interactive.json, and
ensurePatrolHooks() template
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PrintWarning helper in internal/style/style.go
- Update 35 warning message outputs across 16 files to use consistent format
- All warnings now display as "⚠ Warning: <message>" in yellow/bold
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extended the unified cycle system to include rig infrastructure sessions:
- Witness ↔ Refinery (per rig) now cycle with C-b n/p
Also moved SetCycleBindings into ConfigureGasTownSession so ALL Gas Town
sessions automatically get the unified cycle bindings. Removed redundant
individual calls from crew, mayor, and deacon startup code.
Cycle groups are now:
- Town: Mayor ↔ Deacon
- Crew (per rig): All crew members in same rig
- Infra (per rig): Witness ↔ Refinery
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create gt town next/prev commands for cycling between town-level sessions
- Add SetTownCycleBindings() to tmux package
- Wire up bindings when starting mayor and deacon sessions
Now mayor and deacon have the same C-b n/p cycling behavior as crew workers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
gt rig add now creates .claude/settings.json with patrol hooks for:
- witness/ directory (SessionStart, PreCompact, UserPromptSubmit hooks)
- refinery/ directory (same hooks)
gt deacon start also creates hooks if not present (idempotent).
These hooks run `gt prime && gt mail check --inject` on session start,
enabling autonomous patrol execution when daemon sends heartbeats.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.
Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer
This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.
Generated with Claude Code https://claude.com/claude-code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Moved pending.go from deacon/ to polecat/ package
- Changed storage location from deacon/pending.json to spawn/pending.json
- Added 'gt spawn pending' subcommand for listing/clearing pending spawns
- Deprecated 'gt deacon pending' (still works, shows deprecation notice)
This decouples pending spawn observation from the Deacon role - anyone
can now observe pending polecats (Mayor, humans, debugging, etc.).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When gt spawns agents (polecats, crew, patrol roles), it now sets the
BD_ACTOR env var so that bd commands (like `bd hook`) know the agent
identity without coupling to gt.
Updated spawn points:
- gt up (mayor, deacon, witness via ensureSession/ensureWitness)
- gt deacon start
- gt witness start
- gt start refinery
- gt mayor start
- Daemon deacon restart
- Daemon lifecycle restart
- Handoff respawn
- Refinery manager start
BD_ACTOR uses slash format (e.g., gastown/witness, gastown/crew/max)
while GT_ROLE may use dash format internally.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements auto-triggering of polecats after spawn:
- New pending.go in deacon package tracks pending spawns
- CheckInboxForSpawns reads POLECAT_STARTED messages
- TriggerPendingSpawns polls WaitForClaudeReady and nudges when ready
- PruneStalePending removes spawns older than 5 minutes
The Deacon can now call `gt deacon trigger-pending` during patrol to
automatically send "Begin." to newly spawned polecats once Claude is ready.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Organize 43 commands into 7 logical groups using cobra's built-in
AddGroup/GroupID feature:
- Work Management: spawn, sling, hook, handoff, done, mol, mq, etc.
- Agent Management: mayor, witness, refinery, deacon, polecat, etc.
- Communication: mail, nudge, broadcast, peek
- Services: daemon, start, stop, up, down, shutdown
- Workspace: rig, crew, init, install, git-init, namepool
- Configuration: account, theme, hooks, issue, completion
- Diagnostics: status, doctor, prime, version, help
Also renamed molecule to mol as the primary command name
(molecule is now an alias).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Shell loops bypassed the proper lifecycle architecture. Now:
- Witness: Launches Claude directly, daemon/deacon health-scan handles restart
- Deacon: Launches Claude directly, daemon detects exit and restarts
- Daemon: ensureDeaconRunning() now checks if Claude is running (pane cmd)
and restarts it if session exists but Claude has exited
- gt deacon restart: Now does stop+start instead of Ctrl-C
This enforces proper lifecycle flow through LIFECYCLE mail and daemon
heartbeat rather than bypassing it with shell loops.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The deacon was being detected as Mayor because it started in ~/gt (town root).
Now starts in ~/gt/deacon/ so gt prime correctly detects RoleDeacon.
Also ensures deacon directory exists on start.
When 'gt X attach' is run from inside a tmux session, link the target
session's window as a new tab instead of switching sessions entirely.
Use C-b n/p to navigate between tabs.
Outside tmux: unchanged behavior (full attach)
Inside tmux: links window as tab for convenient multi-session viewing
- Add tmux.LinkWindow() and tmux.IsInsideTmux()
- Update attachToTmuxSession() with smart detection
- Update mayor, deacon, crew, refinery attach commands
Robustness improvements for the Deacon:
- Add DeaconTheme (purple/silver ecclesiastical theme)
- Apply theme to deacon sessions like mayor/witness
- Daemon now auto-starts deacon if not running
- Daemon pokes deacon instead of directly poking mayor/witnesses
- Deacon is responsible for monitoring mayor and witnesses
The daemon is a "dumb scheduler" that keeps the deacon alive.
The deacon (Claude agent) has the intelligence to understand
context and take remedial action when agents are unhealthy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add CLI commands for managing the Deacon session, following the same
pattern as the Mayor commands:
- gt deacon start: Start the Deacon tmux session
- gt deacon stop: Stop the session with graceful shutdown
- gt deacon status: Check if session is running
- gt deacon attach: Attach to session (auto-starts if needed)
- gt deacon restart: Restart Claude within the session
The Deacon is the hierarchical health-check orchestrator that monitors
Mayor and Witnesses, handles lifecycle requests, and keeps Gas Town running.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>