KillPaneProcesses was being called on new sessions before respawn,
which killed the fresh shell and destroyed the pane. This caused
"can't find pane" errors on session creation.
Now KillPaneProcesses is only called when restarting in an existing
session where Claude/Node processes might be running and ignoring
SIGHUP. For new sessions, we just use respawn-pane directly.
Also added retry limit and error checking for the stale session
recovery path.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add 'bd' alias for 'gt bead' command
- Add 'work' alias for 'gt hook' command
- Show deacon icon in mayor status line when running
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a session exists but its pane is gone (e.g., after account switch
or town reboot), 'gt crew at' now detects the "can't find pane" error
and automatically recreates the session instead of failing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allow reading messages by their inbox position (e.g., 'gt mail read 3')
in addition to message ID. The inbox display now shows 1-based index
numbers for easy reference.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds gt mail hook <mail-id> command that attaches a mail message to
the agents hook. This provides a more intuitive command path when
working with mail-based workflows.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Users naturally try --body for the message body content (same semantic
field as --message but more precise - distinguishes body from subject).
Added as an alias following the same pattern as --address/--identity.
Closes: gt-bn9mt
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allow `gt mail delete` to accept multiple message IDs at once,
matching the existing behavior of archive, mark-read, and mark-unread.
Also adds --body as an alias for --message in mail reply.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Problem
Claude processes were accumulating as orphans, with 100+ processes piling up
daily. Every `gt handoff` (used dozens of times/hour by crew) left orphaned
processes because `tmux respawn-pane -k` only sends SIGHUP, which Node/Claude
ignores.
## Root Cause
Previous fixes (1043f00d, f89ac47f, 2feefd17, 1b036aad) were laser-focused on
specific symptoms (shutdown, setsid, done.go, molecule_step.go) but never did
a comprehensive audit of ALL RespawnPane call sites. handoff.go was never
fixed despite being the main source of orphans.
## Solution
Added KillPaneProcesses() call before every RespawnPane() in:
- handoff.go (self handoff and remote handoff)
- mayor.go (mayor restart)
- crew_at.go (new session and restart)
KillPaneProcesses explicitly kills all descendant processes with SIGTERM/SIGKILL
before respawning, preventing orphans regardless of SIGHUP handling.
molecule_step.go already had this fix from commit 1b036aad.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
fix(sling): auto-apply mol-polecat-work (#288) and fix wisp orphan lifecycle bug (#842)
Fixes the formula-on-bead pattern to hook the base bead instead of the wisp:
- Auto-apply mol-polecat-work when slinging bare beads to polecats
- Hook BASE bead with attached_molecule pointing to wisp
- gt done now closes attached molecule before closing hooked bead
- Convoys complete properly when work finishes
Fixes#288, #842, #858
resolveSelfTarget returns "mayor/" with trailing slash per addressToIdentity
normalization, but agentIDToBeadID only checked for "mayor" without slash.
This caused `gt hook --clear` to fail with:
Error: could not convert agent ID mayor/ to bead ID
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem:
- Gas Town sets GT_TOWN_ROOT environment variable
- Beads searches for formulas using GT_ROOT environment variable
- This naming inconsistency prevents beads from finding town-level formulas
- Result: `bd mol seed --patrol` fails in rigs, causing false doctor warnings
Solution:
Export both GT_TOWN_ROOT and GT_ROOT from `gt rig detect` command:
- Modified stdout output to export both variables (lines 66, 70)
- Updated cache storage format (lines 134, 136, 138)
- Updated unset statement for both variables (line 110)
- Updated command documentation (lines 33, 37)
Both variables point to the same town root path. This maintains backward
compatibility with Gas Town (GT_TOWN_ROOT) while enabling beads formula
search (GT_ROOT).
Testing:
- `gt rig detect .` now outputs both GT_TOWN_ROOT and GT_ROOT
- `bd mol seed --patrol` works correctly when GT_ROOT is set
- Formula search paths work as expected: town/.beads/formulas/ accessible
Related:
- Complements bd mol seed --patrol implementation (beads PR #1149)
- Complements patrol formula doctor check fix (gastown PR #715)
Co-authored-by: Roland Tritsch <roland@ailtir.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
When slinging work to an agent, updateAgentHookBead() was running
bd slot set from townRoot. But agent beads with rig-level prefixes
(e.g., go-) live in rig databases, not the town database. This caused
"issue not found" errors when trying to update the hook_bead slot.
Fix: Use beads.ResolveHookDir() to resolve the correct working directory
based on the agent bead's prefix before calling SetHookBead().
Co-authored-by: furiosa <spencer@atmosphere-aviation.com>
When the repo is in a broken state (wrong branch, detached HEAD, deleted
worktree), gt handoff would fail with "cannot detect town root" error.
This is exactly when handoff is most needed - to recover and hand off
to a fresh session.
Changes:
- detectTownRootFromCwd() now falls back to GT_TOWN_ROOT and GT_ROOT
environment variables when cwd-based detection fails
- buildRestartCommand() now propagates GT_ROOT to ensure subsequent
handoffs can also use the fallback
- Added tests for the fallback behavior
Fixes gt-x2q81.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for --comment flag as an alias for --reason in the
gt close command. This provides a more intuitive option name for
users who think of close messages as comments rather than reasons.
Handles both --comment value and --comment=value forms.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rename operation was only copying AgentState and CleanupStatus,
missing HookBead (the primary fix), ActiveMR, and NotificationLevel.
This ensures all agent state is preserved when renaming an identity.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add hooks_registry.go: LoadRegistry(), HookRegistry/HookDefinition types
- Add hooks_install.go: gt hooks install command with --role and --all-rigs flags
- gt hooks list now reads from ~/gt/hooks/registry.toml
- Supports dry-run, deduplication, and creates .claude dirs as needed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allow `gt mail reply <id> "message"` in addition to `-m` flag.
This is a desire-path fix - agents naturally try positional syntax.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add ability to access sessions from other accounts when using gt seance --talk.
After gt account switch, sessions from previous accounts are now accessible
via temporary symlinks.
Changes:
- Search all account config directories in accounts.json for session
- Create temporary symlink from source account to current account project dir
- Update sessions-index.json with session entry (using json.RawMessage to preserve fields)
- Cleanup removes symlink and index entry when seance exits
- Add startup cleanup for orphaned symlinks from interrupted sessions
Based on PR #797 by joshuavial, with added orphan cleanup to handle ungraceful exits.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, `gt done` would fail with "0 commits ahead; nothing to merge"
if work was pushed directly to main instead of via PR. This blocked
polecats from completing even when their work was done, causing them to
become zombies.
Now, if the branch has no commits ahead of main, `gt done` skips MR
creation but still completes successfully - notifying the witness,
cleaning up the worktree, and terminating the session.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The quick-add command (used by shell hook's "Add to Gas Town?" prompt)
previously only checked hardcoded paths ~/gt and ~/gastown, ignoring
GT_TOWN_ROOT and any other Gas Town installations.
This caused rigs to be added to the wrong town when users had multiple
Gas Town installations (e.g., ~/gt and ~/Documents/code/gt).
Fix the town discovery order:
1. GT_TOWN_ROOT env var (explicit user preference)
2. workspace.FindFromCwd() (supports multiple installations)
3. Fall back to ~/gt and ~/gastown
PR #759 introduced cleanupOrphanedClaude() using syscall.Kill directly,
which breaks Windows builds. This extracts the function to:
- start_orphan_unix.go: Full implementation with SIGTERM/SIGKILL
- start_orphan_windows.go: Stub (orphan signals not supported)
Follows existing pattern: process_unix.go / process_windows.go
## Problem
The deacon patrol was leaking claude processes. Every patrol cycle (1-3 minutes),
a new claude process was spawned under the hq-deacon tmux session, but old processes
were never terminated. This resulted in 12+ accumulated claude processes consuming
resources.
## Root Cause
In molecule_step.go:331, handleStepContinue() used tmux respawn-pane -k to restart
the pane between patrol steps. The -k flag sends SIGHUP to the shell but does not
kill all descendant processes (claude and its node children).
## Solution
Added KillPaneProcesses() function in tmux.go that explicitly kills all descendant
processes before respawning the pane. This function:
- Gets all descendant PIDs recursively
- Sends SIGTERM to all (deepest first)
- Waits 100ms for graceful shutdown
- Sends SIGKILL to survivors
Updated handleStepContinue() to call KillPaneProcesses() before RespawnPane().
Co-authored-by: Roland Tritsch <roland@ailtir.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support for checking a specific convoy by ID instead of all convoys:
- `gt convoy check <convoy-id>` - check specific convoy
- `gt convoy check` - check all (existing behavior)
- `gt convoy check --dry-run` - preview mode
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds --all/-a flag as a semantic complement to --unread. While the
default behavior already shows all messages, --all makes the intent
explicit when viewing the complete inbox.
The flags are mutually exclusive - using both --all and --unread
returns an error.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds `gt bead read <id>` as an alias for `gt bead show <id>` to provide
an alternative verb that may feel more natural for viewing bead details.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When --owner flag is not provided on gt convoy create, the owner now
defaults to the creator's identity (via detectSender()) rather than
being left empty. This ensures completion notifications always go to
the right place - the agent who requested the convoy.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TestQuerySessionEvents_FindsEventsFromAllLocations was failing because
events created via bd create were not being found. This is caused by
bd CLI 0.47.2 having a bug where database writes do not commit.
Skip the test until the upstream bd CLI bug is fixed, consistent with
how other affected tests were skipped in commit 7714295a.
The original stack overflow issue (gt-obx) was caused by subprocess
interactions with the parent workspace daemon and was already fixed
by the existing skip logic that triggers when GT_TOWN_ROOT or BD_ACTOR
is set.
Fixes: gt-obx
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Claude sessions were terminated using KillSession(), bash subprocesses
spawned by Claude's Bash tool could survive because they ignore SIGHUP.
This caused zombie processes to accumulate over time.
Changed all critical session termination paths to use KillSessionWithProcesses()
which explicitly kills all descendant processes before terminating the session.
Fixes: gt-ew3tk
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The existing PPID=1 detection misses orphaned Claude processes that get
reparented to something other than init/launchd. The new --aggressive
flag cross-references Claude processes against active tmux sessions to
find ALL orphans not in any gt-* or hq-* session.
Testing shows this catches ~3x more orphans (117 vs 39 in one sample).
Usage:
gt orphans procs --aggressive # List ALL orphans
gt orphans procs kill --aggressive # Kill ALL orphans
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Problem
gt shutdown failed to stop orphaned daemon processes because the
detection mechanism ignored errors and had no fallback.
## Root Cause
stopDaemonIfRunning() ignored errors from daemon.IsRunning(), causing:
1. Stale PID files to hide running daemons
2. Corrupted PID files to return silent false
3. No fallback detection for orphaned processes
4. Early return when no sessions running prevented daemon check
## Solution
1. Enhanced IsRunning() to return detailed errors
2. Added process name verification (prevents PID reuse false positives)
3. Added fallback orphan detection using pgrep
4. Fixed stopDaemonIfRunning() to handle errors and use fallback
5. Added daemon check even when no sessions are running
## Testing
Verified shutdown now:
- Detects and reports stale/corrupted PID files
- Finds orphaned daemon processes
- Kills all daemon processes reliably
- Reports detailed status during shutdown
- Works even when no other sessions are running
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Problems Fixed
1. **False reporting**: `gt shutdown` reported "0 sessions stopped" even when
all 5 sessions were successfully terminated
2. **Orphaned processes**: No way to clean up Claude processes left behind by
crashed/interrupted sessions
## Root Causes
1. **Counter bug**: `killSessionsInOrder()` only incremented the counter when
`KillSessionWithProcesses()` returned no error. However, this function can
return an error even after successfully killing all processes (e.g., when
the session auto-closes after its processes die, the final `kill-session`
command fails with "session not found").
2. **No orphan cleanup**: While `internal/util/orphan.go` provides orphan
detection infrastructure, it wasn't integrated into the shutdown workflow.
## Solutions
1. **Fix counter logic**: Modified `killSessionsInOrder()` to verify session
termination by checking if the session still exists after the kill attempt,
rather than relying solely on the error return value. This correctly counts
sessions that were terminated even if the kill command returned an error.
2. **Add `--cleanup-orphans` flag**: Integrated orphan cleanup with a simple
synchronous approach:
- Finds Claude/codex processes without a controlling terminal (TTY)
- Filters out processes younger than 60 seconds (avoids race conditions)
- Excludes processes belonging to active Gas Town tmux sessions
- Sends SIGTERM to all orphans
- Waits for configurable grace period (default 60s)
- Sends SIGKILL to any that survived SIGTERM
3. **Add `--cleanup-orphans-grace-secs` flag**: Allows users to configure the
grace period between SIGTERM and SIGKILL (default 60 seconds).
## Design Choice: Synchronous vs. Persistent State
The orphan cleanup uses a **synchronous wait approach** rather than the
persistent state machine approach in `util.CleanupOrphanedClaudeProcesses()`:
**Synchronous approach (this PR):**
- Send SIGTERM → Wait N seconds → Send SIGKILL (all in one invocation)
- Simpler to understand and debug
- User sees immediate results
- No persistent state file to manage
**Persistent state approach (util.CleanupOrphanedClaudeProcesses):**
- First run: SIGTERM → save state
- Second run (60s later): Check state → SIGKILL
- Requires multiple invocations
- Persists state in `/tmp/gastown-orphan-state`
The synchronous approach is more appropriate for `gt shutdown` where users
expect immediate cleanup, while the persistent approach is better suited for
periodic cleanup daemons.
## Testing
Before fix:
```
Sessions to stop: gt-boot, gt-pgqueue-refinery, gt-pgqueue-witness, hq-deacon, hq-mayor
✓ Gas Town shutdown complete (0 sessions stopped) ← Bug
```
After fix:
```
Sessions to stop: gt-boot, gt-pgqueue-refinery, gt-pgqueue-witness, hq-deacon, hq-mayor
✓ hq-deacon stopped
✓ gt-boot stopped
✓ gt-pgqueue-refinery stopped
✓ gt-pgqueue-witness stopped
✓ hq-mayor stopped
Cleaning up orphaned Claude processes...
→ PID 267916: sent SIGTERM (waiting 60s before SIGKILL)
⏳ Waiting 60 seconds for processes to terminate gracefully...
✓ 1 process(es) terminated gracefully from SIGTERM
✓ All processes cleaned up successfully
✓ Gas Town shutdown complete (5 sessions stopped) ← Fixed
```
All sessions verified terminated via `tmux ls`.
Co-authored-by: Roland Tritsch <roland@ailtir.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add CheckMisclassifiedWisps doctor check to detect issues that should be
marked as wisps but aren't. This catches merge-requests, patrol molecules,
and operational work that lacks the wisp:true flag.
Add defense-in-depth wisp filtering to gt ready command. While bd ready
should already filter wisps, this provides an additional layer to ensure
ephemeral operational work doesn't leak into the ready work display.
Changes:
- New doctor check: misclassified-wisps (fixable, CategoryCleanup)
- gt ready now filters wisps from issues.jsonl in addition to scaffolds
- Detects wisp patterns: merge-request type, patrol labels, mol-* IDs
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused workDir field from witness manager
- Use witMgr.IsRunning() consistently instead of direct tmux call
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove state files from witness and refinery managers, following
the "Discover, Don't Track" principle. Tmux session existence is
now the source of truth for running state (like deacon).
Changes:
- Add IsRunning() that checks tmux HasSession
- Change Status() to return *tmux.SessionInfo
- Remove loadState/saveState/stateManager
- Simplify Start()/Stop() to not use state files
- Update CLI commands (witness/refinery/rig) for new API
- Update tests to be ZFC-compliant
This fixes state file divergence issues where witness/refinery
could show "running" when the actual tmux session was dead.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(sling): check hooked status and send LIFECYCLE:Shutdown on --force
- Change sling validation to check both pinned and hooked status (was only
checking pinned, likely a bug)
- Add --force handling that sends LIFECYCLE:Shutdown message to witness when
forcibly reassigning work from an already-hooked bead
- Use existing LIFECYCLE:Shutdown protocol instead of new KILL_POLECAT -
witness will auto-nuke if clean, or create cleanup wisp if dirty
- Use agent.Self() to identify the requester (falls back to "unknown"
for CLI users without GT_ROLE env vars)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: use env vars instead of undefined agent.Self()
The agent.Self() function does not exist in the agent package.
Replace with direct env var lookups for GT_POLECAT (when running
as a polecat) or USER as fallback.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: beads/crew/lizzy <steve.yegge@gmail.com>
Add BD_ACTOR check at start of runDone() to prevent non-polecat roles
(crew, deacon, witness, etc.) from calling gt done. Only polecats are
ephemeral workers that self-destruct after completing work.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When gt done runs inside a tmux session (e.g., after polecat task
completion), calling KillSessionWithProcesses would kill the gt done
process itself before it could complete cleanup operations like writing
handoff state.
Add KillSessionWithProcessesExcluding() function that accepts a list of
PIDs to exclude from the kill sequence. Update selfKillSession to pass
its own PID, ensuring gt done completes before the session is destroyed.
Also fix both Kill*WithProcesses functions to ignore "session not found"
errors from KillSession - when we kill all processes in a session, tmux
may automatically destroy it before we explicitly call KillSession.
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
The mq list --ready command was filtering by issue.Type == "merge-request",
but beads created by `gt done` have issue_type='task' (the default) with
a gt:merge-request label. This caused ready MRs to be filtered out.
Changed to use beads.HasLabel() which checks the label, completing the
migration from the deprecated issue_type field to labels.
Added TestMRFilteringByLabel to verify the fix handles the bug scenario.
Fixes#816
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
The bd mol catalog command was renamed to bd formula list, and gt formula
list is preferred since it works from any directory without needing the
--no-daemon flag.
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Address review feedback: revert KillSession back to KillSessionWithProcesses
in stopSession() to properly terminate all child processes.
KillSessionWithProcesses recursively finds and terminates descendant processes
with SIGTERM/SIGKILL, preventing orphaned Claude/node processes that can
survive tmux session kills.
The orphan detection in verifyShutdown() remains as a helpful warning but
shouldn't replace proper process termination.
Fixes#291 - gastown is very hard to kill/shutdown/stop
Changes:
- Add shutdown coordination: daemon checks shutdown.lock and skips
heartbeat auto-restarts during shutdown to prevent fighting shutdown
- Add orphaned Claude/node process detection in shutdown verification
The daemon's heartbeat now checks for shutdown.lock (created by gt down)
and skips auto-restart logic when shutdown is in progress. This prevents
the daemon from restarting agents that were intentionally killed during
shutdown.
Shutdown verification now includes detection of orphaned Claude/node
processes that may be left behind when tmux sessions are killed but
child processes don't terminate.