## Problems Fixed 1. **False reporting**: `gt shutdown` reported "0 sessions stopped" even when all 5 sessions were successfully terminated 2. **Orphaned processes**: No way to clean up Claude processes left behind by crashed/interrupted sessions ## Root Causes 1. **Counter bug**: `killSessionsInOrder()` only incremented the counter when `KillSessionWithProcesses()` returned no error. However, this function can return an error even after successfully killing all processes (e.g., when the session auto-closes after its processes die, the final `kill-session` command fails with "session not found"). 2. **No orphan cleanup**: While `internal/util/orphan.go` provides orphan detection infrastructure, it wasn't integrated into the shutdown workflow. ## Solutions 1. **Fix counter logic**: Modified `killSessionsInOrder()` to verify session termination by checking if the session still exists after the kill attempt, rather than relying solely on the error return value. This correctly counts sessions that were terminated even if the kill command returned an error. 2. **Add `--cleanup-orphans` flag**: Integrated orphan cleanup with a simple synchronous approach: - Finds Claude/codex processes without a controlling terminal (TTY) - Filters out processes younger than 60 seconds (avoids race conditions) - Excludes processes belonging to active Gas Town tmux sessions - Sends SIGTERM to all orphans - Waits for configurable grace period (default 60s) - Sends SIGKILL to any that survived SIGTERM 3. **Add `--cleanup-orphans-grace-secs` flag**: Allows users to configure the grace period between SIGTERM and SIGKILL (default 60 seconds). ## Design Choice: Synchronous vs. Persistent State The orphan cleanup uses a **synchronous wait approach** rather than the persistent state machine approach in `util.CleanupOrphanedClaudeProcesses()`: **Synchronous approach (this PR):** - Send SIGTERM → Wait N seconds → Send SIGKILL (all in one invocation) - Simpler to understand and debug - User sees immediate results - No persistent state file to manage **Persistent state approach (util.CleanupOrphanedClaudeProcesses):** - First run: SIGTERM → save state - Second run (60s later): Check state → SIGKILL - Requires multiple invocations - Persists state in `/tmp/gastown-orphan-state` The synchronous approach is more appropriate for `gt shutdown` where users expect immediate cleanup, while the persistent approach is better suited for periodic cleanup daemons. ## Testing Before fix: ``` Sessions to stop: gt-boot, gt-pgqueue-refinery, gt-pgqueue-witness, hq-deacon, hq-mayor ✓ Gas Town shutdown complete (0 sessions stopped) ← Bug ``` After fix: ``` Sessions to stop: gt-boot, gt-pgqueue-refinery, gt-pgqueue-witness, hq-deacon, hq-mayor ✓ hq-deacon stopped ✓ gt-boot stopped ✓ gt-pgqueue-refinery stopped ✓ gt-pgqueue-witness stopped ✓ hq-mayor stopped Cleaning up orphaned Claude processes... → PID 267916: sent SIGTERM (waiting 60s before SIGKILL) ⏳ Waiting 60 seconds for processes to terminate gracefully... ✓ 1 process(es) terminated gracefully from SIGTERM ✓ All processes cleaned up successfully ✓ Gas Town shutdown complete (5 sessions stopped) ← Fixed ``` All sessions verified terminated via `tmux ls`. Co-authored-by: Roland Tritsch <roland@ailtir.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
32 KiB
32 KiB