## Problems Fixed
1. **False reporting**: `gt shutdown` reported "0 sessions stopped" even when
all 5 sessions were successfully terminated
2. **Orphaned processes**: No way to clean up Claude processes left behind by
crashed/interrupted sessions
## Root Causes
1. **Counter bug**: `killSessionsInOrder()` only incremented the counter when
`KillSessionWithProcesses()` returned no error. However, this function can
return an error even after successfully killing all processes (e.g., when
the session auto-closes after its processes die, the final `kill-session`
command fails with "session not found").
2. **No orphan cleanup**: While `internal/util/orphan.go` provides orphan
detection infrastructure, it wasn't integrated into the shutdown workflow.
## Solutions
1. **Fix counter logic**: Modified `killSessionsInOrder()` to verify session
termination by checking if the session still exists after the kill attempt,
rather than relying solely on the error return value. This correctly counts
sessions that were terminated even if the kill command returned an error.
2. **Add `--cleanup-orphans` flag**: Integrated orphan cleanup with a simple
synchronous approach:
- Finds Claude/codex processes without a controlling terminal (TTY)
- Filters out processes younger than 60 seconds (avoids race conditions)
- Excludes processes belonging to active Gas Town tmux sessions
- Sends SIGTERM to all orphans
- Waits for configurable grace period (default 60s)
- Sends SIGKILL to any that survived SIGTERM
3. **Add `--cleanup-orphans-grace-secs` flag**: Allows users to configure the
grace period between SIGTERM and SIGKILL (default 60 seconds).
## Design Choice: Synchronous vs. Persistent State
The orphan cleanup uses a **synchronous wait approach** rather than the
persistent state machine approach in `util.CleanupOrphanedClaudeProcesses()`:
**Synchronous approach (this PR):**
- Send SIGTERM → Wait N seconds → Send SIGKILL (all in one invocation)
- Simpler to understand and debug
- User sees immediate results
- No persistent state file to manage
**Persistent state approach (util.CleanupOrphanedClaudeProcesses):**
- First run: SIGTERM → save state
- Second run (60s later): Check state → SIGKILL
- Requires multiple invocations
- Persists state in `/tmp/gastown-orphan-state`
The synchronous approach is more appropriate for `gt shutdown` where users
expect immediate cleanup, while the persistent approach is better suited for
periodic cleanup daemons.
## Testing
Before fix:
```
Sessions to stop: gt-boot, gt-pgqueue-refinery, gt-pgqueue-witness, hq-deacon, hq-mayor
✓ Gas Town shutdown complete (0 sessions stopped) ← Bug
```
After fix:
```
Sessions to stop: gt-boot, gt-pgqueue-refinery, gt-pgqueue-witness, hq-deacon, hq-mayor
✓ hq-deacon stopped
✓ gt-boot stopped
✓ gt-pgqueue-refinery stopped
✓ gt-pgqueue-witness stopped
✓ hq-mayor stopped
Cleaning up orphaned Claude processes...
→ PID 267916: sent SIGTERM (waiting 60s before SIGKILL)
⏳ Waiting 60 seconds for processes to terminate gracefully...
✓ 1 process(es) terminated gracefully from SIGTERM
✓ All processes cleaned up successfully
✓ Gas Town shutdown complete (5 sessions stopped) ← Fixed
```
All sessions verified terminated via `tmux ls`.
Co-authored-by: Roland Tritsch <roland@ailtir.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>