feat: Add automatic orphaned claude process cleanup (#588)
* feat: Add automatic orphaned claude process cleanup Claude Code's Task tool spawns subagent processes that sometimes don't clean up properly after completion. These accumulate and consume significant memory (observed: 17 processes using ~6GB RAM). This change adds automatic cleanup in two places: 1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs `gt deacon cleanup-orphans` early in each cycle. More responsive (~30s). 2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net when deacon is down. Detection uses TTY column - processes with TTY "?" have no controlling terminal. This is safe because: - Processes in terminals (user sessions) have a TTY like "pts/0" - untouched - Only kills processes with no controlling terminal - Orphaned subagents are children of tmux server with no TTY New files: - internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses - internal/util/orphan_test.go: Tests for orphan detection New command: - `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup Fixes #587 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(orphan): add Windows build tag and minimum age check Addresses review feedback on PR #588: 1. Add //go:build !windows to orphan.go and orphan_test.go - The code uses Unix-specific syscalls (SIGTERM, ESRCH) and ps command options that don't exist on Windows 2. Add minimum age check (60 seconds) to prevent false positives - Prevents race conditions with newly spawned subagents - Addresses reviewer concern about cron/systemd processes - Uses portable etime format instead of Linux-only etimes 3. Add parseEtime helper with comprehensive tests - Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS) - etimes (seconds) is Linux-specific, etime is portable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking Previous approach used process age which doesn't work: a Task subagent runs without TTY from birth, so a long-running legitimate subagent that later fails to exit would be immediately SIGKILLed without trying SIGTERM. New approach uses a state file to track signal history: 1. First encounter → SIGTERM, record PID + timestamp in state file 2. Next cycle (after 60s grace period) → if still alive, SIGKILL 3. Next cycle → if survived SIGKILL, log as unkillable and remove State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/) Format: "<pid> <signal> <unix_timestamp>" per line The state file is automatically cleaned up: - Dead processes removed on load - Unkillable processes removed after logging Also updates callers to use new CleanupResult type which includes the signal sent (SIGTERM, SIGKILL, or UNKILLABLE). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -28,6 +28,7 @@ import (
|
||||
"github.com/steveyegge/gastown/internal/rig"
|
||||
"github.com/steveyegge/gastown/internal/session"
|
||||
"github.com/steveyegge/gastown/internal/tmux"
|
||||
"github.com/steveyegge/gastown/internal/util"
|
||||
"github.com/steveyegge/gastown/internal/wisp"
|
||||
"github.com/steveyegge/gastown/internal/witness"
|
||||
)
|
||||
@@ -268,6 +269,11 @@ func (d *Daemon) heartbeat(state *State) {
|
||||
// This validates tmux sessions are still alive for polecats with work-on-hook
|
||||
d.checkPolecatSessionHealth()
|
||||
|
||||
// 12. Clean up orphaned claude subagent processes (memory leak prevention)
|
||||
// These are Task tool subagents that didn't clean up after completion.
|
||||
// This is a safety net - Deacon patrol also does this more frequently.
|
||||
d.cleanupOrphanedProcesses()
|
||||
|
||||
// Update state
|
||||
state.LastHeartbeat = time.Now()
|
||||
state.HeartbeatCount++
|
||||
@@ -980,3 +986,26 @@ Manual intervention may be required.`,
|
||||
d.logger.Printf("Warning: failed to notify witness of crashed polecat: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// cleanupOrphanedProcesses kills orphaned claude subagent processes.
|
||||
// These are Task tool subagents that didn't clean up after completion.
|
||||
// Detection uses TTY column: processes with TTY "?" have no controlling terminal.
|
||||
// This is a safety net fallback - Deacon patrol also runs this more frequently.
|
||||
func (d *Daemon) cleanupOrphanedProcesses() {
|
||||
results, err := util.CleanupOrphanedClaudeProcesses()
|
||||
if err != nil {
|
||||
d.logger.Printf("Warning: orphan process cleanup failed: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
if len(results) > 0 {
|
||||
d.logger.Printf("Orphan cleanup: processed %d process(es)", len(results))
|
||||
for _, r := range results {
|
||||
if r.Signal == "UNKILLABLE" {
|
||||
d.logger.Printf(" WARNING: PID %d (%s) survived SIGKILL", r.Process.PID, r.Process.Cmd)
|
||||
} else {
|
||||
d.logger.Printf(" Sent %s to PID %d (%s)", r.Signal, r.Process.PID, r.Process.Cmd)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user