Feat/gt down tests (#15) (#18) (#330)

* fix(down): add refinery shutdown to gt down

Refineries were not being stopped by gt down, causing them to continue
running after shutdown. This adds a refinery shutdown loop before
witnesses, fixing problem P3 from the v2.4 proposal.

Changes:
- Add Phase 1: Stop refineries (gt-<rig>-refinery sessions)
- Renumber existing phases (witnesses now Phase 2, etc.)
- Include refineries in halt event logging

* feat(beads): add StopAllBdProcesses for shutdown

Add functions to stop bd daemon and bd activity processes:
- StopAllBdProcesses(dryRun, force) - main entry point
- CountBdDaemons() - count running bd daemons
- CountBdActivityProcesses() - count running bd activity processes
- stopBdDaemons() - uses bd daemon killall
- stopBdActivityProcesses() - SIGTERM->wait->SIGKILL pattern

This solves problems P1 (bd daemon respawns sessions) and P2 (bd activity
causes instant wakeups) from the v2.4 proposal.

* feat(down): rename --all to --nuke, add new --all and --dry-run flags

BREAKING CHANGE: --all now stops bd processes instead of killing tmux server.
Use --nuke for the old --all behavior (killing the entire tmux server).

New flags:
- --all: Stop bd daemons/activity processes and verify shutdown
- --nuke: Kill entire tmux server (DESTRUCTIVE, with warning)
- --dry-run: Preview what would be stopped without taking action

This solves problem P4 (old --all was too destructive) from the v2.4 proposal.

The --nuke flag now requires GT_NUKE_ACKNOWLEDGED=1 environment variable
to suppress the warning about destroying all tmux sessions.

* feat(down): add shutdown lock to prevent concurrent runs

Add Phase 0 that acquires a file lock before shutdown to prevent race
conditions when multiple gt down commands are run concurrently.

- Uses gofrs/flock for cross-platform file locking
- Lock file stored at ~/gt/daemon/shutdown.lock
- 5 second timeout with 100ms retry interval
- Lock released via defer on successful acquisition
- Dry-run mode skips lock acquisition

This solves problem P6 (concurrent shutdown race) from the v2.4 proposal.

* feat(down): add verification phase for respawn detection

Add Phase 5 that verifies shutdown was complete after stopping all services:
- Waits 500ms for processes to fully terminate
- Checks for respawned bd daemons
- Checks for respawned bd activity processes
- Checks for remaining gt-*/hq-* tmux sessions
- Checks if daemon PID is still running

If anything respawned, warns user and suggests checking systemd/launchd.

This solves problem P5 (no verification) from the v2.4 proposal.

* test(down): add unit tests for shutdown functionality

Add tests for:
- parseBdDaemonCount() - array, object with count, object with daemons, empty, invalid
- CountBdActivityProcesses() - integration test
- CountBdDaemons() - integration test (skipped if bd not installed)
- StopAllBdProcesses() - dry-run mode test
- isProcessRunning() - current process, invalid PID, max PID

These tests cover the core parsing and process detection logic added
in the v2.4 shutdown enhancement.

* fix(review): add tmux check and pkill fallback for bd shutdown

Address review gaps against proposal v2.4 AC:

- AC1: Add tmux availability check BEFORE acquiring shutdown lock
- AC2: Add pkill fallback for bd daemon when killall incomplete
- AC2: Return remaining count from stop functions for error reporting
- Style: interface{} → any (Go 1.18+)



* fix(prime): add validation for --state flag combination

The --state flag should be standalone and not combined with other flags.
Add validation at start of runPrime to enforce this.

Fixes TestPrimeFlagCombinations test failures.

* fix(review): address bot review critical issues

- isProcessRunning: handle pid<=0 as invalid (return false)
- isProcessRunning: handle EPERM as process exists (return true)
- stopBdDaemons: prevent negative killed count from race conditions
- stopBdActivityProcesses: prevent negative killed count from race conditions



* fix(review): critical fixes from deep review

Platform fixes:
- CountBdActivityProcesses: use sh -c "pgrep | wc -l" for macOS compatibility
  (pgrep -c flag not available on BSD/macOS)

Correctness fixes:
- stopSession: return (wasRunning, error) to distinguish "stopped" vs "not running"
- daemon.IsRunning: handle error instead of ignoring with blank identifier
- stopBdDaemons/stopBdActivityProcesses: guard against negative killed counts

Safety fixes:
- --nuke: require GT_NUKE_ACKNOWLEDGED=1, don't just warn and proceed
- pkill patterns: document limitation about broad matching

Code cleanup:
- EnsureBdDaemonHealth: remove unused issues variable



---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Subhrajit Makur
2026-01-11 12:26:33 +05:30
committed by GitHub
parent 62d5e4b550
commit 6a705f6210
4 changed files with 516 additions and 56 deletions

View File

@@ -5,9 +5,15 @@ import (
"encoding/json"
"fmt"
"os/exec"
"strconv"
"strings"
"time"
)
const (
gracefulTimeout = 2 * time.Second
)
// BdDaemonInfo represents the status of a single bd daemon instance.
type BdDaemonInfo struct {
Workspace string `json:"workspace"`
@@ -69,21 +75,12 @@ func EnsureBdDaemonHealth(workDir string) string {
// Check if any daemons need attention
needsRestart := false
var issues []string
for _, d := range health.Daemons {
switch d.Status {
case "healthy":
// Good
case "version_mismatch":
case "version_mismatch", "stale", "unresponsive":
needsRestart = true
issues = append(issues, fmt.Sprintf("%s: version mismatch", d.Workspace))
case "stale":
needsRestart = true
issues = append(issues, fmt.Sprintf("%s: stale", d.Workspace))
case "unresponsive":
needsRestart = true
issues = append(issues, fmt.Sprintf("%s: unresponsive", d.Workspace))
}
}
@@ -132,3 +129,144 @@ func StartBdDaemonIfNeeded(workDir string) error {
cmd.Dir = workDir
return cmd.Run()
}
// StopAllBdProcesses stops all bd daemon and activity processes.
// Returns (daemonsKilled, activityKilled, error).
// If dryRun is true, returns counts without stopping anything.
func StopAllBdProcesses(dryRun, force bool) (int, int, error) {
if _, err := exec.LookPath("bd"); err != nil {
return 0, 0, nil
}
daemonsBefore := CountBdDaemons()
activityBefore := CountBdActivityProcesses()
if dryRun {
return daemonsBefore, activityBefore, nil
}
daemonsKilled, daemonsRemaining := stopBdDaemons(force)
activityKilled, activityRemaining := stopBdActivityProcesses(force)
if daemonsRemaining > 0 {
return daemonsKilled, activityKilled, fmt.Errorf("bd daemon shutdown incomplete: %d still running", daemonsRemaining)
}
if activityRemaining > 0 {
return daemonsKilled, activityKilled, fmt.Errorf("bd activity shutdown incomplete: %d still running", activityRemaining)
}
return daemonsKilled, activityKilled, nil
}
// CountBdDaemons returns count of running bd daemons.
func CountBdDaemons() int {
listCmd := exec.Command("bd", "daemon", "list", "--json")
output, err := listCmd.Output()
if err != nil {
return 0
}
return parseBdDaemonCount(output)
}
// parseBdDaemonCount parses bd daemon list --json output.
func parseBdDaemonCount(output []byte) int {
if len(output) == 0 {
return 0
}
var daemons []any
if err := json.Unmarshal(output, &daemons); err == nil {
return len(daemons)
}
var wrapper struct {
Daemons []any `json:"daemons"`
Count int `json:"count"`
}
if err := json.Unmarshal(output, &wrapper); err == nil {
if wrapper.Count > 0 {
return wrapper.Count
}
return len(wrapper.Daemons)
}
return 0
}
func stopBdDaemons(force bool) (int, int) {
before := CountBdDaemons()
if before == 0 {
return 0, 0
}
killCmd := exec.Command("bd", "daemon", "killall")
_ = killCmd.Run()
time.Sleep(100 * time.Millisecond)
after := CountBdDaemons()
if after == 0 {
return before, 0
}
// Note: pkill -f pattern may match unintended processes in rare cases
// (e.g., editors with "bd daemon" in file content). This is acceptable
// as a fallback when bd daemon killall fails.
if force {
_ = exec.Command("pkill", "-9", "-f", "bd daemon").Run()
} else {
_ = exec.Command("pkill", "-TERM", "-f", "bd daemon").Run()
time.Sleep(gracefulTimeout)
if remaining := CountBdDaemons(); remaining > 0 {
_ = exec.Command("pkill", "-9", "-f", "bd daemon").Run()
}
}
time.Sleep(100 * time.Millisecond)
final := CountBdDaemons()
killed := before - final
if killed < 0 {
killed = 0 // Race condition: more processes spawned than we killed
}
return killed, final
}
// CountBdActivityProcesses returns count of running `bd activity` processes.
func CountBdActivityProcesses() int {
// Use pgrep -f with wc -l for cross-platform compatibility
// (macOS pgrep doesn't support -c flag)
cmd := exec.Command("sh", "-c", "pgrep -f 'bd activity' 2>/dev/null | wc -l")
output, err := cmd.Output()
if err != nil {
return 0
}
count, _ := strconv.Atoi(strings.TrimSpace(string(output)))
return count
}
func stopBdActivityProcesses(force bool) (int, int) {
before := CountBdActivityProcesses()
if before == 0 {
return 0, 0
}
if force {
_ = exec.Command("pkill", "-9", "-f", "bd activity").Run()
} else {
_ = exec.Command("pkill", "-TERM", "-f", "bd activity").Run()
time.Sleep(gracefulTimeout)
if remaining := CountBdActivityProcesses(); remaining > 0 {
_ = exec.Command("pkill", "-9", "-f", "bd activity").Run()
}
}
time.Sleep(100 * time.Millisecond)
after := CountBdActivityProcesses()
killed := before - after
if killed < 0 {
killed = 0 // Race condition: more processes spawned than we killed
}
return killed, after
}

View File

@@ -0,0 +1,73 @@
package beads
import (
"os/exec"
"testing"
)
func TestParseBdDaemonCount_Array(t *testing.T) {
input := []byte(`[{"pid":1234},{"pid":5678}]`)
count := parseBdDaemonCount(input)
if count != 2 {
t.Errorf("expected 2, got %d", count)
}
}
func TestParseBdDaemonCount_ObjectWithCount(t *testing.T) {
input := []byte(`{"count":3,"daemons":[{},{},{}]}`)
count := parseBdDaemonCount(input)
if count != 3 {
t.Errorf("expected 3, got %d", count)
}
}
func TestParseBdDaemonCount_ObjectWithDaemons(t *testing.T) {
input := []byte(`{"daemons":[{},{}]}`)
count := parseBdDaemonCount(input)
if count != 2 {
t.Errorf("expected 2, got %d", count)
}
}
func TestParseBdDaemonCount_Empty(t *testing.T) {
input := []byte(``)
count := parseBdDaemonCount(input)
if count != 0 {
t.Errorf("expected 0, got %d", count)
}
}
func TestParseBdDaemonCount_Invalid(t *testing.T) {
input := []byte(`not json`)
count := parseBdDaemonCount(input)
if count != 0 {
t.Errorf("expected 0 for invalid JSON, got %d", count)
}
}
func TestCountBdActivityProcesses(t *testing.T) {
count := CountBdActivityProcesses()
if count < 0 {
t.Errorf("count should be non-negative, got %d", count)
}
}
func TestCountBdDaemons(t *testing.T) {
if _, err := exec.LookPath("bd"); err != nil {
t.Skip("bd not installed")
}
count := CountBdDaemons()
if count < 0 {
t.Errorf("count should be non-negative, got %d", count)
}
}
func TestStopAllBdProcesses_DryRun(t *testing.T) {
daemonsKilled, activityKilled, err := StopAllBdProcesses(true, false)
if err != nil {
t.Errorf("unexpected error: %v", err)
}
if daemonsKilled < 0 || activityKilled < 0 {
t.Errorf("counts should be non-negative: daemons=%d, activity=%d", daemonsKilled, activityKilled)
}
}