Auto-detect and restart daemon on version mismatch (bd-89)
Implements automatic daemon version detection and restart when client and daemon versions are incompatible. Eliminates need for manual 'bd daemon --stop' after upgrades. Changes: - Check daemon version during health check in PersistentPreRun - Auto-restart mismatched daemon or fall back to direct mode - Check version when starting daemon, auto-stop old daemon if incompatible - Robust restart logic: sets working dir, cleans stale sockets, reaps processes - Uses waitForSocketReadiness helper for reliable startup detection - Updated AGENTS.md with version management documentation Closes bd-89 Amp-Thread-ID: https://ampcode.com/threads/T-231a3701-c9c8-49e4-a1b0-e67c94e5c365 Co-authored-by: Amp <amp@ampcode.com>
This commit is contained in:
@@ -85,4 +85,6 @@
|
||||
{"id":"bd-86","title":"Add transaction support for atomicity in merge operations","description":"The merge operation in cmd/bd/merge.go should use transactions to ensure atomicity. Currently marked as TODO at line 143.\n\nThis would prevent partial merges if an error occurs partway through the operation.","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-10-23T19:33:34.549858-07:00","updated_at":"2025-10-23T19:35:40.620329-07:00","closed_at":"2025-10-23T19:35:40.620329-07:00"}
|
||||
{"id":"bd-87","title":"Add RPC support for epic commands in daemon mode","description":"Epic status and close-eligible commands currently error out in daemon mode with a message to use --no-daemon. These commands should work with daemon RPC like other commands.\n\nLocations:\n- cmd/bd/epic.go:26 (epic status command)\n- cmd/bd/epic.go:106 (epic close-eligible command)","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-10-23T19:33:34.552261-07:00","updated_at":"2025-10-23T21:56:33.732039-07:00","closed_at":"2025-10-23T21:56:33.732039-07:00"}
|
||||
{"id":"bd-88","title":"bd import reports \"0 created, 0 updated\" when successfully importing issues","description":"The `bd import` command successfully imported 125 issues but reported \"0 created, 0 updated\" in the output. The import actually worked, but the success message is incorrect/misleading.\n\nThis appears to be a bug in the reporting logic that counts and displays the number of issues created/updated during import.","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-10-23T22:28:40.391453-07:00","updated_at":"2025-10-23T23:05:57.413177-07:00","closed_at":"2025-10-23T23:05:57.413177-07:00"}
|
||||
{"id":"bd-89","title":"Auto-detect and kill old daemon versions","description":"When the client version doesn't match the daemon version, we get confusing behavior (auto-flush race conditions, stale data, etc.). The client should automatically detect version mismatches and handle them gracefully.\n\n**Current behavior:**\n- `bd version --daemon` shows mismatch but requires manual intervention\n- Old daemons keep running after binary upgrades\n- MCP server may connect to old daemon\n- Results in dirty working tree after commits, stale data\n\n**Proposed solution:**\n\nKey lifecycle points to check/restart daemon:\n1. **On first command after version mismatch**: Check daemon version, auto-restart if incompatible\n2. **On daemon start**: Check for existing daemons, kill old ones before starting\n3. **After brew upgrade/install**: Add post-install hook to kill old daemons\n4. **On `bd init`**: Ensure fresh daemon\n\n**Detection logic:**\n```go\n// PersistentPreRun: check daemon version\nif daemonVersion != clientVersion {\n log.Warn(\"Daemon version mismatch, restarting...\")\n killDaemon()\n startDaemon()\n}\n```\n\n**Considerations:**\n- Should we be aggressive (always kill mismatched) or conservative (warn first)?\n- What about multiple workspaces with different bd versions?\n- Should this be opt-in via config flag?\n- How to handle graceful shutdown vs force kill?\n\n**Related issues:**\n- Race condition with auto-flush (see bd-89)\n- Version mismatch confusion for users\n- Stale daemon after upgrades","notes":"## Implementation Summary\n\nImplemented automatic daemon version detection and restart in v0.16.0.\n\n### Changes Made\n\n**1. Auto-restart on version mismatch (main.go PersistentPreRun)**\n- Check daemon version during health check\n- If incompatible, automatically stop old daemon and start new one\n- Falls back to direct mode if restart fails\n- Transparent to users - no manual intervention needed\n\n**2. Auto-stop old daemon on startup (daemon.go)**\n- When starting daemon, check if existing daemon has compatible version\n- If versions are incompatible, auto-stop old daemon before starting new one\n- Prevents \"daemon already running\" errors after upgrades\n\n**3. Robust restart implementation**\n- Sets correct working directory so daemon finds right database\n- Cleans up stale socket files after force kill\n- Properly reaps child process to avoid zombies\n- Uses waitForSocketReadiness helper for reliable startup detection\n- 5-second readiness timeout\n\n### Key Features\n\n- **Automatic**: No user action required after upgrading bd\n- **Transparent**: Works with both MCP server and CLI\n- **Safe**: Falls back to direct mode if restart fails\n- **Tested**: All existing tests pass\n\n### Related\n- Addresses race conditions mentioned in bd-90\n- Uses semver compatibility checking from internal/rpc/server.go","status":"closed","priority":1,"issue_type":"feature","created_at":"2025-10-23T23:15:59.764705-07:00","updated_at":"2025-10-23T23:28:06.611221-07:00","closed_at":"2025-10-23T23:28:06.611221-07:00"}
|
||||
{"id":"bd-9","title":"Test issue 2","description":"","status":"closed","priority":1,"issue_type":"task","created_at":"2025-10-21T23:53:44.31362-07:00","updated_at":"2025-10-23T19:33:21.099891-07:00","closed_at":"2025-10-21T22:06:41.257019-07:00","labels":["test-label"]}
|
||||
{"id":"bd-90","title":"Race condition between git commit and auto-flush debounce","description":"When using MCP/daemon mode, operations trigger a 5-second debounced auto-flush to JSONL. This creates a race condition with git commits, leaving the working tree dirty.\n\n**Example scenario:**\n1. User closes issue via MCP → daemon schedules flush (5 sec delay)\n2. User commits code changes → JSONL appears clean\n3. Daemon flush fires → JSONL modified after commit\n4. Result: dirty working tree showing JSONL changes\n\n**Root cause:**\n- Auto-flush uses 5-second debounce to batch changes\n- Git commits happen immediately\n- No coordination between flush schedule and git operations\n\n**Possible solutions:**\n\n1. **Immediate flush before git operations**\n - Detect git commands (commit, status, push)\n - Force immediate flush if pending\n - Pros: Clean working tree guaranteed\n - Cons: Requires hooking git, may be slow\n\n2. **Commit includes pending flushes**\n - Add `bd sync` to commit workflow\n - Wait for flush to complete before committing\n - Pros: Simple, explicit\n - Cons: Requires user discipline\n\n3. **Git hooks integration**\n - pre-commit hook: `bd sync --wait`\n - Ensures JSONL is up-to-date before commit\n - Pros: Automatic, reliable\n - Cons: Requires hook installation\n\n4. **Reduce debounce delay**\n - Lower from 5s to 1s or 500ms\n - Pros: Faster sync, less likely to race\n - Cons: More frequent I/O, doesn't eliminate race\n\n5. **Lock-based coordination**\n - Daemon holds lock while flush pending\n - Git operations wait for lock\n - Pros: Guarantees ordering\n - Cons: Complex, may block operations\n\n**Recommended approach:**\nCombine #2 and #3:\n- Add `bd sync` command to explicitly flush\n- Provide git hooks in `examples/git-hooks/`\n- Document workflow in AGENTS.md\n- Keep 5s debounce for normal operations\n\n**Related:**\n- bd-89 (daemon version detection)","status":"open","priority":1,"issue_type":"bug","created_at":"2025-10-23T23:16:29.502191-07:00","updated_at":"2025-10-23T23:16:29.502191-07:00"}
|
||||
|
||||
@@ -70,6 +70,12 @@ The single MCP server instance automatically:
|
||||
|
||||
**Note:** The daemon **auto-starts automatically** when you run any `bd` command (v0.9.11+). To disable auto-start, set `BEADS_AUTO_START_DAEMON=false`.
|
||||
|
||||
**Version Management:** bd automatically handles daemon version mismatches (v0.16.0+):
|
||||
- When you upgrade bd, old daemons are automatically detected and restarted
|
||||
- Version compatibility is checked on every connection
|
||||
- No manual `bd daemon --stop` required after upgrades
|
||||
- Works transparently with MCP server and CLI
|
||||
|
||||
**Alternative (legacy): Multiple MCP Server Instances**
|
||||
If you must use separate MCP servers (not recommended):
|
||||
```json
|
||||
|
||||
@@ -90,9 +90,32 @@ Use --health to check daemon health and metrics.`,
|
||||
|
||||
// Check if daemon is already running
|
||||
if isRunning, pid := isDaemonRunning(pidFile); isRunning {
|
||||
fmt.Fprintf(os.Stderr, "Error: daemon already running (PID %d)\n", pid)
|
||||
fmt.Fprintf(os.Stderr, "Use 'bd daemon --stop%s' to stop it first\n", boolToFlag(global, " --global"))
|
||||
os.Exit(1)
|
||||
// Check if running daemon has compatible version
|
||||
socketPath := getSocketPathForPID(pidFile, global)
|
||||
if client, err := rpc.TryConnectWithTimeout(socketPath, 1*time.Second); err == nil && client != nil {
|
||||
health, healthErr := client.Health()
|
||||
client.Close()
|
||||
|
||||
// If we can check version and it's compatible, exit
|
||||
if healthErr == nil && health.Compatible {
|
||||
fmt.Fprintf(os.Stderr, "Error: daemon already running (PID %d, version %s)\n", pid, health.Version)
|
||||
fmt.Fprintf(os.Stderr, "Use 'bd daemon --stop%s' to stop it first\n", boolToFlag(global, " --global"))
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Version mismatch - auto-stop old daemon
|
||||
if healthErr == nil && !health.Compatible {
|
||||
fmt.Fprintf(os.Stderr, "Warning: daemon version mismatch (daemon: %s, client: %s)\n", health.Version, Version)
|
||||
fmt.Fprintf(os.Stderr, "Stopping old daemon and starting new one...\n")
|
||||
stopDaemon(pidFile)
|
||||
// Continue with daemon startup
|
||||
}
|
||||
} else {
|
||||
// Can't check version - assume incompatible
|
||||
fmt.Fprintf(os.Stderr, "Error: daemon already running (PID %d)\n", pid)
|
||||
fmt.Fprintf(os.Stderr, "Use 'bd daemon --stop%s' to stop it first\n", boolToFlag(global, " --global"))
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
// Global daemon doesn't support auto-commit/auto-push (no sync loop)
|
||||
@@ -220,6 +243,16 @@ func getEnvBool(key string, defaultValue bool) bool {
|
||||
return defaultValue
|
||||
}
|
||||
|
||||
// getSocketPathForPID determines the socket path for a given PID file
|
||||
func getSocketPathForPID(pidFile string, global bool) string {
|
||||
if global {
|
||||
home, _ := os.UserHomeDir()
|
||||
return filepath.Join(home, ".beads", "bd.sock")
|
||||
}
|
||||
// Local daemon: socket is in same directory as PID file
|
||||
return filepath.Join(filepath.Dir(pidFile), "bd.sock")
|
||||
}
|
||||
|
||||
func getPIDFilePath(global bool) (string, error) {
|
||||
var beadsDir string
|
||||
var err error
|
||||
|
||||
182
cmd/bd/main.go
182
cmd/bd/main.go
@@ -212,18 +212,56 @@ var rootCmd = &cobra.Command{
|
||||
// Perform health check
|
||||
health, healthErr := client.Health()
|
||||
if healthErr == nil && health.Status == "healthy" {
|
||||
// Daemon is healthy - use it
|
||||
daemonClient = client
|
||||
daemonStatus.Mode = "daemon"
|
||||
daemonStatus.Connected = true
|
||||
daemonStatus.Degraded = false
|
||||
daemonStatus.Health = health.Status
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: connected to daemon at %s (health: %s)\n", socketPath, health.Status)
|
||||
// Check version compatibility
|
||||
if !health.Compatible {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: daemon version mismatch (daemon: %s, client: %s), restarting daemon\n",
|
||||
health.Version, Version)
|
||||
}
|
||||
client.Close()
|
||||
|
||||
// Kill old daemon and restart with new version
|
||||
if restartDaemonForVersionMismatch() {
|
||||
// Retry connection after restart
|
||||
client, err = rpc.TryConnect(socketPath)
|
||||
if err == nil && client != nil {
|
||||
if dbPath != "" {
|
||||
absDBPath, _ := filepath.Abs(dbPath)
|
||||
client.SetDatabasePath(absDBPath)
|
||||
}
|
||||
health, healthErr = client.Health()
|
||||
if healthErr == nil && health.Status == "healthy" {
|
||||
daemonClient = client
|
||||
daemonStatus.Mode = "daemon"
|
||||
daemonStatus.Connected = true
|
||||
daemonStatus.Degraded = false
|
||||
daemonStatus.Health = health.Status
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: connected to restarted daemon (version: %s)\n", health.Version)
|
||||
}
|
||||
warnWorktreeDaemon(dbPath)
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
// If restart failed, fall through to direct mode
|
||||
daemonStatus.FallbackReason = FallbackHealthFailed
|
||||
daemonStatus.Detail = fmt.Sprintf("version mismatch (daemon: %s, client: %s) and restart failed",
|
||||
health.Version, Version)
|
||||
} else {
|
||||
// Daemon is healthy and compatible - use it
|
||||
daemonClient = client
|
||||
daemonStatus.Mode = "daemon"
|
||||
daemonStatus.Connected = true
|
||||
daemonStatus.Degraded = false
|
||||
daemonStatus.Health = health.Status
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: connected to daemon at %s (health: %s)\n", socketPath, health.Status)
|
||||
}
|
||||
// Warn if using daemon with git worktrees
|
||||
warnWorktreeDaemon(dbPath)
|
||||
return // Skip direct storage initialization
|
||||
}
|
||||
// Warn if using daemon with git worktrees
|
||||
warnWorktreeDaemon(dbPath)
|
||||
return // Skip direct storage initialization
|
||||
} else {
|
||||
// Health check failed or daemon unhealthy
|
||||
client.Close()
|
||||
@@ -541,6 +579,128 @@ func shouldUseGlobalDaemon() bool {
|
||||
return repoCount > 1
|
||||
}
|
||||
|
||||
// restartDaemonForVersionMismatch stops the old daemon and starts a new one
|
||||
// Returns true if restart was successful
|
||||
func restartDaemonForVersionMismatch() bool {
|
||||
// Use local daemon (global is deprecated)
|
||||
pidFile, err := getPIDFilePath(false)
|
||||
if err != nil {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: failed to get PID file path: %v\n", err)
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
socketPath := getSocketPath()
|
||||
|
||||
// Check if daemon is running and stop it
|
||||
forcedKill := false
|
||||
if isRunning, pid := isDaemonRunning(pidFile); isRunning {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: stopping old daemon (PID %d)\n", pid)
|
||||
}
|
||||
|
||||
process, err := os.FindProcess(pid)
|
||||
if err != nil {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: failed to find process: %v\n", err)
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// Send stop signal
|
||||
if err := sendStopSignal(process); err != nil {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: failed to signal daemon: %v\n", err)
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// Wait for daemon to stop (up to 5 seconds)
|
||||
for i := 0; i < 50; i++ {
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
if isRunning, _ := isDaemonRunning(pidFile); !isRunning {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: old daemon stopped successfully\n")
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// Force kill if still running
|
||||
if isRunning, _ := isDaemonRunning(pidFile); isRunning {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: force killing old daemon\n")
|
||||
}
|
||||
process.Kill()
|
||||
forcedKill = true
|
||||
}
|
||||
}
|
||||
|
||||
// Clean up stale socket and PID file after force kill or if not running
|
||||
if forcedKill || !isDaemonRunningQuiet(pidFile) {
|
||||
os.Remove(socketPath)
|
||||
os.Remove(pidFile)
|
||||
}
|
||||
|
||||
// Start new daemon with current binary version
|
||||
exe, err := os.Executable()
|
||||
if err != nil {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: failed to get executable path: %v\n", err)
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
args := []string{"daemon"}
|
||||
cmd := exec.Command(exe, args...)
|
||||
cmd.Env = append(os.Environ(), "BD_DAEMON_FOREGROUND=1")
|
||||
|
||||
// Set working directory to database directory so daemon finds correct DB
|
||||
if dbPath != "" {
|
||||
cmd.Dir = filepath.Dir(dbPath)
|
||||
}
|
||||
|
||||
configureDaemonProcess(cmd)
|
||||
|
||||
devNull, err := os.OpenFile(os.DevNull, os.O_RDWR, 0)
|
||||
if err == nil {
|
||||
cmd.Stdin = devNull
|
||||
cmd.Stdout = devNull
|
||||
cmd.Stderr = devNull
|
||||
defer devNull.Close()
|
||||
}
|
||||
|
||||
if err := cmd.Start(); err != nil {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: failed to start new daemon: %v\n", err)
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// Reap the process to avoid zombies
|
||||
go cmd.Wait()
|
||||
|
||||
// Wait for daemon to be ready using shared helper
|
||||
if waitForSocketReadiness(socketPath, 5*time.Second) {
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: new daemon started successfully\n")
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
if os.Getenv("BD_DEBUG") != "" {
|
||||
fmt.Fprintf(os.Stderr, "Debug: new daemon failed to become ready\n")
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// isDaemonRunningQuiet checks if daemon is running without output
|
||||
func isDaemonRunningQuiet(pidFile string) bool {
|
||||
isRunning, _ := isDaemonRunning(pidFile)
|
||||
return isRunning
|
||||
}
|
||||
|
||||
// tryAutoStartDaemon attempts to start the daemon in the background
|
||||
// Returns true if daemon was started successfully and socket is ready
|
||||
func tryAutoStartDaemon(socketPath string) bool {
|
||||
|
||||
Reference in New Issue
Block a user