Commit Graph

128 Commits

Author SHA1 Message Date
Jackson Cantrell
0fb3e8d5fe fix: inherit environment in daemon subprocess calls (#876)
The daemon's exec.Command calls were not explicitly setting cmd.Env,
causing subprocesses to fail when the daemon process doesn't have
the expected PATH environment variable. This manifests as:

  Warning: failed to fetch deacon inbox: exec: "gt": executable file not found in $PATH

When the daemon is started by mechanisms with minimal environments
(launchd, systemd, or shells without full PATH), executables like
gt, bd, git, and sqlite3 couldn't be found.

The fix adds cmd.Env = os.Environ() to all 15 subprocess calls across
three files, ensuring they inherit the daemon's full environment.

Affected commands:
- gt mail inbox/delete/send (lifecycle requests, notifications)
- bd sync/show/list/activity (beads operations)
- git fetch/pull (workspace pre-sync)
- sqlite3 (convoy completion queries)

Fixes #875

Co-authored-by: Jackson Cantrell <cantrelljax@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:41:51 -08:00
furiosa
35abe21c50 fix(convoy): pass specific convoy ID in ConvoyWatcher check
When an issue closes, the daemon ConvoyWatcher now passes the specific
convoy ID to gt convoy check instead of running check on all open convoys.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:48:32 -08:00
Steve Yegge
1feb48dd11 Merge pull request #460 from sauerdaniel/pr/shutdown-reliability
fix(shutdown): Improve gastown shutdown reliability
2026-01-21 20:33:54 -08:00
aleiby
0cdcd0a20b fix(daemon): spawn Deacon immediately after killing stuck session (#729)
When checkDeaconHeartbeat detects a stuck Deacon and kills it, the code
relied on ensureDeaconRunning being called on the next heartbeat. However,
on the next heartbeat, checkDeaconHeartbeat exits early when it finds no
session (assuming ensureDeaconRunning already ran), creating a deadlock
where the Deacon is never restarted.

This fix calls ensureDeaconRunning immediately after the kill attempt,
regardless of success or failure, ensuring the Deacon is restarted
promptly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Executed-By: mayor
Role: mayor
2026-01-21 19:31:38 -08:00
slit
9caf5302d4 fix(tmux): use KillSessionWithProcesses to prevent zombie bash processes
When Claude sessions were terminated using KillSession(), bash subprocesses
spawned by Claude's Bash tool could survive because they ignore SIGHUP.
This caused zombie processes to accumulate over time.

Changed all critical session termination paths to use KillSessionWithProcesses()
which explicitly kills all descendant processes before terminating the session.

Fixes: gt-ew3tk

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:45:58 -08:00
gastown/crew/gus
b71188d0b4 fix: use ps for cross-platform daemon detection
Replace Linux-specific /proc/<pid>/cmdline with ps command
for isGasTownDaemon() to work on macOS and Linux.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:25:25 -08:00
Roland Tritsch
6bfe61f796 Fix daemon shutdown detection bug
## Problem
gt shutdown failed to stop orphaned daemon processes because the
detection mechanism ignored errors and had no fallback.

## Root Cause
stopDaemonIfRunning() ignored errors from daemon.IsRunning(), causing:
1. Stale PID files to hide running daemons
2. Corrupted PID files to return silent false
3. No fallback detection for orphaned processes
4. Early return when no sessions running prevented daemon check

## Solution
1. Enhanced IsRunning() to return detailed errors
2. Added process name verification (prevents PID reuse false positives)
3. Added fallback orphan detection using pgrep
4. Fixed stopDaemonIfRunning() to handle errors and use fallback
5. Added daemon check even when no sessions are running

## Testing
Verified shutdown now:
- Detects and reports stale/corrupted PID files
- Finds orphaned daemon processes
- Kills all daemon processes reliably
- Reports detailed status during shutdown
- Works even when no other sessions are running

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-20 20:25:25 -08:00
mayor
65c1fad8ce fix(shutdown): Improve gastown shutdown reliability
Fixes #291 - gastown is very hard to kill/shutdown/stop

Changes:
- Add shutdown coordination: daemon checks shutdown.lock and skips
  heartbeat auto-restarts during shutdown to prevent fighting shutdown
- Add orphaned Claude/node process detection in shutdown verification

The daemon's heartbeat now checks for shutdown.lock (created by gt down)
and skips auto-restart logic when shutdown is in progress. This prevents
the daemon from restarting agents that were intentionally killed during
shutdown.

Shutdown verification now includes detection of orphaned Claude/node
processes that may be left behind when tmux sessions are killed but
child processes don't terminate.
2026-01-20 23:20:50 +01:00
gastown/crew/max
a610283078 feat(roles): switch daemon to config-based roles, remove role beads (Phase 2+3)
Phase 2: Daemon now uses config.LoadRoleDefinition() instead of role beads
- lifecycle.go: getRoleConfigForIdentity() reads from TOML configs
- Layered override resolution: builtin → town → rig

Phase 3: Remove role bead creation and references
- Remove RoleBead field from AgentFields struct
- gt install no longer creates role beads
- Remove 'role' from custom types list
- Delete migrate_agents.go (no longer needed)
- Deprecate beads_role.go (kept for reading existing beads)
- Rewrite role_beads_check.go to validate TOML configs

Existing role beads are orphaned but harmless.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 12:58:01 -08:00
aleiby
22064b0730 feat: Add automatic orphaned claude process cleanup (#588)
* feat: Add automatic orphaned claude process cleanup

Claude Code's Task tool spawns subagent processes that sometimes don't clean up
properly after completion. These accumulate and consume significant memory
(observed: 17 processes using ~6GB RAM).

This change adds automatic cleanup in two places:

1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs
   `gt deacon cleanup-orphans` early in each cycle. More responsive (~30s).

2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net
   when deacon is down.

Detection uses TTY column - processes with TTY "?" have no controlling terminal.
This is safe because:
- Processes in terminals (user sessions) have a TTY like "pts/0" - untouched
- Only kills processes with no controlling terminal
- Orphaned subagents are children of tmux server with no TTY

New files:
- internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses
- internal/util/orphan_test.go: Tests for orphan detection

New command:
- `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup

Fixes #587

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(orphan): add Windows build tag and minimum age check

Addresses review feedback on PR #588:

1. Add //go:build !windows to orphan.go and orphan_test.go
   - The code uses Unix-specific syscalls (SIGTERM, ESRCH) and
     ps command options that don't exist on Windows

2. Add minimum age check (60 seconds) to prevent false positives
   - Prevents race conditions with newly spawned subagents
   - Addresses reviewer concern about cron/systemd processes
   - Uses portable etime format instead of Linux-only etimes

3. Add parseEtime helper with comprehensive tests
   - Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS)
   - etimes (seconds) is Linux-specific, etime is portable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking

Previous approach used process age which doesn't work: a Task subagent
runs without TTY from birth, so a long-running legitimate subagent that
later fails to exit would be immediately SIGKILLed without trying SIGTERM.

New approach uses a state file to track signal history:

1. First encounter → SIGTERM, record PID + timestamp in state file
2. Next cycle (after 60s grace period) → if still alive, SIGKILL
3. Next cycle → if survived SIGKILL, log as unkillable and remove

State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/)
Format: "<pid> <signal> <unix_timestamp>" per line

The state file is automatically cleaned up:
- Dead processes removed on load
- Unkillable processes removed after logging

Also updates callers to use new CleanupResult type which includes
the signal sent (SIGTERM, SIGKILL, or UNKILLABLE).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:35:48 -08:00
Walter McGivney
29f8dd67e2 fix: Add grace period to prevent Deacon restart loop (#590)
* fix(daemon): prevent runaway refinery session spawning

Fixes #566

The daemon spawned 812 refinery sessions over 4 days because:

1. Zombie detection was too strict - used IsAgentRunning(session, "node")
   but Claude reports pane command as version number (e.g., "2.1.7"),
   causing healthy sessions to be killed and recreated every heartbeat.

2. daemon.json patrol config was completely ignored - the daemon never
   loaded or checked the enabled flags.

Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
  for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
  mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
  before calling ensureRefineriesRunning/ensureWitnessesRunning, add
  diagnostic logging for "already running" cases

Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.

* fix: Add grace period to prevent Deacon restart loop

The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes

Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)

After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.

Fixes #589

---------

Co-authored-by: mayor <your-github-email@example.com>
2026-01-16 15:27:41 -08:00
JJ
b1a5241430 fix(beads): align agent bead prefixes and force multi-hyphen IDs (#482)
* fix(beads): align agent bead prefixes and force multi-hyphen IDs

* fix(checkpoint): treat threshold as stale at boundary
2026-01-16 12:33:51 -08:00
Julian Knutsen
e7ca4908dc refactor(config): remove BEADS_DIR from agent environment and add doctor check (#455)
* fix(sling_test): update test for cook dir change

The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): skip tests requiring missing binaries, handle --allow-stale

- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
  binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(config): remove BEADS_DIR from agent environment

Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.

Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(doctor): detect BEADS_DIR in tmux session environment

Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.

The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:13:57 -08:00
sigfawn
3cf77b2e8b fix(daemon): improve error handling and security (#445)
* fix(beads): cache version check and add timeout to prevent cli lag

* fix(mail_queue): add nil check for queue config

Prevents potential nil pointer panic when queue config exists
in map but has nil value. Added || queueCfg == nil check to
the queue lookup condition in runMailClaim function.

Fixes potential panic that could occur if a queue entry exists
in config but with a nil value.

* fix(migrate_agents_test): fix icon expectations to match actual output

The printMigrationResult function uses icons with two leading spaces
("  ✓", "  ⊘", "  ✗") but the test expected icons without spaces.
This fixes the test expectations to match the actual output format.

* fix(hook): handle error from events.LogFeed

Previously the error from LogFeed was silently ignored with _.
Now we log the error to stderr at warning level but don't fail
the operation since the primary hook action succeeded.

* fix(tmux): security and error handling improvements

- Fix unchecked regexp error in IsClaudeRunning (CVE-like)
- Add input sanitization to SetPaneDiedHook to prevent shell injection
- Add session name validation to SetDynamicStatus
- Sanitize mail from/subject in SendNotificationBanner
- Return error on parse failure in GetEnvironment
- Track skipped lines in ListSessionIDs for debuggability

See: tmux.fix for full analysis

* fix(daemon): improve error handling and security

- Capture stderr in syncWorkspace for better debuggability
- Fail fast on git fetch failures to prevent stale code
- Add logging to previously silent bd list errors
- Change notification state file permissions to 0600
- Improve error messages with actual stderr content

This prevents agents from starting with stale code and provides
better visibility into daemon operations.
2026-01-13 22:13:54 -08:00
Johann Dirry
5d96243414 fix: Windows build support with platform-specific process/signal handling
Separate platform-dependent code into build-tagged files:
- process_unix.go / process_windows.go: isProcessRunning() implementation
- signals_unix.go / signals_windows.go: daemon signal handling (Windows lacks SIGUSR1)

Windows implementation uses windows.OpenProcess with PROCESS_QUERY_LIMITED_INFORMATION
and checks exit code against STILL_ACTIVE (259).

Original-PR: #447
Co-Authored-By: Johann Dirry <johann.dirry@microsea.at>
2026-01-13 20:59:15 -08:00
Will Saults
bda248fb9a feat(refinery,boot): add --agent flag for model selection (#469)
* feat(refinery,boot): add --agent flag for model selection (hq-7d5m)

Add --agent flag to gt refinery start/attach/restart and gt boot spawn
commands for consistent model selection across all agent launch points.

Implementation follows the existing pattern from gt deacon start:
- Add StringVar flag for agent alias
- Pass override to Manager/Boot via SetAgentOverride()
- Use BuildAgentStartupCommandWithAgentOverride when override is set

Files affected:
- cmd/gt/refinery.go: add flags to start/attach/restart commands
- internal/refinery/manager.go: add SetAgentOverride and use in Start()
- cmd/gt/boot.go: add flag to spawn command
- internal/boot/boot.go: add SetAgentOverride and use in spawnTmux()

Closes #438

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(refinery,boot): use parameter-passing pattern for --agent flag

Address PR review feedback:

1. ADD TESTS: Add tests for --agent flag existence following witness_test.go pattern
   - internal/cmd/refinery_test.go: tests for start/attach/restart
   - internal/cmd/boot_test.go: test for spawn

2. ALIGN PATTERN: Change from setter pattern to parameter-passing pattern
   - Manager.Start(foreground, agentOverride) instead of SetAgentOverride + Start
   - Boot.Spawn(agentOverride) instead of SetAgentOverride + Spawn
   - Matches witness.go style: Start(foreground bool, agentOverride string, ...)

Updated all callers to pass empty string for default agent:
- internal/daemon/daemon.go
- internal/cmd/rig.go
- internal/cmd/start.go
- internal/cmd/up.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: furiosa <will@saults.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 13:14:47 -08:00
gastown/crew/george
f79614d764 feat(daemon): event-driven convoy completion check (hq-5kmkl)
Add ConvoyWatcher that monitors bd activity for issue closes and
triggers convoy completion checks immediately rather than waiting
for patrol.

- Watch bd activity --follow --town --json for status=closed events
- Query SQLite for convoys tracking the closed issue
- Trigger gt convoy check when tracked issue closes
- Convoys close within seconds of last issue closing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 18:39:11 -08:00
Julian Knutsen
3caf32f9f7 fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv (#385)
* fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv

Fix polecats not having GT_ROOT environment variable set. The symptom was
polecat sessions showing GT_ROOT="" instead of the expected town root.

Root cause: AgentEnvSimple doesn't set TownRoot, but AgentEnv was always
setting env["GT_ROOT"] = cfg.TownRoot even when empty. This empty value
in export commands would override the tmux session environment.

Changes:
- Only set GT_ROOT and BEADS_DIR in env map if non-empty
- Refactor daemon.go to use AgentEnv with full AgentEnvConfig instead
  of AgentEnvSimple + manual additions
- Update test to verify keys are absent rather than empty

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(lint): silence unparam for unused executeExternalActions args

The external action params (beadID, severity, description) are reserved
for future email/SMS/slack implementations but currently unused.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: max <steve.yegge@gmail.com>
2026-01-12 02:45:03 -08:00
abhijit
833724a7ed new changes 2026-01-11 19:03:06 -08:00
mayor
0f6759e4a2 docs(daemon): update comment to reflect self-cleaning model
The comment incorrectly referred to polecats without hooked work as "idle".
With the self-cleaning model, polecats self-nuke on completion - there are
no idle polecats. A polecat without work is orphaned (needs cleanup).

Closes: gt-0jn0k

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 22:46:51 -08:00
nux
c4fcdd88c8 fix(daemon,beads): use correct agent bead ID format and bd create flags
Two fixes in this commit:

1. daemon/lifecycle.go: Fix agent bead ID pattern for GUPP/orphaned work checks
   - Wrong: gt-polecat-<rig>-<name> (e.g., gt-polecat-gastown-nux)
   - Correct: <prefix>-<rig>-polecat-<name> (e.g., gt-gastown-polecat-nux)
   - Use config.GetRigPrefix() instead of hardcoding gt prefix
   - Use beads.ParseAgentBeadID() in extractRigFromAgentID

2. beads/beads.go: Fix invalid --add-label flag in bd create calls
   - bd create uses --labels, not --add-label
   - bd update uses --add-label (unchanged, was correct)
   - Fixed Create, CreateWithID, CreateAgentBead, CreateRigBead

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:08:55 -08:00
gastown/crew/gus
86751e1ea5 feat(witness): add --env flag for environment variable overrides
Extends the --agent flag with a more general --env flag that allows
setting arbitrary environment variables when starting a witness.

Precedence (highest to lowest):
1. CLI --env overrides
2. Role bead env_vars
3. config.AgentEnv() defaults

Examples:
  gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku
  gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true

Co-authored-by: joshuavial <git@codewithjv.com>
2026-01-09 22:00:43 -08:00
joshuavial
0d3f6c9654 feat: allow witness restart agent override 2026-01-09 21:56:53 -08:00
gastown/crew/gus
7a1ed80068 fix: remove unused identity parameter from setSessionEnvironment 2026-01-09 21:54:54 -08:00
julianknutsen
e999ceb1c1 refactor: consolidate agent env vars into config.AgentEnv
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)

Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:30 -08:00
julianknutsen
1d88a73eaa fix: use ResolveBeadsDir for polecat BEADS_DIR
Previously, polecat startup used hardcoded paths for BEADS_DIR that
didn't follow redirects for repos with tracked beads. This meant
polecats working in worktrees (where .beads/redirect points to the
actual beads location) would use the wrong beads directory.

Fixed locations:
- daemon.go: polecat startup now uses ResolveBeadsDir
- polecat/session_manager.go: session startup now uses ResolveBeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
julianknutsen
7150ce2624 refactor: update managers to use RoleEnvVars
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.

Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers

This ensures all agents receive the same identity env vars regardless
of startup path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
nux
692d6819f2 feat(crash): improve crash logging and mass death detection
Add comprehensive crash logging improvements to help diagnose mass session death events:

- Add TypeSessionDeath and TypeMassDeath event types for feed visibility
- Log pre-death events before killing sessions (who killed, why)
- Add mass death detection in daemon (3+ deaths in 30s triggers alert)
- Add macOS crash report check in gt doctor
- Support session death events in townlog and feed curator

Closes hq-kt1o6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 14:11:09 -08:00
gastown/crew/max
9b2f4a7652 feat(polecat): add repo path to worktrees for LLM ergonomics (GH#283)
Changes polecat worktree structure from:
  polecats/<name>/
to:
  polecats/<name>/<rigname>/

This gives Claude Code agents a recognizable directory name (e.g., tidepool/)
in their cwd instead of just the polecat name, preventing confusion about
which repo they are working in.

Key changes:
- Add clonePath() method to manager.go and session_manager.go for the actual
  git worktree path, keeping polecatDir() for existence checks
- Update Add(), RepairWorktree(), Remove() to use new structure
- Update daemon lifecycle and restart code for new paths
- Update witness handlers to detect both structures
- Update doctor checks (rig_check, branch_check, config_check,
  claude_settings_check) for backward compatibility
- All code includes fallback to old structure for existing polecats

Fixes #283

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 23:16:10 -08:00
george
65c3e90374 feat: Add Codex and OpenCode runtime backend support (#281)
Adds support for alternative AI runtime backends (Codex, OpenCode) alongside
the default Claude backend through a runtime abstraction layer.

- internal/runtime/runtime.go - Runtime-agnostic helper functions
- Extended RuntimeConfig with provider-specific settings
- internal/opencode/ for OpenCode plugin support
- Updated session managers to use runtime abstraction
- Removed unused ensureXxxSession functions
- Fixed daemon.go indentation, updated terminology to runtime

Backward compatible: Claude remains default runtime.

Co-Authored-By: Ben Kraus <ben@cinematicsoftware.com>
Co-Authored-By: Cameron Palmer <cameronmpalmer@users.noreply.github.com>
2026-01-08 22:56:37 -08:00
Cameron Palmer
9fe9323b9c fix: clean up dead code and fix indentation in runtime PR
- Remove unused ensureRefinerySession function from start.go
- Remove unused ensureSession and ensureWitness functions from up.go
- Remove unused ensureWitnessSession function from witness.go
- Remove orphaned imports (runtime, session, constants, config, rig, filepath, time)
- Fix indentation error in daemon.go triggerPendingSpawns comment

These functions were added as part of the Codex/OpenCode runtime support
but were never wired up. The existing managers (refinery.Manager.Start,
witness.Manager.Start) already handle session creation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 22:51:51 -08:00
Subhrajit Makur
c8c765a239 fix: improve integration test reliability (#13)
- Add custom types config after bd init in daemon tests
- Replace fixed sleeps with poll-based waiting in tmux tests
- Skip beads integration test for JSONL-only repos

Fixes flaky test failures in parallel execution.
2026-01-08 17:58:16 -08:00
Steve Yegge
da906847dd Merge pull request #279 from joshuavial/fix/polecat-dotdir-scan
fix: extend polecat dot-dir filtering beyond #258
2026-01-08 17:23:26 -08:00
Ben Kraus
38adfa4d8b codex 2026-01-08 12:36:54 -05:00
joshuavial
c699e3e2ed Stabilize bd role config tests 2026-01-08 22:43:31 +13:00
julianknutsen
65ecb6cafd fix(daemon): restore ensureDeaconRunning to heartbeat and use Manager
The heartbeat now explicitly calls ensureDeaconRunning() for basic
"is Deacon alive" checks, while Boot handles intelligent triage
(stuck/nudge/interrupt decisions).

Changed ensureDeaconRunning to use deacon.Manager.Start() instead of
duplicating startup logic. This gives daemon the same benefits:
- WaitForShellReady (fixes race condition)
- Claude settings setup
- Theming
- StartupNudge and PropulsionNudge (GUPP)

Heartbeat order:
1. ensureDeaconRunning - restart if dead (via Manager)
2. ensureBootRunning - intelligent triage for stuck states
3. checkDeaconHeartbeat - belt-and-suspenders fallback
4-11. Other checks (witnesses, refineries, polecats, etc.)

This was inadvertently removed when Boot was introduced, which
delegated all Deacon checks to Boot. But Boot's mol doesn't actually
restart Deacon - it just reports. Now responsibilities are clear:
daemon ensures alive, Boot ensures responsive.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:23:51 -08:00
Joshua Vial
a9ed342be6 fix: ignore hidden directories when enumerating polecats (#258)
* fix(sling): route bd mol commands to target rig directory

* Fix daemon polecat enumeration to ignore hidden dirs

* Ignore hidden dirs when discovering rig polecats

* Fix CI: enable beads custom types during install

---------

Co-authored-by: joshuavial <git@codewithjv.com>
2026-01-07 20:48:09 -08:00
gus
06d40925d1 feat(daemon): warn when rig wisp config missing
Logs a warning when checking rig operational state if the wisp
config file doesn't exist. This helps diagnose cases where a
parked rig unexpectedly restarts because its parked state was lost.
2026-01-07 01:34:49 -08:00
julianknutsen
29e2c6ed9c Fix daemon polecat respawn to pass rigPath for rig agent settings
Daemon's restartPolecatSession was calling BuildPolecatStartupCommand
with empty rigPath, causing polecats to fall back to town-level defaults
instead of honoring rig-specific agent settings.

Now passes rigPath so rig agent settings are honored.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 22:44:16 -08:00
julianknutsen
72544cc06d Unify agent startup with Manager pattern
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.

Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 21:44:04 -08:00
gastown/crew/joe
fc4b9de02c fix: use tmux for agent liveness in daemon checks (gt-zecmc)
Complete the "discover, don't track" refactoring:

- checkGUPPViolations: use tmux.IsClaudeRunning() instead of agent_state
- checkOrphanedWork: derive dead agents from tmux, not agent_state=dead
- assessStaleness: rely on HasActiveSession (tmux), not agent_state

Non-observable states (stuck, awaiting-gate) are still respected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:50:01 -08:00
rictus
d00e73f110 feat(daemon): respect rig operational state for auto-start
Update daemon to check rig config before auto-starting agents:
- Check wisp config "status" - skip if parked or docked
- Check "auto_restart" config - skip if blocked or false
- Log skip reason for visibility

Affects ensureWitnessRunning, ensureRefineryRunning,
restartPolecatSession, and lifecycle restartSession.

(gt-68c46)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:43:00 -08:00
gastown/crew/joe
87169a3fc7 fix: complete removal of agent_state observable tracking (gt-zecmc)
Additional cleanup from the agent_state refactoring:

- Remove dead code: checkStaleAgents(), markAgentDead() in lifecycle.go
- Remove dead code: reportAgentState(), getAgentFields() in prime.go
- Update getAgentBeadState() comment to clarify non-observable states only
- Update mol-witness-patrol.formula.toml to use tmux discovery
- Update mol-polecat-lease.formula.toml to use POLECAT_DONE mail
- Update docs/watchdog-chain.md to reflect new architecture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:42:23 -08:00
gastown/crew/joe
1f44482ad0 fix: remove observable states from agent_state (discover, don't track)
The agent_state field was recording observable state like "running",
"dead", "idle" which violated the "Discover, Don't Track" principle.
This caused stale state bugs where agents were marked "dead" in beads
but actually running in tmux.

Changes:
- Remove daemon's checkStaleAgents() which marked agents "dead"
- Simplify ensureXxxRunning() to use tmux.IsClaudeRunning() directly
- Remove reportAgentState() calls from gt prime and gt handoff
- Add SetHookBead/ClearHookBead helpers that don't update agent_state
- Use ClearHookBead in gt done and gt unsling
- Simplify gt status to derive state from tmux, not bead

Non-observable states (stuck, awaiting-gate, muted, paused) are still
set because they represent intentional agent decisions that can't be
discovered from tmux state.

Fixes: gt-zecmc

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:32:02 -08:00
gastown/crew/joe
6dbb841e22 fix(daemon): nudge agents on state divergence instead of silent accept
When the daemon detects that an agent bead state doesn't match tmux
(e.g., bead says stopped but Claude is running), it now:

1. Logs the divergence clearly with STATE DIVERGENCE prefix
2. Nudges the agent with an actionable command to fix its state
3. Still skips the restart (safety - don't kill healthy sessions)

This prevents silent state drift where bead state diverges from reality.
Applied to: Deacon, Witness, Refinery ensure functions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 19:23:52 -08:00
jv
99aae0bf02 fix: use canonical hq role bead IDs 2026-01-06 19:13:14 -08:00
jv
02ca9e43fa fix: honor rig agent when starting witness/refinery 2026-01-06 19:12:55 -08:00
gastown/crew/joe
c2451b85e7 Merge origin/main into fix/205-address-claude-startup-issues
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
2026-01-06 19:04:29 -08:00
Julian Knutsen
9d7dcde1e2 feat: Unified beads redirect for tracked and local beads (#222)
* feat: Beads redirect architecture for tracked and local beads

This change implements proper redirect handling so that all rig agents
(Witness, Refinery, Crew, Polecats) can work with both:
- Tracked beads: .beads/ checked into git at mayor/rig/.beads
- Local beads: .beads/ created at rig root during gt rig add

Key changes:

1. SetupRedirect now handles tracked beads by skipping redirect chains.
   The bd CLI doesn't support chains (A→B→C), so worktrees redirect
   directly to the final destination (mayor/rig/.beads for tracked).

2. ResolveBeadsDir is now used consistently in polecat and refinery
   managers instead of hardcoded mayor/rig paths.

3. Rig-level agents (witness, refinery) now use rig beads with rig
   prefix instead of town beads. This follows the architecture where
   town beads are only for Mayor/Deacon.

4. prime.go simplified to always use ../../.beads for crew redirects,
   letting rig-level redirect handle tracked vs local routing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(doctor): Add beads-redirect check for tracked beads

When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root
needs a redirect file pointing to that location. This check:

- Detects missing rig-level redirect for tracked beads
- Verifies redirect points to correct location (mayor/rig/.beads)
- Auto-fixes with 'gt doctor --fix'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Handle fileLock.Unlock error in daemon

Wrap fileLock.Unlock() return value to satisfy errcheck linter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 12:59:37 -08:00
julianknutsen
09bbb0f430 Fix lint issues and sparse checkout for empty repos
- Handle empty repos in ConfigureSparseCheckout (skip read-tree when no HEAD)
- Fix errcheck: wrap fileLock.Unlock() error in defer
- Fix unparam: remove unused *rig.Rig return from getWitnessManager
- Fix unparam: mark unused agentType parameter with blank identifier

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 02:16:15 -08:00