Commit Graph

79 Commits

Author SHA1 Message Date
Roland Tritsch
6bfe61f796 Fix daemon shutdown detection bug
## Problem
gt shutdown failed to stop orphaned daemon processes because the
detection mechanism ignored errors and had no fallback.

## Root Cause
stopDaemonIfRunning() ignored errors from daemon.IsRunning(), causing:
1. Stale PID files to hide running daemons
2. Corrupted PID files to return silent false
3. No fallback detection for orphaned processes
4. Early return when no sessions running prevented daemon check

## Solution
1. Enhanced IsRunning() to return detailed errors
2. Added process name verification (prevents PID reuse false positives)
3. Added fallback orphan detection using pgrep
4. Fixed stopDaemonIfRunning() to handle errors and use fallback
5. Added daemon check even when no sessions are running

## Testing
Verified shutdown now:
- Detects and reports stale/corrupted PID files
- Finds orphaned daemon processes
- Kills all daemon processes reliably
- Reports detailed status during shutdown
- Works even when no other sessions are running

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-20 20:25:25 -08:00
aleiby
22064b0730 feat: Add automatic orphaned claude process cleanup (#588)
* feat: Add automatic orphaned claude process cleanup

Claude Code's Task tool spawns subagent processes that sometimes don't clean up
properly after completion. These accumulate and consume significant memory
(observed: 17 processes using ~6GB RAM).

This change adds automatic cleanup in two places:

1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs
   `gt deacon cleanup-orphans` early in each cycle. More responsive (~30s).

2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net
   when deacon is down.

Detection uses TTY column - processes with TTY "?" have no controlling terminal.
This is safe because:
- Processes in terminals (user sessions) have a TTY like "pts/0" - untouched
- Only kills processes with no controlling terminal
- Orphaned subagents are children of tmux server with no TTY

New files:
- internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses
- internal/util/orphan_test.go: Tests for orphan detection

New command:
- `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup

Fixes #587

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(orphan): add Windows build tag and minimum age check

Addresses review feedback on PR #588:

1. Add //go:build !windows to orphan.go and orphan_test.go
   - The code uses Unix-specific syscalls (SIGTERM, ESRCH) and
     ps command options that don't exist on Windows

2. Add minimum age check (60 seconds) to prevent false positives
   - Prevents race conditions with newly spawned subagents
   - Addresses reviewer concern about cron/systemd processes
   - Uses portable etime format instead of Linux-only etimes

3. Add parseEtime helper with comprehensive tests
   - Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS)
   - etimes (seconds) is Linux-specific, etime is portable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking

Previous approach used process age which doesn't work: a Task subagent
runs without TTY from birth, so a long-running legitimate subagent that
later fails to exit would be immediately SIGKILLed without trying SIGTERM.

New approach uses a state file to track signal history:

1. First encounter → SIGTERM, record PID + timestamp in state file
2. Next cycle (after 60s grace period) → if still alive, SIGKILL
3. Next cycle → if survived SIGKILL, log as unkillable and remove

State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/)
Format: "<pid> <signal> <unix_timestamp>" per line

The state file is automatically cleaned up:
- Dead processes removed on load
- Unkillable processes removed after logging

Also updates callers to use new CleanupResult type which includes
the signal sent (SIGTERM, SIGKILL, or UNKILLABLE).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:35:48 -08:00
Walter McGivney
29f8dd67e2 fix: Add grace period to prevent Deacon restart loop (#590)
* fix(daemon): prevent runaway refinery session spawning

Fixes #566

The daemon spawned 812 refinery sessions over 4 days because:

1. Zombie detection was too strict - used IsAgentRunning(session, "node")
   but Claude reports pane command as version number (e.g., "2.1.7"),
   causing healthy sessions to be killed and recreated every heartbeat.

2. daemon.json patrol config was completely ignored - the daemon never
   loaded or checked the enabled flags.

Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
  for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
  mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
  before calling ensureRefineriesRunning/ensureWitnessesRunning, add
  diagnostic logging for "already running" cases

Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.

* fix: Add grace period to prevent Deacon restart loop

The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes

Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)

After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.

Fixes #589

---------

Co-authored-by: mayor <your-github-email@example.com>
2026-01-16 15:27:41 -08:00
JJ
b1a5241430 fix(beads): align agent bead prefixes and force multi-hyphen IDs (#482)
* fix(beads): align agent bead prefixes and force multi-hyphen IDs

* fix(checkpoint): treat threshold as stale at boundary
2026-01-16 12:33:51 -08:00
Julian Knutsen
e7ca4908dc refactor(config): remove BEADS_DIR from agent environment and add doctor check (#455)
* fix(sling_test): update test for cook dir change

The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): skip tests requiring missing binaries, handle --allow-stale

- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
  binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(config): remove BEADS_DIR from agent environment

Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.

Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(doctor): detect BEADS_DIR in tmux session environment

Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.

The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:13:57 -08:00
Johann Dirry
5d96243414 fix: Windows build support with platform-specific process/signal handling
Separate platform-dependent code into build-tagged files:
- process_unix.go / process_windows.go: isProcessRunning() implementation
- signals_unix.go / signals_windows.go: daemon signal handling (Windows lacks SIGUSR1)

Windows implementation uses windows.OpenProcess with PROCESS_QUERY_LIMITED_INFORMATION
and checks exit code against STILL_ACTIVE (259).

Original-PR: #447
Co-Authored-By: Johann Dirry <johann.dirry@microsea.at>
2026-01-13 20:59:15 -08:00
Will Saults
bda248fb9a feat(refinery,boot): add --agent flag for model selection (#469)
* feat(refinery,boot): add --agent flag for model selection (hq-7d5m)

Add --agent flag to gt refinery start/attach/restart and gt boot spawn
commands for consistent model selection across all agent launch points.

Implementation follows the existing pattern from gt deacon start:
- Add StringVar flag for agent alias
- Pass override to Manager/Boot via SetAgentOverride()
- Use BuildAgentStartupCommandWithAgentOverride when override is set

Files affected:
- cmd/gt/refinery.go: add flags to start/attach/restart commands
- internal/refinery/manager.go: add SetAgentOverride and use in Start()
- cmd/gt/boot.go: add flag to spawn command
- internal/boot/boot.go: add SetAgentOverride and use in spawnTmux()

Closes #438

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(refinery,boot): use parameter-passing pattern for --agent flag

Address PR review feedback:

1. ADD TESTS: Add tests for --agent flag existence following witness_test.go pattern
   - internal/cmd/refinery_test.go: tests for start/attach/restart
   - internal/cmd/boot_test.go: test for spawn

2. ALIGN PATTERN: Change from setter pattern to parameter-passing pattern
   - Manager.Start(foreground, agentOverride) instead of SetAgentOverride + Start
   - Boot.Spawn(agentOverride) instead of SetAgentOverride + Spawn
   - Matches witness.go style: Start(foreground bool, agentOverride string, ...)

Updated all callers to pass empty string for default agent:
- internal/daemon/daemon.go
- internal/cmd/rig.go
- internal/cmd/start.go
- internal/cmd/up.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: furiosa <will@saults.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 13:14:47 -08:00
gastown/crew/george
f79614d764 feat(daemon): event-driven convoy completion check (hq-5kmkl)
Add ConvoyWatcher that monitors bd activity for issue closes and
triggers convoy completion checks immediately rather than waiting
for patrol.

- Watch bd activity --follow --town --json for status=closed events
- Query SQLite for convoys tracking the closed issue
- Trigger gt convoy check when tracked issue closes
- Convoys close within seconds of last issue closing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 18:39:11 -08:00
Julian Knutsen
3caf32f9f7 fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv (#385)
* fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv

Fix polecats not having GT_ROOT environment variable set. The symptom was
polecat sessions showing GT_ROOT="" instead of the expected town root.

Root cause: AgentEnvSimple doesn't set TownRoot, but AgentEnv was always
setting env["GT_ROOT"] = cfg.TownRoot even when empty. This empty value
in export commands would override the tmux session environment.

Changes:
- Only set GT_ROOT and BEADS_DIR in env map if non-empty
- Refactor daemon.go to use AgentEnv with full AgentEnvConfig instead
  of AgentEnvSimple + manual additions
- Update test to verify keys are absent rather than empty

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(lint): silence unparam for unused executeExternalActions args

The external action params (beadID, severity, description) are reserved
for future email/SMS/slack implementations but currently unused.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: max <steve.yegge@gmail.com>
2026-01-12 02:45:03 -08:00
mayor
0f6759e4a2 docs(daemon): update comment to reflect self-cleaning model
The comment incorrectly referred to polecats without hooked work as "idle".
With the self-cleaning model, polecats self-nuke on completion - there are
no idle polecats. A polecat without work is orphaned (needs cleanup).

Closes: gt-0jn0k

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 22:46:51 -08:00
gastown/crew/gus
86751e1ea5 feat(witness): add --env flag for environment variable overrides
Extends the --agent flag with a more general --env flag that allows
setting arbitrary environment variables when starting a witness.

Precedence (highest to lowest):
1. CLI --env overrides
2. Role bead env_vars
3. config.AgentEnv() defaults

Examples:
  gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku
  gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true

Co-authored-by: joshuavial <git@codewithjv.com>
2026-01-09 22:00:43 -08:00
joshuavial
0d3f6c9654 feat: allow witness restart agent override 2026-01-09 21:56:53 -08:00
julianknutsen
e999ceb1c1 refactor: consolidate agent env vars into config.AgentEnv
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)

Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:30 -08:00
julianknutsen
1d88a73eaa fix: use ResolveBeadsDir for polecat BEADS_DIR
Previously, polecat startup used hardcoded paths for BEADS_DIR that
didn't follow redirects for repos with tracked beads. This meant
polecats working in worktrees (where .beads/redirect points to the
actual beads location) would use the wrong beads directory.

Fixed locations:
- daemon.go: polecat startup now uses ResolveBeadsDir
- polecat/session_manager.go: session startup now uses ResolveBeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
julianknutsen
7150ce2624 refactor: update managers to use RoleEnvVars
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.

Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers

This ensures all agents receive the same identity env vars regardless
of startup path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
nux
692d6819f2 feat(crash): improve crash logging and mass death detection
Add comprehensive crash logging improvements to help diagnose mass session death events:

- Add TypeSessionDeath and TypeMassDeath event types for feed visibility
- Log pre-death events before killing sessions (who killed, why)
- Add mass death detection in daemon (3+ deaths in 30s triggers alert)
- Add macOS crash report check in gt doctor
- Support session death events in townlog and feed curator

Closes hq-kt1o6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 14:11:09 -08:00
gastown/crew/max
9b2f4a7652 feat(polecat): add repo path to worktrees for LLM ergonomics (GH#283)
Changes polecat worktree structure from:
  polecats/<name>/
to:
  polecats/<name>/<rigname>/

This gives Claude Code agents a recognizable directory name (e.g., tidepool/)
in their cwd instead of just the polecat name, preventing confusion about
which repo they are working in.

Key changes:
- Add clonePath() method to manager.go and session_manager.go for the actual
  git worktree path, keeping polecatDir() for existence checks
- Update Add(), RepairWorktree(), Remove() to use new structure
- Update daemon lifecycle and restart code for new paths
- Update witness handlers to detect both structures
- Update doctor checks (rig_check, branch_check, config_check,
  claude_settings_check) for backward compatibility
- All code includes fallback to old structure for existing polecats

Fixes #283

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 23:16:10 -08:00
george
65c3e90374 feat: Add Codex and OpenCode runtime backend support (#281)
Adds support for alternative AI runtime backends (Codex, OpenCode) alongside
the default Claude backend through a runtime abstraction layer.

- internal/runtime/runtime.go - Runtime-agnostic helper functions
- Extended RuntimeConfig with provider-specific settings
- internal/opencode/ for OpenCode plugin support
- Updated session managers to use runtime abstraction
- Removed unused ensureXxxSession functions
- Fixed daemon.go indentation, updated terminology to runtime

Backward compatible: Claude remains default runtime.

Co-Authored-By: Ben Kraus <ben@cinematicsoftware.com>
Co-Authored-By: Cameron Palmer <cameronmpalmer@users.noreply.github.com>
2026-01-08 22:56:37 -08:00
Cameron Palmer
9fe9323b9c fix: clean up dead code and fix indentation in runtime PR
- Remove unused ensureRefinerySession function from start.go
- Remove unused ensureSession and ensureWitness functions from up.go
- Remove unused ensureWitnessSession function from witness.go
- Remove orphaned imports (runtime, session, constants, config, rig, filepath, time)
- Fix indentation error in daemon.go triggerPendingSpawns comment

These functions were added as part of the Codex/OpenCode runtime support
but were never wired up. The existing managers (refinery.Manager.Start,
witness.Manager.Start) already handle session creation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 22:51:51 -08:00
Ben Kraus
38adfa4d8b codex 2026-01-08 12:36:54 -05:00
julianknutsen
65ecb6cafd fix(daemon): restore ensureDeaconRunning to heartbeat and use Manager
The heartbeat now explicitly calls ensureDeaconRunning() for basic
"is Deacon alive" checks, while Boot handles intelligent triage
(stuck/nudge/interrupt decisions).

Changed ensureDeaconRunning to use deacon.Manager.Start() instead of
duplicating startup logic. This gives daemon the same benefits:
- WaitForShellReady (fixes race condition)
- Claude settings setup
- Theming
- StartupNudge and PropulsionNudge (GUPP)

Heartbeat order:
1. ensureDeaconRunning - restart if dead (via Manager)
2. ensureBootRunning - intelligent triage for stuck states
3. checkDeaconHeartbeat - belt-and-suspenders fallback
4-11. Other checks (witnesses, refineries, polecats, etc.)

This was inadvertently removed when Boot was introduced, which
delegated all Deacon checks to Boot. But Boot's mol doesn't actually
restart Deacon - it just reports. Now responsibilities are clear:
daemon ensures alive, Boot ensures responsive.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:23:51 -08:00
Joshua Vial
a9ed342be6 fix: ignore hidden directories when enumerating polecats (#258)
* fix(sling): route bd mol commands to target rig directory

* Fix daemon polecat enumeration to ignore hidden dirs

* Ignore hidden dirs when discovering rig polecats

* Fix CI: enable beads custom types during install

---------

Co-authored-by: joshuavial <git@codewithjv.com>
2026-01-07 20:48:09 -08:00
gus
06d40925d1 feat(daemon): warn when rig wisp config missing
Logs a warning when checking rig operational state if the wisp
config file doesn't exist. This helps diagnose cases where a
parked rig unexpectedly restarts because its parked state was lost.
2026-01-07 01:34:49 -08:00
julianknutsen
29e2c6ed9c Fix daemon polecat respawn to pass rigPath for rig agent settings
Daemon's restartPolecatSession was calling BuildPolecatStartupCommand
with empty rigPath, causing polecats to fall back to town-level defaults
instead of honoring rig-specific agent settings.

Now passes rigPath so rig agent settings are honored.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 22:44:16 -08:00
julianknutsen
72544cc06d Unify agent startup with Manager pattern
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.

Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 21:44:04 -08:00
rictus
d00e73f110 feat(daemon): respect rig operational state for auto-start
Update daemon to check rig config before auto-starting agents:
- Check wisp config "status" - skip if parked or docked
- Check "auto_restart" config - skip if blocked or false
- Log skip reason for visibility

Affects ensureWitnessRunning, ensureRefineryRunning,
restartPolecatSession, and lifecycle restartSession.

(gt-68c46)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:43:00 -08:00
gastown/crew/joe
1f44482ad0 fix: remove observable states from agent_state (discover, don't track)
The agent_state field was recording observable state like "running",
"dead", "idle" which violated the "Discover, Don't Track" principle.
This caused stale state bugs where agents were marked "dead" in beads
but actually running in tmux.

Changes:
- Remove daemon's checkStaleAgents() which marked agents "dead"
- Simplify ensureXxxRunning() to use tmux.IsClaudeRunning() directly
- Remove reportAgentState() calls from gt prime and gt handoff
- Add SetHookBead/ClearHookBead helpers that don't update agent_state
- Use ClearHookBead in gt done and gt unsling
- Simplify gt status to derive state from tmux, not bead

Non-observable states (stuck, awaiting-gate, muted, paused) are still
set because they represent intentional agent decisions that can't be
discovered from tmux state.

Fixes: gt-zecmc

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:32:02 -08:00
gastown/crew/joe
6dbb841e22 fix(daemon): nudge agents on state divergence instead of silent accept
When the daemon detects that an agent bead state doesn't match tmux
(e.g., bead says stopped but Claude is running), it now:

1. Logs the divergence clearly with STATE DIVERGENCE prefix
2. Nudges the agent with an actionable command to fix its state
3. Still skips the restart (safety - don't kill healthy sessions)

This prevents silent state drift where bead state diverges from reality.
Applied to: Deacon, Witness, Refinery ensure functions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 19:23:52 -08:00
gastown/crew/joe
c2451b85e7 Merge origin/main into fix/205-address-claude-startup-issues
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
2026-01-06 19:04:29 -08:00
Julian Knutsen
9d7dcde1e2 feat: Unified beads redirect for tracked and local beads (#222)
* feat: Beads redirect architecture for tracked and local beads

This change implements proper redirect handling so that all rig agents
(Witness, Refinery, Crew, Polecats) can work with both:
- Tracked beads: .beads/ checked into git at mayor/rig/.beads
- Local beads: .beads/ created at rig root during gt rig add

Key changes:

1. SetupRedirect now handles tracked beads by skipping redirect chains.
   The bd CLI doesn't support chains (A→B→C), so worktrees redirect
   directly to the final destination (mayor/rig/.beads for tracked).

2. ResolveBeadsDir is now used consistently in polecat and refinery
   managers instead of hardcoded mayor/rig paths.

3. Rig-level agents (witness, refinery) now use rig beads with rig
   prefix instead of town beads. This follows the architecture where
   town beads are only for Mayor/Deacon.

4. prime.go simplified to always use ../../.beads for crew redirects,
   letting rig-level redirect handle tracked vs local routing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(doctor): Add beads-redirect check for tracked beads

When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root
needs a redirect file pointing to that location. This check:

- Detects missing rig-level redirect for tracked beads
- Verifies redirect points to correct location (mayor/rig/.beads)
- Auto-fixes with 'gt doctor --fix'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Handle fileLock.Unlock error in daemon

Wrap fileLock.Unlock() return value to satisfy errcheck linter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 12:59:37 -08:00
julianknutsen
09bbb0f430 Fix lint issues and sparse checkout for empty repos
- Handle empty repos in ConfigureSparseCheckout (skip read-tree when no HEAD)
- Fix errcheck: wrap fileLock.Unlock() error in defer
- Fix unparam: remove unused *rig.Rig return from getWitnessManager
- Fix unparam: mark unused agentType parameter with blank identifier

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 02:16:15 -08:00
mayor
31a32c084b Unify agent startup with GUPP propulsion nudge
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.

Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
  WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
  add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
  unified startup with proper nudges

This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 01:51:25 -08:00
gastown/crew/jack
e8d27e7212 fix: Create and lookup rig agent beads with correct prefix
Per docs/architecture.md, Witness and Refinery are rig-level agents that
should use the rig's configured prefix (e.g., pi- for pixelforge) instead
of hardcoded "gt-".

This extends PR #183's creation fix to also fix all lookup paths:
- internal/rig/manager.go: Create agent beads in rig beads with rig prefix
- internal/daemon/daemon.go: Use rig prefix when looking up agent state
- internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping
- internal/cmd/sling.go: Pass townRoot for prefix lookup
- internal/cmd/unsling.go: Pass townRoot for prefix lookup
- internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups
- internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups
- internal/config/loader.go: Add GetRigPrefix helper

Without this fix, the daemon would:
- Create pi-gastown-witness but look for gt-gastown-witness
- Report agents as missing/dead when they are running
- Fail to manage agent lifecycle correctly

Based on work by Johann Taberlet in PR #183.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:39:57 -08:00
gastown
43cca06460 feat: Add Windows-compatible file locking for daemon
Replace Unix-only syscall.Flock with gofrs/flock library for
cross-platform file locking. This enables the daemon to run on
Windows in addition to Unix-like systems.

- Add github.com/gofrs/flock v0.13.0 dependency
- Replace syscall.Flock calls with flock.TryLock/Unlock
- Maintain same non-blocking exclusive lock semantics

(gt-5354h)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:21:09 -08:00
jakehemmerle
c25ff34b5b fix(daemon): prevent orphan daemons via file locking
Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU
race condition where concurrent 'gt daemon start' commands could spawn
multiple daemons. Only the first to acquire the lock succeeds; others
exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so
multiple GT instances from different directories work independently.

Also detect race losers in runDaemonStart() by comparing spawned PID
with PID file, reporting 'already running' instead of false success.
2026-01-04 15:30:57 -05:00
max
1b69576573 fix: Address golangci-lint errors (errcheck, gosec) (#76)
Apply PR #76 from dannomayernotabot:

- Add golangci exclusions for internal package false positives
- Tighten file permissions (0644 -> 0600) for sensitive files
- Add ReadHeaderTimeout to HTTP server (slowloris prevention)
- Explicit error ignoring with _ = for intentional cases
- Add //nolint comments with justifications
- Spelling: cancelled -> canceled (US locale)

Co-Authored-By: dannomayernotabot <noreply@github.com>

🤖 Generated with Claude Code
2026-01-03 16:11:55 -08:00
joe
4bcf50bf1c Revert to simple gt-mayor/gt-deacon session names
Reverts the session naming changes from PR #70. Multi-town support
on a single machine is not a real use case - rigs provide project
isolation, and true isolation should use containers/VMs.

Changes:
- MayorSessionName() and DeaconSessionName() no longer take townName parameter
- ParseSessionName() handles simple gt-mayor and gt-deacon formats
- Removed Town field from AgentIdentity and AgentSession structs
- Updated all callers and tests

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:33:24 -08:00
rictus
40b8dec2a0 feat(daemon): Add circuit breaker for stuck agents (gt-72cqu)
- Add Refinery monitoring to daemon heartbeat (ensureRefineriesRunning)
- Add circuit breaker: agents marked dead by checkStaleAgents() are
  now force-killed and restarted, even if Claude process is alive
- Fixes zombie Claude sessions that werent updating their bead state

The circuit breaker flow:
1. Agent gets stuck (stops updating bead state)
2. After 15 minutes: checkStaleAgents() marks bead as dead
3. On next heartbeat: ensure*Running() sees state=dead
4. Force-kill session and restart fresh

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 13:12:39 -08:00
markov-kernel
e7145cfd77 fix: Make Mayor/Deacon session names include town name
Session names `gt-mayor` and `gt-deacon` were hardcoded, causing tmux
session name collisions when running multiple towns simultaneously.

Changed to `gt-{town}-mayor` and `gt-{town}-deacon` format (e.g.,
`gt-ai-mayor`) to allow concurrent multi-town operation.

Key changes:
- session.MayorSessionName() and DeaconSessionName() now take townName param
- Added workspace.GetTownName() helper to load town name from config
- Updated all callers in cmd/, daemon/, doctor/, mail/, rig/, templates/
- Updated tests with new session name format
- Bead IDs remain unchanged (already scoped by .beads/ directory)

Fixes #60

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:37:05 +01:00
mayor
a83d5c6e0b fix: Add tmux health check fallback to prevent killing healthy sessions
When the daemon checks if Deacon/Witness is running, it first checks the
agent bead state. If this check fails (bead not found, JSON parse error,
or stale state), it would previously attempt to restart the session -
even if the tmux session was perfectly healthy.

This caused "session already exists" errors when:
1. Agent bead state couldn't be read (prefix mismatch, missing bead)
2. But the tmux session was actually running with Claude active

Fix: Add a tmux session health check as fallback before attempting restart.
If the session exists AND Claude is running in it, skip the restart and
log that we're preserving the healthy session despite stale bead state.

This maintains ZFC compliance (still trusts agent bead as primary source)
while adding a defensive check to prevent unnecessary session kills.

Fixes #63

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-03 09:01:34 -08:00
dag
9ad826cd8c fix(daemon): Kill zombie tmux sessions before recreating
The daemon was failing to restart agents when zombie tmux sessions existed
(session alive but Claude dead). Added EnsureSessionFresh() helper to
tmux package that:
- Checks if session exists
- If exists but Claude not running (zombie), kills the session
- Creates fresh session

Updated all daemon session creation points to use EnsureSessionFresh:
- ensureDeaconRunning()
- ensureWitnessRunning()
- restartPolecatSession()
- restartSession() in lifecycle.go

Added tests for the new helper function. (gt-j1i0r)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:54:39 -08:00
kustrun
94857fc913 fix(session): Auto-accept Claude bypass permissions warning dialog
When Claude starts with --dangerously-skip-permissions, it shows a warning
dialog requiring Down+Enter to accept. This blocked automated polecat and
agent startup.

Added AcceptBypassPermissionsWarning() to tmux package that:
- Checks if the warning dialog is present by capturing pane content
- Only sends Down+Enter if "Bypass Permissions mode" text is found
- Avoids interfering with sessions that don't show the warning

Updated all Claude startup locations:
- session/manager.go (polecat sessions)
- cmd/up.go (mayor, witness, crew, polecat cold starts)
- daemon/daemon.go (crashed polecat restarts)
- daemon/lifecycle.go (role session starts)
2026-01-02 22:44:58 +01:00
morsov
a16b94ffa8 fix(daemon): Reduce heartbeat from 10 min to 3 min for faster stuck detection (gt-6y5o3)
The 10-minute recovery heartbeat was too slow to detect stuck agents promptly.
Reduced to 3 minutes for better balance between detection speed and overhead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 20:02:12 -08:00
gastown/polecats/nux
6f1b6269b1 Fix Deacon spin indefinitely bug (hq-oosxt)
- Add heartbeat checking to Boot degraded triage: detects stale Deacon
  heartbeat (>15min nudges, >30min restarts)
- Add checkDeaconHeartbeat to daemon heartbeat cycle as fallback
- This ensures the Deacon is monitored continuously, not just at startup

The mol-deacon-patrol formula was also updated separately to use
gt deacon health-check instead of ephemeral context memory tracking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 11:47:10 -08:00
gastown/crew/gus
626a24e013 Refactor startup paths to use RuntimeConfig (gt-j0546)
Replaced all hardcoded 'claude --dangerously-skip-permissions' invocations
with configurable helpers from internal/config:

- GetRuntimeCommand(rigPath) - simple command string
- GetRuntimeCommandWithPrompt(rigPath, prompt) - with initial prompt
- BuildAgentStartupCommand(role, bdActor, rigPath, prompt) - generic agent
- BuildPolecatStartupCommand(rigName, polecatName, rigPath, prompt) - polecat
- BuildCrewStartupCommand(rigName, crewName, rigPath, prompt) - crew
- BuildStartupCommand(envVars, rigPath, prompt) - custom env vars

Files updated:
- internal/cmd/start.go (4 locations)
- internal/cmd/crew_lifecycle.go (2 locations)
- internal/cmd/crew_at.go (2 locations)
- internal/cmd/deacon.go
- internal/cmd/witness.go
- internal/cmd/up.go (2 locations)
- internal/cmd/handoff.go (2 locations)
- internal/daemon/daemon.go (3 locations)
- internal/daemon/lifecycle.go
- internal/session/manager.go
- internal/refinery/manager.go
- internal/boot/boot.go

This enables future support for alternative LLM runtimes (aider, etc.)
via rig/town settings configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 23:48:34 -08:00
Steve Yegge
b241a353f3 Clean up stale issue references in comments (gt-z5bri)
- daemon.go: Update gt-2hzl4 comment to past tense (timeout fallback is implemented)
- lifecycle.go: Remove "to be removed" promise for gt-psuw7 (code stays)
- keepalive_test.go: Remove tombstoned gt-gaxo epic reference
- doctor.go: Remove tombstoned gt-gaxo epic reference
- daemon_test.go: Remove tombstoned gt-gaxo epic reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:29:23 -08:00
gastown/polecats/furiosa
108afdbc52 Remove keepalive file infrastructure for feed-based wake model (gt-vdprb.2)
- Remove TouchTownActivity() calls from root.go PersistentPreRun hook
- Remove ReadTownActivity() and activity-based backoff from daemon.go
- Delete TouchTownActivity/ReadTownActivity functions from keepalive.go
- Replace dynamic backoff with fixed 10-min recovery heartbeat

Normal wake is now handled by feed subscription (bd activity --follow).
The daemon is a safety net for dead sessions, GUPP violations, and orphaned work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:13:14 -08:00
Steve Yegge
2112804aba Implement Boot: daemon entry point dog for Deacon triage (gt-rwd5j)
Boot is a watchdog that the daemon pokes instead of Deacon directly,
centralizing the 'when to wake Deacon' decision in an agent that can
reason about context.

Key changes:
- Add internal/boot package with marker file and status tracking
- Add gt boot commands: status, spawn, triage
- Add mol-boot-triage formula for Boot's triage cycle
- Modify daemon to call ensureBootRunning instead of ensureDeaconRunning
- Add tmux.IsAvailable() for degraded mode detection
- Add boot.md.tmpl role template

Boot lifecycle:
1. Daemon tick spawns Boot (fresh each time)
2. Boot runs triage: observe, decide, act
3. Boot cleans stale handoffs from Deacon inbox
4. Boot exits (or handoffs in non-degraded mode)

In degraded mode (no tmux), Boot runs mechanical triage directly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 16:20:00 -08:00
Steve Yegge
3099d99424 Set GIT_AUTHOR_NAME per agent session (gt-6r18e.1)
Export GIT_AUTHOR_NAME alongside BD_ACTOR in all agent session startup
locations. This enables git log --author queries for agent work while
keeping GIT_AUTHOR_EMAIL as the workspace owner.

Files updated:
- internal/session/manager.go (polecat sessions)
- internal/daemon/daemon.go (deacon, witness, polecat via daemon)
- internal/daemon/lifecycle.go (polecat lifecycle)
- internal/cmd/*.go (crew, mayor, deacon, witness, refinery, up, handoff)
- internal/session/manager_test.go (updated test expectations)
- docs/federation.md (marked feature as implemented)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 16:14:33 -08:00
Steve Yegge
e72882cdcb Add feed curator to daemon for activity feed curation (gt-38doh)
Implements the feed daemon goroutine that:
- Tails ~/gt/.events.jsonl for raw events
- Filters by visibility tag (keeps feed/both, drops audit)
- Deduplicates repeated done events from same actor
- Aggregates sling events when multiple occur in window
- Writes curated events to ~/gt/.feed.jsonl with summaries

The curator runs as a goroutine in the gt daemon, starting when
the daemon starts and stopping gracefully on shutdown.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:12:01 -08:00