* feat: Add automatic orphaned claude process cleanup
Claude Code's Task tool spawns subagent processes that sometimes don't clean up
properly after completion. These accumulate and consume significant memory
(observed: 17 processes using ~6GB RAM).
This change adds automatic cleanup in two places:
1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs
`gt deacon cleanup-orphans` early in each cycle. More responsive (~30s).
2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net
when deacon is down.
Detection uses TTY column - processes with TTY "?" have no controlling terminal.
This is safe because:
- Processes in terminals (user sessions) have a TTY like "pts/0" - untouched
- Only kills processes with no controlling terminal
- Orphaned subagents are children of tmux server with no TTY
New files:
- internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses
- internal/util/orphan_test.go: Tests for orphan detection
New command:
- `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup
Fixes#587
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(orphan): add Windows build tag and minimum age check
Addresses review feedback on PR #588:
1. Add //go:build !windows to orphan.go and orphan_test.go
- The code uses Unix-specific syscalls (SIGTERM, ESRCH) and
ps command options that don't exist on Windows
2. Add minimum age check (60 seconds) to prevent false positives
- Prevents race conditions with newly spawned subagents
- Addresses reviewer concern about cron/systemd processes
- Uses portable etime format instead of Linux-only etimes
3. Add parseEtime helper with comprehensive tests
- Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS)
- etimes (seconds) is Linux-specific, etime is portable
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking
Previous approach used process age which doesn't work: a Task subagent
runs without TTY from birth, so a long-running legitimate subagent that
later fails to exit would be immediately SIGKILLed without trying SIGTERM.
New approach uses a state file to track signal history:
1. First encounter → SIGTERM, record PID + timestamp in state file
2. Next cycle (after 60s grace period) → if still alive, SIGKILL
3. Next cycle → if survived SIGKILL, log as unkillable and remove
State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/)
Format: "<pid> <signal> <unix_timestamp>" per line
The state file is automatically cleaned up:
- Dead processes removed on load
- Unkillable processes removed after logging
Also updates callers to use new CleanupResult type which includes
the signal sent (SIGTERM, SIGKILL, or UNKILLABLE).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(daemon): prevent runaway refinery session spawning
Fixes#566
The daemon spawned 812 refinery sessions over 4 days because:
1. Zombie detection was too strict - used IsAgentRunning(session, "node")
but Claude reports pane command as version number (e.g., "2.1.7"),
causing healthy sessions to be killed and recreated every heartbeat.
2. daemon.json patrol config was completely ignored - the daemon never
loaded or checked the enabled flags.
Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
before calling ensureRefineriesRunning/ensureWitnessesRunning, add
diagnostic logging for "already running" cases
Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.
* fix: Add grace period to prevent Deacon restart loop
The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes
Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)
After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.
Fixes#589
---------
Co-authored-by: mayor <your-github-email@example.com>
* fix(sling_test): update test for cook dir change
The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(tests): skip tests requiring missing binaries, handle --allow-stale
- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(config): remove BEADS_DIR from agent environment
Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.
Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(doctor): detect BEADS_DIR in tmux session environment
Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.
The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(beads): cache version check and add timeout to prevent cli lag
* fix(mail_queue): add nil check for queue config
Prevents potential nil pointer panic when queue config exists
in map but has nil value. Added || queueCfg == nil check to
the queue lookup condition in runMailClaim function.
Fixes potential panic that could occur if a queue entry exists
in config but with a nil value.
* fix(migrate_agents_test): fix icon expectations to match actual output
The printMigrationResult function uses icons with two leading spaces
(" ✓", " ⊘", " ✗") but the test expected icons without spaces.
This fixes the test expectations to match the actual output format.
* fix(hook): handle error from events.LogFeed
Previously the error from LogFeed was silently ignored with _.
Now we log the error to stderr at warning level but don't fail
the operation since the primary hook action succeeded.
* fix(tmux): security and error handling improvements
- Fix unchecked regexp error in IsClaudeRunning (CVE-like)
- Add input sanitization to SetPaneDiedHook to prevent shell injection
- Add session name validation to SetDynamicStatus
- Sanitize mail from/subject in SendNotificationBanner
- Return error on parse failure in GetEnvironment
- Track skipped lines in ListSessionIDs for debuggability
See: tmux.fix for full analysis
* fix(daemon): improve error handling and security
- Capture stderr in syncWorkspace for better debuggability
- Fail fast on git fetch failures to prevent stale code
- Add logging to previously silent bd list errors
- Change notification state file permissions to 0600
- Improve error messages with actual stderr content
This prevents agents from starting with stale code and provides
better visibility into daemon operations.
* feat(refinery,boot): add --agent flag for model selection (hq-7d5m)
Add --agent flag to gt refinery start/attach/restart and gt boot spawn
commands for consistent model selection across all agent launch points.
Implementation follows the existing pattern from gt deacon start:
- Add StringVar flag for agent alias
- Pass override to Manager/Boot via SetAgentOverride()
- Use BuildAgentStartupCommandWithAgentOverride when override is set
Files affected:
- cmd/gt/refinery.go: add flags to start/attach/restart commands
- internal/refinery/manager.go: add SetAgentOverride and use in Start()
- cmd/gt/boot.go: add flag to spawn command
- internal/boot/boot.go: add SetAgentOverride and use in spawnTmux()
Closes#438
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(refinery,boot): use parameter-passing pattern for --agent flag
Address PR review feedback:
1. ADD TESTS: Add tests for --agent flag existence following witness_test.go pattern
- internal/cmd/refinery_test.go: tests for start/attach/restart
- internal/cmd/boot_test.go: test for spawn
2. ALIGN PATTERN: Change from setter pattern to parameter-passing pattern
- Manager.Start(foreground, agentOverride) instead of SetAgentOverride + Start
- Boot.Spawn(agentOverride) instead of SetAgentOverride + Spawn
- Matches witness.go style: Start(foreground bool, agentOverride string, ...)
Updated all callers to pass empty string for default agent:
- internal/daemon/daemon.go
- internal/cmd/rig.go
- internal/cmd/start.go
- internal/cmd/up.go
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: furiosa <will@saults.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Add ConvoyWatcher that monitors bd activity for issue closes and
triggers convoy completion checks immediately rather than waiting
for patrol.
- Watch bd activity --follow --town --json for status=closed events
- Query SQLite for convoys tracking the closed issue
- Trigger gt convoy check when tracked issue closes
- Convoys close within seconds of last issue closing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv
Fix polecats not having GT_ROOT environment variable set. The symptom was
polecat sessions showing GT_ROOT="" instead of the expected town root.
Root cause: AgentEnvSimple doesn't set TownRoot, but AgentEnv was always
setting env["GT_ROOT"] = cfg.TownRoot even when empty. This empty value
in export commands would override the tmux session environment.
Changes:
- Only set GT_ROOT and BEADS_DIR in env map if non-empty
- Refactor daemon.go to use AgentEnv with full AgentEnvConfig instead
of AgentEnvSimple + manual additions
- Update test to verify keys are absent rather than empty
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(lint): silence unparam for unused executeExternalActions args
The external action params (beadID, severity, description) are reserved
for future email/SMS/slack implementations but currently unused.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: max <steve.yegge@gmail.com>
The comment incorrectly referred to polecats without hooked work as "idle".
With the self-cleaning model, polecats self-nuke on completion - there are
no idle polecats. A polecat without work is orphaned (needs cleanup).
Closes: gt-0jn0k
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two fixes in this commit:
1. daemon/lifecycle.go: Fix agent bead ID pattern for GUPP/orphaned work checks
- Wrong: gt-polecat-<rig>-<name> (e.g., gt-polecat-gastown-nux)
- Correct: <prefix>-<rig>-polecat-<name> (e.g., gt-gastown-polecat-nux)
- Use config.GetRigPrefix() instead of hardcoding gt prefix
- Use beads.ParseAgentBeadID() in extractRigFromAgentID
2. beads/beads.go: Fix invalid --add-label flag in bd create calls
- bd create uses --labels, not --add-label
- bd update uses --add-label (unchanged, was correct)
- Fixed Create, CreateWithID, CreateAgentBead, CreateRigBead
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends the --agent flag with a more general --env flag that allows
setting arbitrary environment variables when starting a witness.
Precedence (highest to lowest):
1. CLI --env overrides
2. Role bead env_vars
3. config.AgentEnv() defaults
Examples:
gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku
gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true
Co-authored-by: joshuavial <git@codewithjv.com>
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)
Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, polecat startup used hardcoded paths for BEADS_DIR that
didn't follow redirects for repos with tracked beads. This meant
polecats working in worktrees (where .beads/redirect points to the
actual beads location) would use the wrong beads directory.
Fixed locations:
- daemon.go: polecat startup now uses ResolveBeadsDir
- polecat/session_manager.go: session startup now uses ResolveBeadsDir
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.
Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers
This ensures all agents receive the same identity env vars regardless
of startup path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive crash logging improvements to help diagnose mass session death events:
- Add TypeSessionDeath and TypeMassDeath event types for feed visibility
- Log pre-death events before killing sessions (who killed, why)
- Add mass death detection in daemon (3+ deaths in 30s triggers alert)
- Add macOS crash report check in gt doctor
- Support session death events in townlog and feed curator
Closes hq-kt1o6
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes polecat worktree structure from:
polecats/<name>/
to:
polecats/<name>/<rigname>/
This gives Claude Code agents a recognizable directory name (e.g., tidepool/)
in their cwd instead of just the polecat name, preventing confusion about
which repo they are working in.
Key changes:
- Add clonePath() method to manager.go and session_manager.go for the actual
git worktree path, keeping polecatDir() for existence checks
- Update Add(), RepairWorktree(), Remove() to use new structure
- Update daemon lifecycle and restart code for new paths
- Update witness handlers to detect both structures
- Update doctor checks (rig_check, branch_check, config_check,
claude_settings_check) for backward compatibility
- All code includes fallback to old structure for existing polecats
Fixes#283
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds support for alternative AI runtime backends (Codex, OpenCode) alongside
the default Claude backend through a runtime abstraction layer.
- internal/runtime/runtime.go - Runtime-agnostic helper functions
- Extended RuntimeConfig with provider-specific settings
- internal/opencode/ for OpenCode plugin support
- Updated session managers to use runtime abstraction
- Removed unused ensureXxxSession functions
- Fixed daemon.go indentation, updated terminology to runtime
Backward compatible: Claude remains default runtime.
Co-Authored-By: Ben Kraus <ben@cinematicsoftware.com>
Co-Authored-By: Cameron Palmer <cameronmpalmer@users.noreply.github.com>
- Remove unused ensureRefinerySession function from start.go
- Remove unused ensureSession and ensureWitness functions from up.go
- Remove unused ensureWitnessSession function from witness.go
- Remove orphaned imports (runtime, session, constants, config, rig, filepath, time)
- Fix indentation error in daemon.go triggerPendingSpawns comment
These functions were added as part of the Codex/OpenCode runtime support
but were never wired up. The existing managers (refinery.Manager.Start,
witness.Manager.Start) already handle session creation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add custom types config after bd init in daemon tests
- Replace fixed sleeps with poll-based waiting in tmux tests
- Skip beads integration test for JSONL-only repos
Fixes flaky test failures in parallel execution.
The heartbeat now explicitly calls ensureDeaconRunning() for basic
"is Deacon alive" checks, while Boot handles intelligent triage
(stuck/nudge/interrupt decisions).
Changed ensureDeaconRunning to use deacon.Manager.Start() instead of
duplicating startup logic. This gives daemon the same benefits:
- WaitForShellReady (fixes race condition)
- Claude settings setup
- Theming
- StartupNudge and PropulsionNudge (GUPP)
Heartbeat order:
1. ensureDeaconRunning - restart if dead (via Manager)
2. ensureBootRunning - intelligent triage for stuck states
3. checkDeaconHeartbeat - belt-and-suspenders fallback
4-11. Other checks (witnesses, refineries, polecats, etc.)
This was inadvertently removed when Boot was introduced, which
delegated all Deacon checks to Boot. But Boot's mol doesn't actually
restart Deacon - it just reports. Now responsibilities are clear:
daemon ensures alive, Boot ensures responsive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Logs a warning when checking rig operational state if the wisp
config file doesn't exist. This helps diagnose cases where a
parked rig unexpectedly restarts because its parked state was lost.
Daemon's restartPolecatSession was calling BuildPolecatStartupCommand
with empty rigPath, causing polecats to fall back to town-level defaults
instead of honoring rig-specific agent settings.
Now passes rigPath so rig agent settings are honored.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.
Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Complete the "discover, don't track" refactoring:
- checkGUPPViolations: use tmux.IsClaudeRunning() instead of agent_state
- checkOrphanedWork: derive dead agents from tmux, not agent_state=dead
- assessStaleness: rely on HasActiveSession (tmux), not agent_state
Non-observable states (stuck, awaiting-gate) are still respected.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update daemon to check rig config before auto-starting agents:
- Check wisp config "status" - skip if parked or docked
- Check "auto_restart" config - skip if blocked or false
- Log skip reason for visibility
Affects ensureWitnessRunning, ensureRefineryRunning,
restartPolecatSession, and lifecycle restartSession.
(gt-68c46)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Additional cleanup from the agent_state refactoring:
- Remove dead code: checkStaleAgents(), markAgentDead() in lifecycle.go
- Remove dead code: reportAgentState(), getAgentFields() in prime.go
- Update getAgentBeadState() comment to clarify non-observable states only
- Update mol-witness-patrol.formula.toml to use tmux discovery
- Update mol-polecat-lease.formula.toml to use POLECAT_DONE mail
- Update docs/watchdog-chain.md to reflect new architecture
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The agent_state field was recording observable state like "running",
"dead", "idle" which violated the "Discover, Don't Track" principle.
This caused stale state bugs where agents were marked "dead" in beads
but actually running in tmux.
Changes:
- Remove daemon's checkStaleAgents() which marked agents "dead"
- Simplify ensureXxxRunning() to use tmux.IsClaudeRunning() directly
- Remove reportAgentState() calls from gt prime and gt handoff
- Add SetHookBead/ClearHookBead helpers that don't update agent_state
- Use ClearHookBead in gt done and gt unsling
- Simplify gt status to derive state from tmux, not bead
Non-observable states (stuck, awaiting-gate, muted, paused) are still
set because they represent intentional agent decisions that can't be
discovered from tmux state.
Fixes: gt-zecmc
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When the daemon detects that an agent bead state doesn't match tmux
(e.g., bead says stopped but Claude is running), it now:
1. Logs the divergence clearly with STATE DIVERGENCE prefix
2. Nudges the agent with an actionable command to fix its state
3. Still skips the restart (safety - don't kill healthy sessions)
This prevents silent state drift where bead state diverges from reality.
Applied to: Deacon, Witness, Refinery ensure functions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
* feat: Beads redirect architecture for tracked and local beads
This change implements proper redirect handling so that all rig agents
(Witness, Refinery, Crew, Polecats) can work with both:
- Tracked beads: .beads/ checked into git at mayor/rig/.beads
- Local beads: .beads/ created at rig root during gt rig add
Key changes:
1. SetupRedirect now handles tracked beads by skipping redirect chains.
The bd CLI doesn't support chains (A→B→C), so worktrees redirect
directly to the final destination (mayor/rig/.beads for tracked).
2. ResolveBeadsDir is now used consistently in polecat and refinery
managers instead of hardcoded mayor/rig paths.
3. Rig-level agents (witness, refinery) now use rig beads with rig
prefix instead of town beads. This follows the architecture where
town beads are only for Mayor/Deacon.
4. prime.go simplified to always use ../../.beads for crew redirects,
letting rig-level redirect handle tracked vs local routing.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(doctor): Add beads-redirect check for tracked beads
When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root
needs a redirect file pointing to that location. This check:
- Detects missing rig-level redirect for tracked beads
- Verifies redirect points to correct location (mayor/rig/.beads)
- Auto-fixes with 'gt doctor --fix'
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: Handle fileLock.Unlock error in daemon
Wrap fileLock.Unlock() return value to satisfy errcheck linter.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Handle empty repos in ConfigureSparseCheckout (skip read-tree when no HEAD)
- Fix errcheck: wrap fileLock.Unlock() error in defer
- Fix unparam: remove unused *rig.Rig return from getWitnessManager
- Fix unparam: mark unused agentType parameter with blank identifier
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.
Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
unified startup with proper nudges
This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per docs/architecture.md, Witness and Refinery are rig-level agents that
should use the rig's configured prefix (e.g., pi- for pixelforge) instead
of hardcoded "gt-".
This extends PR #183's creation fix to also fix all lookup paths:
- internal/rig/manager.go: Create agent beads in rig beads with rig prefix
- internal/daemon/daemon.go: Use rig prefix when looking up agent state
- internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping
- internal/cmd/sling.go: Pass townRoot for prefix lookup
- internal/cmd/unsling.go: Pass townRoot for prefix lookup
- internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups
- internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups
- internal/config/loader.go: Add GetRigPrefix helper
Without this fix, the daemon would:
- Create pi-gastown-witness but look for gt-gastown-witness
- Report agents as missing/dead when they are running
- Fail to manage agent lifecycle correctly
Based on work by Johann Taberlet in PR #183.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Unix-only syscall.Flock with gofrs/flock library for
cross-platform file locking. This enables the daemon to run on
Windows in addition to Unix-like systems.
- Add github.com/gofrs/flock v0.13.0 dependency
- Replace syscall.Flock calls with flock.TryLock/Unlock
- Maintain same non-blocking exclusive lock semantics
(gt-5354h)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Deacon respawns refinery/witness sessions via LIFECYCLE requests,
the new sessions were starting at the Claude welcome screen without
the propulsion nudge that triggers autonomous execution.
Added StartupNudge and PropulsionNudgeForRole calls to restartSession()
in lifecycle.go, matching the pattern used in ensureRefinerySession()
in start.go. This ensures respawned agents receive the GUPP nudge and
begin autonomous work immediately.
Fixes: gt-01jpg
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Town-level services (Mayor, Deacon) now use hq- prefix instead of gt-:
- hq-mayor (was gt-mayor)
- hq-deacon (was gt-deacon)
This distinguishes town-level sessions from rig-level sessions which
continue to use gt- prefix (gt-gastown-witness, gt-gastown-crew-max, etc).
Changes:
- session.MayorSessionName() returns "hq-mayor"
- session.DeaconSessionName() returns "hq-deacon"
- ParseSessionName() handles both hq- and gt- prefixes
- categorizeSession() handles both prefixes
- categorizeSessions() accepts both prefixes
- Updated all tests and documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --branch flag to `gt rig add` to specify a custom default branch
instead of auto-detecting from remote. This supports repositories that
use non-standard default branches like `develop` or `release`.
Changes:
- Add --branch flag to `gt rig add` command
- Store default_branch in rig config.json
- Propagate default branch to refinery, witness, daemon, and all commands
- Rename ensureMainBranch to ensureDefaultBranch for clarity
- Add Rig.DefaultBranch() method for consistent access
- Update crew/manager.go and swarm/manager.go to use rig config
Based on PR #49 by @kustrun - rebased and extended with additional fixes.
Co-authored-by: kustrun <kustrun@users.noreply.github.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU
race condition where concurrent 'gt daemon start' commands could spawn
multiple daemons. Only the first to acquire the lock succeeds; others
exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so
multiple GT instances from different directories work independently.
Also detect race losers in runDaemonStart() by comparing spawned PID
with PID file, reporting 'already running' instead of false success.
Agent directories (witness/, refinery/, mayor/) contained state.json files
with last_active timestamps that were never updated, making them stale and
misleading. This change removes:
- initAgentStates function that created vestigial state.json files
- AgentState type and related Load/Save functions from config package
- MayorStateValidCheck from doctor checks
- requesting_* lifecycle verification (dead code - flags were never set)
- FileStateJSON constant and MayorStatePath function
Kept intact:
- daemon/state.json (actively used for daemon runtime state)
- crew/<name>/state.json (operational CrewWorker metadata)
- Agent state tracking via beads (the ZFC-compliant approach)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>