All agents now receive their startup beacon + role-specific instructions via
the CLI prompt, making sessions identifiable in /resume picker while removing
unreliable post-startup nudges.
Changes:
- Rename FormatStartupNudge → FormatStartupBeacon, StartupNudgeConfig → BeaconConfig
- Remove StartupNudge() function (no longer needed)
- Remove PropulsionNudge() and PropulsionNudgeForRole() functions
- Update deacon, witness, refinery, polecat managers to include beacon in CLI prompt
- Update boot to inline beacon (can't import session due to import cycle)
- Update daemon/lifecycle.go to include beacon via BuildCommandWithPrompt
- Update cmd/deacon.go to include beacon in startup command
- Remove redundant StartupNudge and PropulsionNudge calls from all startup paths
The beacon is now part of the CLI prompt which is queued before Claude starts,
making it more reliable than post-startup nudges which had timing issues.
SessionStart hook runs gt prime automatically, so PropulsionNudge was redundant.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused workDir field from witness manager
- Use witMgr.IsRunning() consistently instead of direct tmux call
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove state files from witness and refinery managers, following
the "Discover, Don't Track" principle. Tmux session existence is
now the source of truth for running state (like deacon).
Changes:
- Add IsRunning() that checks tmux HasSession
- Change Status() to return *tmux.SessionInfo
- Remove loadState/saveState/stateManager
- Simplify Start()/Stop() to not use state files
- Update CLI commands (witness/refinery/rig) for new API
- Update tests to be ZFC-compliant
This fixes state file divergence issues where witness/refinery
could show "running" when the actual tmux session was dead.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add initial prompt to deacon and witness startup commands to trigger
autonomous patrol. Without this, agents sit idle at the prompt after
SessionStart hooks run.
Implements GUPP (Gas Town Universal Propulsion Principle): agents start
patrol immediately when Claude launches.
Fixes#525: gt up reports deacon success but session doesn't actually start
Previously, WaitForCommand failures were marked as "non-fatal" in the
manager Start() methods used by gt up. This caused gt up to report
success even when Claude failed to start, because the error was silently
ignored.
Now when WaitForCommand or WaitForRuntimeReady times out:
1. The zombie tmux session is killed
2. An error is returned to the caller
3. gt up properly reports the failure
This aligns the manager Start() behavior with the cmd start functions
(e.g., gt deacon start) which already had fatal WaitForCommand behavior.
Changed files:
- internal/deacon/manager.go
- internal/mayor/manager.go
- internal/witness/manager.go
- internal/refinery/manager.go
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(sling_test): update test for cook dir change
The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(tests): skip tests requiring missing binaries, handle --allow-stale
- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(config): remove BEADS_DIR from agent environment
Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.
Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(doctor): detect BEADS_DIR in tmux session environment
Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.
The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Migrate witness, boot, and deacon spawns to use NewSessionWithCommand
instead of NewSession+SendKeys to ensure BD_ACTOR is visible in the
process tree for orphan detection via ps.
Refs: gt-emi5b
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The witness manager was using rig-level beads path to look up role
configuration, but role beads use the hq- prefix and live in town-level
beads. This caused "unexpected end of JSON input" errors when starting
witnesses because the rig database (with gt- prefix) couldn't find
hq-witness-role.
Changed roleConfig() to use townRoot instead of rig.BeadsPath() to
correctly resolve town-level role beads.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Replace ProcessExists() checks in witness and refinery managers with
tmux session detection. Agent liveness should be derived from tmux
session state, not PID probing (per ZFC tracking principles).
- Remove util.ProcessExists() from witness/manager.go and refinery/manager.go
- Delete internal/util/process.go and process_test.go (now unused)
- Foreground mode and Stop() now rely solely on tmux HasSession/KillSession
Closes: hq-yxkdr (recentDeaths already removed)
Closes: hq-1sd4o (ProcessExists removed)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends the --agent flag with a more general --env flag that allows
setting arbitrary environment variables when starting a witness.
Precedence (highest to lowest):
1. CLI --env overrides
2. Role bead env_vars
3. config.AgentEnv() defaults
Examples:
gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku
gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true
Co-authored-by: joshuavial <git@codewithjv.com>
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)
Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.
Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers
This ensures all agents receive the same identity env vars regardless
of startup path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Agent sessions would fail on startup because send-keys arrived before the
shell was ready, causing 'bad pattern' and 'command not found' errors.
Fix: Create sessions with the command directly using tmux new-session's
command argument. This runs the agent as the pane's initial process,
avoiding shell readiness timing issues entirely.
Updated all agent managers: mayor, deacon, witness, refinery, polecat, crew.
Also fixes pre-existing build error in polecat/manager.go (polecatPath →
clonePath/newClonePath).
Closes#280
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add WaitForShellReady call before SendKeys in all agent managers
(deacon, mayor, witness, refinery). This prevents intermittent
"can't find pane" errors that occur when the tmux session is
created but the shell isn't ready to receive input yet.
The issue manifests under load (e.g., during `gt up` when multiple
agents start in sequence) where the 200ms delay in SendKeysDelayed
isn't sufficient for the pane to be fully initialized.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Witness was calling BuildAgentStartupCommand with empty rigPath,
causing it to fall back to town-level defaults instead of honoring
rig-specific agent settings (like RigSettings.Agent).
Now passes m.rig.Path so rig agent settings are honored, consistent
with how refinery already passes the rig path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create mayor.Manager for mayor lifecycle (Start/Stop/IsRunning/Status)
- Create deacon.Manager for deacon lifecycle with respawn loop
- Move session.Manager to polecat.SessionManager (clearer naming)
- Add zombie session detection for mayor/deacon (kills tmux if Claude dead)
- Remove duplicate session startup code from up.go, start.go, mayor.go
- Rename sessMgr -> polecatMgr for consistency
- Make witness/refinery SessionName() public for status display
All agent types now follow the same Manager pattern:
mgr := agent.NewManager(...)
mgr.Start(...)
mgr.Stop()
mgr.IsRunning()
mgr.Status()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.
Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
The witness manager's Stop() method was only updating runtime JSON state
without killing the tmux session, causing 'gt rig shutdown' to leave
witness sessions running.
Added sessionName() method and tmux kill-session logic to match the
refinery's existing implementation.
Fixes: bd-gxaf
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.
Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
unified startup with proper nudges
This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create internal/agent package with shared State type and StateManager
- StateManager uses Go generics for type-safe load/save operations
- Update witness and refinery to use shared State type alias
- Replace loadState/saveState implementations with StateManager delegation
- Maintains backwards compatibility through re-exported constants
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move duplicated processExists function to shared util package:
- Create internal/util/process.go with ProcessExists function
- Add internal/util/process_test.go with basic tests
- Update witness/manager.go to use util.ProcessExists
- Update refinery/manager.go to use util.ProcessExists
- Remove local processExists functions from both files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prevents data loss from concurrent/interrupted state file writes by using
atomic write pattern (write to .tmp, then rename).
Changes:
- Add internal/util package with AtomicWriteJSON/AtomicWriteFile helpers
- Update witness/manager.go saveState to use atomic writes
- Update refinery/manager.go saveState to use atomic writes
- Update crew/manager.go saveState to use atomic writes
- Update daemon/types.go SaveState to use atomic writes
- Update polecat/namepool.go Save to use atomic writes
- Add comprehensive tests for atomic write utilities
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.
Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer
This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.
Generated with Claude Code https://claude.com/claude-code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove embedded molecule management from witness - this logic belongs
in molecule definitions (YAML + hooks), not Go code.
Removed:
- WitnessHandoffState, WorkerState types
- Handoff bead management (load/save/ensure)
- Patrol instance creation (ensurePatrolInstance)
- Polecat arm tracking (ensurePolecatArm, closePolecatArm)
- updateWorkerActivity function
Simplified:
- Nudge tracking now uses only SpawnedIssues (in-memory)
- run() no longer loads handoff state or creates patrol instances
- healthCheck() no longer manages tracking arms
Fixed:
- escalateToMayor: bd mail send → gt mail send
- ackMessage: bd mail ack → gt mail archive
The witness now does its core job (health checks, nudges, escalation,
cleanup) without trying to manage molecule state. Molecule tracking
should be handled by the molecule system itself via bd mol commands.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add arm closing to witness cleanup:
- Add closePolecatArm(name, reason) method that:
- Looks up arm ID in handoff state
- Closes the arm issue with the given reason
- Clears arm ID from handoff state
- Call closePolecatArm() at end of cleanupPolecat()
This completes the arm lifecycle: created when polecat discovered,
closed when polecat cleaned up. The Christmas Ornament pattern now
tracks the full polecat lifecycle.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add arm bonding to witness health check:
- Add ensurePolecatArm() method that:
- Checks if arm already exists for polecat (ArmID in WorkerState)
- If not, calls `gt mol bond mol-polecat-arm` with template vars
- Parses arm ID from output and stores in handoff state
- Call ensurePolecatArm() in healthCheck() for each running polecat
This creates the Christmas Ornament pattern where mol-witness-patrol
has dynamically bonded mol-polecat-arm children, one per active polecat.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add patrol instance tracking to witness manager:
- Add PatrolInstanceID to WitnessHandoffState for tracking
- Add ArmID to WorkerState for per-polecat arm tracking
- Add ensurePatrolInstance() method that:
- Checks if patrol instance already exists (from previous session)
- Creates new patrol instance if needed
- Persists instance ID in handoff state
This enables the molecule-based tracking ledger pattern where the
Go-based witness creates beads issues to track its patrol state.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add BranchPushedToRemote() to git package that properly handles
polecat branches without upstream tracking. Update verifyPolecatState
in witness to check that branches are pushed before allowing cleanup.
This prevents the scenario where polecats close issues without pushing
their work, causing all commits to be lost when the worktree is deleted.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add handling for POLECAT_DONE messages in processShutdownRequests()
- Track which polecats have signaled done (using SpawnedIssues with "done:" prefix)
- For LIFECYCLE:shutdown requests, wait for POLECAT_DONE before cleanup
- Add checkPendingCompletions() to nudge polecats with closed issues
- Add 10-minute timeout with force-kill after waiting for POLECAT_DONE
- Protects against losing MR submissions when Witness cleans up too early
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add persistent state storage for Witness across wisp burns:
- Add WorkerState and WitnessHandoffState types
- Implement loadHandoffState/saveHandoffState for bead persistence
- Update getNudgeCount/recordNudge to use persistent state
- Add activity tracking integration into healthCheck
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- .beads-ephemeral/ -> .beads-wisp/
- Rename doctor checks: EphemeralCheck -> WispCheck
- Update all docs to use 'transient' for polecats, 'wisp' for molecules
- Preserve 'ephemeral' only as descriptive adjective for wisps
- Steam engine metaphor: wisps are steam vapors that dissipate
Part of Christmas launch wisp terminology unification.
Add isChildOfEpic() function that checks if an issue is a child of
the configured epic by verifying the issue has the epic in its
dependents with dependency_type="blocks".
When EpicID is configured in witness config, only issues that block
that epic will be considered for auto-spawning. Issues that cannot
be verified are safely skipped.
Closes gt-zhm5.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- docs/architecture.md: update mail routing explanation
- internal/witness/manager.go: fix actual bd mail calls to gt mail
- internal/cmd/mail.go: remove stale compatibility note
- internal/daemon/lifecycle.go: update comment
Add comprehensive uncommitted work checks before any polecat cleanup:
- Check for uncommitted changes (modified/untracked files)
- Check for stashes
- Check for unpushed commits
Affected commands:
- gt polecat remove: now refuses if uncommitted work exists
- gt rig shutdown: checks all polecats before shutdown
- Witness cleanup: refuses to clean polecats with uncommitted work
- gt spawn: warns if spawning to polecat with uncommitted work
Safety model:
- --force: bypasses uncommitted changes check only
- --nuclear: bypasses ALL safety checks (will lose work)
New git helpers:
- StashCount(): count stashes in repo
- UnpushedCommits(): count commits not pushed to upstream
- CheckUncommittedWork(): comprehensive work status check
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the core Witness functionality:
- gt witness start: Creates tmux session with Claude, theming, auto-priming
- gt witness stop: Kills tmux session and updates state
- gt witness status: Shows session state reconciled with tmux
- Shutdown handler: Verifies git clean state before cleanup, sends nudges
- Auto-spawn: Spawns polecats for ready work up to configurable capacity
- Health checks: Monitors polecat activity, nudges stuck workers, escalates
Also updates handoff to include polecat name in lifecycle requests.
Closes: gt-53w6, gt-mxyj, gt-5wtw, gt-cpm2, gt-es1i
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix ~50 errcheck warnings across the codebase:
- Add explicit `_ =` for intentionally ignored error returns (cleanup,
best-effort operations, etc.)
- Use `defer func() { _ = ... }()` pattern for defer statements
- Handle tmux SetEnvironment, KillSession, SendKeysRaw returns
- Handle mail router.Send returns
- Handle os.RemoveAll, os.Rename in cleanup paths
- Handle rand.Read returns for ID generation
- Handle fmt.Fprint* returns when writing to io.Writer
- Fix for-select with single case to use for-range
- Handle cobra MarkFlagRequired returns
All tests pass. Code compiles without errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This implements the ephemeral polecat model where polecats are spawned
fresh for each task and deleted upon completion.
Key changes:
**Spawn (internal/cmd/spawn.go):**
- Always create fresh worktree from main branch
- Run bd init in new worktree to initialize beads
- Remove --create flag (now implicit)
- Replace stale polecats with fresh worktrees
**Handoff (internal/cmd/handoff.go):**
- Add rig/polecat detection from environment and tmux session
- Send shutdown requests to correct witness (rig/witness)
- Include polecat name in lifecycle request body
**Witness (internal/witness/manager.go):**
- Add mail checking in monitoring loop
- Process LIFECYCLE shutdown requests
- Implement full cleanup sequence:
- Kill tmux session
- Remove git worktree
- Delete polecat branch
**Polecat state machine (internal/polecat/types.go):**
- Primary states: working, done, stuck
- Deprecate idle/active (kept for backward compatibility)
- New polecats start in working state
- ClearIssue transitions to done (not idle)
**Polecat commands (internal/cmd/polecat.go):**
- Update list to show "Active Polecats"
- Normalize legacy states for display
- Add deprecation warnings to wake/sleep commands
Closes gt-7ik
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the witness monitoring agent command with start, stop, and status
subcommands. The witness monitors polecats for stuck/idle states and
can nudge blocked workers.
Closes gt-kcee
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>