Replace ProcessExists() checks in witness and refinery managers with
tmux session detection. Agent liveness should be derived from tmux
session state, not PID probing (per ZFC tracking principles).
- Remove util.ProcessExists() from witness/manager.go and refinery/manager.go
- Delete internal/util/process.go and process_test.go (now unused)
- Foreground mode and Stop() now rely solely on tmux HasSession/KillSession
Closes: hq-yxkdr (recentDeaths already removed)
Closes: hq-1sd4o (ProcessExists removed)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends the --agent flag with a more general --env flag that allows
setting arbitrary environment variables when starting a witness.
Precedence (highest to lowest):
1. CLI --env overrides
2. Role bead env_vars
3. config.AgentEnv() defaults
Examples:
gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku
gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true
Co-authored-by: joshuavial <git@codewithjv.com>
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)
Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.
Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers
This ensures all agents receive the same identity env vars regardless
of startup path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Agent sessions would fail on startup because send-keys arrived before the
shell was ready, causing 'bad pattern' and 'command not found' errors.
Fix: Create sessions with the command directly using tmux new-session's
command argument. This runs the agent as the pane's initial process,
avoiding shell readiness timing issues entirely.
Updated all agent managers: mayor, deacon, witness, refinery, polecat, crew.
Also fixes pre-existing build error in polecat/manager.go (polecatPath →
clonePath/newClonePath).
Closes#280
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add WaitForShellReady call before SendKeys in all agent managers
(deacon, mayor, witness, refinery). This prevents intermittent
"can't find pane" errors that occur when the tmux session is
created but the shell isn't ready to receive input yet.
The issue manifests under load (e.g., during `gt up` when multiple
agents start in sequence) where the 200ms delay in SendKeysDelayed
isn't sufficient for the pane to be fully initialized.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Witness was calling BuildAgentStartupCommand with empty rigPath,
causing it to fall back to town-level defaults instead of honoring
rig-specific agent settings (like RigSettings.Agent).
Now passes m.rig.Path so rig agent settings are honored, consistent
with how refinery already passes the rig path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create mayor.Manager for mayor lifecycle (Start/Stop/IsRunning/Status)
- Create deacon.Manager for deacon lifecycle with respawn loop
- Move session.Manager to polecat.SessionManager (clearer naming)
- Add zombie session detection for mayor/deacon (kills tmux if Claude dead)
- Remove duplicate session startup code from up.go, start.go, mayor.go
- Rename sessMgr -> polecatMgr for consistency
- Make witness/refinery SessionName() public for status display
All agent types now follow the same Manager pattern:
mgr := agent.NewManager(...)
mgr.Start(...)
mgr.Stop()
mgr.IsRunning()
mgr.Status()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.
Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
The witness manager's Stop() method was only updating runtime JSON state
without killing the tmux session, causing 'gt rig shutdown' to leave
witness sessions running.
Added sessionName() method and tmux kill-session logic to match the
refinery's existing implementation.
Fixes: bd-gxaf
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.
Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
unified startup with proper nudges
This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create internal/agent package with shared State type and StateManager
- StateManager uses Go generics for type-safe load/save operations
- Update witness and refinery to use shared State type alias
- Replace loadState/saveState implementations with StateManager delegation
- Maintains backwards compatibility through re-exported constants
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move duplicated processExists function to shared util package:
- Create internal/util/process.go with ProcessExists function
- Add internal/util/process_test.go with basic tests
- Update witness/manager.go to use util.ProcessExists
- Update refinery/manager.go to use util.ProcessExists
- Remove local processExists functions from both files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prevents data loss from concurrent/interrupted state file writes by using
atomic write pattern (write to .tmp, then rename).
Changes:
- Add internal/util package with AtomicWriteJSON/AtomicWriteFile helpers
- Update witness/manager.go saveState to use atomic writes
- Update refinery/manager.go saveState to use atomic writes
- Update crew/manager.go saveState to use atomic writes
- Update daemon/types.go SaveState to use atomic writes
- Update polecat/namepool.go Save to use atomic writes
- Add comprehensive tests for atomic write utilities
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.
Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer
This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.
Generated with Claude Code https://claude.com/claude-code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove embedded molecule management from witness - this logic belongs
in molecule definitions (YAML + hooks), not Go code.
Removed:
- WitnessHandoffState, WorkerState types
- Handoff bead management (load/save/ensure)
- Patrol instance creation (ensurePatrolInstance)
- Polecat arm tracking (ensurePolecatArm, closePolecatArm)
- updateWorkerActivity function
Simplified:
- Nudge tracking now uses only SpawnedIssues (in-memory)
- run() no longer loads handoff state or creates patrol instances
- healthCheck() no longer manages tracking arms
Fixed:
- escalateToMayor: bd mail send → gt mail send
- ackMessage: bd mail ack → gt mail archive
The witness now does its core job (health checks, nudges, escalation,
cleanup) without trying to manage molecule state. Molecule tracking
should be handled by the molecule system itself via bd mol commands.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add arm closing to witness cleanup:
- Add closePolecatArm(name, reason) method that:
- Looks up arm ID in handoff state
- Closes the arm issue with the given reason
- Clears arm ID from handoff state
- Call closePolecatArm() at end of cleanupPolecat()
This completes the arm lifecycle: created when polecat discovered,
closed when polecat cleaned up. The Christmas Ornament pattern now
tracks the full polecat lifecycle.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add arm bonding to witness health check:
- Add ensurePolecatArm() method that:
- Checks if arm already exists for polecat (ArmID in WorkerState)
- If not, calls `gt mol bond mol-polecat-arm` with template vars
- Parses arm ID from output and stores in handoff state
- Call ensurePolecatArm() in healthCheck() for each running polecat
This creates the Christmas Ornament pattern where mol-witness-patrol
has dynamically bonded mol-polecat-arm children, one per active polecat.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add patrol instance tracking to witness manager:
- Add PatrolInstanceID to WitnessHandoffState for tracking
- Add ArmID to WorkerState for per-polecat arm tracking
- Add ensurePatrolInstance() method that:
- Checks if patrol instance already exists (from previous session)
- Creates new patrol instance if needed
- Persists instance ID in handoff state
This enables the molecule-based tracking ledger pattern where the
Go-based witness creates beads issues to track its patrol state.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add BranchPushedToRemote() to git package that properly handles
polecat branches without upstream tracking. Update verifyPolecatState
in witness to check that branches are pushed before allowing cleanup.
This prevents the scenario where polecats close issues without pushing
their work, causing all commits to be lost when the worktree is deleted.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add handling for POLECAT_DONE messages in processShutdownRequests()
- Track which polecats have signaled done (using SpawnedIssues with "done:" prefix)
- For LIFECYCLE:shutdown requests, wait for POLECAT_DONE before cleanup
- Add checkPendingCompletions() to nudge polecats with closed issues
- Add 10-minute timeout with force-kill after waiting for POLECAT_DONE
- Protects against losing MR submissions when Witness cleans up too early
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add persistent state storage for Witness across wisp burns:
- Add WorkerState and WitnessHandoffState types
- Implement loadHandoffState/saveHandoffState for bead persistence
- Update getNudgeCount/recordNudge to use persistent state
- Add activity tracking integration into healthCheck
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- .beads-ephemeral/ -> .beads-wisp/
- Rename doctor checks: EphemeralCheck -> WispCheck
- Update all docs to use 'transient' for polecats, 'wisp' for molecules
- Preserve 'ephemeral' only as descriptive adjective for wisps
- Steam engine metaphor: wisps are steam vapors that dissipate
Part of Christmas launch wisp terminology unification.
Add isChildOfEpic() function that checks if an issue is a child of
the configured epic by verifying the issue has the epic in its
dependents with dependency_type="blocks".
When EpicID is configured in witness config, only issues that block
that epic will be considered for auto-spawning. Issues that cannot
be verified are safely skipped.
Closes gt-zhm5.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- docs/architecture.md: update mail routing explanation
- internal/witness/manager.go: fix actual bd mail calls to gt mail
- internal/cmd/mail.go: remove stale compatibility note
- internal/daemon/lifecycle.go: update comment
Add comprehensive uncommitted work checks before any polecat cleanup:
- Check for uncommitted changes (modified/untracked files)
- Check for stashes
- Check for unpushed commits
Affected commands:
- gt polecat remove: now refuses if uncommitted work exists
- gt rig shutdown: checks all polecats before shutdown
- Witness cleanup: refuses to clean polecats with uncommitted work
- gt spawn: warns if spawning to polecat with uncommitted work
Safety model:
- --force: bypasses uncommitted changes check only
- --nuclear: bypasses ALL safety checks (will lose work)
New git helpers:
- StashCount(): count stashes in repo
- UnpushedCommits(): count commits not pushed to upstream
- CheckUncommittedWork(): comprehensive work status check
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the core Witness functionality:
- gt witness start: Creates tmux session with Claude, theming, auto-priming
- gt witness stop: Kills tmux session and updates state
- gt witness status: Shows session state reconciled with tmux
- Shutdown handler: Verifies git clean state before cleanup, sends nudges
- Auto-spawn: Spawns polecats for ready work up to configurable capacity
- Health checks: Monitors polecat activity, nudges stuck workers, escalates
Also updates handoff to include polecat name in lifecycle requests.
Closes: gt-53w6, gt-mxyj, gt-5wtw, gt-cpm2, gt-es1i
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix ~50 errcheck warnings across the codebase:
- Add explicit `_ =` for intentionally ignored error returns (cleanup,
best-effort operations, etc.)
- Use `defer func() { _ = ... }()` pattern for defer statements
- Handle tmux SetEnvironment, KillSession, SendKeysRaw returns
- Handle mail router.Send returns
- Handle os.RemoveAll, os.Rename in cleanup paths
- Handle rand.Read returns for ID generation
- Handle fmt.Fprint* returns when writing to io.Writer
- Fix for-select with single case to use for-range
- Handle cobra MarkFlagRequired returns
All tests pass. Code compiles without errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This implements the ephemeral polecat model where polecats are spawned
fresh for each task and deleted upon completion.
Key changes:
**Spawn (internal/cmd/spawn.go):**
- Always create fresh worktree from main branch
- Run bd init in new worktree to initialize beads
- Remove --create flag (now implicit)
- Replace stale polecats with fresh worktrees
**Handoff (internal/cmd/handoff.go):**
- Add rig/polecat detection from environment and tmux session
- Send shutdown requests to correct witness (rig/witness)
- Include polecat name in lifecycle request body
**Witness (internal/witness/manager.go):**
- Add mail checking in monitoring loop
- Process LIFECYCLE shutdown requests
- Implement full cleanup sequence:
- Kill tmux session
- Remove git worktree
- Delete polecat branch
**Polecat state machine (internal/polecat/types.go):**
- Primary states: working, done, stuck
- Deprecate idle/active (kept for backward compatibility)
- New polecats start in working state
- ClearIssue transitions to done (not idle)
**Polecat commands (internal/cmd/polecat.go):**
- Update list to show "Active Polecats"
- Normalize legacy states for display
- Add deprecation warnings to wake/sleep commands
Closes gt-7ik
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the witness monitoring agent command with start, stop, and status
subcommands. The witness monitors polecats for stuck/idle states and
can nudge blocked workers.
Closes gt-kcee
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>