gastown

Author	SHA1	Message	Date
aleiby	22064b0730	feat: Add automatic orphaned claude process cleanup (#588 ) * feat: Add automatic orphaned claude process cleanup Claude Code's Task tool spawns subagent processes that sometimes don't clean up properly after completion. These accumulate and consume significant memory (observed: 17 processes using ~6GB RAM). This change adds automatic cleanup in two places: 1. Deacon patrol (primary): New patrol step "orphan-process-cleanup" runs `gt deacon cleanup-orphans` early in each cycle. More responsive (~30s). 2. Daemon heartbeat (fallback): Runs cleanup every 3 minutes as safety net when deacon is down. Detection uses TTY column - processes with TTY "?" have no controlling terminal. This is safe because: - Processes in terminals (user sessions) have a TTY like "pts/0" - untouched - Only kills processes with no controlling terminal - Orphaned subagents are children of tmux server with no TTY New files: - internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses - internal/util/orphan_test.go: Tests for orphan detection New command: - `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup Fixes #587 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(orphan): add Windows build tag and minimum age check Addresses review feedback on PR #588: 1. Add //go:build !windows to orphan.go and orphan_test.go - The code uses Unix-specific syscalls (SIGTERM, ESRCH) and ps command options that don't exist on Windows 2. Add minimum age check (60 seconds) to prevent false positives - Prevents race conditions with newly spawned subagents - Addresses reviewer concern about cron/systemd processes - Uses portable etime format instead of Linux-only etimes 3. Add parseEtime helper with comprehensive tests - Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS) - etimes (seconds) is Linux-specific, etime is portable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking Previous approach used process age which doesn't work: a Task subagent runs without TTY from birth, so a long-running legitimate subagent that later fails to exit would be immediately SIGKILLed without trying SIGTERM. New approach uses a state file to track signal history: 1. First encounter → SIGTERM, record PID + timestamp in state file 2. Next cycle (after 60s grace period) → if still alive, SIGKILL 3. Next cycle → if survived SIGKILL, log as unkillable and remove State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/) Format: "<pid> <signal> <unix_timestamp>" per line The state file is automatically cleaned up: - Dead processes removed on load - Unkillable processes removed after logging Also updates callers to use new CleanupResult type which includes the signal sent (SIGTERM, SIGKILL, or UNKILLABLE). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-16 15:35:48 -08:00
Walter McGivney	29f8dd67e2	fix: Add grace period to prevent Deacon restart loop (#590 ) * fix(daemon): prevent runaway refinery session spawning Fixes #566 The daemon spawned 812 refinery sessions over 4 days because: 1. Zombie detection was too strict - used IsAgentRunning(session, "node") but Claude reports pane command as version number (e.g., "2.1.7"), causing healthy sessions to be killed and recreated every heartbeat. 2. daemon.json patrol config was completely ignored - the daemon never loaded or checked the enabled flags. Changes: - refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning() for robust Claude detection (handles "node", "claude", version patterns) - daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read mayor/daemon.json - daemon/daemon.go: Load patrol config at startup, check enabled flags before calling ensureRefineriesRunning/ensureWitnessesRunning, add diagnostic logging for "already running" cases Tested: Verified over multiple heartbeats that refinery shows "already running, skipping spawn" instead of spawning new sessions. * fix: Add grace period to prevent Deacon restart loop The daemon had a race condition where: 1. ensureDeaconRunning() starts a new Deacon session 2. checkDeaconHeartbeat() runs in same heartbeat cycle 3. Heartbeat file is stale (from before crash) 4. Session is immediately killed 5. Infinite restart loop every 3 minutes Fix: - Track when Deacon was last started (deaconLastStarted field) - Skip heartbeat check during 5-minute grace period - Add config support for Deacon (consistency with refinery/witness) After grace period, normal heartbeat checking resumes. Genuinely stuck sessions (no heartbeat update after 5+ min) are still detected. Fixes #589 --------- Co-authored-by: mayor <your-github-email@example.com>	2026-01-16 15:27:41 -08:00
JJ	b1a5241430	fix(beads): align agent bead prefixes and force multi-hyphen IDs (#482 ) * fix(beads): align agent bead prefixes and force multi-hyphen IDs * fix(checkpoint): treat threshold as stale at boundary	2026-01-16 12:33:51 -08:00
Julian Knutsen	e7ca4908dc	refactor(config): remove BEADS_DIR from agent environment and add doctor check (#455 ) * fix(sling_test): update test for cook dir change The cook command no longer needs database context and runs from cwd, not the target rig directory. Update test to match this behavior change from `bd2a5ab5`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(tests): skip tests requiring missing binaries, handle --allow-stale - Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini binaries aren't available in the test environment - Update rig manager test stub to handle --allow-stale flag Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(config): remove BEADS_DIR from agent environment Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect mechanism instead of relying on environment variable. This prevents prefix mismatches when agents operate across different beads databases. Changes: - Remove BeadsDir field from AgentEnvConfig - Remove BEADS_DIR from env vars set on agent sessions - Update doctor env_check to not expect BEADS_DIR - Update all manager Start() calls to not pass BeadsDir Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(doctor): detect BEADS_DIR in tmux session environment Add a doctor check that warns when BEADS_DIR is set in any Gas Town tmux session. BEADS_DIR in the environment overrides prefix-based routing and breaks multi-rig lookups - agents should use the beads redirect mechanism instead. The check: - Iterates over all Gas Town tmux sessions (gt-* and hq-*) - Checks if BEADS_DIR is set in the session environment - Returns a warning with fix hint to restart sessions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: julianknutsen <julianknutsen@users.noreply.github> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-13 22:13:57 -08:00
sigfawn	3cf77b2e8b	fix(daemon): improve error handling and security (#445 ) * fix(beads): cache version check and add timeout to prevent cli lag * fix(mail_queue): add nil check for queue config Prevents potential nil pointer panic when queue config exists in map but has nil value. Added \|\| queueCfg == nil check to the queue lookup condition in runMailClaim function. Fixes potential panic that could occur if a queue entry exists in config but with a nil value. * fix(migrate_agents_test): fix icon expectations to match actual output The printMigrationResult function uses icons with two leading spaces (" ✓", " ⊘", " ✗") but the test expected icons without spaces. This fixes the test expectations to match the actual output format. * fix(hook): handle error from events.LogFeed Previously the error from LogFeed was silently ignored with _. Now we log the error to stderr at warning level but don't fail the operation since the primary hook action succeeded. * fix(tmux): security and error handling improvements - Fix unchecked regexp error in IsClaudeRunning (CVE-like) - Add input sanitization to SetPaneDiedHook to prevent shell injection - Add session name validation to SetDynamicStatus - Sanitize mail from/subject in SendNotificationBanner - Return error on parse failure in GetEnvironment - Track skipped lines in ListSessionIDs for debuggability See: tmux.fix for full analysis * fix(daemon): improve error handling and security - Capture stderr in syncWorkspace for better debuggability - Fail fast on git fetch failures to prevent stale code - Add logging to previously silent bd list errors - Change notification state file permissions to 0600 - Improve error messages with actual stderr content This prevents agents from starting with stale code and provides better visibility into daemon operations.	2026-01-13 22:13:54 -08:00
Johann Dirry	5d96243414	fix: Windows build support with platform-specific process/signal handling Separate platform-dependent code into build-tagged files: - process_unix.go / process_windows.go: isProcessRunning() implementation - signals_unix.go / signals_windows.go: daemon signal handling (Windows lacks SIGUSR1) Windows implementation uses windows.OpenProcess with PROCESS_QUERY_LIMITED_INFORMATION and checks exit code against STILL_ACTIVE (259). Original-PR: #447 Co-Authored-By: Johann Dirry <johann.dirry@microsea.at>	2026-01-13 20:59:15 -08:00
Will Saults	bda248fb9a	feat(refinery,boot): add --agent flag for model selection (#469 ) * feat(refinery,boot): add --agent flag for model selection (hq-7d5m) Add --agent flag to gt refinery start/attach/restart and gt boot spawn commands for consistent model selection across all agent launch points. Implementation follows the existing pattern from gt deacon start: - Add StringVar flag for agent alias - Pass override to Manager/Boot via SetAgentOverride() - Use BuildAgentStartupCommandWithAgentOverride when override is set Files affected: - cmd/gt/refinery.go: add flags to start/attach/restart commands - internal/refinery/manager.go: add SetAgentOverride and use in Start() - cmd/gt/boot.go: add flag to spawn command - internal/boot/boot.go: add SetAgentOverride and use in spawnTmux() Closes #438 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(refinery,boot): use parameter-passing pattern for --agent flag Address PR review feedback: 1. ADD TESTS: Add tests for --agent flag existence following witness_test.go pattern - internal/cmd/refinery_test.go: tests for start/attach/restart - internal/cmd/boot_test.go: test for spawn 2. ALIGN PATTERN: Change from setter pattern to parameter-passing pattern - Manager.Start(foreground, agentOverride) instead of SetAgentOverride + Start - Boot.Spawn(agentOverride) instead of SetAgentOverride + Spawn - Matches witness.go style: Start(foreground bool, agentOverride string, ...) Updated all callers to pass empty string for default agent: - internal/daemon/daemon.go - internal/cmd/rig.go - internal/cmd/start.go - internal/cmd/up.go Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: furiosa <will@saults.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-13 13:14:47 -08:00
gastown/crew/george	f79614d764	feat(daemon): event-driven convoy completion check (hq-5kmkl) Add ConvoyWatcher that monitors bd activity for issue closes and triggers convoy completion checks immediately rather than waiting for patrol. - Watch bd activity --follow --town --json for status=closed events - Query SQLite for convoys tracking the closed issue - Trigger gt convoy check when tracked issue closes - Convoys close within seconds of last issue closing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-12 18:39:11 -08:00
Julian Knutsen	3caf32f9f7	fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv (#385 ) * fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv Fix polecats not having GT_ROOT environment variable set. The symptom was polecat sessions showing GT_ROOT="" instead of the expected town root. Root cause: AgentEnvSimple doesn't set TownRoot, but AgentEnv was always setting env["GT_ROOT"] = cfg.TownRoot even when empty. This empty value in export commands would override the tmux session environment. Changes: - Only set GT_ROOT and BEADS_DIR in env map if non-empty - Refactor daemon.go to use AgentEnv with full AgentEnvConfig instead of AgentEnvSimple + manual additions - Update test to verify keys are absent rather than empty Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(lint): silence unparam for unused executeExternalActions args The external action params (beadID, severity, description) are reserved for future email/SMS/slack implementations but currently unused. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: julianknutsen <julianknutsen@users.noreply.github> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: max <steve.yegge@gmail.com>	2026-01-12 02:45:03 -08:00
abhijit	833724a7ed	new changes	2026-01-11 19:03:06 -08:00
mayor	0f6759e4a2	docs(daemon): update comment to reflect self-cleaning model The comment incorrectly referred to polecats without hooked work as "idle". With the self-cleaning model, polecats self-nuke on completion - there are no idle polecats. A polecat without work is orphaned (needs cleanup). Closes: gt-0jn0k Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 22:46:51 -08:00
nux	c4fcdd88c8	fix(daemon,beads): use correct agent bead ID format and bd create flags Two fixes in this commit: 1. daemon/lifecycle.go: Fix agent bead ID pattern for GUPP/orphaned work checks - Wrong: gt-polecat-<rig>-<name> (e.g., gt-polecat-gastown-nux) - Correct: <prefix>-<rig>-polecat-<name> (e.g., gt-gastown-polecat-nux) - Use config.GetRigPrefix() instead of hardcoding gt prefix - Use beads.ParseAgentBeadID() in extractRigFromAgentID 2. beads/beads.go: Fix invalid --add-label flag in bd create calls - bd create uses --labels, not --add-label - bd update uses --add-label (unchanged, was correct) - Fixed Create, CreateWithID, CreateAgentBead, CreateRigBead Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-09 22:08:55 -08:00
gastown/crew/gus	86751e1ea5	feat(witness): add --env flag for environment variable overrides Extends the --agent flag with a more general --env flag that allows setting arbitrary environment variables when starting a witness. Precedence (highest to lowest): 1. CLI --env overrides 2. Role bead env_vars 3. config.AgentEnv() defaults Examples: gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true Co-authored-by: joshuavial <git@codewithjv.com>	2026-01-09 22:00:43 -08:00
joshuavial	0d3f6c9654	feat: allow witness restart agent override	2026-01-09 21:56:53 -08:00
gastown/crew/gus	7a1ed80068	fix: remove unused identity parameter from setSessionEnvironment	2026-01-09 21:54:54 -08:00
julianknutsen	e999ceb1c1	refactor: consolidate agent env vars into config.AgentEnv Create centralized AgentEnv function as single source of truth for all agent environment variables. All agents now consistently receive: - GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity) - GT_ROOT, BEADS_DIR (workspace paths) - GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity) - BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config) - CLAUDE_CONFIG_DIR (optional account selection) Remove RoleEnvVars in favor of AgentEnvSimple wrapper. Remove IncludeBeadsEnv flag - beads env vars always included. Update all manager and cmd call sites to use AgentEnv. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-09 21:52:30 -08:00
julianknutsen	1d88a73eaa	fix: use ResolveBeadsDir for polecat BEADS_DIR Previously, polecat startup used hardcoded paths for BEADS_DIR that didn't follow redirects for repos with tracked beads. This meant polecats working in worktrees (where .beads/redirect points to the actual beads location) would use the wrong beads directory. Fixed locations: - daemon.go: polecat startup now uses ResolveBeadsDir - polecat/session_manager.go: session startup now uses ResolveBeadsDir Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-09 21:52:09 -08:00
julianknutsen	7150ce2624	refactor: update managers to use RoleEnvVars Consolidates all role startup code to use the shared RoleEnvVars() function, ensuring consistent env vars across tmux SetEnvironment and Claude startup command exports. Updated: - Mayor manager - Deacon startup (daemon.go) - Witness manager - Refinery manager - Polecat startup (daemon.go) - BuildPolecatStartupCommand, BuildCrewStartupCommand helpers This ensures all agents receive the same identity env vars regardless of startup path. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-09 21:52:09 -08:00
nux	692d6819f2	feat(crash): improve crash logging and mass death detection Add comprehensive crash logging improvements to help diagnose mass session death events: - Add TypeSessionDeath and TypeMassDeath event types for feed visibility - Log pre-death events before killing sessions (who killed, why) - Add mass death detection in daemon (3+ deaths in 30s triggers alert) - Add macOS crash report check in gt doctor - Support session death events in townlog and feed curator Closes hq-kt1o6 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-09 14:11:09 -08:00
gastown/crew/max	9b2f4a7652	feat(polecat): add repo path to worktrees for LLM ergonomics (GH#283) Changes polecat worktree structure from: polecats/<name>/ to: polecats/<name>/<rigname>/ This gives Claude Code agents a recognizable directory name (e.g., tidepool/) in their cwd instead of just the polecat name, preventing confusion about which repo they are working in. Key changes: - Add clonePath() method to manager.go and session_manager.go for the actual git worktree path, keeping polecatDir() for existence checks - Update Add(), RepairWorktree(), Remove() to use new structure - Update daemon lifecycle and restart code for new paths - Update witness handlers to detect both structures - Update doctor checks (rig_check, branch_check, config_check, claude_settings_check) for backward compatibility - All code includes fallback to old structure for existing polecats Fixes #283 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-08 23:16:10 -08:00
george	65c3e90374	feat: Add Codex and OpenCode runtime backend support (#281 ) Adds support for alternative AI runtime backends (Codex, OpenCode) alongside the default Claude backend through a runtime abstraction layer. - internal/runtime/runtime.go - Runtime-agnostic helper functions - Extended RuntimeConfig with provider-specific settings - internal/opencode/ for OpenCode plugin support - Updated session managers to use runtime abstraction - Removed unused ensureXxxSession functions - Fixed daemon.go indentation, updated terminology to runtime Backward compatible: Claude remains default runtime. Co-Authored-By: Ben Kraus <ben@cinematicsoftware.com> Co-Authored-By: Cameron Palmer <cameronmpalmer@users.noreply.github.com>	2026-01-08 22:56:37 -08:00
Cameron Palmer	9fe9323b9c	fix: clean up dead code and fix indentation in runtime PR - Remove unused ensureRefinerySession function from start.go - Remove unused ensureSession and ensureWitness functions from up.go - Remove unused ensureWitnessSession function from witness.go - Remove orphaned imports (runtime, session, constants, config, rig, filepath, time) - Fix indentation error in daemon.go triggerPendingSpawns comment These functions were added as part of the Codex/OpenCode runtime support but were never wired up. The existing managers (refinery.Manager.Start, witness.Manager.Start) already handle session creation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-08 22:51:51 -08:00
Subhrajit Makur	c8c765a239	fix: improve integration test reliability (#13 ) - Add custom types config after bd init in daemon tests - Replace fixed sleeps with poll-based waiting in tmux tests - Skip beads integration test for JSONL-only repos Fixes flaky test failures in parallel execution.	2026-01-08 17:58:16 -08:00
Steve Yegge	da906847dd	Merge pull request #279 from joshuavial/fix/polecat-dotdir-scan fix: extend polecat dot-dir filtering beyond #258	2026-01-08 17:23:26 -08:00
Ben Kraus	38adfa4d8b	codex	2026-01-08 12:36:54 -05:00
joshuavial	c699e3e2ed	Stabilize bd role config tests	2026-01-08 22:43:31 +13:00
julianknutsen	65ecb6cafd	fix(daemon): restore ensureDeaconRunning to heartbeat and use Manager The heartbeat now explicitly calls ensureDeaconRunning() for basic "is Deacon alive" checks, while Boot handles intelligent triage (stuck/nudge/interrupt decisions). Changed ensureDeaconRunning to use deacon.Manager.Start() instead of duplicating startup logic. This gives daemon the same benefits: - WaitForShellReady (fixes race condition) - Claude settings setup - Theming - StartupNudge and PropulsionNudge (GUPP) Heartbeat order: 1. ensureDeaconRunning - restart if dead (via Manager) 2. ensureBootRunning - intelligent triage for stuck states 3. checkDeaconHeartbeat - belt-and-suspenders fallback 4-11. Other checks (witnesses, refineries, polecats, etc.) This was inadvertently removed when Boot was introduced, which delegated all Deacon checks to Boot. But Boot's mol doesn't actually restart Deacon - it just reports. Now responsibilities are clear: daemon ensures alive, Boot ensures responsive. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-08 01:23:51 -08:00
Joshua Vial	a9ed342be6	fix: ignore hidden directories when enumerating polecats (#258 ) * fix(sling): route bd mol commands to target rig directory * Fix daemon polecat enumeration to ignore hidden dirs * Ignore hidden dirs when discovering rig polecats * Fix CI: enable beads custom types during install --------- Co-authored-by: joshuavial <git@codewithjv.com>	2026-01-07 20:48:09 -08:00
gus	06d40925d1	feat(daemon): warn when rig wisp config missing Logs a warning when checking rig operational state if the wisp config file doesn't exist. This helps diagnose cases where a parked rig unexpectedly restarts because its parked state was lost.	2026-01-07 01:34:49 -08:00
julianknutsen	29e2c6ed9c	Fix daemon polecat respawn to pass rigPath for rig agent settings Daemon's restartPolecatSession was calling BuildPolecatStartupCommand with empty rigPath, causing polecats to fall back to town-level defaults instead of honoring rig-specific agent settings. Now passes rigPath so rig agent settings are honored. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 22:44:16 -08:00
julianknutsen	72544cc06d	Unify agent startup with Manager pattern Refactors all agent startup paths (witness, refinery, crew, polecat) to use a consistent Manager interface with Start(), Stop(), IsRunning(), and SessionName() methods. Includes: - Witness manager with GUPP propulsion nudge for startup - Refinery manager for engineer sessions - Crew manager for worker agents - Session/polecat manager updates - claude_settings_check doctor check for settings validation - Settings management consolidated from rig/manager.go - Settings location moved outside source repos to prevent conflicts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 21:44:04 -08:00
gastown/crew/joe	fc4b9de02c	fix: use tmux for agent liveness in daemon checks (gt-zecmc) Complete the "discover, don't track" refactoring: - checkGUPPViolations: use tmux.IsClaudeRunning() instead of agent_state - checkOrphanedWork: derive dead agents from tmux, not agent_state=dead - assessStaleness: rely on HasActiveSession (tmux), not agent_state Non-observable states (stuck, awaiting-gate) are still respected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 20:50:01 -08:00
rictus	d00e73f110	feat(daemon): respect rig operational state for auto-start Update daemon to check rig config before auto-starting agents: - Check wisp config "status" - skip if parked or docked - Check "auto_restart" config - skip if blocked or false - Log skip reason for visibility Affects ensureWitnessRunning, ensureRefineryRunning, restartPolecatSession, and lifecycle restartSession. (gt-68c46) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 20:43:00 -08:00
gastown/crew/joe	87169a3fc7	fix: complete removal of agent_state observable tracking (gt-zecmc) Additional cleanup from the agent_state refactoring: - Remove dead code: checkStaleAgents(), markAgentDead() in lifecycle.go - Remove dead code: reportAgentState(), getAgentFields() in prime.go - Update getAgentBeadState() comment to clarify non-observable states only - Update mol-witness-patrol.formula.toml to use tmux discovery - Update mol-polecat-lease.formula.toml to use POLECAT_DONE mail - Update docs/watchdog-chain.md to reflect new architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 20:42:23 -08:00
gastown/crew/joe	1f44482ad0	fix: remove observable states from agent_state (discover, don't track) The agent_state field was recording observable state like "running", "dead", "idle" which violated the "Discover, Don't Track" principle. This caused stale state bugs where agents were marked "dead" in beads but actually running in tmux. Changes: - Remove daemon's checkStaleAgents() which marked agents "dead" - Simplify ensureXxxRunning() to use tmux.IsClaudeRunning() directly - Remove reportAgentState() calls from gt prime and gt handoff - Add SetHookBead/ClearHookBead helpers that don't update agent_state - Use ClearHookBead in gt done and gt unsling - Simplify gt status to derive state from tmux, not bead Non-observable states (stuck, awaiting-gate, muted, paused) are still set because they represent intentional agent decisions that can't be discovered from tmux state. Fixes: gt-zecmc 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 20:32:02 -08:00
gastown/crew/joe	6dbb841e22	fix(daemon): nudge agents on state divergence instead of silent accept When the daemon detects that an agent bead state doesn't match tmux (e.g., bead says stopped but Claude is running), it now: 1. Logs the divergence clearly with STATE DIVERGENCE prefix 2. Nudges the agent with an actionable command to fix its state 3. Still skips the restart (safety - don't kill healthy sessions) This prevents silent state drift where bead state diverges from reality. Applied to: Deacon, Witness, Refinery ensure functions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 19:23:52 -08:00
jv	99aae0bf02	fix: use canonical hq role bead IDs	2026-01-06 19:13:14 -08:00
jv	02ca9e43fa	fix: honor rig agent when starting witness/refinery	2026-01-06 19:12:55 -08:00
gastown/crew/joe	c2451b85e7	Merge origin/main into fix/205-address-claude-startup-issues Resolved conflict in internal/witness/manager.go: - Kept session import (used by PR code) - Kept PR's more accurate comment for PID check - Removed duplicate sessionName method introduced by merge	2026-01-06 19:04:29 -08:00
Julian Knutsen	9d7dcde1e2	feat: Unified beads redirect for tracked and local beads (#222 ) * feat: Beads redirect architecture for tracked and local beads This change implements proper redirect handling so that all rig agents (Witness, Refinery, Crew, Polecats) can work with both: - Tracked beads: .beads/ checked into git at mayor/rig/.beads - Local beads: .beads/ created at rig root during gt rig add Key changes: 1. SetupRedirect now handles tracked beads by skipping redirect chains. The bd CLI doesn't support chains (A→B→C), so worktrees redirect directly to the final destination (mayor/rig/.beads for tracked). 2. ResolveBeadsDir is now used consistently in polecat and refinery managers instead of hardcoded mayor/rig paths. 3. Rig-level agents (witness, refinery) now use rig beads with rig prefix instead of town beads. This follows the architecture where town beads are only for Mayor/Deacon. 4. prime.go simplified to always use ../../.beads for crew redirects, letting rig-level redirect handle tracked vs local routing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(doctor): Add beads-redirect check for tracked beads When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root needs a redirect file pointing to that location. This check: - Detects missing rig-level redirect for tracked beads - Verifies redirect points to correct location (mayor/rig/.beads) - Auto-fixes with 'gt doctor --fix' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Handle fileLock.Unlock error in daemon Wrap fileLock.Unlock() return value to satisfy errcheck linter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 12:59:37 -08:00
julianknutsen	09bbb0f430	Fix lint issues and sparse checkout for empty repos - Handle empty repos in ConfigureSparseCheckout (skip read-tree when no HEAD) - Fix errcheck: wrap fileLock.Unlock() error in defer - Fix unparam: remove unused *rig.Rig return from getWitnessManager - Fix unparam: mark unused agentType parameter with blank identifier 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 02:16:15 -08:00
mayor	31a32c084b	Unify agent startup with GUPP propulsion nudge Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go, cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge (GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start() which handles everything including nudges. Changes: - witness/manager.go: Full rewrite with session creation, env vars, theming, WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP) - refinery/manager.go: Add propulsion nudge sequence after Claude startup - cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession - cmd/rig.go: Use witness.Manager.Start() instead of inline session creation - cmd/start.go: Use witness.Manager.Start() - cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(), add EnsureSettingsForRole in ensureSession() - daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for unified startup with proper nudges This ensures all agent startup paths (gt witness start, gt rig boot, gt up, daemon restarts) consistently apply GUPP propulsion nudges. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-06 01:51:25 -08:00
gastown/crew/jack	e8d27e7212	fix: Create and lookup rig agent beads with correct prefix Per docs/architecture.md, Witness and Refinery are rig-level agents that should use the rig's configured prefix (e.g., pi- for pixelforge) instead of hardcoded "gt-". This extends PR #183's creation fix to also fix all lookup paths: - internal/rig/manager.go: Create agent beads in rig beads with rig prefix - internal/daemon/daemon.go: Use rig prefix when looking up agent state - internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping - internal/cmd/sling.go: Pass townRoot for prefix lookup - internal/cmd/unsling.go: Pass townRoot for prefix lookup - internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups - internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups - internal/config/loader.go: Add GetRigPrefix helper Without this fix, the daemon would: - Create pi-gastown-witness but look for gt-gastown-witness - Report agents as missing/dead when they are running - Fail to manage agent lifecycle correctly Based on work by Johann Taberlet in PR #183. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 19:39:57 -08:00
mayor	5224dfb50d	Merge remote-tracking branch 'origin/polecat/slit-mk1wa0rj'	2026-01-05 19:39:01 -08:00
gastown	43cca06460	feat: Add Windows-compatible file locking for daemon Replace Unix-only syscall.Flock with gofrs/flock library for cross-platform file locking. This enables the daemon to run on Windows in addition to Unix-like systems. - Add github.com/gofrs/flock v0.13.0 dependency - Replace syscall.Flock calls with flock.TryLock/Unlock - Maintain same non-blocking exclusive lock semantics (gt-5354h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 19:21:09 -08:00
slit	8110aab257	fix: Add GUPP propulsion nudge to daemon restartSession When Deacon respawns refinery/witness sessions via LIFECYCLE requests, the new sessions were starting at the Claude welcome screen without the propulsion nudge that triggers autonomous execution. Added StartupNudge and PropulsionNudgeForRole calls to restartSession() in lifecycle.go, matching the pattern used in ensureRefinerySession() in start.go. This ensures respawned agents receive the GUPP nudge and begin autonomous work immediately. Fixes: gt-01jpg 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 17:14:49 -08:00
gastown/crew/jack	6b8c897e37	feat: use hq- prefix for Mayor and Deacon session names Town-level services (Mayor, Deacon) now use hq- prefix instead of gt-: - hq-mayor (was gt-mayor) - hq-deacon (was gt-deacon) This distinguishes town-level sessions from rig-level sessions which continue to use gt- prefix (gt-gastown-witness, gt-gastown-crew-max, etc). Changes: - session.MayorSessionName() returns "hq-mayor" - session.DeaconSessionName() returns "hq-deacon" - ParseSessionName() handles both hq- and gt- prefixes - categorizeSession() handles both prefixes - categorizeSessions() accepts both prefixes - Updated all tests and documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 00:42:24 -08:00
gastown/crew/jack	eea4435269	feat(rig): Add --branch flag for custom default branch Add --branch flag to `gt rig add` to specify a custom default branch instead of auto-detecting from remote. This supports repositories that use non-standard default branches like `develop` or `release`. Changes: - Add --branch flag to `gt rig add` command - Store default_branch in rig config.json - Propagate default branch to refinery, witness, daemon, and all commands - Rename ensureMainBranch to ensureDefaultBranch for clarity - Add Rig.DefaultBranch() method for consistent access - Update crew/manager.go and swarm/manager.go to use rig config Based on PR #49 by @kustrun - rebased and extended with additional fixes. Co-authored-by: kustrun <kustrun@users.noreply.github.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 14:01:02 -08:00
jakehemmerle	c25ff34b5b	fix(daemon): prevent orphan daemons via file locking Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU race condition where concurrent 'gt daemon start' commands could spawn multiple daemons. Only the first to acquire the lock succeeds; others exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so multiple GT instances from different directories work independently. Also detect race losers in runDaemonStart() by comparing spawned PID with PID file, reporting 'already running' instead of false success.	2026-01-04 15:30:57 -05:00
splendid	acd2565a5b	fix: remove vestigial state.json files from agent directories Agent directories (witness/, refinery/, mayor/) contained state.json files with last_active timestamps that were never updated, making them stale and misleading. This change removes: - initAgentStates function that created vestigial state.json files - AgentState type and related Load/Save functions from config package - MayorStateValidCheck from doctor checks - requesting_* lifecycle verification (dead code - flags were never set) - FileStateJSON constant and MayorStatePath function Kept intact: - daemon/state.json (actively used for daemon runtime state) - crew/<name>/state.json (operational CrewWorker metadata) - Agent state tracking via beads (the ZFC-compliant approach) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 10:36:23 -08:00

1 2 3

119 Commits