Commit Graph

120 Commits

Author SHA1 Message Date
gastown/crew/max
a610283078 feat(roles): switch daemon to config-based roles, remove role beads (Phase 2+3)
Phase 2: Daemon now uses config.LoadRoleDefinition() instead of role beads
- lifecycle.go: getRoleConfigForIdentity() reads from TOML configs
- Layered override resolution: builtin → town → rig

Phase 3: Remove role bead creation and references
- Remove RoleBead field from AgentFields struct
- gt install no longer creates role beads
- Remove 'role' from custom types list
- Delete migrate_agents.go (no longer needed)
- Deprecate beads_role.go (kept for reading existing beads)
- Rewrite role_beads_check.go to validate TOML configs

Existing role beads are orphaned but harmless.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 12:58:01 -08:00
aleiby
22064b0730 feat: Add automatic orphaned claude process cleanup (#588)
* feat: Add automatic orphaned claude process cleanup

Claude Code's Task tool spawns subagent processes that sometimes don't clean up
properly after completion. These accumulate and consume significant memory
(observed: 17 processes using ~6GB RAM).

This change adds automatic cleanup in two places:

1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs
   `gt deacon cleanup-orphans` early in each cycle. More responsive (~30s).

2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net
   when deacon is down.

Detection uses TTY column - processes with TTY "?" have no controlling terminal.
This is safe because:
- Processes in terminals (user sessions) have a TTY like "pts/0" - untouched
- Only kills processes with no controlling terminal
- Orphaned subagents are children of tmux server with no TTY

New files:
- internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses
- internal/util/orphan_test.go: Tests for orphan detection

New command:
- `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup

Fixes #587

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(orphan): add Windows build tag and minimum age check

Addresses review feedback on PR #588:

1. Add //go:build !windows to orphan.go and orphan_test.go
   - The code uses Unix-specific syscalls (SIGTERM, ESRCH) and
     ps command options that don't exist on Windows

2. Add minimum age check (60 seconds) to prevent false positives
   - Prevents race conditions with newly spawned subagents
   - Addresses reviewer concern about cron/systemd processes
   - Uses portable etime format instead of Linux-only etimes

3. Add parseEtime helper with comprehensive tests
   - Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS)
   - etimes (seconds) is Linux-specific, etime is portable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking

Previous approach used process age which doesn't work: a Task subagent
runs without TTY from birth, so a long-running legitimate subagent that
later fails to exit would be immediately SIGKILLed without trying SIGTERM.

New approach uses a state file to track signal history:

1. First encounter → SIGTERM, record PID + timestamp in state file
2. Next cycle (after 60s grace period) → if still alive, SIGKILL
3. Next cycle → if survived SIGKILL, log as unkillable and remove

State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/)
Format: "<pid> <signal> <unix_timestamp>" per line

The state file is automatically cleaned up:
- Dead processes removed on load
- Unkillable processes removed after logging

Also updates callers to use new CleanupResult type which includes
the signal sent (SIGTERM, SIGKILL, or UNKILLABLE).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:35:48 -08:00
Walter McGivney
29f8dd67e2 fix: Add grace period to prevent Deacon restart loop (#590)
* fix(daemon): prevent runaway refinery session spawning

Fixes #566

The daemon spawned 812 refinery sessions over 4 days because:

1. Zombie detection was too strict - used IsAgentRunning(session, "node")
   but Claude reports pane command as version number (e.g., "2.1.7"),
   causing healthy sessions to be killed and recreated every heartbeat.

2. daemon.json patrol config was completely ignored - the daemon never
   loaded or checked the enabled flags.

Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
  for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
  mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
  before calling ensureRefineriesRunning/ensureWitnessesRunning, add
  diagnostic logging for "already running" cases

Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.

* fix: Add grace period to prevent Deacon restart loop

The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes

Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)

After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.

Fixes #589

---------

Co-authored-by: mayor <your-github-email@example.com>
2026-01-16 15:27:41 -08:00
JJ
b1a5241430 fix(beads): align agent bead prefixes and force multi-hyphen IDs (#482)
* fix(beads): align agent bead prefixes and force multi-hyphen IDs

* fix(checkpoint): treat threshold as stale at boundary
2026-01-16 12:33:51 -08:00
Julian Knutsen
e7ca4908dc refactor(config): remove BEADS_DIR from agent environment and add doctor check (#455)
* fix(sling_test): update test for cook dir change

The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): skip tests requiring missing binaries, handle --allow-stale

- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
  binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(config): remove BEADS_DIR from agent environment

Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.

Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(doctor): detect BEADS_DIR in tmux session environment

Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.

The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:13:57 -08:00
sigfawn
3cf77b2e8b fix(daemon): improve error handling and security (#445)
* fix(beads): cache version check and add timeout to prevent cli lag

* fix(mail_queue): add nil check for queue config

Prevents potential nil pointer panic when queue config exists
in map but has nil value. Added || queueCfg == nil check to
the queue lookup condition in runMailClaim function.

Fixes potential panic that could occur if a queue entry exists
in config but with a nil value.

* fix(migrate_agents_test): fix icon expectations to match actual output

The printMigrationResult function uses icons with two leading spaces
("  ✓", "  ⊘", "  ✗") but the test expected icons without spaces.
This fixes the test expectations to match the actual output format.

* fix(hook): handle error from events.LogFeed

Previously the error from LogFeed was silently ignored with _.
Now we log the error to stderr at warning level but don't fail
the operation since the primary hook action succeeded.

* fix(tmux): security and error handling improvements

- Fix unchecked regexp error in IsClaudeRunning (CVE-like)
- Add input sanitization to SetPaneDiedHook to prevent shell injection
- Add session name validation to SetDynamicStatus
- Sanitize mail from/subject in SendNotificationBanner
- Return error on parse failure in GetEnvironment
- Track skipped lines in ListSessionIDs for debuggability

See: tmux.fix for full analysis

* fix(daemon): improve error handling and security

- Capture stderr in syncWorkspace for better debuggability
- Fail fast on git fetch failures to prevent stale code
- Add logging to previously silent bd list errors
- Change notification state file permissions to 0600
- Improve error messages with actual stderr content

This prevents agents from starting with stale code and provides
better visibility into daemon operations.
2026-01-13 22:13:54 -08:00
Johann Dirry
5d96243414 fix: Windows build support with platform-specific process/signal handling
Separate platform-dependent code into build-tagged files:
- process_unix.go / process_windows.go: isProcessRunning() implementation
- signals_unix.go / signals_windows.go: daemon signal handling (Windows lacks SIGUSR1)

Windows implementation uses windows.OpenProcess with PROCESS_QUERY_LIMITED_INFORMATION
and checks exit code against STILL_ACTIVE (259).

Original-PR: #447
Co-Authored-By: Johann Dirry <johann.dirry@microsea.at>
2026-01-13 20:59:15 -08:00
Will Saults
bda248fb9a feat(refinery,boot): add --agent flag for model selection (#469)
* feat(refinery,boot): add --agent flag for model selection (hq-7d5m)

Add --agent flag to gt refinery start/attach/restart and gt boot spawn
commands for consistent model selection across all agent launch points.

Implementation follows the existing pattern from gt deacon start:
- Add StringVar flag for agent alias
- Pass override to Manager/Boot via SetAgentOverride()
- Use BuildAgentStartupCommandWithAgentOverride when override is set

Files affected:
- cmd/gt/refinery.go: add flags to start/attach/restart commands
- internal/refinery/manager.go: add SetAgentOverride and use in Start()
- cmd/gt/boot.go: add flag to spawn command
- internal/boot/boot.go: add SetAgentOverride and use in spawnTmux()

Closes #438

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(refinery,boot): use parameter-passing pattern for --agent flag

Address PR review feedback:

1. ADD TESTS: Add tests for --agent flag existence following witness_test.go pattern
   - internal/cmd/refinery_test.go: tests for start/attach/restart
   - internal/cmd/boot_test.go: test for spawn

2. ALIGN PATTERN: Change from setter pattern to parameter-passing pattern
   - Manager.Start(foreground, agentOverride) instead of SetAgentOverride + Start
   - Boot.Spawn(agentOverride) instead of SetAgentOverride + Spawn
   - Matches witness.go style: Start(foreground bool, agentOverride string, ...)

Updated all callers to pass empty string for default agent:
- internal/daemon/daemon.go
- internal/cmd/rig.go
- internal/cmd/start.go
- internal/cmd/up.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: furiosa <will@saults.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 13:14:47 -08:00
gastown/crew/george
f79614d764 feat(daemon): event-driven convoy completion check (hq-5kmkl)
Add ConvoyWatcher that monitors bd activity for issue closes and
triggers convoy completion checks immediately rather than waiting
for patrol.

- Watch bd activity --follow --town --json for status=closed events
- Query SQLite for convoys tracking the closed issue
- Trigger gt convoy check when tracked issue closes
- Convoys close within seconds of last issue closing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 18:39:11 -08:00
Julian Knutsen
3caf32f9f7 fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv (#385)
* fix(config): don't export empty GT_ROOT/BEADS_DIR in AgentEnv

Fix polecats not having GT_ROOT environment variable set. The symptom was
polecat sessions showing GT_ROOT="" instead of the expected town root.

Root cause: AgentEnvSimple doesn't set TownRoot, but AgentEnv was always
setting env["GT_ROOT"] = cfg.TownRoot even when empty. This empty value
in export commands would override the tmux session environment.

Changes:
- Only set GT_ROOT and BEADS_DIR in env map if non-empty
- Refactor daemon.go to use AgentEnv with full AgentEnvConfig instead
  of AgentEnvSimple + manual additions
- Update test to verify keys are absent rather than empty

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(lint): silence unparam for unused executeExternalActions args

The external action params (beadID, severity, description) are reserved
for future email/SMS/slack implementations but currently unused.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: max <steve.yegge@gmail.com>
2026-01-12 02:45:03 -08:00
abhijit
833724a7ed new changes 2026-01-11 19:03:06 -08:00
mayor
0f6759e4a2 docs(daemon): update comment to reflect self-cleaning model
The comment incorrectly referred to polecats without hooked work as "idle".
With the self-cleaning model, polecats self-nuke on completion - there are
no idle polecats. A polecat without work is orphaned (needs cleanup).

Closes: gt-0jn0k

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 22:46:51 -08:00
nux
c4fcdd88c8 fix(daemon,beads): use correct agent bead ID format and bd create flags
Two fixes in this commit:

1. daemon/lifecycle.go: Fix agent bead ID pattern for GUPP/orphaned work checks
   - Wrong: gt-polecat-<rig>-<name> (e.g., gt-polecat-gastown-nux)
   - Correct: <prefix>-<rig>-polecat-<name> (e.g., gt-gastown-polecat-nux)
   - Use config.GetRigPrefix() instead of hardcoding gt prefix
   - Use beads.ParseAgentBeadID() in extractRigFromAgentID

2. beads/beads.go: Fix invalid --add-label flag in bd create calls
   - bd create uses --labels, not --add-label
   - bd update uses --add-label (unchanged, was correct)
   - Fixed Create, CreateWithID, CreateAgentBead, CreateRigBead

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:08:55 -08:00
gastown/crew/gus
86751e1ea5 feat(witness): add --env flag for environment variable overrides
Extends the --agent flag with a more general --env flag that allows
setting arbitrary environment variables when starting a witness.

Precedence (highest to lowest):
1. CLI --env overrides
2. Role bead env_vars
3. config.AgentEnv() defaults

Examples:
  gt witness start greenplace --env ANTHROPIC_MODEL=claude-3-haiku
  gt witness restart greenplace --env DEBUG=1 --env VERBOSE=true

Co-authored-by: joshuavial <git@codewithjv.com>
2026-01-09 22:00:43 -08:00
joshuavial
0d3f6c9654 feat: allow witness restart agent override 2026-01-09 21:56:53 -08:00
gastown/crew/gus
7a1ed80068 fix: remove unused identity parameter from setSessionEnvironment 2026-01-09 21:54:54 -08:00
julianknutsen
e999ceb1c1 refactor: consolidate agent env vars into config.AgentEnv
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)

Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:30 -08:00
julianknutsen
1d88a73eaa fix: use ResolveBeadsDir for polecat BEADS_DIR
Previously, polecat startup used hardcoded paths for BEADS_DIR that
didn't follow redirects for repos with tracked beads. This meant
polecats working in worktrees (where .beads/redirect points to the
actual beads location) would use the wrong beads directory.

Fixed locations:
- daemon.go: polecat startup now uses ResolveBeadsDir
- polecat/session_manager.go: session startup now uses ResolveBeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
julianknutsen
7150ce2624 refactor: update managers to use RoleEnvVars
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.

Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers

This ensures all agents receive the same identity env vars regardless
of startup path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
nux
692d6819f2 feat(crash): improve crash logging and mass death detection
Add comprehensive crash logging improvements to help diagnose mass session death events:

- Add TypeSessionDeath and TypeMassDeath event types for feed visibility
- Log pre-death events before killing sessions (who killed, why)
- Add mass death detection in daemon (3+ deaths in 30s triggers alert)
- Add macOS crash report check in gt doctor
- Support session death events in townlog and feed curator

Closes hq-kt1o6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 14:11:09 -08:00
gastown/crew/max
9b2f4a7652 feat(polecat): add repo path to worktrees for LLM ergonomics (GH#283)
Changes polecat worktree structure from:
  polecats/<name>/
to:
  polecats/<name>/<rigname>/

This gives Claude Code agents a recognizable directory name (e.g., tidepool/)
in their cwd instead of just the polecat name, preventing confusion about
which repo they are working in.

Key changes:
- Add clonePath() method to manager.go and session_manager.go for the actual
  git worktree path, keeping polecatDir() for existence checks
- Update Add(), RepairWorktree(), Remove() to use new structure
- Update daemon lifecycle and restart code for new paths
- Update witness handlers to detect both structures
- Update doctor checks (rig_check, branch_check, config_check,
  claude_settings_check) for backward compatibility
- All code includes fallback to old structure for existing polecats

Fixes #283

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 23:16:10 -08:00
george
65c3e90374 feat: Add Codex and OpenCode runtime backend support (#281)
Adds support for alternative AI runtime backends (Codex, OpenCode) alongside
the default Claude backend through a runtime abstraction layer.

- internal/runtime/runtime.go - Runtime-agnostic helper functions
- Extended RuntimeConfig with provider-specific settings
- internal/opencode/ for OpenCode plugin support
- Updated session managers to use runtime abstraction
- Removed unused ensureXxxSession functions
- Fixed daemon.go indentation, updated terminology to runtime

Backward compatible: Claude remains default runtime.

Co-Authored-By: Ben Kraus <ben@cinematicsoftware.com>
Co-Authored-By: Cameron Palmer <cameronmpalmer@users.noreply.github.com>
2026-01-08 22:56:37 -08:00
Cameron Palmer
9fe9323b9c fix: clean up dead code and fix indentation in runtime PR
- Remove unused ensureRefinerySession function from start.go
- Remove unused ensureSession and ensureWitness functions from up.go
- Remove unused ensureWitnessSession function from witness.go
- Remove orphaned imports (runtime, session, constants, config, rig, filepath, time)
- Fix indentation error in daemon.go triggerPendingSpawns comment

These functions were added as part of the Codex/OpenCode runtime support
but were never wired up. The existing managers (refinery.Manager.Start,
witness.Manager.Start) already handle session creation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 22:51:51 -08:00
Subhrajit Makur
c8c765a239 fix: improve integration test reliability (#13)
- Add custom types config after bd init in daemon tests
- Replace fixed sleeps with poll-based waiting in tmux tests
- Skip beads integration test for JSONL-only repos

Fixes flaky test failures in parallel execution.
2026-01-08 17:58:16 -08:00
Steve Yegge
da906847dd Merge pull request #279 from joshuavial/fix/polecat-dotdir-scan
fix: extend polecat dot-dir filtering beyond #258
2026-01-08 17:23:26 -08:00
Ben Kraus
38adfa4d8b codex 2026-01-08 12:36:54 -05:00
joshuavial
c699e3e2ed Stabilize bd role config tests 2026-01-08 22:43:31 +13:00
julianknutsen
65ecb6cafd fix(daemon): restore ensureDeaconRunning to heartbeat and use Manager
The heartbeat now explicitly calls ensureDeaconRunning() for basic
"is Deacon alive" checks, while Boot handles intelligent triage
(stuck/nudge/interrupt decisions).

Changed ensureDeaconRunning to use deacon.Manager.Start() instead of
duplicating startup logic. This gives daemon the same benefits:
- WaitForShellReady (fixes race condition)
- Claude settings setup
- Theming
- StartupNudge and PropulsionNudge (GUPP)

Heartbeat order:
1. ensureDeaconRunning - restart if dead (via Manager)
2. ensureBootRunning - intelligent triage for stuck states
3. checkDeaconHeartbeat - belt-and-suspenders fallback
4-11. Other checks (witnesses, refineries, polecats, etc.)

This was inadvertently removed when Boot was introduced, which
delegated all Deacon checks to Boot. But Boot's mol doesn't actually
restart Deacon - it just reports. Now responsibilities are clear:
daemon ensures alive, Boot ensures responsive.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:23:51 -08:00
Joshua Vial
a9ed342be6 fix: ignore hidden directories when enumerating polecats (#258)
* fix(sling): route bd mol commands to target rig directory

* Fix daemon polecat enumeration to ignore hidden dirs

* Ignore hidden dirs when discovering rig polecats

* Fix CI: enable beads custom types during install

---------

Co-authored-by: joshuavial <git@codewithjv.com>
2026-01-07 20:48:09 -08:00
gus
06d40925d1 feat(daemon): warn when rig wisp config missing
Logs a warning when checking rig operational state if the wisp
config file doesn't exist. This helps diagnose cases where a
parked rig unexpectedly restarts because its parked state was lost.
2026-01-07 01:34:49 -08:00
julianknutsen
29e2c6ed9c Fix daemon polecat respawn to pass rigPath for rig agent settings
Daemon's restartPolecatSession was calling BuildPolecatStartupCommand
with empty rigPath, causing polecats to fall back to town-level defaults
instead of honoring rig-specific agent settings.

Now passes rigPath so rig agent settings are honored.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 22:44:16 -08:00
julianknutsen
72544cc06d Unify agent startup with Manager pattern
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.

Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 21:44:04 -08:00
gastown/crew/joe
fc4b9de02c fix: use tmux for agent liveness in daemon checks (gt-zecmc)
Complete the "discover, don't track" refactoring:

- checkGUPPViolations: use tmux.IsClaudeRunning() instead of agent_state
- checkOrphanedWork: derive dead agents from tmux, not agent_state=dead
- assessStaleness: rely on HasActiveSession (tmux), not agent_state

Non-observable states (stuck, awaiting-gate) are still respected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:50:01 -08:00
rictus
d00e73f110 feat(daemon): respect rig operational state for auto-start
Update daemon to check rig config before auto-starting agents:
- Check wisp config "status" - skip if parked or docked
- Check "auto_restart" config - skip if blocked or false
- Log skip reason for visibility

Affects ensureWitnessRunning, ensureRefineryRunning,
restartPolecatSession, and lifecycle restartSession.

(gt-68c46)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:43:00 -08:00
gastown/crew/joe
87169a3fc7 fix: complete removal of agent_state observable tracking (gt-zecmc)
Additional cleanup from the agent_state refactoring:

- Remove dead code: checkStaleAgents(), markAgentDead() in lifecycle.go
- Remove dead code: reportAgentState(), getAgentFields() in prime.go
- Update getAgentBeadState() comment to clarify non-observable states only
- Update mol-witness-patrol.formula.toml to use tmux discovery
- Update mol-polecat-lease.formula.toml to use POLECAT_DONE mail
- Update docs/watchdog-chain.md to reflect new architecture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:42:23 -08:00
gastown/crew/joe
1f44482ad0 fix: remove observable states from agent_state (discover, don't track)
The agent_state field was recording observable state like "running",
"dead", "idle" which violated the "Discover, Don't Track" principle.
This caused stale state bugs where agents were marked "dead" in beads
but actually running in tmux.

Changes:
- Remove daemon's checkStaleAgents() which marked agents "dead"
- Simplify ensureXxxRunning() to use tmux.IsClaudeRunning() directly
- Remove reportAgentState() calls from gt prime and gt handoff
- Add SetHookBead/ClearHookBead helpers that don't update agent_state
- Use ClearHookBead in gt done and gt unsling
- Simplify gt status to derive state from tmux, not bead

Non-observable states (stuck, awaiting-gate, muted, paused) are still
set because they represent intentional agent decisions that can't be
discovered from tmux state.

Fixes: gt-zecmc

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:32:02 -08:00
gastown/crew/joe
6dbb841e22 fix(daemon): nudge agents on state divergence instead of silent accept
When the daemon detects that an agent bead state doesn't match tmux
(e.g., bead says stopped but Claude is running), it now:

1. Logs the divergence clearly with STATE DIVERGENCE prefix
2. Nudges the agent with an actionable command to fix its state
3. Still skips the restart (safety - don't kill healthy sessions)

This prevents silent state drift where bead state diverges from reality.
Applied to: Deacon, Witness, Refinery ensure functions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 19:23:52 -08:00
jv
99aae0bf02 fix: use canonical hq role bead IDs 2026-01-06 19:13:14 -08:00
jv
02ca9e43fa fix: honor rig agent when starting witness/refinery 2026-01-06 19:12:55 -08:00
gastown/crew/joe
c2451b85e7 Merge origin/main into fix/205-address-claude-startup-issues
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
2026-01-06 19:04:29 -08:00
Julian Knutsen
9d7dcde1e2 feat: Unified beads redirect for tracked and local beads (#222)
* feat: Beads redirect architecture for tracked and local beads

This change implements proper redirect handling so that all rig agents
(Witness, Refinery, Crew, Polecats) can work with both:
- Tracked beads: .beads/ checked into git at mayor/rig/.beads
- Local beads: .beads/ created at rig root during gt rig add

Key changes:

1. SetupRedirect now handles tracked beads by skipping redirect chains.
   The bd CLI doesn't support chains (A→B→C), so worktrees redirect
   directly to the final destination (mayor/rig/.beads for tracked).

2. ResolveBeadsDir is now used consistently in polecat and refinery
   managers instead of hardcoded mayor/rig paths.

3. Rig-level agents (witness, refinery) now use rig beads with rig
   prefix instead of town beads. This follows the architecture where
   town beads are only for Mayor/Deacon.

4. prime.go simplified to always use ../../.beads for crew redirects,
   letting rig-level redirect handle tracked vs local routing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(doctor): Add beads-redirect check for tracked beads

When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root
needs a redirect file pointing to that location. This check:

- Detects missing rig-level redirect for tracked beads
- Verifies redirect points to correct location (mayor/rig/.beads)
- Auto-fixes with 'gt doctor --fix'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Handle fileLock.Unlock error in daemon

Wrap fileLock.Unlock() return value to satisfy errcheck linter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 12:59:37 -08:00
julianknutsen
09bbb0f430 Fix lint issues and sparse checkout for empty repos
- Handle empty repos in ConfigureSparseCheckout (skip read-tree when no HEAD)
- Fix errcheck: wrap fileLock.Unlock() error in defer
- Fix unparam: remove unused *rig.Rig return from getWitnessManager
- Fix unparam: mark unused agentType parameter with blank identifier

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 02:16:15 -08:00
mayor
31a32c084b Unify agent startup with GUPP propulsion nudge
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.

Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
  WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
  add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
  unified startup with proper nudges

This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 01:51:25 -08:00
gastown/crew/jack
e8d27e7212 fix: Create and lookup rig agent beads with correct prefix
Per docs/architecture.md, Witness and Refinery are rig-level agents that
should use the rig's configured prefix (e.g., pi- for pixelforge) instead
of hardcoded "gt-".

This extends PR #183's creation fix to also fix all lookup paths:
- internal/rig/manager.go: Create agent beads in rig beads with rig prefix
- internal/daemon/daemon.go: Use rig prefix when looking up agent state
- internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping
- internal/cmd/sling.go: Pass townRoot for prefix lookup
- internal/cmd/unsling.go: Pass townRoot for prefix lookup
- internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups
- internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups
- internal/config/loader.go: Add GetRigPrefix helper

Without this fix, the daemon would:
- Create pi-gastown-witness but look for gt-gastown-witness
- Report agents as missing/dead when they are running
- Fail to manage agent lifecycle correctly

Based on work by Johann Taberlet in PR #183.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:39:57 -08:00
mayor
5224dfb50d Merge remote-tracking branch 'origin/polecat/slit-mk1wa0rj' 2026-01-05 19:39:01 -08:00
gastown
43cca06460 feat: Add Windows-compatible file locking for daemon
Replace Unix-only syscall.Flock with gofrs/flock library for
cross-platform file locking. This enables the daemon to run on
Windows in addition to Unix-like systems.

- Add github.com/gofrs/flock v0.13.0 dependency
- Replace syscall.Flock calls with flock.TryLock/Unlock
- Maintain same non-blocking exclusive lock semantics

(gt-5354h)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:21:09 -08:00
slit
8110aab257 fix: Add GUPP propulsion nudge to daemon restartSession
When Deacon respawns refinery/witness sessions via LIFECYCLE requests,
the new sessions were starting at the Claude welcome screen without
the propulsion nudge that triggers autonomous execution.

Added StartupNudge and PropulsionNudgeForRole calls to restartSession()
in lifecycle.go, matching the pattern used in ensureRefinerySession()
in start.go. This ensures respawned agents receive the GUPP nudge and
begin autonomous work immediately.

Fixes: gt-01jpg

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:14:49 -08:00
gastown/crew/jack
6b8c897e37 feat: use hq- prefix for Mayor and Deacon session names
Town-level services (Mayor, Deacon) now use hq- prefix instead of gt-:
- hq-mayor (was gt-mayor)
- hq-deacon (was gt-deacon)

This distinguishes town-level sessions from rig-level sessions which
continue to use gt- prefix (gt-gastown-witness, gt-gastown-crew-max, etc).

Changes:
- session.MayorSessionName() returns "hq-mayor"
- session.DeaconSessionName() returns "hq-deacon"
- ParseSessionName() handles both hq- and gt- prefixes
- categorizeSession() handles both prefixes
- categorizeSessions() accepts both prefixes
- Updated all tests and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 00:42:24 -08:00
gastown/crew/jack
eea4435269 feat(rig): Add --branch flag for custom default branch
Add --branch flag to `gt rig add` to specify a custom default branch
instead of auto-detecting from remote. This supports repositories that
use non-standard default branches like `develop` or `release`.

Changes:
- Add --branch flag to `gt rig add` command
- Store default_branch in rig config.json
- Propagate default branch to refinery, witness, daemon, and all commands
- Rename ensureMainBranch to ensureDefaultBranch for clarity
- Add Rig.DefaultBranch() method for consistent access
- Update crew/manager.go and swarm/manager.go to use rig config

Based on PR #49 by @kustrun - rebased and extended with additional fixes.

Co-authored-by: kustrun <kustrun@users.noreply.github.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 14:01:02 -08:00
jakehemmerle
c25ff34b5b fix(daemon): prevent orphan daemons via file locking
Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU
race condition where concurrent 'gt daemon start' commands could spawn
multiple daemons. Only the first to acquire the lock succeeds; others
exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so
multiple GT instances from different directories work independently.

Also detect race losers in runDaemonStart() by comparing spawned PID
with PID file, reporting 'already running' instead of false success.
2026-01-04 15:30:57 -05:00