Two improvements:
1. gt crew start now infers rig from cwd when first arg is not a valid
rig name (gt-czltv). Previously, running `gt crew start bob` from
within a rig directory would fail because "bob" was treated as the
rig name. Now it checks if the arg is a valid rig first.
2. Refactored copyOverlay to shared rig.CopyOverlay utility:
- Eliminates code duplication between crew and polecat managers
- Preserves source file permissions instead of hardcoding 0644
- Follows PR #278 improvements
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a visible "CRITICAL" warning and pre-submission checklist to the
polecat template. Explicitly notes that polecats should NOT manually
close the root issue - the Refinery handles that after merge.
This addresses the intent of PR #287 while avoiding the conflicting
`bd close` instruction that would break the Refinery workflow.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add new step to mol-deacon-patrol.formula.toml that discovers molecules
blocked on gates that have now closed, and dispatches them to the
appropriate rig's polecat pool.
This completes the async resume cycle without explicit waiter tracking.
The molecule state IS the waiter - patrol discovers reality each cycle.
- Uses bd mol ready --gated to find gate-ready molecules
- Dispatches via gt sling <mol-id> <rig>/polecats
- Runs after gate-evaluation, before health-scan
- Bumps formula version to 6
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When rig/.beads doesn't exist, fall back to mayor/rig/.beads (tracked
beads architecture) with a warning suggesting 'bd doctor' to fix.
This restores behavior that was inadvertently removed in #290, which
simplified SetupRedirect but removed the fallback path.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace the hand-rolled contains() function with the standard library
strings.Contains(). Also removes the redundant len(data) > 0 check
since strings.Contains handles empty strings correctly.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The t.Skipf call had a raw newline inside a double-quoted string,
which is invalid Go syntax. Use \n escape sequence instead.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The SessionHookCheck was incorrectly flagging 'gt prime --hook' as invalid,
only accepting 'session-start.sh' wrapper. The --hook flag properly handles
session_id passthrough via stdin JSON, making it a valid alternative.
Changes:
- Update usesSessionStartScript to accept --hook flag
- Add containsFlag helper to prevent false positives (e.g., --hookup)
- Update error messages and fix hints to suggest both options
- Add comprehensive tests including edge cases
Tests cover:
- Bare gt prime (fails)
- gt prime --hook (passes)
- gt prime --hookup (fails - not a valid flag)
- gt prime --verbose --hook (passes - flag order doesn't matter)
- session-start.sh (passes)
- Mixed valid/invalid hooks in same file
- Town-level and rig-level settings
- Add custom types config after bd init in daemon tests
- Replace fixed sleeps with poll-based waiting in tmux tests
- Skip beads integration test for JSONL-only repos
Fixes flaky test failures in parallel execution.
Adds RigBeadsCheck to gt doctor to verify rig identity beads exist.
These beads track rig metadata (git URL, prefix, state) and are created
by gt rig add. The check scans routes.jsonl and verifies each rig
has an identity bead, with --fix to create missing ones.
Recovered from furiosa's uncommitted work after worker interruption.
Co-Authored-By: furiosa <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add overlay directory support to automatically copy gitignored files
(like .env, config files) from <rig>/.runtime/overlay/ to polecat
and crew worktree roots when they are spawned.
This allows services started by polecats/crew to have their required
configuration files at the root without committing them to git.
Changes:
- Add copyOverlay() function to polecat and crew managers
- Call copyOverlay() after setupSharedBeads() in AddWithOptions/RepairWorktreeWithOptions
- Non-fatal: overlay copy failures only log warnings, don't block spawn
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Claude Code can report its pane command as "node", "claude", or a version
number like "2.0.76". Previously only "node" was detected, causing healthy
sessions to be incorrectly identified as zombies and killed during daemon
heartbeat recovery.
This fix detects all three patterns to prevent witness sessions from being
killed every 3 minutes.
Based on michaellady's work in PR #174.
Co-Authored-By: michaellady <michaellady@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The orphan-processes check previously killed any Claude process without
a tmux ancestor, which incorrectly targeted user's personal Claude
sessions running in regular terminals.
Now the check is informational only:
- Changed from FixableCheck to BaseCheck (no auto-fix)
- Returns StatusOK with details listing processes outside tmux
- Message advises user to verify processes are expected
- Removed Fix method and related helpers
The orphan-sessions check remains fixable since it only targets gt-*
sessions that don't match valid Gas Town patterns.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The heartbeat now explicitly calls ensureDeaconRunning() for basic
"is Deacon alive" checks, while Boot handles intelligent triage
(stuck/nudge/interrupt decisions).
Changed ensureDeaconRunning to use deacon.Manager.Start() instead of
duplicating startup logic. This gives daemon the same benefits:
- WaitForShellReady (fixes race condition)
- Claude settings setup
- Theming
- StartupNudge and PropulsionNudge (GUPP)
Heartbeat order:
1. ensureDeaconRunning - restart if dead (via Manager)
2. ensureBootRunning - intelligent triage for stuck states
3. checkDeaconHeartbeat - belt-and-suspenders fallback
4-11. Other checks (witnesses, refineries, polecats, etc.)
This was inadvertently removed when Boot was introduced, which
delegated all Deacon checks to Boot. But Boot's mol doesn't actually
restart Deacon - it just reports. Now responsibilities are clear:
daemon ensures alive, Boot ensures responsive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The deacon tmux session is named hq-deacon, not gt-deacon. Fix the
incorrect references in mol-boot-triage and mol-gastown-boot formulas.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The formula parser only supports TOML (uses toml.Decode). The JSON
version of mol-gastown-boot was never used - it was likely created
by mistake or for an abandoned experiment.
Changes:
- Remove .beads/formulas/mol-gastown-boot.formula.json
- Remove internal/formula/formulas/mol-gastown-boot.formula.json
- Simplify go:generate to only copy .formula.toml files
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously GT_ROOT was documented as a formula search path but never
actually set, making $GT_ROOT/.beads/formulas/ unreachable for agents.
Now BuildStartupCommand automatically sets GT_ROOT to the town root,
enabling all agents (witness, refinery, polecat, crew, etc.) to find
town-level formulas without relying on cwd-relative paths.
Also adds a doctor check (gt-root-env) that warns when existing sessions
are missing GT_ROOT, with instructions to restart sessions.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The SetupRedirect function was failing for rigs that use the tracked
beads architecture where the canonical beads location is mayor/rig/.beads
and there is no rig-level .beads directory.
This fix now checks for both locations:
1. rig/.beads (with optional redirect to mayor/rig/.beads)
2. mayor/rig/.beads directly (if no rig/.beads exists)
This ensures crew and polecat workspaces get the correct redirect file
pointing to the shared beads database in all configurations.
Closes: gt-jy77g
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds the ability to switch between Claude Code accounts with a single command:
gt account switch <handle>
The command:
1. Detects current account by checking ~/.claude symlink target
2. If ~/.claude is a real directory, moves it to the current account config_dir
3. Removes existing ~/.claude symlink (if any)
4. Creates symlink from ~/.claude to target account config_dir
5. Updates default account in accounts.json
6. Prints confirmation with restart reminder
Handles edge cases:
- Already on target account (no-op with message)
- Target account does not exist (error with list of valid accounts)
- ~/.claude is real directory (first-time setup scenario)
Closes gt-jd8m1
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit handoff/cycling heuristics for the Witness role:
- Hand off after 15 patrol loops (vs Deacon's 20)
- Immediate handoff after extraordinary actions
- Define extraordinary actions specific to Witness role
- Add Handoff (Wisp-Based) section explaining idempotent patrols
This brings Witness documentation in line with Deacon's level of
detail for context cycling.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The manager refactors (ea8bef2, 72544cc0) conflicted with the agent
override feature, causing regressions:
Deacon (ea8bef2):
- Lost agentOverride parameter
- Re-added respawn loop (removed in 5f2e16f)
- Lost GUPP (startup + propulsion nudges)
Crew (72544cc0):
- Lost agentOverride wiring to StartOptions
- --agent flag had no effect on crew refresh/restart
This fix restores agent override support and GUPP while keeping
improvements from the manager refactors (zombie detection, etc).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: Add rig-level custom agent support
Implement rig-level custom agent configuration support to enable per-rig
agent definitions in <rig>/settings/config.json, following the same pattern as
town-level agents in settings/config.json.
Changes:
- Added RigSettings.Agents field to internal/config/types.go
- Added DefaultRigAgentRegistryPath() and LoadRigAgentRegistry() functions to internal/config/agents.go
- Updated ResolveAgentConfigWithOverride() to accept and pass rigSettings parameter
- Updated GetRuntimeCommandWithAgentOverride() to use rigSettings when available
- Updated GetRuntimeCommandWithPromptAndAgentOverride() to use rigSettings
- Updated all Build*WithOverride functions to pass rigSettings
This fixes the issue where rig-level agent settings were loaded but
ignored by lookupAgentConfig, enabling per-rig custom agents for
polecats and crew members.
* test: Add rig-level custom agent tests
Added comprehensive unit tests for rig agent registry functions:
- TestDefaultRigAgentRegistryPath: verifies path construction
- TestLoadRigAgentRegistry: verifies file loading and JSON parsing
- TestLookupAgentConfigWithRigSettings: verifies agent lookup priority (rig > town > builtin)
Added placeholder integration test for future CI/CD setup.
* initial commit
* fix: resolve compilation errors in rig-level custom agent support
- Add missing RigAgentRegistryPath function (alias for DefaultRigAgentRegistryPath)
- Restore ResolveAgentConfigWithOverride function that was incorrectly removed
- Fix ResolveAgentConfig to return single value (not triple)
- Add initRegistryLocked() call to LoadRigAgentRegistry to prevent nil panic
- Fix DefaultRigAgentRegistryPath to use rigPath directly (not parent dir)
- Fix test file syntax errors (remove EOF artifacts)
- Fix test parameter order for lookupAgentConfig calls
- Fix test expectations to match correct custom agent override behavior
* test: implement rig-level custom agent integration test
- Add stub agent script that simulates AI agent with Q&A capability
- Test ResolveAgentConfig correctly picks up rig-level agents
- Test BuildPolecatStartupCommand includes custom agent command
- Test ResolveAgentConfigWithOverride respects rig agents
- Test rig agents override town agents with same name
- Add tmux integration test that spawns session and verifies output
- Stub agent echoes 'STUB_AGENT_STARTED' and handles ping/pong Q&A
- All tests pass including real tmux session verification
* docs: add OpenCode custom agent example to reference
- Show settings/agents.json format for advanced configs
- Include OpenCode example with session resume flags
- Document OPENCODE_PERMISSION env var for autonomous mode
* fix: improve rig-level agent support with docs and test fixes
- Add rig-level agent documentation to reference.md
- Document agent resolution order (rig → town → built-in)
- Deduplicate LoadAgentRegistry/LoadRigAgentRegistry into shared helper
- Fix test isolation in TestLoadRigAgentRegistry
- Fix nil pointer dereference in test assertions (use t.Fatal not t.Error)
Replaces inline ensureRefinerySession function with refinery.NewManager(r).Start(false) in gt start --all. Gains zombie detection, proper state tracking, and WaitForShellReady fix.
CI failures (lint in beads.go, integration tests) are pre-existing issues unrelated to this PR's changes.
Co-Authored-By: julianknutsen <julianknutsen@users.noreply.github.com>
The Deacon patrol formula's zombie-scan step now:
- Only detects zombies via --dry-run, never kills directly
- Files death warrants for Boot to handle interrogation/execution
- Includes psychological weight language about termination gravity
This prevents accidental destruction of worker context, mid-task
progress, and unsaved state. Kill authority belongs to Boot.
Bumped version to 5.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add WaitForShellReady call before SendKeys in all agent managers
(deacon, mayor, witness, refinery). This prevents intermittent
"can't find pane" errors that occur when the tmux session is
created but the shell isn't ready to receive input yet.
The issue manifests under load (e.g., during `gt up` when multiple
agents start in sequence) where the 200ms delay in SendKeysDelayed
isn't sufficient for the pane to be fully initialized.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* bd sync: 2026-01-05 06:22:43
* bd sync: 2026-01-05 07:08:42
* bd sync: 2026-01-05 07:24:58
* feat: Add code coverage PR comment to GitHub Actions
Adds a step to the CI workflow that:
- Collects code coverage during test runs
- Parses per-package coverage percentages
- Posts a markdown table comment on PRs with:
- Overall coverage percentage
- Per-package breakdown table
- Updates existing comment on subsequent pushes
Closes: ga-tl5
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): handle fork PR permissions for coverage comment
Fork PRs cannot write comments via GITHUB_TOKEN due to security
restrictions. Add condition to skip comment step for external PRs
and upload coverage report as artifact instead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(ci): separate coverage into dedicated job
- Test job now uploads coverage.out and test-output.txt as artifacts
- New Coverage Report job runs after tests complete
- Downloads coverage data, generates report, uploads as artifact
- Always uploads coverage-report artifact (for both fork and internal PRs)
- Comments on PR only for internal PRs (fork PRs get notice message)
- Cleaner separation of concerns
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): coverage job waits for both test and integration
Coverage Report job now depends on [test, integration] to ensure
it only runs after all test stages complete successfully.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): restore Coverage Report job after Test and Integration
Coverage Report job now properly:
- Depends on [test, integration] - waits for both to complete
- Downloads coverage data from Test job
- Generates and uploads coverage-report artifact
- Comments on internal PRs only
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* test: add debugging output to TestInstallTownRoleSlots
Add logging for gt install output and bd list to help diagnose
CI failures where agent beads may not be created.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): update beads to @main and fix lint errors
- Change CI to install beads from @main instead of @latest
(latest release doesn't support role/agent issue types)
- Remove error return from cleanBeadsRuntimeFiles since all
errors are intentionally ignored (best-effort cleanup)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): pin beads to v0.44.0 for agent/role types
Beads main recently extracted Gas Town-specific types (agent, role, etc.)
from core. Pin CI to v0.44.0 which still has these types.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): unpin beads version back to @latest
Beads v0.46.0 now supports agent/role types again.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: remove stale gastown/.beads files from PR
These beads files are local runtime state that shouldn't be committed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Defines the state machine that Dogs execute for death warrants:
- 3-attempt interrogation with escalating timeouts (60s, 120s, 240s)
- PARDON path when session responds with ALIVE
- EXECUTE path after all attempts exhausted
- EPITAPH step for audit logging
Key design decisions documented:
- Dogs are goroutines, not Claude sessions
- Timeout gates close on timer OR early response detection
- State persisted to ~/gt/deacon/dogs/active/ for crash recovery
Implements specification for gt-cd404.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When gt doctor --fix detects stale Claude settings at town root, it was
automatically killing ALL Gas Town sessions (gt-* and hq-*). This is too
disruptive because:
1. Deacon runs gt doctor automatically, creating a restart loop
2. Active crew/polecat work could be lost mid-task
3. Settings are only read at startup, so running agents already have
the config loaded in memory
Instead, warn the user and tell them to restart agents manually:
"Town-root settings were moved. Restart agents to pick up new config:
gt up --restart"
Addresses PR #239 feedback.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>