The Start() function was returning success even if the pane died during
initialization (e.g., if Claude failed to start). This caused the caller
to get a confusing "getting pane" error when trying to use the session.
Now Start() verifies the session is still running at the end, returning
a clear error message if the session died during startup.
Fixes: gt-0cif0s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a polecat runs gt done, the worktree is removed but the parent
polecat directory could be left behind containing only .beads/. This
caused gt polecat list to show ghost entries since exists() checks
if the polecatDir exists.
The fix adds explicit cleanup of .beads/ directories:
1. After git worktree remove succeeds, clean up any leftover .beads/
in the clonePath that was not fully removed
2. For new structure polecats, also clean up any .beads/ at the
polecatDir level before trying to remove the parent directory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds show subcommand to gt bead that delegates to gt show (which
delegates to bd show). This completes gt-zdwy58.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add prominent warnings about the mandatory gt done requirement:
- New 'THE IDLE POLECAT HERESY' section at top of both templates
- Emphasize that sitting idle after completing work is a critical failure
- Add MANDATORY labels to completion protocol sections
- Add final reminder section before metadata block
This addresses the bug where polecats complete work but don't run gt done,
sitting idle and wasting resources instead of properly shutting down.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The gt orphans kill command now performs a unified cleanup that removes
orphaned commits via git gc AND kills orphaned Claude processes in one
operation, with a single confirmation prompt.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pin bd (beads CLI) to v0.47.1 in CI workflows and fix test agent IDs
that trigger bd's isLikelyHash() prefix extraction logic.
Changes:
- Pin bd to v0.47.1 in ci.yml and integration.yml (v0.47.2 has routing
defaults that cause prefix mismatch errors)
- Fix TestCloseAndClearAgentBead_FieldClearing: change agent IDs from
`test-testrig-polecat-0` to `test-testrig-polecat-all_fields_populated`
- Fix TestCloseAndClearAgentBead_ReasonVariations: change agent IDs from
`test-testrig-polecat-reason0` to `test-testrig-polecat-empty_reason`
Root cause: bd v0.47.1's isLikelyHash() treats suffixes of 3-8 chars
(with digits for 4+ chars) as potential git hashes. Patterns like `-0`
(single digit) and `-reason0` (7 chars with digit) caused bd to extract
the wrong prefix from agent IDs.
Using test names as suffixes (e.g., `all_fields_populated`) avoids this
because they're all >8 characters and won't trigger hash detection.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Formula scaffold beads (created when formulas are installed) were
appearing as actionable work items in `gt ready`. These are template
beads, not actual work.
Add filtering to exclude issues whose ID:
- Matches a formula name exactly (e.g., "mol-deacon-patrol")
- Starts with "<formula-name>." (step scaffolds like "mol-deacon-patrol.inbox-check")
The fix reads the formulas directory to get installed formula names
and filters issues accordingly for both town and rig beads.
Fixes: gt-579
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: Add automatic orphaned claude process cleanup
Claude Code's Task tool spawns subagent processes that sometimes don't clean up
properly after completion. These accumulate and consume significant memory
(observed: 17 processes using ~6GB RAM).
This change adds automatic cleanup in two places:
1. **Deacon patrol** (primary): New patrol step "orphan-process-cleanup" runs
`gt deacon cleanup-orphans` early in each cycle. More responsive (~30s).
2. **Daemon heartbeat** (fallback): Runs cleanup every 3 minutes as safety net
when deacon is down.
Detection uses TTY column - processes with TTY "?" have no controlling terminal.
This is safe because:
- Processes in terminals (user sessions) have a TTY like "pts/0" - untouched
- Only kills processes with no controlling terminal
- Orphaned subagents are children of tmux server with no TTY
New files:
- internal/util/orphan.go: FindOrphanedClaudeProcesses, CleanupOrphanedClaudeProcesses
- internal/util/orphan_test.go: Tests for orphan detection
New command:
- `gt deacon cleanup-orphans`: Manual/patrol-triggered cleanup
Fixes#587
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(orphan): add Windows build tag and minimum age check
Addresses review feedback on PR #588:
1. Add //go:build !windows to orphan.go and orphan_test.go
- The code uses Unix-specific syscalls (SIGTERM, ESRCH) and
ps command options that don't exist on Windows
2. Add minimum age check (60 seconds) to prevent false positives
- Prevents race conditions with newly spawned subagents
- Addresses reviewer concern about cron/systemd processes
- Uses portable etime format instead of Linux-only etimes
3. Add parseEtime helper with comprehensive tests
- Parses [[DD-]HH:]MM:SS format (works on both Linux and macOS)
- etimes (seconds) is Linux-specific, etime is portable
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(orphan): add proper SIGTERM→SIGKILL escalation with state tracking
Previous approach used process age which doesn't work: a Task subagent
runs without TTY from birth, so a long-running legitimate subagent that
later fails to exit would be immediately SIGKILLed without trying SIGTERM.
New approach uses a state file to track signal history:
1. First encounter → SIGTERM, record PID + timestamp in state file
2. Next cycle (after 60s grace period) → if still alive, SIGKILL
3. Next cycle → if survived SIGKILL, log as unkillable and remove
State file: $XDG_RUNTIME_DIR/gastown-orphan-state (or /tmp/)
Format: "<pid> <signal> <unix_timestamp>" per line
The state file is automatically cleaned up:
- Dead processes removed on load
- Unkillable processes removed after logging
Also updates callers to use new CleanupResult type which includes
the signal sent (SIGTERM, SIGKILL, or UNKILLABLE).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#566
The daemon spawned 812 refinery sessions over 4 days because:
1. Zombie detection was too strict - used IsAgentRunning(session, "node")
but Claude reports pane command as version number (e.g., "2.1.7"),
causing healthy sessions to be killed and recreated every heartbeat.
2. daemon.json patrol config was completely ignored - the daemon never
loaded or checked the enabled flags.
Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
before calling ensureRefineriesRunning/ensureWitnessesRunning, add
diagnostic logging for "already running" cases
Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.
Co-authored-by: mayor <your-github-email@example.com>
Each rig now gets a deterministic theme based on its name instead of
always defaulting to mad-max. Uses a prime multiplier hash (×31) for
good distribution across themes. Same rig name always gets the same
theme. Users can still override with `gt namepool set`.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: stabilize beads and config tests
* fix: remove t.Parallel() incompatible with t.Setenv()
The test now uses t.Setenv() which cannot be used with t.Parallel() in Go.
This completes the conflict resolution from the rebase.
* style: fix gofmt issue in beads_test.go
Remove extra blank line in comment block.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- gt hook --clear: alias for 'gt unhook' (gt-eod2iv)
- gt close: wrapper for 'bd close' (gt-msak6o)
- gt bead move: move beads between repos (gt-dzdbr7)
These commands were natural guesses that agents tried but didn't exist.
Following the desire-paths approach to improve agent ergonomics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When `gt rig add` fails due to GitHub password auth being disabled,
provide a helpful error message that:
- Explains that GitHub no longer supports password authentication
- Suggests the equivalent SSH URL for GitHub/GitLab repos
- Falls back to generic SSH suggestion for other hosts
Also adds tests for the URL conversion function.
Fixes#548
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When attaching to a session from within tmux, use 'tmux switch-client'
instead of 'tmux attach-session' to avoid the nested session error.
Fixes#603
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(daemon): prevent runaway refinery session spawning
Fixes#566
The daemon spawned 812 refinery sessions over 4 days because:
1. Zombie detection was too strict - used IsAgentRunning(session, "node")
but Claude reports pane command as version number (e.g., "2.1.7"),
causing healthy sessions to be killed and recreated every heartbeat.
2. daemon.json patrol config was completely ignored - the daemon never
loaded or checked the enabled flags.
Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
before calling ensureRefineriesRunning/ensureWitnessesRunning, add
diagnostic logging for "already running" cases
Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.
* fix: Add grace period to prevent Deacon restart loop
The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes
Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)
After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.
Fixes#589
---------
Co-authored-by: mayor <your-github-email@example.com>
When JSON parsing of inbox output fails, the code falls back to plain
text mode. However, the error from the fallback `gt mail inbox` command
was being silently ignored with `_`, masking failures and making
debugging difficult.
This change properly captures and returns the error if the fallback
command fails.
Co-authored-by: Gastown Bot <bot@gastown.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add CleanExcludingBeads() method that returns true if the only uncommitted
changes are .beads/ database files. These files are synced across worktrees
and shouldn't block polecat cleanup.
Fixes#516
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
KillSessionWithProcesses was only killing descendant processes,
assuming the session kill would terminate the pane process itself.
However, if the pane process (claude) calls setsid(), it detaches
from the controlling terminal and survives the session kill.
This fix explicitly kills the pane PID after killing descendants,
before killing the tmux session. This catches processes that have
escaped the process tree via setsid().
Fixes#513
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When multiple agents start simultaneously (e.g., `gt up`), each runs
`gt nudge deacon session-started` in their SessionStart hook. These
nudges arrive concurrently and can interleave in the tmux input buffer,
causing:
1. Text from one nudge mixing with another
2. Enter keys not properly submitting messages
3. Garbled input requiring manual intervention
This fix adds per-session mutex serialization to NudgeSession() and
NudgePane(). Concurrent nudges to the same session now queue and
execute one at a time.
## Root Cause
The NudgeSession pattern sends text, waits 500ms, sends Escape, waits
100ms, then sends Enter. When multiple nudges arrive within this ~800ms
window, their send-keys commands interleave, corrupting the input.
## Alternatives Considered
1. **Delay deacon nudges** - Add sleep before nudge in SessionStart
- Simplest (one-line change)
- But: doesn't prevent concurrent nudges from multiple agents
2. **Debounce session-started** - Deacon ignores rapid-fire nudges
- Medium complexity
- But: only helps session-started, not other nudge types
3. **File-based signaling** - Replace tmux nudges with file watches
- Avoids tmux input issues entirely
- But: significant architectural change
4. **File upstream bug** - Report to Claude Code team
- SessionStart hooks fire async and can interleave
- But: fix timeline unknown, need robustness now
## Tradeoffs
- Concurrent nudges to same session now queue (adds latency)
- Memory: one mutex per unique session name (bounded, acceptable)
- Does not fix Claude Code's underlying async hook behavior
## Testing
- Build passes
- All tmux package tests pass
- Manual testing: started deacon + multiple witnesses concurrently,
nudges processed correctly without garbled input
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Add tests to verify that rig.Manager.AddRig correctly creates witness
and refinery agent beads via initAgentBeads. Also improve mock bd:
- Fix mock bd to handle --no-daemon --allow-stale global flags
- Return valid JSON for create commands with bead ID
- Log create commands for test verification
- Add TestRigAddCreatesAgentBeads integration test
- Add TestAgentBeadIDs unit test for bead ID generation
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(mq): skip closed MRs in list, next, and ready views (gt-qtb3w)
The gt mq list command with --status=open filter was incorrectly displaying
CLOSED merge requests as 'ready'. This occurred because bd list --status=open
was returning closed issues.
Added manual status filtering in three locations:
- mq_list.go: Filter closed MRs in all list views
- mq_next.go: Skip closed MRs when finding next ready MR
- engineer.go: Skip closed MRs in refinery's ready queue
Also fixed build error in mail_queue.go where QueueConfig struct (non-pointer)
was being compared to nil.
Workaround for upstream bd list status filter bug.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* style: fix gofmt issue in engineer.go comment block
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The help text claimed 'gt mail read' marks messages as read, but this
was intentionally removed in 71d313ed to preserve handoff messages.
Update the help text to accurately reflect the current behavior and
point users to 'gt mail mark-read' for explicit read marking.
Add validateIssue() to check that an issue exists and is not tombstoned
before creating the tmux session. This prevents CPU spin loops from
agents retrying work on invalid issues.
Fixes#569
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When gt doctor runs, it now detects and kills zombie sessions - tmux
sessions that are valid Gas Town sessions (gt-*, hq-*) but have no
Claude/node process running inside. These occur when Claude exits or
crashes but the tmux session remains.
Previously, OrphanSessionCheck only validated session names but did not
check if Claude was actually running. This left empty sessions
accumulating over time.
Fixes#472
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WriteRoutes() would fail if the beads directory didn't exist yet.
Add os.MkdirAll before creating the routes file.
Fixes#552
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Formula scaffolds (beads with IDs starting with "mol-") are templates
created when formulas are installed, not actual work items. They were
incorrectly appearing in gt ready output as actionable work.
Fixes#579
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ListUnread() was returning all messages in beads mode instead of
filtering by the Read field. Apply the same filtering logic used
in legacy mode to both code paths.
Fixes#595
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The test was duplicating the icon selection logic in a switch statement
instead of calling the actual function being tested. Extract the icon
logic into getMigrationStatusIcon() and have the test call it directly.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(beads): cache version check and add timeout to prevent cli lag
* fix(migrate_agents_test): fix icon expectations to match actual output
The printMigrationResult function uses icons with two leading spaces
(" ✓", " ⊘", " ✗") but the test expected icons without spaces.
This fixes the test expectations to match the actual output format.
When using `gt sling <formula> --on <bead>`, the wisp was bonded to the
target bead but the attached_molecule field wasn't being set in the
bead's description. This caused `gt hook` to report "No molecule
attached" even though the formula was correctly bonded.
Now both sling.go (--on mode) and sling_formula.go (standalone formula)
call storeAttachedMoleculeInBead() to record the molecule attachment
after wisp creation. This ensures gt hook can properly display molecule
progress.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Branch names like "polecat/furiosa-mkb0vq9f" don't contain the actual
issue ID, causing gt done to incorrectly parse "furiosa-mkb0vq9f" as the
issue. This broke integration branch auto-detection since the wrong issue
was used for parent epic lookup.
Changes:
- After parsing branch name, check the agent's hook_bead field which
contains the actual issue ID (e.g., "gt-845.1")
- Fix parseBranchName to not extract fake issue IDs from modern polecat branches
- Fix detectIntegrationBranch to traverse full parent chain (molecule → bug → epic)
- Include issue ID in polecat branch names when HookBead is set
Added tests covering:
- Agent hook returns correct issue ID
- Modern polecat branch format parsing
- Integration branch detection through parent chain
Fixes#411
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
IsSilentExit used type assertion which fails on wrapped errors.
Changed to errors.As to properly unwrap and detect SilentExitError.
Added test to verify wrapped error detection works.
The stop hook runs 'gt costs record' which executes 'bd create' to
record session costs. When run from a role subdirectory (e.g., mayor/)
that doesn't have its own .beads database, bd fails with:
'database not initialized: issue_prefix config is missing'
Fix by using workspace.FindFromCwd() to locate the town root and
setting bdCmd.Dir to run bd from there, where the .beads database
exists.
- Add sqlite3 to README.md prerequisites section
- Add gt doctor check that warns if sqlite3 CLI is not found
- Documents that sqlite3 is required for convoy database queries
Fixes#534
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
detectTownRoot() was only checking for mayor/town.json, but some
workspaces only have the mayor/ directory without town.json.
This caused mail routing to fail silently - messages showed
success but werent persisted because townRoot was empty.
Now uses workspace.Find() which supports both primary marker
(mayor/town.json) and secondary marker (mayor/ directory).
Fixes: gt-6v7z89
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Desire path: agents naturally try 'gt show <id>' to inspect beads.
This wraps 'bd show' via syscall.Exec, passing all flags through.
- Works with any prefix (gt-, bd-, hq-, etc.)
- Routes to correct beads database automatically
- DisableFlagParsing passes all flags to bd show
Closes gt-82jxwx
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the desire-path from bd-dcahx: agents naturally try
'gt cat <bead-id>' to view bead content, following Unix conventions.
The command validates bead ID prefixes (bd-*, hq-*, mol-*) and
delegates to 'bd show' for the actual display.
Supports --json flag for programmatic use.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The 'cat' alias for 'gt polecat' was never used by agents.
Removing it frees up 'cat' for a more intuitive use case:
displaying bead content (gt cat <bead-id>).
See: bd-dcahx
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add CreatedAt timestamp to CreateGroupBead() in beads_group.go
- Add CreatedAt timestamp to CreateChannelBead() in beads_channel.go
- Check channel status before sending in router.go sendToChannel()
- Reject sends to closed channels with appropriate error message
Closes: gt-yibjdm, gt-bv2f97
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The router was missing support for beads-native channel addresses.
When mail_send.go resolved an address to RecipientChannel, it set
msg.To to "channel:<name>" but router.Send() had no handler for this
prefix, causing channel messages to fail silently.
Added:
- isChannelAddress() and parseChannelName() helper functions
- sendToChannel() method that creates messages with proper channel:
labels for channel queries
- Channel validation before sending
- Retention enforcement after message creation
Also updated docs/beads-native-messaging.md with more comprehensive
documentation of the beads-native messaging system.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Groups and channels are town-level entities that span rigs, so they
should use the hq- prefix rather than gt- (rig-level).
Changes:
- GroupBeadID: gt-group- → hq-group-
- ChannelBeadID: gt-channel- → hq-channel-
- Add --force flag to bypass prefix validation (town beads may have
mixed prefixes from test runs)
- Update tests and documentation
Also adds docs/beads-native-messaging.md documenting:
- New bead types (gt:group, gt:queue, gt:channel)
- CLI commands (gt mail group, gt mail channel)
- Address resolution logic
- Usage examples
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add beads-native queue management commands to gt mail:
- gt mail queue create <name> --claimers <pattern>
- gt mail queue show <name>
- gt mail queue list
- gt mail queue delete <name>
Also enhanced QueueFields struct with CreatedBy and CreatedAt fields
to support queue metadata tracking.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change ChannelBeadID to use hq-channel-* prefix instead of gt-channel-*
to match the town-level beads database prefix, fixing the "prefix mismatch"
error when creating channels.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement claiming for queue messages using beads-native approach:
- Add claim_pattern field to QueueFields for eligibility checking
- Add MatchClaimPattern function for pattern matching (wildcards supported)
- Add FindEligibleQueues to find all queues an agent can claim from
- Rewrite runMailClaim to use beads-native queue lookup
- Support optional queue argument (claim from any eligible if not specified)
- Use claimed-by/claimed-at labels instead of changing assignee
- Update runMailRelease to work with new claiming approach
- Add comprehensive tests for pattern matching and validation
Queue messages are now claimed via labels:
- claimed-by: <agent-identity>
- claimed-at: <RFC3339 timestamp>
Messages with queue:<name> label but no claimed-by are unclaimed.
Closes gt-xfqh1e.11
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add gt mail channel subcommands for beads-native channels:
- gt mail channel [name] - list channels or show messages
- gt mail channel list - list all channels
- gt mail channel show <name> - show channel messages
- gt mail channel create <name> [--retain-count=N] [--retain-hours=N]
- gt mail channel delete <name>
Channels are pub/sub streams for broadcast messaging with retention policies.
Messages are stored with channel:<name> label and retrieved via beads queries.
Part of gt-xfqh1e.12 (channel viewing task).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>