Commit Graph

96 Commits

Author SHA1 Message Date
aleiby
b9f5797b9e fix(refinery): use role-specific runtime config for startup (#756) (#847)
Use ResolveRoleAgentConfig instead of LoadRuntimeConfig to properly
resolve role-specific agent settings (like ready_prompt_prefix) when
starting the refinery. This fixes timeout issues when role_agents.refinery
is configured with a non-default agent.

Also move AcceptBypassPermissionsWarning before WaitForRuntimeReady to
avoid a race condition where the bypass dialog can block prompt detection.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 19:11:39 -08:00
Advaya Krishna
d67aa0212c feat(refinery): use squash merge to eliminate redundant merge commits (#856)
Replace MergeNoFF with MergeSquash in the Refinery to preserve the original
conventional commit message (feat:/fix:) from polecat branches instead of
creating "Merge polecat/... into main" commits.

Changes:
- Add MergeSquash function to internal/git/git.go
- Add GetBranchCommitMessage helper to retrieve branch commit messages
- Update engineer.go doMerge to use squash merge with original message

Fixes #855

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 18:12:40 -08:00
dementus
d0a1e165e5 feat(convoy): add redundant observers to Witness and Refinery
Per PRIMING.md principle "Redundant Monitoring Is Resilience", add convoy
completion checks to Witness and Refinery for redundant observation:

- New internal/convoy/observer.go with shared CheckConvoysForIssue function
- Witness: checks convoys after successful polecat nuke in HandleMerged
- Refinery: checks convoys after closing source issue in both success handlers

Multiple observers closing the same convoy is idempotent - each checks if
convoy is already closed before running `gt convoy check`.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:48:02 -08:00
furiosa
78ca8bd5bf fix(witness,refinery): remove ZFC-violating state types
Remove Witness and Refinery structs that recorded observable state
(State, PID, StartedAt, etc.) in violation of ZFC and "Discover,
Don't Track" principles.

Changes:
- Remove Witness struct and State type alias from witness/types.go
- Remove Refinery struct and State type alias from refinery/types.go
- Remove deprecated run(*Refinery) method from refinery/manager.go
- Update witness/types_test.go to remove tests for deleted types

The managers already derive running state from tmux sessions
(following the deacon pattern). The deleted types were vestigial
and unused.

Resolves: gt-r5pui

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:45:28 -08:00
gastown/crew/mel
5218102f49 refactor(witness,refinery): ZFC-compliant state management
Remove state files from witness and refinery managers, following
the "Discover, Don't Track" principle. Tmux session existence is
now the source of truth for running state (like deacon).

Changes:
- Add IsRunning() that checks tmux HasSession
- Change Status() to return *tmux.SessionInfo
- Remove loadState/saveState/stateManager
- Simplify Start()/Stop() to not use state files
- Update CLI commands (witness/refinery/rig) for new API
- Update tests to be ZFC-compliant

This fixes state file divergence issues where witness/refinery
could show "running" when the actual tmux session was dead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:00:56 -08:00
aleiby
15d1dc8fa8 fix: Make WaitForCommand/WaitForRuntimeReady fatal in manager Start() (#529)
Fixes #525: gt up reports deacon success but session doesn't actually start

Previously, WaitForCommand failures were marked as "non-fatal" in the
manager Start() methods used by gt up. This caused gt up to report
success even when Claude failed to start, because the error was silently
ignored.

Now when WaitForCommand or WaitForRuntimeReady times out:
1. The zombie tmux session is killed
2. An error is returned to the caller
3. gt up properly reports the failure

This aligns the manager Start() behavior with the cmd start functions
(e.g., gt deacon start) which already had fatal WaitForCommand behavior.

Changed files:
- internal/deacon/manager.go
- internal/mayor/manager.go
- internal/witness/manager.go
- internal/refinery/manager.go

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 00:00:53 -08:00
Walter McGivney
29f8dd67e2 fix: Add grace period to prevent Deacon restart loop (#590)
* fix(daemon): prevent runaway refinery session spawning

Fixes #566

The daemon spawned 812 refinery sessions over 4 days because:

1. Zombie detection was too strict - used IsAgentRunning(session, "node")
   but Claude reports pane command as version number (e.g., "2.1.7"),
   causing healthy sessions to be killed and recreated every heartbeat.

2. daemon.json patrol config was completely ignored - the daemon never
   loaded or checked the enabled flags.

Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
  for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
  mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
  before calling ensureRefineriesRunning/ensureWitnessesRunning, add
  diagnostic logging for "already running" cases

Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.

* fix: Add grace period to prevent Deacon restart loop

The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes

Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)

After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.

Fixes #589

---------

Co-authored-by: mayor <your-github-email@example.com>
2026-01-16 15:27:41 -08:00
Erik LaBianca
4fa6cfa0da fix(mq): skip closed MRs in list, next, and ready views (#563)
* fix(mq): skip closed MRs in list, next, and ready views (gt-qtb3w)

The gt mq list command with --status=open filter was incorrectly displaying
CLOSED merge requests as 'ready'. This occurred because bd list --status=open
was returning closed issues.

Added manual status filtering in three locations:
- mq_list.go: Filter closed MRs in all list views
- mq_next.go: Skip closed MRs when finding next ready MR
- engineer.go: Skip closed MRs in refinery's ready queue

Also fixed build error in mail_queue.go where QueueConfig struct (non-pointer)
was being compared to nil.

Workaround for upstream bd list status filter bug.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: fix gofmt issue in engineer.go comment block

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-16 15:23:28 -08:00
Julian Knutsen
e7ca4908dc refactor(config): remove BEADS_DIR from agent environment and add doctor check (#455)
* fix(sling_test): update test for cook dir change

The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): skip tests requiring missing binaries, handle --allow-stale

- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
  binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(config): remove BEADS_DIR from agent environment

Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.

Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(doctor): detect BEADS_DIR in tmux session environment

Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.

The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:13:57 -08:00
Subhrajit Makur
c61b67eb03 fix(config): implement role_agents support in BuildStartupCommand (#19) (#456)
* fix(config): implement role_agents support in BuildStartupCommand

The role_agents field in TownSettings and RigSettings existed but was
not being used by the startup command builders. All services fell back
to the default agent instead of using role-specific agent assignments.

Changes:
- BuildStartupCommand now extracts GT_ROLE from envVars and uses
  ResolveRoleAgentConfig() for role-based agent selection
- BuildStartupCommandWithAgentOverride follows the same pattern when
  no explicit override is provided
- refinery/manager.go uses ResolveRoleAgentConfig with constants
- cmd/start.go uses ResolveRoleAgentConfig with constants
- Updated comments from hardcoded agent name to generic "agent"
- Added ValidateAgentConfig() to check agent exists and binary is in PATH
- Added lookupAgentConfigIfExists() helper for validation
- ResolveRoleAgentConfig now warns to stderr and falls back to default
  if configured agent is invalid or binary is missing

Resolution priority (now working):
1. Explicit --agent override
2. Rig's role_agents[role] (validated)
3. Town's role_agents[role] (validated)
4. Rig's agent setting
5. Town's default_agent
6. Hardcoded default fallback

Adds tests for:
- TestBuildStartupCommand_UsesRoleAgentsFromTownSettings
- TestBuildStartupCommand_RigRoleAgentsOverridesTownRoleAgents
- TestBuildAgentStartupCommand_UsesRoleAgents
- TestValidateAgentConfig
- TestResolveRoleAgentConfig_FallsBackOnInvalidAgent

Fixes: role_agents configuration not being applied to services

* fix(config): add GT_ROOT to BuildStartupCommandWithAgentOverride

- Fixes missing GT_ROOT and GT_SESSION_ID_ENV exports in
  BuildStartupCommandWithAgentOverride, matching BuildStartupCommand behavior
- Adds test for override priority over role_agents
- Adds test verifying GT_ROOT is included in command

This addresses the Greptile review comment about agents started with
an override not having access to town-level resources.

Co-authored-by: Steve Yegge <steve.yegge@gmail.com>
2026-01-13 13:34:22 -08:00
Will Saults
bda248fb9a feat(refinery,boot): add --agent flag for model selection (#469)
* feat(refinery,boot): add --agent flag for model selection (hq-7d5m)

Add --agent flag to gt refinery start/attach/restart and gt boot spawn
commands for consistent model selection across all agent launch points.

Implementation follows the existing pattern from gt deacon start:
- Add StringVar flag for agent alias
- Pass override to Manager/Boot via SetAgentOverride()
- Use BuildAgentStartupCommandWithAgentOverride when override is set

Files affected:
- cmd/gt/refinery.go: add flags to start/attach/restart commands
- internal/refinery/manager.go: add SetAgentOverride and use in Start()
- cmd/gt/boot.go: add flag to spawn command
- internal/boot/boot.go: add SetAgentOverride and use in spawnTmux()

Closes #438

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(refinery,boot): use parameter-passing pattern for --agent flag

Address PR review feedback:

1. ADD TESTS: Add tests for --agent flag existence following witness_test.go pattern
   - internal/cmd/refinery_test.go: tests for start/attach/restart
   - internal/cmd/boot_test.go: test for spawn

2. ALIGN PATTERN: Change from setter pattern to parameter-passing pattern
   - Manager.Start(foreground, agentOverride) instead of SetAgentOverride + Start
   - Boot.Spawn(agentOverride) instead of SetAgentOverride + Spawn
   - Matches witness.go style: Start(foreground bool, agentOverride string, ...)

Updated all callers to pass empty string for default agent:
- internal/daemon/daemon.go
- internal/cmd/rig.go
- internal/cmd/start.go
- internal/cmd/up.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: furiosa <will@saults.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 13:14:47 -08:00
gastown/refinery
9315248134 fix(mq): persist MR rejection to beads storage
The RejectMR function was modifying the in-memory MR object but never
persisting the change to beads storage. This caused rejected MRs to
continue showing in the queue with status "open".

Fix: Call beads.CloseWithReason() to properly close the MR bead before
updating the in-memory state.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 12:54:54 -08:00
george
58207a00ec refactor(mrqueue): remove mrqueue package, use beads for MRs (gt-dqi)
Remove the mrqueue side-channel from gastown. The merge queue now uses
beads merge-request wisps exclusively, not parallel .beads/mq/*.json files.

Changes:
- Delete internal/mrqueue/ package (~830 lines removed)
- Move scoring logic to internal/refinery/score.go
- Update Refinery engineer to query beads via ReadyWithType("merge-request")
- Add MRInfo struct to replace mrqueue.MR
- Add ClaimMR/ReleaseMR methods using beads assignee field
- Update HandleMergeReady to not create duplicate queue entries
- Update gt refinery commands (claim, release, unclaimed) to use beads
- Stub out MQEventSource (no longer needed)

The Refinery now:
- Lists MRs via beads.ReadyWithType("merge-request")
- Claims via beads.Update(..., {Assignee: worker})
- Closes via beads.CloseWithReason("merged", mrID)
- Blocks on conflicts via beads.AddDependency(mrID, taskID)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 23:48:56 -08:00
gastown/crew/joe
b9ecb7b82e docs: clarify name pool vs polecat pool misconception
Fix misleading language that could suggest polecats wait in an idle pool:

- refinery/engineer.go: "available polecat" → "fresh polecat (spawned on demand)"
- namepool.go: Clarify this pools NAMES not polecats; polecats are spawned
  fresh and nuked when done, only name slots are reused
- dog-pool-architecture.md: "Pool allocation pattern" → "Name slot allocation
  pattern (pool of names, not instances)"

There is no idle pool of polecats. They are spawned for work and nuked when done.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 02:25:31 -08:00
Bo
2a0a8c760b fix(refinery): delete remote polecat branches after merge (#369)
Since the self-cleaning model (Jan 10), polecats push branches to origin
before `gt done`. The refinery was only deleting local branches after
merge, causing stale `polecat/*` branches to accumulate on the remote.

Now deletes both local and remote branches after successful merge.
Uses existing `git.DeleteRemoteBranch()` function. Remote deletion is
non-fatal if the branch doesn't exist.

Fixes #359
2026-01-11 23:08:29 -08:00
gastown/crew/max
f7d497ba07 fix(zfc): remove strings.Contains conflict detection from Go code
hq-hcil1: Remove deprecated HasConflict/HasAuthFailure/IsNotARepo/HasRebaseConflict
methods that violated ZFC by having Go code decide error types based on stderr parsing.

Changes:
- Remove deprecated helper methods from GitError and SwarmGitError
- Export GetConflictingFiles() which uses git porcelain output (diff --diff-filter=U)
- Update CheckConflicts(), engineer.go, and integration.go to use GetConflictingFiles()
- Update tests to verify raw stderr is available for agent observation

ZFC principle: Go code transports raw output to agents; agents observe and decide.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:31:55 -08:00
gastown/crew/max
131dac91c8 refactor(zfc): derive state from files instead of in-memory cache
Apply ZFC (Zero Forge Cache) principle across git error handling and
feed curation. Agents now observe raw git output and make their own
decisions rather than relying on pre-interpreted error types.

- Add GitError type with raw stdout/stderr for observation
- Add SwarmGitError following the same pattern
- Remove in-memory deduplication maps from Curator
- Curator now reads state from feed/events files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:23:44 -08:00
gastown/crew/george
c94d59dca7 fix: ZFC improvements - query tmux directly instead of marker TTL
Two ZFC fixes:

1. Boot marker file (hq-zee5n): Changed IsRunning() to query
   tmux.HasSession() directly instead of checking marker file
   freshness with TTL. Removed stale marker check from doctor.

2. Branch pattern matching (hq-zwuh6): Replaced hardcoded "polecat/"
   strings with constants.BranchPolecatPrefix for consistency.

Also removed 60-second WaitForCommand blocking from crew Start()
which was causing gt crew start to hang.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:08:12 -08:00
dennis
0f633be4b1 fix(zfc): remove PID-based agent liveness detection
Replace ProcessExists() checks in witness and refinery managers with
tmux session detection. Agent liveness should be derived from tmux
session state, not PID probing (per ZFC tracking principles).

- Remove util.ProcessExists() from witness/manager.go and refinery/manager.go
- Delete internal/util/process.go and process_test.go (now unused)
- Foreground mode and Stop() now rely solely on tmux HasSession/KillSession

Closes: hq-yxkdr (recentDeaths already removed)
Closes: hq-1sd4o (ProcessExists removed)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:02:34 -08:00
julianknutsen
e999ceb1c1 refactor: consolidate agent env vars into config.AgentEnv
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)

Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:30 -08:00
julianknutsen
7150ce2624 refactor: update managers to use RoleEnvVars
Consolidates all role startup code to use the shared RoleEnvVars()
function, ensuring consistent env vars across tmux SetEnvironment
and Claude startup command exports.

Updated:
- Mayor manager
- Deacon startup (daemon.go)
- Witness manager
- Refinery manager
- Polecat startup (daemon.go)
- BuildPolecatStartupCommand, BuildCrewStartupCommand helpers

This ensures all agents receive the same identity env vars regardless
of startup path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:09 -08:00
mayor
ff3d1b2e23 fix(refinery): use mayor/rig fallback for correct git remote
Refinery was using rig.Path which found the town's .git with rig-named
remotes (e.g., 'gastown') instead of 'origin'. This caused refinery to
miss polecat branches when fetching.

Now falls back to mayor/rig (which has 'origin' pointing to the project
repo) when refinery/rig doesn't exist.

Fixes: hq-uvrzt
2026-01-09 12:42:14 -08:00
jack
afff85cdff fix(tmux): use NewSessionWithCommand to avoid send-keys race condition
Agent sessions would fail on startup because send-keys arrived before the
shell was ready, causing 'bad pattern' and 'command not found' errors.

Fix: Create sessions with the command directly using tmux new-session's
command argument. This runs the agent as the pane's initial process,
avoiding shell readiness timing issues entirely.

Updated all agent managers: mayor, deacon, witness, refinery, polecat, crew.

Also fixes pre-existing build error in polecat/manager.go (polecatPath →
clonePath/newClonePath).

Closes #280

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 23:35:31 -08:00
Ben Kraus
38adfa4d8b codex 2026-01-08 12:36:54 -05:00
Julian Knutsen
28a9de64d5 fix(tmux): wait for shell ready before sending keys (#264)
Add WaitForShellReady call before SendKeys in all agent managers
(deacon, mayor, witness, refinery). This prevents intermittent
"can't find pane" errors that occur when the tmux session is
created but the shell isn't ready to receive input yet.

The issue manifests under load (e.g., during `gt up` when multiple
agents start in sequence) where the 200ms delay in SendKeysDelayed
isn't sufficient for the pane to be fully initialized.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 20:49:25 -08:00
julianknutsen
ea8bef2029 refactor: unify agent startup with Manager pattern
- Create mayor.Manager for mayor lifecycle (Start/Stop/IsRunning/Status)
- Create deacon.Manager for deacon lifecycle with respawn loop
- Move session.Manager to polecat.SessionManager (clearer naming)
- Add zombie session detection for mayor/deacon (kills tmux if Claude dead)
- Remove duplicate session startup code from up.go, start.go, mayor.go
- Rename sessMgr -> polecatMgr for consistency
- Make witness/refinery SessionName() public for status display

All agent types now follow the same Manager pattern:
  mgr := agent.NewManager(...)
  mgr.Start(...)
  mgr.Stop()
  mgr.IsRunning()
  mgr.Status()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 22:32:35 -08:00
julianknutsen
72544cc06d Unify agent startup with Manager pattern
Refactors all agent startup paths (witness, refinery, crew, polecat) to use
a consistent Manager interface with Start(), Stop(), IsRunning(), and
SessionName() methods.

Includes:
- Witness manager with GUPP propulsion nudge for startup
- Refinery manager for engineer sessions
- Crew manager for worker agents
- Session/polecat manager updates
- claude_settings_check doctor check for settings validation
- Settings management consolidated from rig/manager.go
- Settings location moved outside source repos to prevent conflicts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 21:44:04 -08:00
jv
22693c1dcc feat: runtime-aware tmux agent checks 2026-01-06 19:13:14 -08:00
jv
02ca9e43fa fix: honor rig agent when starting witness/refinery 2026-01-06 19:12:55 -08:00
gastown/crew/joe
c2451b85e7 Merge origin/main into fix/205-address-claude-startup-issues
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
2026-01-06 19:04:29 -08:00
gastown/crew/george
c306879a31 fix(refinery): merge local branches instead of fetching from origin (gt-cio03)
Phase 2 of Heresy Correction: Local-Only Polecat Branches

Changes:
- Replace FetchBranch("origin", branch) with BranchExists() check
- Use local branch directly for CheckConflicts() and MergeNoFF()
- Remove "origin/" prefix from branch references

The Refinery worktree shares .repo.git with polecat worktrees, so
branches created by polecats are already visible locally without
needing to fetch from origin.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 13:14:58 -08:00
markov-kernel
6fe25c757c fix(refinery): Send MERGE_FAILED to Witness when merge is rejected
When the Refinery detects a build error or test failure and refuses
to merge, the polecat was never notified. This fixes the notification
pipeline by:

1. Adding MERGE_FAILED protocol support to Witness:
   - PatternMergeFailed regex pattern
   - ProtoMergeFailed protocol type constant
   - MergeFailedPayload struct with all failure details
   - ParseMergeFailed parser function
   - ClassifyMessage case for MERGE_FAILED

2. Adding HandleMergeFailed handler to Witness:
   - Parses the failure notification
   - Sends HIGH priority mail to polecat with fix instructions
   - Includes branch, issue, failure type, and error details

3. Adding mail notification in Refinery's handleFailureFromQueue:
   - Creates mail.Router for sending protocol messages
   - Sends MERGE_FAILED to Witness when merge fails
   - Includes failure type (build/tests/conflict) and error

4. Adding comprehensive unit tests:
   - TestParseMergeFailed for full body parsing
   - TestParseMergeFailed_MinimalBody for minimal body
   - TestParseMergeFailed_InvalidSubject for error handling
   - ClassifyMessage test cases for MERGE_FAILED

Fixes #114

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 13:02:46 -08:00
Julian Knutsen
9d7dcde1e2 feat: Unified beads redirect for tracked and local beads (#222)
* feat: Beads redirect architecture for tracked and local beads

This change implements proper redirect handling so that all rig agents
(Witness, Refinery, Crew, Polecats) can work with both:
- Tracked beads: .beads/ checked into git at mayor/rig/.beads
- Local beads: .beads/ created at rig root during gt rig add

Key changes:

1. SetupRedirect now handles tracked beads by skipping redirect chains.
   The bd CLI doesn't support chains (A→B→C), so worktrees redirect
   directly to the final destination (mayor/rig/.beads for tracked).

2. ResolveBeadsDir is now used consistently in polecat and refinery
   managers instead of hardcoded mayor/rig paths.

3. Rig-level agents (witness, refinery) now use rig beads with rig
   prefix instead of town beads. This follows the architecture where
   town beads are only for Mayor/Deacon.

4. prime.go simplified to always use ../../.beads for crew redirects,
   letting rig-level redirect handle tracked vs local routing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(doctor): Add beads-redirect check for tracked beads

When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root
needs a redirect file pointing to that location. This check:

- Detects missing rig-level redirect for tracked beads
- Verifies redirect points to correct location (mayor/rig/.beads)
- Auto-fixes with 'gt doctor --fix'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Handle fileLock.Unlock error in daemon

Wrap fileLock.Unlock() return value to satisfy errcheck linter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 12:59:37 -08:00
mayor
31a32c084b Unify agent startup with GUPP propulsion nudge
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.

Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
  WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
  add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
  unified startup with proper nudges

This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 01:51:25 -08:00
mayor
fc0b506253 Merge polecat branches, resolve conflicts 2026-01-05 19:39:25 -08:00
mediocre
f49197243d refactor: extract shared AgentStateManager pattern (gt-gaw8e)
- Create internal/agent package with shared State type and StateManager
- StateManager uses Go generics for type-safe load/save operations
- Update witness and refinery to use shared State type alias
- Replace loadState/saveState implementations with StateManager delegation
- Maintains backwards compatibility through re-exported constants

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 00:19:41 -08:00
wraith
ef248a1824 refactor: extract ExecWithOutput utility for command execution (gt-vurfr)
Create util.ExecWithOutput and util.ExecRun to consolidate repeated
exec.Command patterns across witness/handlers.go and refinery/manager.go.

Changes:
- Add internal/util/exec.go with ExecWithOutput (returns stdout) and
  ExecRun (runs command without output)
- Refactor witness/handlers.go to use utility functions (7 call sites)
- Refactor refinery/manager.go, removing unused gitRun/gitOutput methods
- Add comprehensive tests in exec_test.go

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 00:18:47 -08:00
gastown/crew/jack
eea4435269 feat(rig): Add --branch flag for custom default branch
Add --branch flag to `gt rig add` to specify a custom default branch
instead of auto-detecting from remote. This supports repositories that
use non-standard default branches like `develop` or `release`.

Changes:
- Add --branch flag to `gt rig add` command
- Store default_branch in rig config.json
- Propagate default branch to refinery, witness, daemon, and all commands
- Rename ensureMainBranch to ensureDefaultBranch for clarity
- Add Rig.DefaultBranch() method for consistent access
- Update crew/manager.go and swarm/manager.go to use rig config

Based on PR #49 by @kustrun - rebased and extended with additional fixes.

Co-authored-by: kustrun <kustrun@users.noreply.github.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 14:01:02 -08:00
max
1b69576573 fix: Address golangci-lint errors (errcheck, gosec) (#76)
Apply PR #76 from dannomayernotabot:

- Add golangci exclusions for internal package false positives
- Tighten file permissions (0644 -> 0600) for sensitive files
- Add ReadHeaderTimeout to HTTP server (slowloris prevention)
- Explicit error ignoring with _ = for intentional cases
- Add //nolint comments with justifications
- Spelling: cancelled -> canceled (US locale)

Co-Authored-By: dannomayernotabot <noreply@github.com>

🤖 Generated with Claude Code
2026-01-03 16:11:55 -08:00
Olivier Debeuf De Rijcker
7f9795f630 fix: Close MR beads after successful merge from queue (#52)
The handleSuccessFromQueue function was missing critical steps that
exist in the legacy handleSuccess function:

1. Fetch and update the MR bead with merge_commit SHA and close_reason
2. Close the MR bead with CloseWithReason("merged", mr.ID)

Without these steps, the MR bead stayed "open" in beads even after the
queue file was deleted. This caused Mayor (which queries beads for open
merge-requests) to think there were pending MRs while Refinery (which
uses the queue) reported completion.

Fixes #46
2026-01-03 11:54:14 -08:00
keeper
354219033a feat: Add 'ensure' semantics to witness/refinery start commands
gt witness start and gt refinery start now detect zombie sessions
(tmux alive but Claude dead) and automatically kill and recreate them.

This makes the start commands idempotent:
- If no session exists → create new session
- If session exists and healthy → do nothing (already running)
- If session exists but zombie → kill and recreate

Previously users had to manually run stop then start, or use restart.

Closes: gt-ekc5u

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:53:52 -08:00
furiosa
efd9434ee1 feat(refinery): Integrate merge-slot gate for conflict resolution
Adds merge-slot integration to the Refinery's Engineer for serializing
conflict resolution. When a conflict is detected:
- Acquire the merge slot before creating a conflict resolution task
- If slot is held, defer task creation (MR stays in queue)
- Release slot after successful merge

This prevents cascading conflicts from multiple polecats racing to
resolve conflicts simultaneously.

Adds MergeSlot wrapper functions to beads package for slot operations.

(gt-4u49x)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:26:45 -08:00
nux
f7839c2724 feat(refinery): Non-blocking delegation via bead-gates (gt-hibbj)
When merge conflicts occur, the Refinery now creates a conflict resolution
task and blocks the MR on that task, allowing the queue to continue to the
next MR without waiting.

Changes:
- Add BlockedBy field to mrqueue.MR for tracking blocking tasks
- Update handleFailureFromQueue to set BlockedBy after creating conflict task
- Add ListReady method to mrqueue that filters out blocked MRs
- Add ListBlocked method for monitoring blocked MRs
- Add IsBeadOpen, ListReadyMRs, ListBlockedMRs helpers to Engineer
- Add 'gt refinery ready' command (unclaimed AND unblocked MRs)
- Add 'gt refinery blocked' command (shows blocked MRs)

When the conflict resolution task closes, the MR unblocks and re-enters
the ready queue for processing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:25:15 -08:00
joe
c3e83a3d09 Fix remaining worker startups for gt seance metadata
- startCrewMember: now uses BuildCrewStartupCommand (was GetRuntimeCommand)
- refinery/manager.go: now uses BuildAgentStartupCommand (was GetRuntimeCommand)

Both now properly inject BD_ACTOR and GT_ROLE so seance can identify
sessions correctly. This completes the seance metadata fix started in
the previous commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:09:42 -08:00
nux
8517ff0650 feat(mq): Add priority-ordered queue display and processing (gt-si8rq.6)
Implements priority scoring for merge queue ordering:

## Changes to gt mq list
- Add SCORE column showing priority score (higher = process first)
- Sort MRs by score descending instead of simple priority
- Add CONVOY column showing convoy ID if tracked

## New gt mq next command
- Returns highest-score MR ready for processing
- Supports --strategy=fifo for FIFO ordering fallback
- Supports --quiet for just printing MR ID
- Supports --json for programmatic access

## Changes to Refinery
- Queue() now sorts by priority score instead of simple priority
- Uses ScoreMR from mrqueue package for consistent scoring

## MR Fields Extended
- Added retry_count, last_conflict_sha, conflict_task_id
- Added convoy_id, convoy_created_at for convoy tracking
- These fields feed into priority scoring function

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 17:18:27 -08:00
slit
8f276febd3 feat(refinery): Create conflict-resolution tasks when rebase fails (gt-si8rq.3)
When the Refinery Engineer detects a merge conflict during rebase,
it now creates a dispatchable task bead for conflict resolution.

Task format:
- Title: Resolve merge conflicts: <original-issue-title>
- Type: task
- Priority: boosted from original (P2 -> P1, P1 -> P0)
- Description includes: original MR ID, branch, conflict SHA,
  source issue, retry count, and step-by-step instructions

The task appears in bd ready and can be dispatched to available
polecats by the Witness. After resolution and force-push, the
Refinery will automatically retry the merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 01:40:39 -08:00
gastown/crew/jack
1e53cd78a6 fix: Security fixes and docs updates (gt-jsm2s, gt-d47q0, gt-orujk)
- convoy.go: Escape single quotes in SQL to prevent injection
- engineer.go: Add comment clarifying test command trust model
  (config.json is trusted infra, not PR-controlled)
- agents.go, prime.go, mayor.md.tmpl: Fix 'gt polecats' -> 'gt polecat list'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 11:02:03 -08:00
mayor
4ea2b719ba fix(review): Address code review feedback for hq-eggh5
- Rename issueToBead → issueToMR (clearer naming)
- Handle empty TargetBranch with fallback to main
- Sort queue by priority (P0 first)
- Remove dead code (discoverWorkBranches, branchToMR)
- Fix parseTime fallback format
2025-12-31 11:36:48 -08:00
mayor
36f17bbadd fix: Refinery queue uses beads MQ as source of truth (hq-eggh5)
The refinery was checking git branches instead of the beads merge queue.
This caused MRs to pile up when branches were deleted but MR beads remained.

- manager.go: Queue() now queries beads for type=merge-request issues
- refinery.md.tmpl: Updated queue-scan to use gt mq list
- mol-refinery-patrol.formula.toml: Updated queue-scan step instructions
2025-12-31 11:36:48 -08:00
gastown/crew/gus
626a24e013 Refactor startup paths to use RuntimeConfig (gt-j0546)
Replaced all hardcoded 'claude --dangerously-skip-permissions' invocations
with configurable helpers from internal/config:

- GetRuntimeCommand(rigPath) - simple command string
- GetRuntimeCommandWithPrompt(rigPath, prompt) - with initial prompt
- BuildAgentStartupCommand(role, bdActor, rigPath, prompt) - generic agent
- BuildPolecatStartupCommand(rigName, polecatName, rigPath, prompt) - polecat
- BuildCrewStartupCommand(rigName, crewName, rigPath, prompt) - crew
- BuildStartupCommand(envVars, rigPath, prompt) - custom env vars

Files updated:
- internal/cmd/start.go (4 locations)
- internal/cmd/crew_lifecycle.go (2 locations)
- internal/cmd/crew_at.go (2 locations)
- internal/cmd/deacon.go
- internal/cmd/witness.go
- internal/cmd/up.go (2 locations)
- internal/cmd/handoff.go (2 locations)
- internal/daemon/daemon.go (3 locations)
- internal/daemon/lifecycle.go
- internal/session/manager.go
- internal/refinery/manager.go
- internal/boot/boot.go

This enables future support for alternative LLM runtimes (aider, etc.)
via rig/town settings configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 23:48:34 -08:00