fix(config): implement role_agents support in BuildStartupCommand (#19) (#456)

* fix(config): implement role_agents support in BuildStartupCommand

The role_agents field in TownSettings and RigSettings existed but was
not being used by the startup command builders. All services fell back
to the default agent instead of using role-specific agent assignments.

Changes:
- BuildStartupCommand now extracts GT_ROLE from envVars and uses
  ResolveRoleAgentConfig() for role-based agent selection
- BuildStartupCommandWithAgentOverride follows the same pattern when
  no explicit override is provided
- refinery/manager.go uses ResolveRoleAgentConfig with constants
- cmd/start.go uses ResolveRoleAgentConfig with constants
- Updated comments from hardcoded agent name to generic "agent"
- Added ValidateAgentConfig() to check agent exists and binary is in PATH
- Added lookupAgentConfigIfExists() helper for validation
- ResolveRoleAgentConfig now warns to stderr and falls back to default
  if configured agent is invalid or binary is missing

Resolution priority (now working):
1. Explicit --agent override
2. Rig's role_agents[role] (validated)
3. Town's role_agents[role] (validated)
4. Rig's agent setting
5. Town's default_agent
6. Hardcoded default fallback

Adds tests for:
- TestBuildStartupCommand_UsesRoleAgentsFromTownSettings
- TestBuildStartupCommand_RigRoleAgentsOverridesTownRoleAgents
- TestBuildAgentStartupCommand_UsesRoleAgents
- TestValidateAgentConfig
- TestResolveRoleAgentConfig_FallsBackOnInvalidAgent

Fixes: role_agents configuration not being applied to services

* fix(config): add GT_ROOT to BuildStartupCommandWithAgentOverride

- Fixes missing GT_ROOT and GT_SESSION_ID_ENV exports in
  BuildStartupCommandWithAgentOverride, matching BuildStartupCommand behavior
- Adds test for override priority over role_agents
- Adds test verifying GT_ROOT is included in command

This addresses the Greptile review comment about agents started with
an override not having access to town-level resources.

Co-authored-by: Steve Yegge <steve.yegge@gmail.com>
This commit is contained in:
Subhrajit Makur
2026-01-14 03:04:22 +05:30
committed by GitHub
parent fa99e615f0
commit c61b67eb03
4 changed files with 393 additions and 30 deletions

View File

@@ -116,7 +116,7 @@ func (m *Manager) Start(foreground bool, agentOverride string) error {
if foreground {
// In foreground mode, check tmux session (no PID inference per ZFC)
townRoot := filepath.Dir(m.rig.Path)
agentCfg := config.ResolveAgentConfig(townRoot, m.rig.Path)
agentCfg := config.ResolveRoleAgentConfig(constants.RoleRefinery, townRoot, m.rig.Path)
if running, _ := t.HasSession(sessionID); running && t.IsAgentRunning(sessionID, config.ExpectedPaneCommands(agentCfg)...) {
return ErrAlreadyRunning
}
@@ -138,15 +138,15 @@ func (m *Manager) Start(foreground bool, agentOverride string) error {
// Background mode: check if session already exists
running, _ := t.HasSession(sessionID)
if running {
// Session exists - check if Claude is actually running (healthy vs zombie)
// Session exists - check if agent is actually running (healthy vs zombie)
townRoot := filepath.Dir(m.rig.Path)
agentCfg := config.ResolveAgentConfig(townRoot, m.rig.Path)
agentCfg := config.ResolveRoleAgentConfig(constants.RoleRefinery, townRoot, m.rig.Path)
if t.IsAgentRunning(sessionID, config.ExpectedPaneCommands(agentCfg)...) {
// Healthy - Claude is running
// Healthy - agent is running
return ErrAlreadyRunning
}
// Zombie - tmux alive but Claude dead. Kill and recreate.
_, _ = fmt.Fprintln(m.output, "⚠ Detected zombie session (tmux alive, Claude dead). Recreating...")
// Zombie - tmux alive but agent dead. Kill and recreate.
_, _ = fmt.Fprintln(m.output, "⚠ Detected zombie session (tmux alive, agent dead). Recreating...")
if err := t.KillSession(sessionID); err != nil {
return fmt.Errorf("killing zombie session: %w", err)
}