Commit Graph

85 Commits

Author SHA1 Message Date
gastown/crew/joe
1f44482ad0 fix: remove observable states from agent_state (discover, don't track)
The agent_state field was recording observable state like "running",
"dead", "idle" which violated the "Discover, Don't Track" principle.
This caused stale state bugs where agents were marked "dead" in beads
but actually running in tmux.

Changes:
- Remove daemon's checkStaleAgents() which marked agents "dead"
- Simplify ensureXxxRunning() to use tmux.IsClaudeRunning() directly
- Remove reportAgentState() calls from gt prime and gt handoff
- Add SetHookBead/ClearHookBead helpers that don't update agent_state
- Use ClearHookBead in gt done and gt unsling
- Simplify gt status to derive state from tmux, not bead

Non-observable states (stuck, awaiting-gate, muted, paused) are still
set because they represent intentional agent decisions that can't be
discovered from tmux state.

Fixes: gt-zecmc

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 20:32:02 -08:00
gastown/crew/joe
6dbb841e22 fix(daemon): nudge agents on state divergence instead of silent accept
When the daemon detects that an agent bead state doesn't match tmux
(e.g., bead says stopped but Claude is running), it now:

1. Logs the divergence clearly with STATE DIVERGENCE prefix
2. Nudges the agent with an actionable command to fix its state
3. Still skips the restart (safety - don't kill healthy sessions)

This prevents silent state drift where bead state diverges from reality.
Applied to: Deacon, Witness, Refinery ensure functions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 19:23:52 -08:00
jv
99aae0bf02 fix: use canonical hq role bead IDs 2026-01-06 19:13:14 -08:00
jv
02ca9e43fa fix: honor rig agent when starting witness/refinery 2026-01-06 19:12:55 -08:00
gastown/crew/joe
c2451b85e7 Merge origin/main into fix/205-address-claude-startup-issues
Resolved conflict in internal/witness/manager.go:
- Kept session import (used by PR code)
- Kept PR's more accurate comment for PID check
- Removed duplicate sessionName method introduced by merge
2026-01-06 19:04:29 -08:00
Julian Knutsen
9d7dcde1e2 feat: Unified beads redirect for tracked and local beads (#222)
* feat: Beads redirect architecture for tracked and local beads

This change implements proper redirect handling so that all rig agents
(Witness, Refinery, Crew, Polecats) can work with both:
- Tracked beads: .beads/ checked into git at mayor/rig/.beads
- Local beads: .beads/ created at rig root during gt rig add

Key changes:

1. SetupRedirect now handles tracked beads by skipping redirect chains.
   The bd CLI doesn't support chains (A→B→C), so worktrees redirect
   directly to the final destination (mayor/rig/.beads for tracked).

2. ResolveBeadsDir is now used consistently in polecat and refinery
   managers instead of hardcoded mayor/rig paths.

3. Rig-level agents (witness, refinery) now use rig beads with rig
   prefix instead of town beads. This follows the architecture where
   town beads are only for Mayor/Deacon.

4. prime.go simplified to always use ../../.beads for crew redirects,
   letting rig-level redirect handle tracked vs local routing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(doctor): Add beads-redirect check for tracked beads

When a repo has .beads/ tracked in git (at mayor/rig/.beads), the rig root
needs a redirect file pointing to that location. This check:

- Detects missing rig-level redirect for tracked beads
- Verifies redirect points to correct location (mayor/rig/.beads)
- Auto-fixes with 'gt doctor --fix'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Handle fileLock.Unlock error in daemon

Wrap fileLock.Unlock() return value to satisfy errcheck linter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 12:59:37 -08:00
julianknutsen
09bbb0f430 Fix lint issues and sparse checkout for empty repos
- Handle empty repos in ConfigureSparseCheckout (skip read-tree when no HEAD)
- Fix errcheck: wrap fileLock.Unlock() error in defer
- Fix unparam: remove unused *rig.Rig return from getWitnessManager
- Fix unparam: mark unused agentType parameter with blank identifier

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 02:16:15 -08:00
mayor
31a32c084b Unify agent startup with GUPP propulsion nudge
Witness and Refinery startup was duplicated across cmd/witness.go, cmd/up.go,
cmd/rig.go, and daemon.go. Worse, not all code paths sent the propulsion nudge
(GUPP - Gas Town Universal Propulsion Principle). Now unified in Manager.Start()
which handles everything including nudges.

Changes:
- witness/manager.go: Full rewrite with session creation, env vars, theming,
  WaitForClaudeReady, startup nudge, and propulsion nudge (GUPP)
- refinery/manager.go: Add propulsion nudge sequence after Claude startup
- cmd/witness.go: Simplify to just call mgr.Start(), remove ensureWitnessSession
- cmd/rig.go: Use witness.Manager.Start() instead of inline session creation
- cmd/start.go: Use witness.Manager.Start()
- cmd/up.go: Use witness.Manager.Start(), remove ensureWitness(),
  add EnsureSettingsForRole in ensureSession()
- daemon.go: Use witness.Manager.Start() and refinery.Manager.Start() for
  unified startup with proper nudges

This ensures all agent startup paths (gt witness start, gt rig boot, gt up,
daemon restarts) consistently apply GUPP propulsion nudges.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 01:51:25 -08:00
gastown/crew/jack
e8d27e7212 fix: Create and lookup rig agent beads with correct prefix
Per docs/architecture.md, Witness and Refinery are rig-level agents that
should use the rig's configured prefix (e.g., pi- for pixelforge) instead
of hardcoded "gt-".

This extends PR #183's creation fix to also fix all lookup paths:
- internal/rig/manager.go: Create agent beads in rig beads with rig prefix
- internal/daemon/daemon.go: Use rig prefix when looking up agent state
- internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping
- internal/cmd/sling.go: Pass townRoot for prefix lookup
- internal/cmd/unsling.go: Pass townRoot for prefix lookup
- internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups
- internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups
- internal/config/loader.go: Add GetRigPrefix helper

Without this fix, the daemon would:
- Create pi-gastown-witness but look for gt-gastown-witness
- Report agents as missing/dead when they are running
- Fail to manage agent lifecycle correctly

Based on work by Johann Taberlet in PR #183.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:39:57 -08:00
mayor
5224dfb50d Merge remote-tracking branch 'origin/polecat/slit-mk1wa0rj' 2026-01-05 19:39:01 -08:00
gastown
43cca06460 feat: Add Windows-compatible file locking for daemon
Replace Unix-only syscall.Flock with gofrs/flock library for
cross-platform file locking. This enables the daemon to run on
Windows in addition to Unix-like systems.

- Add github.com/gofrs/flock v0.13.0 dependency
- Replace syscall.Flock calls with flock.TryLock/Unlock
- Maintain same non-blocking exclusive lock semantics

(gt-5354h)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:21:09 -08:00
slit
8110aab257 fix: Add GUPP propulsion nudge to daemon restartSession
When Deacon respawns refinery/witness sessions via LIFECYCLE requests,
the new sessions were starting at the Claude welcome screen without
the propulsion nudge that triggers autonomous execution.

Added StartupNudge and PropulsionNudgeForRole calls to restartSession()
in lifecycle.go, matching the pattern used in ensureRefinerySession()
in start.go. This ensures respawned agents receive the GUPP nudge and
begin autonomous work immediately.

Fixes: gt-01jpg

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:14:49 -08:00
gastown/crew/jack
6b8c897e37 feat: use hq- prefix for Mayor and Deacon session names
Town-level services (Mayor, Deacon) now use hq- prefix instead of gt-:
- hq-mayor (was gt-mayor)
- hq-deacon (was gt-deacon)

This distinguishes town-level sessions from rig-level sessions which
continue to use gt- prefix (gt-gastown-witness, gt-gastown-crew-max, etc).

Changes:
- session.MayorSessionName() returns "hq-mayor"
- session.DeaconSessionName() returns "hq-deacon"
- ParseSessionName() handles both hq- and gt- prefixes
- categorizeSession() handles both prefixes
- categorizeSessions() accepts both prefixes
- Updated all tests and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 00:42:24 -08:00
gastown/crew/jack
eea4435269 feat(rig): Add --branch flag for custom default branch
Add --branch flag to `gt rig add` to specify a custom default branch
instead of auto-detecting from remote. This supports repositories that
use non-standard default branches like `develop` or `release`.

Changes:
- Add --branch flag to `gt rig add` command
- Store default_branch in rig config.json
- Propagate default branch to refinery, witness, daemon, and all commands
- Rename ensureMainBranch to ensureDefaultBranch for clarity
- Add Rig.DefaultBranch() method for consistent access
- Update crew/manager.go and swarm/manager.go to use rig config

Based on PR #49 by @kustrun - rebased and extended with additional fixes.

Co-authored-by: kustrun <kustrun@users.noreply.github.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 14:01:02 -08:00
jakehemmerle
c25ff34b5b fix(daemon): prevent orphan daemons via file locking
Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU
race condition where concurrent 'gt daemon start' commands could spawn
multiple daemons. Only the first to acquire the lock succeeds; others
exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so
multiple GT instances from different directories work independently.

Also detect race losers in runDaemonStart() by comparing spawned PID
with PID file, reporting 'already running' instead of false success.
2026-01-04 15:30:57 -05:00
splendid
acd2565a5b fix: remove vestigial state.json files from agent directories
Agent directories (witness/, refinery/, mayor/) contained state.json files
with last_active timestamps that were never updated, making them stale and
misleading. This change removes:

- initAgentStates function that created vestigial state.json files
- AgentState type and related Load/Save functions from config package
- MayorStateValidCheck from doctor checks
- requesting_* lifecycle verification (dead code - flags were never set)
- FileStateJSON constant and MayorStatePath function

Kept intact:
- daemon/state.json (actively used for daemon runtime state)
- crew/<name>/state.json (operational CrewWorker metadata)
- Agent state tracking via beads (the ZFC-compliant approach)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 10:36:23 -08:00
gastown/crew/george
60ecf1ff76 refactor: migrate callers from deprecated MayorBeadID/DeaconBeadID to Town variants (gt-gbw0a)
Update all callers of deprecated MayorBeadID()/DeaconBeadID() to use
MayorBeadIDTown()/DeaconBeadIDTown() which return hq- prefix IDs for
town-level beads storage.

Changes:
- internal/daemon/lifecycle.go: identityToAgentBeadID and checkStaleAgents
- internal/cmd/prime.go: getAgentBeadID
- internal/cmd/molecule_status.go: buildAgentBeadID
- internal/cmd/prime_test.go: update expected values to hq-*
- Comments updated to reflect hq- prefix for town-level agents

The deprecated functions remain for backward compatibility and are used
by the migration tool (migrate_agents.go).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:53:30 -08:00
max
1b69576573 fix: Address golangci-lint errors (errcheck, gosec) (#76)
Apply PR #76 from dannomayernotabot:

- Add golangci exclusions for internal package false positives
- Tighten file permissions (0644 -> 0600) for sensitive files
- Add ReadHeaderTimeout to HTTP server (slowloris prevention)
- Explicit error ignoring with _ = for intentional cases
- Add //nolint comments with justifications
- Spelling: cancelled -> canceled (US locale)

Co-Authored-By: dannomayernotabot <noreply@github.com>

🤖 Generated with Claude Code
2026-01-03 16:11:55 -08:00
joe
4bcf50bf1c Revert to simple gt-mayor/gt-deacon session names
Reverts the session naming changes from PR #70. Multi-town support
on a single machine is not a real use case - rigs provide project
isolation, and true isolation should use containers/VMs.

Changes:
- MayorSessionName() and DeaconSessionName() no longer take townName parameter
- ParseSessionName() handles simple gt-mayor and gt-deacon formats
- Removed Town field from AgentIdentity and AgentSession structs
- Updated all callers and tests

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:33:24 -08:00
rictus
40b8dec2a0 feat(daemon): Add circuit breaker for stuck agents (gt-72cqu)
- Add Refinery monitoring to daemon heartbeat (ensureRefineriesRunning)
- Add circuit breaker: agents marked dead by checkStaleAgents() are
  now force-killed and restarted, even if Claude process is alive
- Fixes zombie Claude sessions that werent updating their bead state

The circuit breaker flow:
1. Agent gets stuck (stops updating bead state)
2. After 15 minutes: checkStaleAgents() marks bead as dead
3. On next heartbeat: ensure*Running() sees state=dead
4. Force-kill session and restart fresh

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 13:12:39 -08:00
markov-kernel
e7145cfd77 fix: Make Mayor/Deacon session names include town name
Session names `gt-mayor` and `gt-deacon` were hardcoded, causing tmux
session name collisions when running multiple towns simultaneously.

Changed to `gt-{town}-mayor` and `gt-{town}-deacon` format (e.g.,
`gt-ai-mayor`) to allow concurrent multi-town operation.

Key changes:
- session.MayorSessionName() and DeaconSessionName() now take townName param
- Added workspace.GetTownName() helper to load town name from config
- Updated all callers in cmd/, daemon/, doctor/, mail/, rig/, templates/
- Updated tests with new session name format
- Bead IDs remain unchanged (already scoped by .beads/ directory)

Fixes #60

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:37:05 +01:00
mayor
a83d5c6e0b fix: Add tmux health check fallback to prevent killing healthy sessions
When the daemon checks if Deacon/Witness is running, it first checks the
agent bead state. If this check fails (bead not found, JSON parse error,
or stale state), it would previously attempt to restart the session -
even if the tmux session was perfectly healthy.

This caused "session already exists" errors when:
1. Agent bead state couldn't be read (prefix mismatch, missing bead)
2. But the tmux session was actually running with Claude active

Fix: Add a tmux session health check as fallback before attempting restart.
If the session exists AND Claude is running in it, skip the restart and
log that we're preserving the healthy session despite stale bead state.

This maintains ZFC compliance (still trusts agent bead as primary source)
while adding a defensive check to prevent unnecessary session kills.

Fixes #63

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-03 09:01:34 -08:00
dag
9ad826cd8c fix(daemon): Kill zombie tmux sessions before recreating
The daemon was failing to restart agents when zombie tmux sessions existed
(session alive but Claude dead). Added EnsureSessionFresh() helper to
tmux package that:
- Checks if session exists
- If exists but Claude not running (zombie), kills the session
- Creates fresh session

Updated all daemon session creation points to use EnsureSessionFresh:
- ensureDeaconRunning()
- ensureWitnessRunning()
- restartPolecatSession()
- restartSession() in lifecycle.go

Added tests for the new helper function. (gt-j1i0r)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:54:39 -08:00
kustrun
94857fc913 fix(session): Auto-accept Claude bypass permissions warning dialog
When Claude starts with --dangerously-skip-permissions, it shows a warning
dialog requiring Down+Enter to accept. This blocked automated polecat and
agent startup.

Added AcceptBypassPermissionsWarning() to tmux package that:
- Checks if the warning dialog is present by capturing pane content
- Only sends Down+Enter if "Bypass Permissions mode" text is found
- Avoids interfering with sessions that don't show the warning

Updated all Claude startup locations:
- session/manager.go (polecat sessions)
- cmd/up.go (mayor, witness, crew, polecat cold starts)
- daemon/daemon.go (crashed polecat restarts)
- daemon/lifecycle.go (role session starts)
2026-01-02 22:44:58 +01:00
morsov
a16b94ffa8 fix(daemon): Reduce heartbeat from 10 min to 3 min for faster stuck detection (gt-6y5o3)
The 10-minute recovery heartbeat was too slow to detect stuck agents promptly.
Reduced to 3 minutes for better balance between detection speed and overhead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 20:02:12 -08:00
gastown/polecats/furiosa
278abf15d6 fix(hook): Read hook_bead from database column, not description (gt-7m33w)
The hook discovery code was reading hook_bead from the agent bead's
description field (parsed via ParseAgentFieldsFromDescription), but
the slot update code writes to the hook_bead database column via
'bd slot set'. This mismatch caused polecats to see stale hook values
from the description instead of the current value from the database.

Fixed in:
- molecule_status.go: Use agentBead.HookBead instead of parsing description
- status.go: Use issue.HookBead directly
- lifecycle.go: Update all GUPP and orphan detection to read from
  database columns instead of parsing description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 17:42:31 -08:00
gastown/polecats/nux
6f1b6269b1 Fix Deacon spin indefinitely bug (hq-oosxt)
- Add heartbeat checking to Boot degraded triage: detects stale Deacon
  heartbeat (>15min nudges, >30min restarts)
- Add checkDeaconHeartbeat to daemon heartbeat cycle as fallback
- This ensures the Deacon is monitored continuously, not just at startup

The mol-deacon-patrol formula was also updated separately to use
gt deacon health-check instead of ephemeral context memory tracking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 11:47:10 -08:00
gastown/crew/gus
626a24e013 Refactor startup paths to use RuntimeConfig (gt-j0546)
Replaced all hardcoded 'claude --dangerously-skip-permissions' invocations
with configurable helpers from internal/config:

- GetRuntimeCommand(rigPath) - simple command string
- GetRuntimeCommandWithPrompt(rigPath, prompt) - with initial prompt
- BuildAgentStartupCommand(role, bdActor, rigPath, prompt) - generic agent
- BuildPolecatStartupCommand(rigName, polecatName, rigPath, prompt) - polecat
- BuildCrewStartupCommand(rigName, crewName, rigPath, prompt) - crew
- BuildStartupCommand(envVars, rigPath, prompt) - custom env vars

Files updated:
- internal/cmd/start.go (4 locations)
- internal/cmd/crew_lifecycle.go (2 locations)
- internal/cmd/crew_at.go (2 locations)
- internal/cmd/deacon.go
- internal/cmd/witness.go
- internal/cmd/up.go (2 locations)
- internal/cmd/handoff.go (2 locations)
- internal/daemon/daemon.go (3 locations)
- internal/daemon/lifecycle.go
- internal/session/manager.go
- internal/refinery/manager.go
- internal/boot/boot.go

This enables future support for alternative LLM runtimes (aider, etc.)
via rig/town settings configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 23:48:34 -08:00
gastown/crew/jack
d37bd53a90 Dynamically discover rigs in checkStaleAgents
Instead of hardcoding the rig list (gastown, beads), now loads rigs
from mayor/rigs.json using config.LoadRigsConfig. Falls back gracefully
to just checking global agents if the config cannot be loaded.

(gt-mmp0q)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 23:23:05 -08:00
Steve Yegge
b241a353f3 Clean up stale issue references in comments (gt-z5bri)
- daemon.go: Update gt-2hzl4 comment to past tense (timeout fallback is implemented)
- lifecycle.go: Remove "to be removed" promise for gt-psuw7 (code stays)
- keepalive_test.go: Remove tombstoned gt-gaxo epic reference
- doctor.go: Remove tombstoned gt-gaxo epic reference
- daemon_test.go: Remove tombstoned gt-gaxo epic reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:29:23 -08:00
gastown/polecats/furiosa
108afdbc52 Remove keepalive file infrastructure for feed-based wake model (gt-vdprb.2)
- Remove TouchTownActivity() calls from root.go PersistentPreRun hook
- Remove ReadTownActivity() and activity-based backoff from daemon.go
- Delete TouchTownActivity/ReadTownActivity functions from keepalive.go
- Replace dynamic backoff with fixed 10-min recovery heartbeat

Normal wake is now handled by feed subscription (bd activity --follow).
The daemon is a safety net for dead sessions, GUPP violations, and orphaned work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:13:14 -08:00
Steve Yegge
2112804aba Implement Boot: daemon entry point dog for Deacon triage (gt-rwd5j)
Boot is a watchdog that the daemon pokes instead of Deacon directly,
centralizing the 'when to wake Deacon' decision in an agent that can
reason about context.

Key changes:
- Add internal/boot package with marker file and status tracking
- Add gt boot commands: status, spawn, triage
- Add mol-boot-triage formula for Boot's triage cycle
- Modify daemon to call ensureBootRunning instead of ensureDeaconRunning
- Add tmux.IsAvailable() for degraded mode detection
- Add boot.md.tmpl role template

Boot lifecycle:
1. Daemon tick spawns Boot (fresh each time)
2. Boot runs triage: observe, decide, act
3. Boot cleans stale handoffs from Deacon inbox
4. Boot exits (or handoffs in non-degraded mode)

In degraded mode (no tmux), Boot runs mechanical triage directly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 16:20:00 -08:00
Steve Yegge
3099d99424 Set GIT_AUTHOR_NAME per agent session (gt-6r18e.1)
Export GIT_AUTHOR_NAME alongside BD_ACTOR in all agent session startup
locations. This enables git log --author queries for agent work while
keeping GIT_AUTHOR_EMAIL as the workspace owner.

Files updated:
- internal/session/manager.go (polecat sessions)
- internal/daemon/daemon.go (deacon, witness, polecat via daemon)
- internal/daemon/lifecycle.go (polecat lifecycle)
- internal/cmd/*.go (crew, mayor, deacon, witness, refinery, up, handoff)
- internal/session/manager_test.go (updated test expectations)
- docs/federation.md (marked feature as implemented)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 16:14:33 -08:00
Steve Yegge
437d42c7fa ZFC #4: Replace daemon identity parsing with agent self-registration
Implements role-based lifecycle configuration where agent types self-register
via role beads instead of hardcoded identity string parsing in the daemon.

Changes:
- Add RoleConfig struct with lifecycle fields (session_pattern, work_dir_pattern,
  needs_pre_sync, start_command, env_vars)
- Add ParseRoleConfig/FormatRoleConfig/ExpandRolePattern to beads package
- Add role bead ID helpers (RoleBeadID, MayorRoleBeadID, etc.)
- Refactor daemon to use single parseIdentity function as ONLY place where
  identity strings are parsed
- Daemon now looks up role beads to get lifecycle config, with fallback to
  defaults when role bead is missing or has no config
- Updated all role beads (mayor, deacon, witness, refinery, crew, polecat)
  with structured lifecycle configuration fields
- Add comprehensive unit tests for RoleConfig parsing and expansion

This makes the daemon ZFC-compliant by trusting what agents self-report in
their role beads rather than encoding agent-specific knowledge in Go code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:13:14 -08:00
Steve Yegge
e72882cdcb Add feed curator to daemon for activity feed curation (gt-38doh)
Implements the feed daemon goroutine that:
- Tails ~/gt/.events.jsonl for raw events
- Filters by visibility tag (keeps feed/both, drops audit)
- Deduplicates repeated done events from same actor
- Aggregates sling events when multiple occur in window
- Writes curated events to ~/gt/.feed.jsonl with summaries

The curator runs as a goroutine in the gt daemon, starting when
the daemon starts and stopping gracefully on shutdown.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:12:01 -08:00
Steve Yegge
85ec39c487 fix: Add session health monitoring and auto-restart for crashed polecats (gt-i7wcn)
This fix addresses the issue where polecat sessions terminate unexpectedly
during work without recovery:

Changes:
- Add `checkPolecatSessionHealth()` to daemon heartbeat loop
  - Proactively validates tmux sessions are alive for polecats
  - Detects crashed polecats that have work-on-hook
  - Auto-restarts crashed polecats with proper environment setup
  - Notifies Witness if restart fails as fallback

- Add polecat support to lifecycle identity mapping
  - `identityToSession()` now handles polecat identities
  - `restartSession()` can restart crashed polecat sessions
  - `identityToStateFile()` handles polecat state files
  - `identityToAgentBeadID()` handles polecat agent beads
  - `identityToBDActor()` handles polecat BD_ACTOR conversion

- Add `gt session check` command for manual health checking
  - Validates tmux sessions exist for all polecats
  - Shows summary of healthy vs not-running sessions
  - Useful for debugging session issues

This provides faster recovery (within heartbeat interval) compared to
waiting for GUPP violation timeout (30 min) or Witness detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 22:13:24 -08:00
Steve Yegge
553e29626a Daemon heartbeat becomes recovery-focused (gt-vdprb.4)
Change daemon from wake-focused to recovery-focused:

Before: Daemon pokes agents every 5-60min as primary wake
After: Daemon only checks for edge cases that feed-wake cannot handle

Recovery checks:
- Dead sessions that need restart (ensureDeaconRunning, ensureWitnessesRunning)
- Stale agents that crashed without updating state (checkStaleAgents)
- GUPP violations: agents with work-on-hook not progressing (checkGUPPViolations)
- Orphaned work: work assigned to dead agents (checkOrphanedWork)

Removed:
- pokeDeacon() - no longer sending HEARTBEAT messages
- pokeWitness()/pokeWitnesses() - no longer sending HEARTBEAT messages
- MOTD message arrays - only used by poke functions

Normal agent wake is now handled by feed subscription (bd activity --follow).
The daemon is the safety net for edge cases, not the primary propulsion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 18:10:20 -08:00
Steve Yegge
c92b11d1bd feat: Standardize agent bead naming to prefix-rig-role-name (gt-zvte2)
Implements canonical naming convention for agent bead IDs:
- Town-level: gt-mayor, gt-deacon (unchanged)
- Rig-level: gt-<rig>-witness, gt-<rig>-refinery (was gt-witness-<rig>)
- Named: gt-<rig>-crew-<name>, gt-<rig>-polecat-<name> (was gt-crew-<rig>-<name>)

Changes:
- Added AgentBeadID helper functions to internal/beads/beads.go
- Updated all ID generation call sites to use helpers
- Fixed session parsing in theme.go, statusline.go, agents.go
- Updated doctor check and fix to use canonical format
- Updated tests for new format

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 14:54:37 -08:00
Steve Yegge
40cb3eb9fc Extract timing constants to constants package (gt-795e8)
Added 6 timing constants:
- ShutdownNotifyDelay (500ms)
- ClaudeStartTimeout (15s)
- ShellReadyTimeout (5s)
- DefaultDebounceMs (100)
- DefaultDisplayMs (5000)
- PollInterval (100ms)

Updated 7 files to use these constants instead of magic numbers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:18:03 -08:00
Steve Yegge
213b3bab20 feat: Add atomic write pattern for state files (gt-wled7)
Prevents data loss from concurrent/interrupted state file writes by using
atomic write pattern (write to .tmp, then rename).

Changes:
- Add internal/util package with AtomicWriteJSON/AtomicWriteFile helpers
- Update witness/manager.go saveState to use atomic writes
- Update refinery/manager.go saveState to use atomic writes
- Update crew/manager.go saveState to use atomic writes
- Update daemon/types.go SaveState to use atomic writes
- Update polecat/namepool.go Save to use atomic writes
- Add comprehensive tests for atomic write utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:57:53 -08:00
Steve Yegge
2655a85124 chore: remove issue ID references from comments
Strip (gt-xxx), (bd-xxx) suffixes from code comments. The descriptions
remain meaningful without the ephemeral issue IDs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:05:16 -08:00
Steve Yegge
34f5ef0e93 feat: Wire witness patrol formula to daemon cron (gt-qpoxz)
Adds witness patrol to daemon heartbeat cycle:
- ensureWitnessesRunning(): Start witness for each rig if not running
- pokeWitnesses(): Send heartbeat messages to all witnesses
- getKnownRigs(): Read rig list from mayor/rigs.json

The daemon now ensures both Deacon and Witnesses are running
and sends periodic heartbeats to trigger their patrol molecules.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 02:44:37 -08:00
Steve Yegge
2d6b93f26b fix: Remove PID/tmux state inference (gt-psuw7)
ZFC compliance: daemon becomes pure transport layer, trusting agent beads.

Changes:
- refinery Status(): Simply returns loaded state, no PID/tmux reconciliation
- witness Status(): Simply returns loaded state, no PID inference
- daemon ensureDeaconRunning(): Trusts agent bead state, no tmux fallback
- daemon pokeDeacon(): Trusts agent bead state, no HasSession check

Removed:
- 78 lines of state inference code (PID checks, tmux session parsing)
- "Reconciliation" logic that overwrote agent-reported state

Note: Timeout fallback for dead agents is gt-2hzl4 (separate issue).

Reference: ~/gt/docs/zfc-violations-audit.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:56:42 -08:00
Steve Yegge
597c6b8071 Add timeout fallback for dead agents (gt-2hzl4)
- Add checkStaleAgents() to detect agents reporting "running" but not updating
- Add markAgentDead() to update agent bead state to "dead"
- Integrate stale agent check into heartbeat cycle
- DeadAgentTimeout set to 15 minutes

This is a safety mechanism for agents that crash without updating their state.
The daemon now marks them as dead so they can be restarted.

Also fixes duplicate AgentFields declaration - now uses beads.go version with
ParseAgentFieldsFromDescription alias in fields.go.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:55:52 -08:00
Steve Yegge
a1715fa91f refactor: Move agent field parsing to shared beads package
- Add AgentFields and ParseAgentFieldsFromDescription to internal/beads/fields.go
- Update daemon/lifecycle.go to use shared parsing
- Update cmd/molecule_status.go to use shared parsing
- Remove duplicate parsing code and unused isAgentRunningByBead function

This consolidates agent bead field parsing in one place, following the pattern
established for AttachmentFields and MRFields.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:43:43 -08:00
Steve Yegge
9c47f99767 Daemon reads agent bead state (gt-39ttg)
- Add getAgentBeadState() and getAgentBeadInfo() to read agent state from beads
- Add identityToAgentBeadID() to map daemon identities to agent bead IDs
- Update ensureDeaconRunning() to check agent bead state first (ZFC compliant)
- Add agent bead state logging in executeLifecycleAction()

This is the first step toward ZFC-compliant state detection. Dependent tasks:
- gt-psuw7: Remove PID/tmux state inference
- gt-2hzl4: Add timeout fallback for dead agents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:35:29 -08:00
Steve Yegge
fa0dfc324e feat: Add crew session cycling fix and daemon exponential backoff (gt-ws8ol)
- Fix crew next/prev: Pass session name via key binding to avoid run-shell context issue
- Add TouchTownActivity() for town-level activity signaling
- Implement daemon exponential backoff based on activity.json:
  - 0-5 min idle → 5 min heartbeat
  - 5-15 min idle → 10 min heartbeat
  - 15-45 min idle → 30 min heartbeat
  - 45+ min idle → 60 min heartbeat (max)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 21:15:08 -08:00
Steve Yegge
a68e5bf95b fix: ignore lifecycle requests older than 6 hours (gt-a41um)
Defense in depth: Even if message deletion fails, stale lifecycle
requests (>6 hours old) are now ignored and deleted without execution.

This prevents ancient LIFECYCLE messages from killing sessions if
they somehow survive in the inbox.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 17:52:48 -08:00
Steve Yegge
96ce991bb7 fix: delete LIFECYCLE messages before executing action (gt-a41um)
P0 bug: Crew workers were being killed every 5 minutes due to stale
LIFECYCLE messages from days ago being reprocessed on each heartbeat.

Root cause: closeMessage() was only called after successful action
execution. Failed actions left messages in inbox to be reprocessed.

Fix: "Claim then execute" pattern - delete message FIRST, before
attempting the action. This guarantees stale messages are cleaned up
regardless of action outcome. If a lifecycle action is needed, the
sender must re-request.

Also improved closeMessage() to capture and log actual error output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 17:44:21 -08:00
Steve Yegge
34b5a3bb8d Document intentional error suppressions with comments (gt-zn9m)
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.

Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer

This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 23:14:29 -08:00