Commit Graph

77 Commits

Author SHA1 Message Date
gastown/crew/jack
e8d27e7212 fix: Create and lookup rig agent beads with correct prefix
Per docs/architecture.md, Witness and Refinery are rig-level agents that
should use the rig's configured prefix (e.g., pi- for pixelforge) instead
of hardcoded "gt-".

This extends PR #183's creation fix to also fix all lookup paths:
- internal/rig/manager.go: Create agent beads in rig beads with rig prefix
- internal/daemon/daemon.go: Use rig prefix when looking up agent state
- internal/daemon/lifecycle.go: Use rig prefix for identity-to-bead mapping
- internal/cmd/sling.go: Pass townRoot for prefix lookup
- internal/cmd/unsling.go: Pass townRoot for prefix lookup
- internal/cmd/molecule_status.go: Use rig prefix for agent bead lookups
- internal/cmd/molecule_attach.go: Use rig prefix for agent bead lookups
- internal/config/loader.go: Add GetRigPrefix helper

Without this fix, the daemon would:
- Create pi-gastown-witness but look for gt-gastown-witness
- Report agents as missing/dead when they are running
- Fail to manage agent lifecycle correctly

Based on work by Johann Taberlet in PR #183.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Johann Taberlet <johann.taberlet@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:39:57 -08:00
mayor
5224dfb50d Merge remote-tracking branch 'origin/polecat/slit-mk1wa0rj' 2026-01-05 19:39:01 -08:00
gastown
43cca06460 feat: Add Windows-compatible file locking for daemon
Replace Unix-only syscall.Flock with gofrs/flock library for
cross-platform file locking. This enables the daemon to run on
Windows in addition to Unix-like systems.

- Add github.com/gofrs/flock v0.13.0 dependency
- Replace syscall.Flock calls with flock.TryLock/Unlock
- Maintain same non-blocking exclusive lock semantics

(gt-5354h)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:21:09 -08:00
slit
8110aab257 fix: Add GUPP propulsion nudge to daemon restartSession
When Deacon respawns refinery/witness sessions via LIFECYCLE requests,
the new sessions were starting at the Claude welcome screen without
the propulsion nudge that triggers autonomous execution.

Added StartupNudge and PropulsionNudgeForRole calls to restartSession()
in lifecycle.go, matching the pattern used in ensureRefinerySession()
in start.go. This ensures respawned agents receive the GUPP nudge and
begin autonomous work immediately.

Fixes: gt-01jpg

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:14:49 -08:00
gastown/crew/jack
6b8c897e37 feat: use hq- prefix for Mayor and Deacon session names
Town-level services (Mayor, Deacon) now use hq- prefix instead of gt-:
- hq-mayor (was gt-mayor)
- hq-deacon (was gt-deacon)

This distinguishes town-level sessions from rig-level sessions which
continue to use gt- prefix (gt-gastown-witness, gt-gastown-crew-max, etc).

Changes:
- session.MayorSessionName() returns "hq-mayor"
- session.DeaconSessionName() returns "hq-deacon"
- ParseSessionName() handles both hq- and gt- prefixes
- categorizeSession() handles both prefixes
- categorizeSessions() accepts both prefixes
- Updated all tests and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 00:42:24 -08:00
gastown/crew/jack
eea4435269 feat(rig): Add --branch flag for custom default branch
Add --branch flag to `gt rig add` to specify a custom default branch
instead of auto-detecting from remote. This supports repositories that
use non-standard default branches like `develop` or `release`.

Changes:
- Add --branch flag to `gt rig add` command
- Store default_branch in rig config.json
- Propagate default branch to refinery, witness, daemon, and all commands
- Rename ensureMainBranch to ensureDefaultBranch for clarity
- Add Rig.DefaultBranch() method for consistent access
- Update crew/manager.go and swarm/manager.go to use rig config

Based on PR #49 by @kustrun - rebased and extended with additional fixes.

Co-authored-by: kustrun <kustrun@users.noreply.github.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 14:01:02 -08:00
jakehemmerle
c25ff34b5b fix(daemon): prevent orphan daemons via file locking
Add syscall.Flock() exclusive lock in daemon.Run() to prevent TOCTOU
race condition where concurrent 'gt daemon start' commands could spawn
multiple daemons. Only the first to acquire the lock succeeds; others
exit cleanly. Lock is per-town (in townRoot/daemon/daemon.lock) so
multiple GT instances from different directories work independently.

Also detect race losers in runDaemonStart() by comparing spawned PID
with PID file, reporting 'already running' instead of false success.
2026-01-04 15:30:57 -05:00
splendid
acd2565a5b fix: remove vestigial state.json files from agent directories
Agent directories (witness/, refinery/, mayor/) contained state.json files
with last_active timestamps that were never updated, making them stale and
misleading. This change removes:

- initAgentStates function that created vestigial state.json files
- AgentState type and related Load/Save functions from config package
- MayorStateValidCheck from doctor checks
- requesting_* lifecycle verification (dead code - flags were never set)
- FileStateJSON constant and MayorStatePath function

Kept intact:
- daemon/state.json (actively used for daemon runtime state)
- crew/<name>/state.json (operational CrewWorker metadata)
- Agent state tracking via beads (the ZFC-compliant approach)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 10:36:23 -08:00
gastown/crew/george
60ecf1ff76 refactor: migrate callers from deprecated MayorBeadID/DeaconBeadID to Town variants (gt-gbw0a)
Update all callers of deprecated MayorBeadID()/DeaconBeadID() to use
MayorBeadIDTown()/DeaconBeadIDTown() which return hq- prefix IDs for
town-level beads storage.

Changes:
- internal/daemon/lifecycle.go: identityToAgentBeadID and checkStaleAgents
- internal/cmd/prime.go: getAgentBeadID
- internal/cmd/molecule_status.go: buildAgentBeadID
- internal/cmd/prime_test.go: update expected values to hq-*
- Comments updated to reflect hq- prefix for town-level agents

The deprecated functions remain for backward compatibility and are used
by the migration tool (migrate_agents.go).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:53:30 -08:00
max
1b69576573 fix: Address golangci-lint errors (errcheck, gosec) (#76)
Apply PR #76 from dannomayernotabot:

- Add golangci exclusions for internal package false positives
- Tighten file permissions (0644 -> 0600) for sensitive files
- Add ReadHeaderTimeout to HTTP server (slowloris prevention)
- Explicit error ignoring with _ = for intentional cases
- Add //nolint comments with justifications
- Spelling: cancelled -> canceled (US locale)

Co-Authored-By: dannomayernotabot <noreply@github.com>

🤖 Generated with Claude Code
2026-01-03 16:11:55 -08:00
joe
4bcf50bf1c Revert to simple gt-mayor/gt-deacon session names
Reverts the session naming changes from PR #70. Multi-town support
on a single machine is not a real use case - rigs provide project
isolation, and true isolation should use containers/VMs.

Changes:
- MayorSessionName() and DeaconSessionName() no longer take townName parameter
- ParseSessionName() handles simple gt-mayor and gt-deacon formats
- Removed Town field from AgentIdentity and AgentSession structs
- Updated all callers and tests

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:33:24 -08:00
rictus
40b8dec2a0 feat(daemon): Add circuit breaker for stuck agents (gt-72cqu)
- Add Refinery monitoring to daemon heartbeat (ensureRefineriesRunning)
- Add circuit breaker: agents marked dead by checkStaleAgents() are
  now force-killed and restarted, even if Claude process is alive
- Fixes zombie Claude sessions that werent updating their bead state

The circuit breaker flow:
1. Agent gets stuck (stops updating bead state)
2. After 15 minutes: checkStaleAgents() marks bead as dead
3. On next heartbeat: ensure*Running() sees state=dead
4. Force-kill session and restart fresh

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 13:12:39 -08:00
markov-kernel
e7145cfd77 fix: Make Mayor/Deacon session names include town name
Session names `gt-mayor` and `gt-deacon` were hardcoded, causing tmux
session name collisions when running multiple towns simultaneously.

Changed to `gt-{town}-mayor` and `gt-{town}-deacon` format (e.g.,
`gt-ai-mayor`) to allow concurrent multi-town operation.

Key changes:
- session.MayorSessionName() and DeaconSessionName() now take townName param
- Added workspace.GetTownName() helper to load town name from config
- Updated all callers in cmd/, daemon/, doctor/, mail/, rig/, templates/
- Updated tests with new session name format
- Bead IDs remain unchanged (already scoped by .beads/ directory)

Fixes #60

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:37:05 +01:00
mayor
a83d5c6e0b fix: Add tmux health check fallback to prevent killing healthy sessions
When the daemon checks if Deacon/Witness is running, it first checks the
agent bead state. If this check fails (bead not found, JSON parse error,
or stale state), it would previously attempt to restart the session -
even if the tmux session was perfectly healthy.

This caused "session already exists" errors when:
1. Agent bead state couldn't be read (prefix mismatch, missing bead)
2. But the tmux session was actually running with Claude active

Fix: Add a tmux session health check as fallback before attempting restart.
If the session exists AND Claude is running in it, skip the restart and
log that we're preserving the healthy session despite stale bead state.

This maintains ZFC compliance (still trusts agent bead as primary source)
while adding a defensive check to prevent unnecessary session kills.

Fixes #63

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-03 09:01:34 -08:00
dag
9ad826cd8c fix(daemon): Kill zombie tmux sessions before recreating
The daemon was failing to restart agents when zombie tmux sessions existed
(session alive but Claude dead). Added EnsureSessionFresh() helper to
tmux package that:
- Checks if session exists
- If exists but Claude not running (zombie), kills the session
- Creates fresh session

Updated all daemon session creation points to use EnsureSessionFresh:
- ensureDeaconRunning()
- ensureWitnessRunning()
- restartPolecatSession()
- restartSession() in lifecycle.go

Added tests for the new helper function. (gt-j1i0r)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 18:54:39 -08:00
kustrun
94857fc913 fix(session): Auto-accept Claude bypass permissions warning dialog
When Claude starts with --dangerously-skip-permissions, it shows a warning
dialog requiring Down+Enter to accept. This blocked automated polecat and
agent startup.

Added AcceptBypassPermissionsWarning() to tmux package that:
- Checks if the warning dialog is present by capturing pane content
- Only sends Down+Enter if "Bypass Permissions mode" text is found
- Avoids interfering with sessions that don't show the warning

Updated all Claude startup locations:
- session/manager.go (polecat sessions)
- cmd/up.go (mayor, witness, crew, polecat cold starts)
- daemon/daemon.go (crashed polecat restarts)
- daemon/lifecycle.go (role session starts)
2026-01-02 22:44:58 +01:00
morsov
a16b94ffa8 fix(daemon): Reduce heartbeat from 10 min to 3 min for faster stuck detection (gt-6y5o3)
The 10-minute recovery heartbeat was too slow to detect stuck agents promptly.
Reduced to 3 minutes for better balance between detection speed and overhead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 20:02:12 -08:00
gastown/polecats/furiosa
278abf15d6 fix(hook): Read hook_bead from database column, not description (gt-7m33w)
The hook discovery code was reading hook_bead from the agent bead's
description field (parsed via ParseAgentFieldsFromDescription), but
the slot update code writes to the hook_bead database column via
'bd slot set'. This mismatch caused polecats to see stale hook values
from the description instead of the current value from the database.

Fixed in:
- molecule_status.go: Use agentBead.HookBead instead of parsing description
- status.go: Use issue.HookBead directly
- lifecycle.go: Update all GUPP and orphan detection to read from
  database columns instead of parsing description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 17:42:31 -08:00
gastown/polecats/nux
6f1b6269b1 Fix Deacon spin indefinitely bug (hq-oosxt)
- Add heartbeat checking to Boot degraded triage: detects stale Deacon
  heartbeat (>15min nudges, >30min restarts)
- Add checkDeaconHeartbeat to daemon heartbeat cycle as fallback
- This ensures the Deacon is monitored continuously, not just at startup

The mol-deacon-patrol formula was also updated separately to use
gt deacon health-check instead of ephemeral context memory tracking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 11:47:10 -08:00
gastown/crew/gus
626a24e013 Refactor startup paths to use RuntimeConfig (gt-j0546)
Replaced all hardcoded 'claude --dangerously-skip-permissions' invocations
with configurable helpers from internal/config:

- GetRuntimeCommand(rigPath) - simple command string
- GetRuntimeCommandWithPrompt(rigPath, prompt) - with initial prompt
- BuildAgentStartupCommand(role, bdActor, rigPath, prompt) - generic agent
- BuildPolecatStartupCommand(rigName, polecatName, rigPath, prompt) - polecat
- BuildCrewStartupCommand(rigName, crewName, rigPath, prompt) - crew
- BuildStartupCommand(envVars, rigPath, prompt) - custom env vars

Files updated:
- internal/cmd/start.go (4 locations)
- internal/cmd/crew_lifecycle.go (2 locations)
- internal/cmd/crew_at.go (2 locations)
- internal/cmd/deacon.go
- internal/cmd/witness.go
- internal/cmd/up.go (2 locations)
- internal/cmd/handoff.go (2 locations)
- internal/daemon/daemon.go (3 locations)
- internal/daemon/lifecycle.go
- internal/session/manager.go
- internal/refinery/manager.go
- internal/boot/boot.go

This enables future support for alternative LLM runtimes (aider, etc.)
via rig/town settings configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 23:48:34 -08:00
gastown/crew/jack
d37bd53a90 Dynamically discover rigs in checkStaleAgents
Instead of hardcoding the rig list (gastown, beads), now loads rigs
from mayor/rigs.json using config.LoadRigsConfig. Falls back gracefully
to just checking global agents if the config cannot be loaded.

(gt-mmp0q)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 23:23:05 -08:00
Steve Yegge
b241a353f3 Clean up stale issue references in comments (gt-z5bri)
- daemon.go: Update gt-2hzl4 comment to past tense (timeout fallback is implemented)
- lifecycle.go: Remove "to be removed" promise for gt-psuw7 (code stays)
- keepalive_test.go: Remove tombstoned gt-gaxo epic reference
- doctor.go: Remove tombstoned gt-gaxo epic reference
- daemon_test.go: Remove tombstoned gt-gaxo epic reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:29:23 -08:00
gastown/polecats/furiosa
108afdbc52 Remove keepalive file infrastructure for feed-based wake model (gt-vdprb.2)
- Remove TouchTownActivity() calls from root.go PersistentPreRun hook
- Remove ReadTownActivity() and activity-based backoff from daemon.go
- Delete TouchTownActivity/ReadTownActivity functions from keepalive.go
- Replace dynamic backoff with fixed 10-min recovery heartbeat

Normal wake is now handled by feed subscription (bd activity --follow).
The daemon is a safety net for dead sessions, GUPP violations, and orphaned work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:13:14 -08:00
Steve Yegge
2112804aba Implement Boot: daemon entry point dog for Deacon triage (gt-rwd5j)
Boot is a watchdog that the daemon pokes instead of Deacon directly,
centralizing the 'when to wake Deacon' decision in an agent that can
reason about context.

Key changes:
- Add internal/boot package with marker file and status tracking
- Add gt boot commands: status, spawn, triage
- Add mol-boot-triage formula for Boot's triage cycle
- Modify daemon to call ensureBootRunning instead of ensureDeaconRunning
- Add tmux.IsAvailable() for degraded mode detection
- Add boot.md.tmpl role template

Boot lifecycle:
1. Daemon tick spawns Boot (fresh each time)
2. Boot runs triage: observe, decide, act
3. Boot cleans stale handoffs from Deacon inbox
4. Boot exits (or handoffs in non-degraded mode)

In degraded mode (no tmux), Boot runs mechanical triage directly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 16:20:00 -08:00
Steve Yegge
3099d99424 Set GIT_AUTHOR_NAME per agent session (gt-6r18e.1)
Export GIT_AUTHOR_NAME alongside BD_ACTOR in all agent session startup
locations. This enables git log --author queries for agent work while
keeping GIT_AUTHOR_EMAIL as the workspace owner.

Files updated:
- internal/session/manager.go (polecat sessions)
- internal/daemon/daemon.go (deacon, witness, polecat via daemon)
- internal/daemon/lifecycle.go (polecat lifecycle)
- internal/cmd/*.go (crew, mayor, deacon, witness, refinery, up, handoff)
- internal/session/manager_test.go (updated test expectations)
- docs/federation.md (marked feature as implemented)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 16:14:33 -08:00
Steve Yegge
437d42c7fa ZFC #4: Replace daemon identity parsing with agent self-registration
Implements role-based lifecycle configuration where agent types self-register
via role beads instead of hardcoded identity string parsing in the daemon.

Changes:
- Add RoleConfig struct with lifecycle fields (session_pattern, work_dir_pattern,
  needs_pre_sync, start_command, env_vars)
- Add ParseRoleConfig/FormatRoleConfig/ExpandRolePattern to beads package
- Add role bead ID helpers (RoleBeadID, MayorRoleBeadID, etc.)
- Refactor daemon to use single parseIdentity function as ONLY place where
  identity strings are parsed
- Daemon now looks up role beads to get lifecycle config, with fallback to
  defaults when role bead is missing or has no config
- Updated all role beads (mayor, deacon, witness, refinery, crew, polecat)
  with structured lifecycle configuration fields
- Add comprehensive unit tests for RoleConfig parsing and expansion

This makes the daemon ZFC-compliant by trusting what agents self-report in
their role beads rather than encoding agent-specific knowledge in Go code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:13:14 -08:00
Steve Yegge
e72882cdcb Add feed curator to daemon for activity feed curation (gt-38doh)
Implements the feed daemon goroutine that:
- Tails ~/gt/.events.jsonl for raw events
- Filters by visibility tag (keeps feed/both, drops audit)
- Deduplicates repeated done events from same actor
- Aggregates sling events when multiple occur in window
- Writes curated events to ~/gt/.feed.jsonl with summaries

The curator runs as a goroutine in the gt daemon, starting when
the daemon starts and stopping gracefully on shutdown.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:12:01 -08:00
Steve Yegge
85ec39c487 fix: Add session health monitoring and auto-restart for crashed polecats (gt-i7wcn)
This fix addresses the issue where polecat sessions terminate unexpectedly
during work without recovery:

Changes:
- Add `checkPolecatSessionHealth()` to daemon heartbeat loop
  - Proactively validates tmux sessions are alive for polecats
  - Detects crashed polecats that have work-on-hook
  - Auto-restarts crashed polecats with proper environment setup
  - Notifies Witness if restart fails as fallback

- Add polecat support to lifecycle identity mapping
  - `identityToSession()` now handles polecat identities
  - `restartSession()` can restart crashed polecat sessions
  - `identityToStateFile()` handles polecat state files
  - `identityToAgentBeadID()` handles polecat agent beads
  - `identityToBDActor()` handles polecat BD_ACTOR conversion

- Add `gt session check` command for manual health checking
  - Validates tmux sessions exist for all polecats
  - Shows summary of healthy vs not-running sessions
  - Useful for debugging session issues

This provides faster recovery (within heartbeat interval) compared to
waiting for GUPP violation timeout (30 min) or Witness detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 22:13:24 -08:00
Steve Yegge
553e29626a Daemon heartbeat becomes recovery-focused (gt-vdprb.4)
Change daemon from wake-focused to recovery-focused:

Before: Daemon pokes agents every 5-60min as primary wake
After: Daemon only checks for edge cases that feed-wake cannot handle

Recovery checks:
- Dead sessions that need restart (ensureDeaconRunning, ensureWitnessesRunning)
- Stale agents that crashed without updating state (checkStaleAgents)
- GUPP violations: agents with work-on-hook not progressing (checkGUPPViolations)
- Orphaned work: work assigned to dead agents (checkOrphanedWork)

Removed:
- pokeDeacon() - no longer sending HEARTBEAT messages
- pokeWitness()/pokeWitnesses() - no longer sending HEARTBEAT messages
- MOTD message arrays - only used by poke functions

Normal agent wake is now handled by feed subscription (bd activity --follow).
The daemon is the safety net for edge cases, not the primary propulsion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 18:10:20 -08:00
Steve Yegge
c92b11d1bd feat: Standardize agent bead naming to prefix-rig-role-name (gt-zvte2)
Implements canonical naming convention for agent bead IDs:
- Town-level: gt-mayor, gt-deacon (unchanged)
- Rig-level: gt-<rig>-witness, gt-<rig>-refinery (was gt-witness-<rig>)
- Named: gt-<rig>-crew-<name>, gt-<rig>-polecat-<name> (was gt-crew-<rig>-<name>)

Changes:
- Added AgentBeadID helper functions to internal/beads/beads.go
- Updated all ID generation call sites to use helpers
- Fixed session parsing in theme.go, statusline.go, agents.go
- Updated doctor check and fix to use canonical format
- Updated tests for new format

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 14:54:37 -08:00
Steve Yegge
40cb3eb9fc Extract timing constants to constants package (gt-795e8)
Added 6 timing constants:
- ShutdownNotifyDelay (500ms)
- ClaudeStartTimeout (15s)
- ShellReadyTimeout (5s)
- DefaultDebounceMs (100)
- DefaultDisplayMs (5000)
- PollInterval (100ms)

Updated 7 files to use these constants instead of magic numbers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 17:18:03 -08:00
Steve Yegge
213b3bab20 feat: Add atomic write pattern for state files (gt-wled7)
Prevents data loss from concurrent/interrupted state file writes by using
atomic write pattern (write to .tmp, then rename).

Changes:
- Add internal/util package with AtomicWriteJSON/AtomicWriteFile helpers
- Update witness/manager.go saveState to use atomic writes
- Update refinery/manager.go saveState to use atomic writes
- Update crew/manager.go saveState to use atomic writes
- Update daemon/types.go SaveState to use atomic writes
- Update polecat/namepool.go Save to use atomic writes
- Add comprehensive tests for atomic write utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:57:53 -08:00
Steve Yegge
2655a85124 chore: remove issue ID references from comments
Strip (gt-xxx), (bd-xxx) suffixes from code comments. The descriptions
remain meaningful without the ephemeral issue IDs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 10:05:16 -08:00
Steve Yegge
34f5ef0e93 feat: Wire witness patrol formula to daemon cron (gt-qpoxz)
Adds witness patrol to daemon heartbeat cycle:
- ensureWitnessesRunning(): Start witness for each rig if not running
- pokeWitnesses(): Send heartbeat messages to all witnesses
- getKnownRigs(): Read rig list from mayor/rigs.json

The daemon now ensures both Deacon and Witnesses are running
and sends periodic heartbeats to trigger their patrol molecules.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 02:44:37 -08:00
Steve Yegge
2d6b93f26b fix: Remove PID/tmux state inference (gt-psuw7)
ZFC compliance: daemon becomes pure transport layer, trusting agent beads.

Changes:
- refinery Status(): Simply returns loaded state, no PID/tmux reconciliation
- witness Status(): Simply returns loaded state, no PID inference
- daemon ensureDeaconRunning(): Trusts agent bead state, no tmux fallback
- daemon pokeDeacon(): Trusts agent bead state, no HasSession check

Removed:
- 78 lines of state inference code (PID checks, tmux session parsing)
- "Reconciliation" logic that overwrote agent-reported state

Note: Timeout fallback for dead agents is gt-2hzl4 (separate issue).

Reference: ~/gt/docs/zfc-violations-audit.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:56:42 -08:00
Steve Yegge
597c6b8071 Add timeout fallback for dead agents (gt-2hzl4)
- Add checkStaleAgents() to detect agents reporting "running" but not updating
- Add markAgentDead() to update agent bead state to "dead"
- Integrate stale agent check into heartbeat cycle
- DeadAgentTimeout set to 15 minutes

This is a safety mechanism for agents that crash without updating their state.
The daemon now marks them as dead so they can be restarted.

Also fixes duplicate AgentFields declaration - now uses beads.go version with
ParseAgentFieldsFromDescription alias in fields.go.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:55:52 -08:00
Steve Yegge
a1715fa91f refactor: Move agent field parsing to shared beads package
- Add AgentFields and ParseAgentFieldsFromDescription to internal/beads/fields.go
- Update daemon/lifecycle.go to use shared parsing
- Update cmd/molecule_status.go to use shared parsing
- Remove duplicate parsing code and unused isAgentRunningByBead function

This consolidates agent bead field parsing in one place, following the pattern
established for AttachmentFields and MRFields.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:43:43 -08:00
Steve Yegge
9c47f99767 Daemon reads agent bead state (gt-39ttg)
- Add getAgentBeadState() and getAgentBeadInfo() to read agent state from beads
- Add identityToAgentBeadID() to map daemon identities to agent bead IDs
- Update ensureDeaconRunning() to check agent bead state first (ZFC compliant)
- Add agent bead state logging in executeLifecycleAction()

This is the first step toward ZFC-compliant state detection. Dependent tasks:
- gt-psuw7: Remove PID/tmux state inference
- gt-2hzl4: Add timeout fallback for dead agents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 01:35:29 -08:00
Steve Yegge
fa0dfc324e feat: Add crew session cycling fix and daemon exponential backoff (gt-ws8ol)
- Fix crew next/prev: Pass session name via key binding to avoid run-shell context issue
- Add TouchTownActivity() for town-level activity signaling
- Implement daemon exponential backoff based on activity.json:
  - 0-5 min idle → 5 min heartbeat
  - 5-15 min idle → 10 min heartbeat
  - 15-45 min idle → 30 min heartbeat
  - 45+ min idle → 60 min heartbeat (max)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 21:15:08 -08:00
Steve Yegge
a68e5bf95b fix: ignore lifecycle requests older than 6 hours (gt-a41um)
Defense in depth: Even if message deletion fails, stale lifecycle
requests (>6 hours old) are now ignored and deleted without execution.

This prevents ancient LIFECYCLE messages from killing sessions if
they somehow survive in the inbox.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 17:52:48 -08:00
Steve Yegge
96ce991bb7 fix: delete LIFECYCLE messages before executing action (gt-a41um)
P0 bug: Crew workers were being killed every 5 minutes due to stale
LIFECYCLE messages from days ago being reprocessed on each heartbeat.

Root cause: closeMessage() was only called after successful action
execution. Failed actions left messages in inbox to be reprocessed.

Fix: "Claim then execute" pattern - delete message FIRST, before
attempting the action. This guarantees stale messages are cleaned up
regardless of action outcome. If a lifecycle action is needed, the
sender must re-request.

Also improved closeMessage() to capture and log actual error output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 17:44:21 -08:00
Steve Yegge
34b5a3bb8d Document intentional error suppressions with comments (gt-zn9m)
All 156 instances of _ = error suppression in non-test code now have
explanatory comments documenting why the error is intentionally ignored.

Categories of intentional suppressions:
- non-fatal: session works without these - tmux environment setup
- non-fatal: theming failure does not affect operation - visual styling
- best-effort cleanup - defer cleanup on failure paths
- best-effort notification - mail/notifications that should not block
- best-effort interrupt - graceful shutdown attempts
- crypto/rand.Read only fails on broken system - random ID generation
- output errors non-actionable - fmt.Fprint to io.Writer

This addresses the silent failure and debugging concerns raised in the
issue by making the intentionality explicit in the code.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 23:14:29 -08:00
Steve Yegge
39faf79322 Fix polecat auto-start by triggering pending spawns in daemon heartbeat (gt-iu23)
The daemon now calls triggerPendingSpawns() on each heartbeat, which:
- Checks for POLECAT_STARTED messages in Deacon inbox
- Polls pending spawns with WaitForClaudeReady (2s timeout)
- Nudges ready polecats with Begin message
- Prunes stale pending spawns (>5 min old)

This ensures polecats get triggered even when Deacon is not in an
active patrol cycle. Uses bootstrap mode regex detection which
is acceptable for daemon operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 22:13:31 -08:00
Steve Yegge
9afd6c5572 feat: Set BD_ACTOR env var when spawning agents (gt-rhfji)
When gt spawns agents (polecats, crew, patrol roles), it now sets the
BD_ACTOR env var so that bd commands (like `bd hook`) know the agent
identity without coupling to gt.

Updated spawn points:
- gt up (mayor, deacon, witness via ensureSession/ensureWitness)
- gt deacon start
- gt witness start
- gt start refinery
- gt mayor start
- Daemon deacon restart
- Daemon lifecycle restart
- Handoff respawn
- Refinery manager start

BD_ACTOR uses slash format (e.g., gastown/witness, gastown/crew/max)
while GT_ROLE may use dash format internally.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 13:26:38 -08:00
Steve Yegge
b6817899b4 refactor: ZFC cleanup - move Go heuristics to Deacon molecule (gt-gaxo)
Remove Go code that makes workflow decisions. All health checking,
staleness detection, nudging, and escalation belongs in the Deacon
molecule where Claude executes it.

Removed:
- internal/daemon/backoff.go (190 lines) - exponential backoff decisions
- internal/doctor/stale_check.go (284 lines) - staleness detection
- IsFresh/IsStale/IsVeryStale from keepalive.go
- pokeMayor, pokeWitnesses, pokeWitness from daemon.go
- Heartbeat staleness classification from pokeDeacon

Changed:
- Lifecycle parsing now uses structured body (JSON or simple text)
  instead of keyword matching on subject line
- Daemon now only ensures Deacon is running and sends simple heartbeats
- No backoff, no staleness classification, no decision-making

Total: ~800 lines removed from Go code

The Deacon molecule will handle all health checking, nudging, and
escalation. Go is now just a message router.

See gt-gaxo epic for full rationale.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 00:11:15 -08:00
Steve Yegge
8d47911623 Fix deacon startup: use ~/gt/deacon dir and export GT_ROLE
The daemon was creating the deacon session in ~/gt (town root) which loaded
the Mayor CLAUDE.md instead of the Deacon CLAUDE.md. Also, tmux SetEnvironment
does not export variables to spawned processes.

Changes:
- Create deacon session in ~/gt/deacon (correct CLAUDE.md)
- Export GT_ROLE=deacon in launch command (process inherits env)
- Apply to both initial launch and restart paths

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 12:39:04 -08:00
Steve Yegge
5f2e16f789 refactor: remove shell respawn loops from witness and deacon (gt-zxgu)
Shell loops bypassed the proper lifecycle architecture. Now:

- Witness: Launches Claude directly, daemon/deacon health-scan handles restart
- Deacon: Launches Claude directly, daemon detects exit and restarts
- Daemon: ensureDeaconRunning() now checks if Claude is running (pane cmd)
         and restarts it if session exists but Claude has exited
- gt deacon restart: Now does stop+start instead of Ctrl-C

This enforces proper lifecycle flow through LIFECYCLE mail and daemon
heartbeat rather than bypassing it with shell loops.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 04:02:17 -08:00
Steve Yegge
a5a37f5d63 Fix: /handoff regression - use respawn-pane with direct claude command
Three related fixes:

1. lifecycle.go: Use gt mail delete instead of gt mail read to prevent
   lifecycle requests from accumulating.

2. handoff.go: Return exec claude command for respawn-pane instead of
   gt crew at which tries to attach to existing session.

3. handoff.md skill: Work without --cycle and -m flags, send mail separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 00:23:40 -08:00
Steve Yegge
c9dcaf0c51 bd sync: 2025-12-22 22:04:04 2025-12-22 22:04:11 -08:00
Steve Yegge
ae513a5db4 Deacon uses wisp-based patrol (gt-3x0z.9)
Daemon changes:
- Remove checkDeaconAttachment() - Deacon self-spawns wisps
- Remove findDeaconPatrolMolecule() - unused
- Remove nudgeDeaconForPatrol() - unused
- Remove DeaconPatrolMolecule const - unused
- Remove beads import - no longer needed

Deacon template changes:
- Update to wisp-based patrol model
- Replace bd mol run with bd mol spawn (wisps by default)
- Remove pinned molecule concept for patrol
- Add Why Wisps section explaining ephemeral design
- Update startup/handoff protocols for wisp-based cycles

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 02:13:24 -08:00