Commit Graph

4 Commits

Author SHA1 Message Date
Walter McGivney
29f8dd67e2 fix: Add grace period to prevent Deacon restart loop (#590)
* fix(daemon): prevent runaway refinery session spawning

Fixes #566

The daemon spawned 812 refinery sessions over 4 days because:

1. Zombie detection was too strict - used IsAgentRunning(session, "node")
   but Claude reports pane command as version number (e.g., "2.1.7"),
   causing healthy sessions to be killed and recreated every heartbeat.

2. daemon.json patrol config was completely ignored - the daemon never
   loaded or checked the enabled flags.

Changes:
- refinery/manager.go: Use IsClaudeRunning() instead of IsAgentRunning()
  for robust Claude detection (handles "node", "claude", version patterns)
- daemon/types.go: Add PatrolConfig types and LoadPatrolConfig() to read
  mayor/daemon.json
- daemon/daemon.go: Load patrol config at startup, check enabled flags
  before calling ensureRefineriesRunning/ensureWitnessesRunning, add
  diagnostic logging for "already running" cases

Tested: Verified over multiple heartbeats that refinery shows "already
running, skipping spawn" instead of spawning new sessions.

* fix: Add grace period to prevent Deacon restart loop

The daemon had a race condition where:
1. ensureDeaconRunning() starts a new Deacon session
2. checkDeaconHeartbeat() runs in same heartbeat cycle
3. Heartbeat file is stale (from before crash)
4. Session is immediately killed
5. Infinite restart loop every 3 minutes

Fix:
- Track when Deacon was last started (deaconLastStarted field)
- Skip heartbeat check during 5-minute grace period
- Add config support for Deacon (consistency with refinery/witness)

After grace period, normal heartbeat checking resumes. Genuinely
stuck sessions (no heartbeat update after 5+ min) are still detected.

Fixes #589

---------

Co-authored-by: mayor <your-github-email@example.com>
2026-01-16 15:27:41 -08:00
Steve Yegge
213b3bab20 feat: Add atomic write pattern for state files (gt-wled7)
Prevents data loss from concurrent/interrupted state file writes by using
atomic write pattern (write to .tmp, then rename).

Changes:
- Add internal/util package with AtomicWriteJSON/AtomicWriteFile helpers
- Update witness/manager.go saveState to use atomic writes
- Update refinery/manager.go saveState to use atomic writes
- Update crew/manager.go saveState to use atomic writes
- Update daemon/types.go SaveState to use atomic writes
- Update polecat/namepool.go Save to use atomic writes
- Add comprehensive tests for atomic write utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 15:57:53 -08:00
Steve Yegge
1554380228 feat(deacon): improve timing and add heartbeat command
Timing changes for more relaxed poke intervals:
- Daemon heartbeat: 60s → 5 minutes
- Backoff base: 60s → 5 minutes
- Backoff max: 10m → 30 minutes
- Fresh threshold: <2min → <5min
- Stale threshold: 2-5min → 5-15min
- Very stale threshold: >5min → >15min

New command:
- `gt deacon heartbeat [action]` - Touch heartbeat file easily

Template rewrite:
- Clearer wake/sleep model
- Documents wake sources (daemon poke, mail, timer callbacks)
- Simpler rounds with `gt deacon heartbeat` instead of bash echo
- Mentions plugins as optional maintenance tasks
- Explains timer callbacks pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 02:12:21 -08:00
Steve Yegge
1265df70f9 feat: add gt daemon for town-level background service
Implements the town daemon (gt-99m) that handles:
- Periodic heartbeat to poke Mayor and Witnesses
- Lifecycle request processing (cycle, restart, shutdown)
- Session management for agent restarts

Commands:
- gt daemon start: Start daemon in background
- gt daemon stop: Stop running daemon
- gt daemon status: Show daemon status and stats
- gt daemon logs: View daemon log file

The daemon is a "dumb scheduler" - all intelligence remains in agents.
It simply pokes them on schedule and executes lifecycle requests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:06:20 -08:00