refactor: ZFC cleanup - move Go heuristics to Deacon molecule (gt-gaxo)

Remove Go code that makes workflow decisions. All health checking,
staleness detection, nudging, and escalation belongs in the Deacon
molecule where Claude executes it.

Removed:
- internal/daemon/backoff.go (190 lines) - exponential backoff decisions
- internal/doctor/stale_check.go (284 lines) - staleness detection
- IsFresh/IsStale/IsVeryStale from keepalive.go
- pokeMayor, pokeWitnesses, pokeWitness from daemon.go
- Heartbeat staleness classification from pokeDeacon

Changed:
- Lifecycle parsing now uses structured body (JSON or simple text)
  instead of keyword matching on subject line
- Daemon now only ensures Deacon is running and sends simple heartbeats
- No backoff, no staleness classification, no decision-making

Total: ~800 lines removed from Go code

The Deacon molecule will handle all health checking, nudging, and
escalation. Go is now just a message router.

See gt-gaxo epic for full rationale.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Steve Yegge
2025-12-24 00:11:15 -08:00
parent 0f88c793f8
commit b6817899b4
13 changed files with 145 additions and 1224 deletions

View File

View File

@@ -487,7 +487,7 @@
{"id":"gt-gaxo.3","title":"Remove doctor staleness detection","description":"**File:** doctor/stale_check.go (lines 11-48)\n\n**Current behavior:**\n- DefaultStaleThreshold = 1 hour hardcoded\n- Scans molecules for in_progress status older than threshold\n- Reports \"stale\" molecules automatically\n\n**Fix:**\n- Remove automatic staleness detection from doctor\n- Doctor becomes pure diagnostic tool (reports facts, not judgments)\n- Deacon molecule step does \"orphan-check\" instead:\n \"Find issues in_progress with no active polecat\"\n\n**Lines to remove:**\n- stale_check.go:13 (DefaultStaleThreshold constant)\n- stale_check.go:243-256 (staleness classification logic)","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-23T23:59:04.36334-08:00","updated_at":"2025-12-24T00:06:56.685681-08:00","closed_at":"2025-12-24T00:06:56.685681-08:00","close_reason":"Deleted stale_check.go, removed from doctor registration","dependencies":[{"issue_id":"gt-gaxo.3","depends_on_id":"gt-gaxo","type":"parent-child","created_at":"2025-12-23T23:59:04.36382-08:00","created_by":"daemon"}]}
{"id":"gt-gaxo.4","title":"Remove polecat state derivation from issue status","description":"**File:** polecat/manager.go (lines 556-599)\n\n**Current behavior:**\n- Switch on issue.Status to derive polecat state\n- Go decides: open/in_progress → Working, closed → Done\n\n**Fix:**\n- Polecat state comes from polecat, not inferred by Go\n- Polecat signals state via mail or explicit field\n- Or: remove state derivation entirely, just report issue status\n\n**Lines to refactor:**\n- manager.go:576-588 (switch statement)\n\n**Priority:** P2 - less critical than daemon/heartbeat logic","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-23T23:59:05.85152-08:00","updated_at":"2025-12-23T23:59:05.85152-08:00","dependencies":[{"issue_id":"gt-gaxo.4","depends_on_id":"gt-gaxo","type":"parent-child","created_at":"2025-12-23T23:59:05.852006-08:00","created_by":"daemon"}]}
{"id":"gt-gaxo.5","title":"Design Deacon molecule health-check step","description":"**Context:** After removing Go heuristics, Deacon molecule needs the logic.\n\n**New molecule step: health-scan**\n\nClaude executes:\n1. Check Witness heartbeat: gt witness status \u003crig\u003e\n2. Check Refinery heartbeat: gt refinery status \u003crig\u003e\n3. For each, assess: responsive? stuck? needs restart?\n4. If unresponsive for N cycles, send escalation mail\n\n**Key difference from Go approach:**\n- Claude makes the judgment call, not hardcoded thresholds\n- Claude can read context (what was the agent working on?)\n- Claude can ask questions or check additional signals\n- Thresholds come from molecule config, not Go constants\n\n**Deliverables:**\n- Update mol-deacon-patrol health-scan step\n- Add configurable thresholds as molecule variables\n- Test with simulated stuck agents","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-23T23:59:07.247548-08:00","updated_at":"2025-12-23T23:59:07.247548-08:00","dependencies":[{"issue_id":"gt-gaxo.5","depends_on_id":"gt-gaxo","type":"parent-child","created_at":"2025-12-23T23:59:07.248049-08:00","created_by":"daemon"}]}
{"id":"gt-gaxo.6","title":"Remove lifecycle intent parsing from Go","description":"daemon/lifecycle.go parses mail subjects with regex looking for restart/shutdown/cycle keywords, then executes actions. Fix: use structured message types in mail body instead of parsing subjects. Go reads action field, does not interpret text.","status":"in_progress","priority":1,"issue_type":"task","created_at":"2025-12-23T23:59:15.765947-08:00","updated_at":"2025-12-24T00:07:05.083161-08:00","dependencies":[{"issue_id":"gt-gaxo.6","depends_on_id":"gt-gaxo","type":"parent-child","created_at":"2025-12-23T23:59:15.766409-08:00","created_by":"daemon"}]}
{"id":"gt-gaxo.6","title":"Remove lifecycle intent parsing from Go","description":"daemon/lifecycle.go parses mail subjects with regex looking for restart/shutdown/cycle keywords, then executes actions. Fix: use structured message types in mail body instead of parsing subjects. Go reads action field, does not interpret text.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-23T23:59:15.765947-08:00","updated_at":"2025-12-24T00:10:56.167018-08:00","closed_at":"2025-12-24T00:10:56.167018-08:00","close_reason":"Changed lifecycle parsing to use structured body instead of keyword matching on subject","dependencies":[{"issue_id":"gt-gaxo.6","depends_on_id":"gt-gaxo","type":"parent-child","created_at":"2025-12-23T23:59:15.766409-08:00","created_by":"daemon"}]}
{"id":"gt-gby","title":"gt handoff: Unified agent lifecycle command","description":"## Summary\n\nUnified `gt handoff` command for ALL agent types to request lifecycle actions.\n\n## Usage\n\ngt handoff # Context-aware default\ngt handoff --shutdown # Terminate, cleanup, don't restart\ngt handoff --cycle # Restart with handoff mail\ngt handoff --restart # Fresh restart, no handoff\n\n## Context-Aware Defaults\n\n| Agent Type | Default | Reason |\n|------------|---------|--------|\n| Polecat | --shutdown | Ephemeral, work is done |\n| Witness | --cycle | Long-running, context full |\n| Refinery | --cycle | Long-running, context full |\n| Mayor | --cycle | Long-running, context full |\n| Crew | (sends mail only) | Human-managed |\n\n## What gt handoff Does\n\n1. **Verify safe to stop**\n - Git state clean (no uncommitted changes)\n - Work handed off (PR exists for polecats)\n\n2. **Send handoff mail to self** (for cycle/restart)\n - Captures current state\n - New session will read this\n\n3. **Send lifecycle request to manager**\n - Polecats/Refinery → Witness\n - Witness/Mayor → Daemon\n - Format: mail to \u003cmanager\u003e with action type\n\n4. **Set state: requesting_\u003caction\u003e**\n - Lifecycle manager checks this before acting\n\n5. **Wait for termination**\n - Don't self-exit - let manager kill session\n - Ensures clean handoff\n\n## Lifecycle Request Flow\n\nAgent Lifecycle Manager\n | |\n | 1. gt handoff --cycle |\n | a. Verify git clean |\n | b. Send handoff mail to self |\n | c. Set requesting_cycle=true |\n | d. Send lifecycle request |\n |------------------------------------→|\n | |\n | 2. Receive request\n | 3. Verify state |\n | 4. Kill session |\n | 5. Start new |\n | (for cycle) |\n | |\n | New session reads handoff |\n | Resumes work |\n\n## Who Manages Whom\n\n| Agent | Sends lifecycle request to |\n|-------|---------------------------|\n| Polecat | \u003crig\u003e/witness |\n| Refinery | \u003crig\u003e/witness |\n| Witness | daemon/ |\n| Mayor | daemon/ |\n\n## Implementation\n\n1. Detect current role (polecat, witness, refinery, mayor, crew)\n2. Apply context-aware default if no flag specified\n3. Run pre-flight checks (git clean, work handed off)\n4. Send handoff mail to self (if cycling)\n5. Send lifecycle request to appropriate manager\n6. Set requesting_\u003caction\u003e in state.json\n7. Wait (manager will kill us)\n\n## For Polecats (--shutdown)\n\nAdditional cleanup after kill:\n- Witness removes worktree\n- Witness deletes polecat branch\n- Polecat ceases to exist\n\n## Related Issues\n\n- gt-99m: Daemon (handles Mayor/Witness lifecycle)\n- gt-7ik: Ephemeral polecats (polecat cleanup)\n- gt-eu9: Witness session cycling","status":"in_progress","priority":1,"issue_type":"task","created_at":"2025-12-18T11:39:40.806863-08:00","updated_at":"2025-12-18T18:18:22.35369-08:00","dependencies":[{"issue_id":"gt-gby","depends_on_id":"gt-7ik","type":"blocks","created_at":"2025-12-18T11:39:46.423945-08:00","created_by":"daemon"},{"issue_id":"gt-gby","depends_on_id":"gt-eu9","type":"blocks","created_at":"2025-12-18T11:39:46.547204-08:00","created_by":"daemon"},{"issue_id":"gt-gby","depends_on_id":"gt-99m","type":"blocks","created_at":"2025-12-18T11:50:24.142182-08:00","created_by":"daemon"}]}
{"id":"gt-ggmc","title":"Merge: gt-83k0","description":"branch: polecat/furiosa\ntarget: main\nsource_issue: gt-83k0\nrig: gastown","status":"closed","priority":1,"issue_type":"merge-request","created_at":"2025-12-22T23:36:24.551025-08:00","updated_at":"2025-12-22T23:38:38.536524-08:00","closed_at":"2025-12-22T23:38:38.536524-08:00","close_reason":"Merged to main"}
{"id":"gt-gl2","title":"Clarify Mayor vs Witness cleanup responsibilities","description":"Document the cleanup authority model: Witness owns ALL per-worker cleanup, Mayor never involved.\n\n## The Rule\n\n**Witness handles ALL per-worker cleanup. Mayor is never involved.**\n\n## Why This Matters\n\n1. Separation of concerns: Mayor strategic, Witness operational\n2. Reduced coordination overhead: No back-and-forth for routine cleanup\n3. Faster shutdown: Witness kills workers immediately upon verification\n4. Cleaner escalation: Mayor only hears about problems\n\n## What Witness Handles\n\n- Verifying worker git state before kill\n- Nudging workers to fix dirty state\n- Killing worker sessions\n- Updating worker state (sleep/wake)\n- Logging verification results\n\n## What Mayor Handles\n\n- Receiving swarm complete notifications\n- Deciding whether to start new swarms\n- Handling escalations (stuck workers after 3 retries)\n- Cross-rig coordination\n\n## Escalation Path\n\nWorker stuck -\u003e Witness nudges (up to 3x) -\u003e Witness escalates to Mayor -\u003e Mayor decides: force kill, reassign, or human\n\n## Anti-Patterns\n\nDO NOT: Mayor asks Witness if worker X is clean\nDO: Witness reports swarm complete, all workers verified\n\nDO NOT: Mayor kills worker sessions directly\nDO: Mayor tells Witness to abort swarm, Witness handles cleanup\n\nDO NOT: Workers report done to Mayor\nDO: Workers report to Witness, Witness aggregates and reports up","status":"open","priority":1,"issue_type":"task","created_at":"2025-12-15T19:48:56.678724-08:00","updated_at":"2025-12-15T20:48:12.068964-08:00","dependencies":[{"issue_id":"gt-gl2","depends_on_id":"gt-82y","type":"blocks","created_at":"2025-12-15T19:49:05.929877-08:00","created_by":"daemon"}]}
@@ -893,6 +893,8 @@
{"id":"gt-w775","title":"MR: gt-svi.1 (polecat/Furiosa)","description":"branch: polecat/Furiosa\ntarget: main\nsource_issue: gt-svi.1","status":"closed","priority":1,"issue_type":"merge-request","created_at":"2025-12-18T20:21:40.921429-08:00","updated_at":"2025-12-18T20:21:54.163532-08:00","closed_at":"2025-12-18T20:21:54.163532-08:00"}
{"id":"gt-w98d","title":"witness Handoff","description":"attached_molecule: gt-87jz\nattached_at: 2025-12-24T00:23:42Z","status":"pinned","priority":2,"issue_type":"task","created_at":"2025-12-23T16:23:42.292529-08:00","updated_at":"2025-12-23T16:23:42.603527-08:00"}
{"id":"gt-w9o","title":"/restart: Personal slash command for in-place agent restart","description":"Create ~/.claude/commands/restart.md that restarts current Gas Town agent in place.\n\n## Detection\n- Read tmux session name: gt-mayor, gt-witness-*, gt-refinery-*, gt-polecat-*\n- Fallback: check GT_ROLE env var\n\n## Behavior by role\n- mayor: gt mayor restart (sends Ctrl-C, loop respawns)\n- witness: gt witness restart\n- refinery: gt refinery restart \n- polecat: gt polecat restart (or witness-mediated)\n\n## Command format\nUses backticks for inline bash to detect context, then instructs Claude to run appropriate restart.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-18T18:32:30.043125-08:00","updated_at":"2025-12-18T18:43:17.182303-08:00","closed_at":"2025-12-18T18:43:17.182303-08:00"}
{"id":"gt-wewf","title":"Test Patrol for Bonding","description":"Parent issue for mol bond CLI test","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T00:10:43.346135-08:00","updated_at":"2025-12-24T00:10:43.691283-08:00","closed_at":"2025-12-24T00:10:43.691283-08:00","close_reason":"Closed"}
{"id":"gt-wewf.1","title":"Polecat Arm (arm-toast)","description":"Single polecat inspection and action cycle.\n\nThis molecule is bonded dynamically by mol-witness-patrol's survey-workers step.\nEach polecat being monitored gets one arm that runs in parallel with other arms.\n\n## Variables\n\n| Variable | Required | Description |\n|----------|----------|-------------|\n| polecat_name | Yes | Name of the polecat to inspect |\n| rig | Yes | Rig containing the polecat |\n\n## Step: capture\nCapture recent tmux output for toast.\n\n```bash\ntmux capture-pane -t gt-gastown-toast -p | tail -50\n```\n\nRecord:\n- Last activity timestamp (when was last tool call?)\n- Visible errors or stack traces\n- Completion indicators (\"Done\", \"Finished\", etc.)\n\n## Step: assess\nCategorize polecat state based on captured output.\n\nStates:\n- **working**: Recent tool calls, active processing\n- **idle**: At prompt, no recent activity\n- **error**: Showing errors or stack traces\n- **requesting_shutdown**: Sent LIFECYCLE/Shutdown mail\n- **done**: Showing completion indicators\n\nCalculate: minutes since last activity.\nNeeds: capture\n\n## Step: load-history\nRead nudge history for toast from patrol state.\n\n```\nnudge_count = state.nudges[toast].count\nlast_nudge_time = state.nudges[toast].timestamp\n```\n\nThis data was loaded by the parent patrol's load-state step and passed\nto the arm via the bonding context.\nNeeds: assess\n\n## Step: decide\nApply the nudge matrix to determine action for toast.\n\n| State | Idle Time | Nudge Count | Action |\n|-------|-----------|-------------|--------|\n| working | any | any | none |\n| idle | \u003c10min | any | none |\n| idle | 10-15min | 0 | nudge-1 (gentle) |\n| idle | 15-20min | 1 | nudge-2 (direct) |\n| idle | 20+min | 2 | nudge-3 (final) |\n| idle | any | 3 | escalate |\n| error | any | any | assess-severity |\n| requesting_shutdown | any | any | pre-kill-verify |\n| done | any | any | pre-kill-verify |\n\nNudge text:\n1. \"How's progress? Need any help?\"\n2. \"Please wrap up soon. What's blocking you?\"\n3. \"Final check. Will escalate in 5 min if no response.\"\n\nRecord decision and rationale.\nNeeds: load-history\n\n## Step: execute\nTake the decided action for toast.\n\n**nudge-N**:\n```bash\ntmux send-keys -t gt-gastown-toast \"{{nudge_text}}\" Enter\n```\n\n**pre-kill-verify**:\n```bash\ncd polecats/toast\ngit status # Must be clean\ngit log origin/main..HEAD # Check for unpushed\nbd show \u003cassigned-issue\u003e # Verify closed/deferred\n```\nIf clean: kill session, remove worktree, delete branch\nIf dirty: record failure, retry next cycle\n\n**escalate**:\n```bash\ngt mail send mayor/ -s \"Escalation: toast stuck\" -m \"...\"\n```\n\n**none**: No action needed.\n\nRecord: action taken, result, updated nudge count.\nNeeds: decide\n\n## Output\n\nThe arm completes with:\n- action_taken: none | nudge-1 | nudge-2 | nudge-3 | killed | escalated\n- result: success | failed | pending\n- updated_state: New nudge count and timestamp for toast\n\nThis data feeds back to the parent patrol's aggregate step.\n---\nbonded_from: mol-polecat-arm\nbonded_to: gt-wewf\nbonded_ref: arm-toast\nbonded_at: 2025-12-23T10:00:00Z\n","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T00:10:43.433441-08:00","updated_at":"2025-12-24T00:10:43.600601-08:00","closed_at":"2025-12-24T00:10:43.600601-08:00","close_reason":"Closed","dependencies":[{"issue_id":"gt-wewf.1","depends_on_id":"gt-wewf","type":"parent-child","created_at":"2025-12-24T00:10:43.433852-08:00","created_by":"daemon"}]}
{"id":"gt-wpg","title":"Replaceable notifications via Claude Code queue","description":"Leverage Claude Code's ability to replace queued text for notifications that supersede previous ones.\n\n## Problem\n\nIf daemon sends 10 heartbeats while agent is busy, agent returns to see 10 stacked messages. Wasteful and noisy.\n\n## Solution\n\nUse Claude Code's queue replacement for:\n- Heartbeat messages (only latest matters)\n- Status updates that supersede previous\n- Progress notifications\n\n## Implementation\n\nNotifications get a 'slot' identifier. New notification in same slot replaces old one:\n- Slot: 'heartbeat' → only one heartbeat queued at a time\n- Slot: 'status-\u003crig\u003e' → latest status per rig\n- No slot → stacks normally (for unique messages)\n\n## Research Needed\n\n- How does Claude Code expose queue replacement?\n- tmux send-keys behavior with pending input\n- Alternative: clear + resend pattern","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-18T14:19:29.821949-08:00","updated_at":"2025-12-20T13:19:00.398942-08:00","closed_at":"2025-12-20T13:19:00.398942-08:00","dependencies":[{"issue_id":"gt-wpg","depends_on_id":"gt-99m","type":"blocks","created_at":"2025-12-18T14:19:46.656972-08:00","created_by":"daemon"}]}
{"id":"gt-wrw2","title":"Test2","description":"Testing gt mail","status":"open","priority":2,"issue_type":"message","assignee":"gastown-alpha","created_at":"2025-12-20T21:39:05.875792-08:00","updated_at":"2025-12-20T21:39:05.875792-08:00","labels":["thread:thread-1fd9f932cef0"],"sender":"Steve Yegge","wisp":true}
{"id":"gt-wusk","title":"Layered context onboarding pattern","description":"Pattern from handoff discussion:\n\n## Pattern: Layered Context Onboarding\n\nTown CLAUDE.md (user/org) -\u003e Rig CLAUDE.md (project) -\u003e Role priming\n\n## Ultra-compressed HOP for workers (no reveal)\n\n- Permanent record: All work tracked. Outcomes matter.\n- Quality gates: Molecule steps exist for a reason.\n- Attribution: Completions build your track record.\n- Handoff clean: Leave state any worker can continue.\n\n## Recommendation\n\nCreate Town @AGENTS.md for shared worker context that all workers see.\nThis provides common behavioral guidance without revealing full HOP context.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-20T00:55:11.984103-08:00","updated_at":"2025-12-20T00:55:11.984103-08:00"}
@@ -927,6 +929,8 @@
{"id":"gt-yt6g","title":"Standardize session end: gt handoff for all roles","description":"## Summary\n\nStandardize session ending across all agent roles to use `gt handoff` as the canonical command. This is critical for the beads propulsion cycle - turning agent sessions from pets into cattle.\n\n## Current State (Inconsistent)\n\n| Role | Current Guidance | Command |\n|------|-----------------|---------|\n| Mayor | Manual mail send | `gt mail send mayor/ -s 'HANDOFF:...' -m '...'` |\n| Crew | Manual mail send | `gt mail send \u003crig\u003e/crew/\u003cname\u003e -s 'HANDOFF:...' -m '...'` |\n| Witness | Manual mail send | `gt mail send \u003crig\u003e/witness -s 'HANDOFF:...' -m '...'` |\n| Refinery | Manual mail send | `gt mail send \u003crig\u003e/refinery -s 'HANDOFF:...' -m '...'` |\n| Deacon | Exit on high context | (implicit) |\n| Polecat | `gt done` | `gt done [--exit TYPE]` |\n\n## Target State (Unified)\n\nAll roles use `gt handoff`:\n- `gt handoff` - Hand off current session to fresh instance\n- `gt handoff -s 'context' -m 'details'` - Hand off with custom message\n- For polecats: `gt handoff` internally calls `gt done`\n\n## Changes Required\n\n### 1. Code Changes\n- [ ] Update `gt handoff` to detect polecat role and call `gt done` internally\n- [ ] Consider adding `--exit` flag to `gt handoff` for polecat compatibility\n\n### 2. CLAUDE.md Updates (gastown)\n- [ ] ~/gt/CLAUDE.md (Mayor)\n- [ ] gastown/crew/max/CLAUDE.md\n- [ ] gastown/crew/joe/CLAUDE.md\n- [ ] gastown/witness/CLAUDE.md\n- [ ] gastown/refinery/CLAUDE.md (and rig/)\n- [ ] deacon/CLAUDE.md\n\n### 3. CLAUDE.md Updates (beads)\n- [ ] beads/mayor/rig/CLAUDE.md\n- [ ] beads/crew/dave/CLAUDE.md\n- [ ] beads/crew/zoey/CLAUDE.md\n- [ ] beads/witness/CLAUDE.md\n- [ ] beads/refinery/CLAUDE.md (and rig/)\n\n### 4. Architecture Docs\n- [ ] docs/patrol-system-design.md\n- [ ] gastown/mayor/rig/docs/prompts.md\n- [ ] gastown/mayor/rig/docs/session-management.md\n\n## New Session End Checklist (Universal)\n\n```\n# SESSION CLOSE PROTOCOL\n\n[ ] 1. git status (check uncommitted changes)\n[ ] 2. git add \u003cfiles\u003e (stage changes)\n[ ] 3. git commit -m '...' (commit with issue ID)\n[ ] 4. bd sync (sync beads)\n[ ] 5. git push (push to remote - CRITICAL)\n[ ] 6. gt handoff (hand off to fresh session)\n OR gt handoff -s 'Context' -m 'Details for next session'\n```\n\n## Why This Matters\n\nThe handoff mechanism is what turns agent sessions from **pets** (precious, long-lived) into **cattle** (disposable, replaceable). At any time, any agent can:\n1. Send itself a detailed handoff mail (or sling itself a mol)\n2. System shuts them down and restarts them\n3. Fresh session runs priming and reads mail\n4. Work continues seamlessly\n\nThis enables:\n- Unlimited context through automatic cycling\n- Clean recovery from any state\n- Consistent behavior across all roles\n- Simplified agent instructions","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-23T12:57:25.246279-08:00","updated_at":"2025-12-23T13:05:12.92201-08:00","closed_at":"2025-12-23T13:05:12.92201-08:00","close_reason":"Standardized session end guidance across all roles to use gt handoff as canonical method"}
{"id":"gt-yx4","title":"Town root .beads has schema mismatch with bd","description":"The .beads directory at town root (/Users/stevey/gt/.beads) has an incompatible schema:\n\n```\nError: failed to open database: failed to initialize schema: sqlite3: SQL logic error: no such column: thread_id\n```\n\nMeanwhile, gastown/.beads (symlinked to mayor/rig/.beads) works fine.\n\n## Impact\n\n- gt mail inbox fails at town root\n- gt handoff sends mail to broken db\n- Daemon can't check its inbox\n\n## Options\n\n1. Delete town root .beads/beads.db and let it recreate\n2. Symlink town root .beads to gastown/.beads\n3. Run schema migration on existing db\n\n## Root Cause\n\nLikely a beads version upgrade that added thread_id column, but the town root db was created before that.","status":"closed","priority":1,"issue_type":"bug","created_at":"2025-12-18T14:31:35.559042-08:00","updated_at":"2025-12-19T00:39:32.211083-08:00","closed_at":"2025-12-19T00:39:32.211083-08:00"}
{"id":"gt-yzms","title":"Merge polecat/rictus: Add molecule phase lifecycle diagram","description":"Branch: polecat/rictus\n\nAdds molecule phase lifecycle diagram to architecture.md showing Proto → Mol/Wisp → Digest state transitions with the 'states of matter' metaphor. Also documents when to use Mol (durable) vs Wisp (ephemeral).\n\nCloses: gt-c6zs","status":"closed","priority":1,"issue_type":"merge-request","created_at":"2025-12-21T16:41:58.139439-08:00","updated_at":"2025-12-21T17:20:27.50075-08:00","closed_at":"2025-12-21T17:20:27.50075-08:00","close_reason":"ORPHANED: Branch never pushed, worktree deleted"}
{"id":"gt-yzx4","title":"Test Patrol Parent","description":"Test parent for Christmas Ornament pattern","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T00:10:42.806241-08:00","updated_at":"2025-12-24T00:10:43.15213-08:00","closed_at":"2025-12-24T00:10:43.15213-08:00","close_reason":"Closed"}
{"id":"gt-yzx4.1","title":"Test Polecat Arm","description":"Test child for bonding pattern","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T00:10:42.904273-08:00","updated_at":"2025-12-24T00:10:43.017692-08:00","closed_at":"2025-12-24T00:10:43.017692-08:00","close_reason":"Closed","dependencies":[{"issue_id":"gt-yzx4.1","depends_on_id":"gt-yzx4","type":"parent-child","created_at":"2025-12-24T00:10:42.904708-08:00","created_by":"daemon"}]}
{"id":"gt-z3qf","title":"Overhaul gt mol to match bd mol chemistry interface","description":"## The Sling: Unified Work Dispatch\n\nThis issue tracks the overhaul of `gt molecule` to align with chemistry metaphor and introduce the **Universal Gas Town Propulsion Principle**.\n\n### The Propulsion Principle\n\n\u003e **If you find something on your hook, YOU RUN IT.**\n\nThis is the one rule that drives all Gas Town agents.\n\n### The Sling Operation\n\n`gt sling \u003cthing\u003e \u003ctarget\u003e [options]` - unified command for spawn + assign + pin.\n\nSee: `gastown/mayor/rig/docs/sling-design.md`\n\n### Implementation Tasks\n\n| Issue | Title | Priority |\n|-------|-------|----------|\n| gt-4ev4 | Implement gt sling command | P1 |\n| gt-uym5 | Implement gt mol status command | P1 |\n| gt-i4kq | Update templates for Propulsion Principle | P1 |\n| gt-7hor | Document the Propulsion Principle | P2 |\n\n### Command Changes\n\n| Old | New |\n|-----|-----|\n| `gt molecule instantiate` | `gt sling` |\n| `gt molecule attach` | `gt sling --force` |\n| `gt molecule detach` | `gt mol burn` |\n| `gt molecule progress` | `gt mol status` |\n| `gt molecule list` | `gt mol catalog` |\n| `gt spawn --molecule` | `gt sling` |\n\n### Acceptance Criteria\n\n- [ ] `gt sling` works for protos, issues, and epics\n- [ ] `gt mol status` shows hook state\n- [ ] Templates updated for propulsion principle\n- [ ] Old commands deprecated with warnings\n- [ ] Documentation complete","status":"closed","priority":1,"issue_type":"feature","created_at":"2025-12-22T03:02:38.049324-08:00","updated_at":"2025-12-22T14:37:37.562677-08:00","closed_at":"2025-12-22T14:37:37.562677-08:00","close_reason":"Core sling work complete (gt-4ev4, gt-uym5, gt-7hor closed). gt-i4kq (template updates) remains open but is independent polish work.","dependencies":[{"issue_id":"gt-z3qf","depends_on_id":"gt-4ev4","type":"blocks","created_at":"2025-12-22T12:10:42.394653-08:00","created_by":"daemon"},{"issue_id":"gt-z3qf","depends_on_id":"gt-uym5","type":"blocks","created_at":"2025-12-22T12:10:42.46834-08:00","created_by":"daemon"},{"issue_id":"gt-z3qf","depends_on_id":"gt-i4kq","type":"blocks","created_at":"2025-12-22T12:10:42.541384-08:00","created_by":"daemon"},{"issue_id":"gt-z3qf","depends_on_id":"gt-7hor","type":"blocks","created_at":"2025-12-22T12:10:42.613099-08:00","created_by":"daemon"}]}
{"id":"gt-z4g","title":"Plugin: Plan-to-Epic converter","description":"## Purpose\n\nHelp users create beads epics from various planning inputs.\n\n## Inputs\n- Markdown task lists\n- GitHub issues\n- Linear/Jira exports\n- Free-form descriptions\n- Existing beads epics\n\n## Output\n- Beads epic with properly structured children\n- Dependencies set for wave ordering\n- Priorities assigned\n- Ready for `gt spawn --epic \u003cid\u003e`\n\n## Implementation Options\n\n### Option A: CLI Tool\n```bash\ngt plan import --from github --repo owner/repo --label batch-candidate\ngt plan import --from markdown tasks.md\ngt plan structure \u003cepic-id\u003e # analyze and add dependencies\n```\n\n### Option B: Plugin Agent\nA plugin at `\u003crig\u003e/plugins/plan-oracle/` that:\n- Receives planning requests via mail\n- Analyzes scope and requirements\n- Creates structured beads epic\n- Sets dependencies based on analysis\n\n### Option C: Interactive Mode\n```bash\ngt plan create\n# Walks through questions, creates epic interactively\n```\n\n## Axiom\n\nAs stated: 'The Planning phase should end in the creation of a workable Beads plan.'\n\nThis plugin bridges the gap between human planning and machine-executable work.\n\n## Priority\n\nP2 - Nice to have for MVP. Manual epic creation works for now.\n\n## Note\n\nNo \"swarm IDs\" - output is just a beads epic with children. Workers process it independently.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-16T02:10:20.663549-08:00","updated_at":"2025-12-16T17:26:41.087304-08:00"}
{"id":"gt-z94m","title":"load-state","description":"Read handoff bead and get nudge counts.\n\nNeeds: check-refinery","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T01:41:54.505607-08:00","updated_at":"2025-12-23T04:39:39.698069-08:00","closed_at":"2025-12-23T04:39:39.698069-08:00","close_reason":"Parent gt-751s superseded by Christmas Ornament pattern","dependencies":[{"issue_id":"gt-z94m","depends_on_id":"gt-751s","type":"parent-child","created_at":"2025-12-23T01:41:54.542384-08:00","created_by":"stevey"}],"wisp":true}

View File

@@ -3,3 +3,29 @@
See **CLAUDE.md** for complete agent context and instructions.
This file exists for compatibility with tools that look for AGENTS.md.
## Landing the Plane (Session Completion)
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.
**MANDATORY WORKFLOW:**
1. **File issues for remaining work** - Create issues for anything that needs follow-up
2. **Run quality gates** (if code changed) - Tests, linters, builds
3. **Update issue status** - Close finished work, update in-progress items
4. **PUSH TO REMOTE** - This is MANDATORY:
```bash
git pull --rebase
bd sync
git push
git status # MUST show "up to date with origin"
```
5. **Clean up** - Clear stashes, prune remote branches
6. **Verify** - All changes committed AND pushed
7. **Hand off** - Provide context for next session
**CRITICAL RULES:**
- Work is NOT complete until `git push` succeeds
- NEVER stop before pushing - that leaves work stranded locally
- NEVER say "ready to push when you are" - YOU must push
- If push fails, resolve and retry until it succeeds

View File

@@ -85,8 +85,8 @@ func runDoctor(cmd *cobra.Command, args []string) error {
d.Register(doctor.NewPatrolPluginsAccessibleCheck())
d.Register(doctor.NewPatrolRolesHavePromptsCheck())
// Attachment checks
d.Register(doctor.NewStaleAttachmentsCheck())
// NOTE: StaleAttachmentsCheck removed - staleness detection belongs in Deacon molecule
// See gt-gaxo epic for ZFC cleanup rationale
// Config architecture checks
d.Register(doctor.NewSettingsCheck())

View File

@@ -1,190 +0,0 @@
package daemon
import (
"time"
)
// BackoffStrategy defines how intervals grow.
type BackoffStrategy string
const (
// StrategyFixed keeps the same interval (no backoff).
StrategyFixed BackoffStrategy = "fixed"
// StrategyGeometric multiplies by a factor each miss (1.5x).
StrategyGeometric BackoffStrategy = "geometric"
// StrategyExponential doubles interval each miss (2x).
StrategyExponential BackoffStrategy = "exponential"
)
// BackoffConfig holds backoff configuration.
type BackoffConfig struct {
// Strategy determines how intervals grow.
Strategy BackoffStrategy
// BaseInterval is the starting interval (default 60s).
BaseInterval time.Duration
// MaxInterval is the cap on how large intervals can grow (default 10m).
MaxInterval time.Duration
// Factor is the multiplier for geometric backoff (default 1.5).
Factor float64
}
// DefaultBackoffConfig returns sensible defaults.
// Base interval is 5 minutes since deacon rounds may take a while
// (health checks, plugins, syncing clones, complex remediation).
// Max interval is 30 minutes - beyond that, something is likely wrong.
func DefaultBackoffConfig() *BackoffConfig {
return &BackoffConfig{
Strategy: StrategyGeometric,
BaseInterval: 5 * time.Minute,
MaxInterval: 30 * time.Minute,
Factor: 1.5,
}
}
// AgentBackoff tracks backoff state for a single agent.
type AgentBackoff struct {
// AgentID identifies the agent (e.g., "mayor", "gastown-witness").
AgentID string
// BaseInterval is the starting interval.
BaseInterval time.Duration
// CurrentInterval is the current (possibly backed-off) interval.
CurrentInterval time.Duration
// MaxInterval caps how large intervals can grow.
MaxInterval time.Duration
// ConsecutiveMiss counts pokes with no response.
ConsecutiveMiss int
// LastPoke is when we last poked this agent.
LastPoke time.Time
// LastActivity is when the agent last showed activity.
LastActivity time.Time
}
// NewAgentBackoff creates backoff state for an agent.
func NewAgentBackoff(agentID string, config *BackoffConfig) *AgentBackoff {
if config == nil {
config = DefaultBackoffConfig()
}
return &AgentBackoff{
AgentID: agentID,
BaseInterval: config.BaseInterval,
CurrentInterval: config.BaseInterval,
MaxInterval: config.MaxInterval,
}
}
// ShouldPoke returns true if enough time has passed since the last poke.
func (ab *AgentBackoff) ShouldPoke() bool {
if ab.LastPoke.IsZero() {
return true // Never poked
}
return time.Since(ab.LastPoke) >= ab.CurrentInterval
}
// RecordPoke records that we poked the agent.
func (ab *AgentBackoff) RecordPoke() {
ab.LastPoke = time.Now()
}
// RecordMiss records that the agent didn't respond since last poke.
// This increases the backoff interval.
func (ab *AgentBackoff) RecordMiss(config *BackoffConfig) {
ab.ConsecutiveMiss++
if config == nil {
config = DefaultBackoffConfig()
}
switch config.Strategy {
case StrategyFixed:
// No change
case StrategyGeometric:
ab.CurrentInterval = time.Duration(float64(ab.CurrentInterval) * config.Factor)
case StrategyExponential:
ab.CurrentInterval = ab.CurrentInterval * 2
}
// Cap at max interval
if ab.CurrentInterval > ab.MaxInterval {
ab.CurrentInterval = ab.MaxInterval
}
}
// RecordActivity records that the agent showed activity.
// This resets the backoff to the base interval.
func (ab *AgentBackoff) RecordActivity() {
ab.ConsecutiveMiss = 0
ab.CurrentInterval = ab.BaseInterval
ab.LastActivity = time.Now()
}
// BackoffManager tracks backoff state for all agents.
type BackoffManager struct {
config *BackoffConfig
agents map[string]*AgentBackoff
}
// NewBackoffManager creates a new backoff manager.
func NewBackoffManager(config *BackoffConfig) *BackoffManager {
if config == nil {
config = DefaultBackoffConfig()
}
return &BackoffManager{
config: config,
agents: make(map[string]*AgentBackoff),
}
}
// GetOrCreate returns backoff state for an agent, creating if needed.
func (bm *BackoffManager) GetOrCreate(agentID string) *AgentBackoff {
if ab, ok := bm.agents[agentID]; ok {
return ab
}
ab := NewAgentBackoff(agentID, bm.config)
bm.agents[agentID] = ab
return ab
}
// ShouldPoke returns true if we should poke the given agent.
func (bm *BackoffManager) ShouldPoke(agentID string) bool {
return bm.GetOrCreate(agentID).ShouldPoke()
}
// RecordPoke records that we poked an agent.
func (bm *BackoffManager) RecordPoke(agentID string) {
bm.GetOrCreate(agentID).RecordPoke()
}
// RecordMiss records that an agent didn't respond.
func (bm *BackoffManager) RecordMiss(agentID string) {
bm.GetOrCreate(agentID).RecordMiss(bm.config)
}
// RecordActivity records that an agent showed activity.
func (bm *BackoffManager) RecordActivity(agentID string) {
bm.GetOrCreate(agentID).RecordActivity()
}
// GetInterval returns the current interval for an agent.
func (bm *BackoffManager) GetInterval(agentID string) time.Duration {
return bm.GetOrCreate(agentID).CurrentInterval
}
// Stats returns a map of agent ID to current interval for logging.
func (bm *BackoffManager) Stats() map[string]time.Duration {
stats := make(map[string]time.Duration, len(bm.agents))
for id, ab := range bm.agents {
stats[id] = ab.CurrentInterval
}
return stats
}

View File

@@ -1,290 +0,0 @@
package daemon
import (
"testing"
"time"
)
func TestDefaultBackoffConfig(t *testing.T) {
config := DefaultBackoffConfig()
if config.Strategy != StrategyGeometric {
t.Errorf("expected strategy Geometric, got %v", config.Strategy)
}
if config.BaseInterval != 5*time.Minute {
t.Errorf("expected base interval 5m, got %v", config.BaseInterval)
}
if config.MaxInterval != 30*time.Minute {
t.Errorf("expected max interval 30m, got %v", config.MaxInterval)
}
if config.Factor != 1.5 {
t.Errorf("expected factor 1.5, got %v", config.Factor)
}
}
func TestNewAgentBackoff(t *testing.T) {
config := DefaultBackoffConfig()
ab := NewAgentBackoff("test-agent", config)
if ab.AgentID != "test-agent" {
t.Errorf("expected agent ID 'test-agent', got %s", ab.AgentID)
}
if ab.BaseInterval != 5*time.Minute {
t.Errorf("expected base interval 5m, got %v", ab.BaseInterval)
}
if ab.CurrentInterval != 5*time.Minute {
t.Errorf("expected current interval 5m, got %v", ab.CurrentInterval)
}
if ab.ConsecutiveMiss != 0 {
t.Errorf("expected consecutive miss 0, got %d", ab.ConsecutiveMiss)
}
}
func TestAgentBackoff_ShouldPoke(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyGeometric,
BaseInterval: 100 * time.Millisecond, // Short for testing
MaxInterval: 1 * time.Second,
Factor: 1.5,
}
ab := NewAgentBackoff("test", config)
// Should poke immediately (never poked)
if !ab.ShouldPoke() {
t.Error("expected ShouldPoke=true for new agent")
}
// Record a poke
ab.RecordPoke()
// Should not poke immediately after
if ab.ShouldPoke() {
t.Error("expected ShouldPoke=false immediately after poke")
}
// Wait for interval
time.Sleep(110 * time.Millisecond)
// Now should poke again
if !ab.ShouldPoke() {
t.Error("expected ShouldPoke=true after interval elapsed")
}
}
func TestAgentBackoff_GeometricBackoff(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyGeometric,
BaseInterval: 100 * time.Millisecond,
MaxInterval: 1 * time.Second,
Factor: 1.5,
}
ab := NewAgentBackoff("test", config)
// Initial interval
if ab.CurrentInterval != 100*time.Millisecond {
t.Errorf("expected initial interval 100ms, got %v", ab.CurrentInterval)
}
// First miss: 100ms * 1.5 = 150ms
ab.RecordMiss(config)
if ab.CurrentInterval != 150*time.Millisecond {
t.Errorf("expected interval 150ms after 1 miss, got %v", ab.CurrentInterval)
}
if ab.ConsecutiveMiss != 1 {
t.Errorf("expected consecutive miss 1, got %d", ab.ConsecutiveMiss)
}
// Second miss: 150ms * 1.5 = 225ms
ab.RecordMiss(config)
if ab.CurrentInterval != 225*time.Millisecond {
t.Errorf("expected interval 225ms after 2 misses, got %v", ab.CurrentInterval)
}
// Third miss: 225ms * 1.5 = 337.5ms
ab.RecordMiss(config)
expected := time.Duration(337500000) // 337.5ms in nanoseconds
if ab.CurrentInterval != expected {
t.Errorf("expected interval ~337.5ms after 3 misses, got %v", ab.CurrentInterval)
}
}
func TestAgentBackoff_ExponentialBackoff(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyExponential,
BaseInterval: 100 * time.Millisecond,
MaxInterval: 1 * time.Second,
Factor: 2.0, // Ignored for exponential
}
ab := NewAgentBackoff("test", config)
// First miss: 100ms * 2 = 200ms
ab.RecordMiss(config)
if ab.CurrentInterval != 200*time.Millisecond {
t.Errorf("expected interval 200ms after 1 miss, got %v", ab.CurrentInterval)
}
// Second miss: 200ms * 2 = 400ms
ab.RecordMiss(config)
if ab.CurrentInterval != 400*time.Millisecond {
t.Errorf("expected interval 400ms after 2 misses, got %v", ab.CurrentInterval)
}
// Third miss: 400ms * 2 = 800ms
ab.RecordMiss(config)
if ab.CurrentInterval != 800*time.Millisecond {
t.Errorf("expected interval 800ms after 3 misses, got %v", ab.CurrentInterval)
}
}
func TestAgentBackoff_FixedStrategy(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyFixed,
BaseInterval: 100 * time.Millisecond,
MaxInterval: 1 * time.Second,
Factor: 1.5,
}
ab := NewAgentBackoff("test", config)
// Multiple misses should not change interval
ab.RecordMiss(config)
ab.RecordMiss(config)
ab.RecordMiss(config)
if ab.CurrentInterval != 100*time.Millisecond {
t.Errorf("expected interval to stay at 100ms with fixed strategy, got %v", ab.CurrentInterval)
}
if ab.ConsecutiveMiss != 3 {
t.Errorf("expected consecutive miss 3, got %d", ab.ConsecutiveMiss)
}
}
func TestAgentBackoff_MaxInterval(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyExponential,
BaseInterval: 100 * time.Millisecond,
MaxInterval: 500 * time.Millisecond,
Factor: 2.0,
}
ab := NewAgentBackoff("test", config)
// Keep missing until we hit the cap
for i := 0; i < 10; i++ {
ab.RecordMiss(config)
}
if ab.CurrentInterval != 500*time.Millisecond {
t.Errorf("expected interval capped at 500ms, got %v", ab.CurrentInterval)
}
}
func TestAgentBackoff_RecordActivity(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyGeometric,
BaseInterval: 100 * time.Millisecond,
MaxInterval: 1 * time.Second,
Factor: 1.5,
}
ab := NewAgentBackoff("test", config)
// Build up some backoff
ab.RecordMiss(config)
ab.RecordMiss(config)
ab.RecordMiss(config)
if ab.CurrentInterval == 100*time.Millisecond {
t.Error("expected interval to have increased")
}
if ab.ConsecutiveMiss != 3 {
t.Errorf("expected consecutive miss 3, got %d", ab.ConsecutiveMiss)
}
// Record activity - should reset
ab.RecordActivity()
if ab.CurrentInterval != 100*time.Millisecond {
t.Errorf("expected interval reset to 100ms, got %v", ab.CurrentInterval)
}
if ab.ConsecutiveMiss != 0 {
t.Errorf("expected consecutive miss reset to 0, got %d", ab.ConsecutiveMiss)
}
if ab.LastActivity.IsZero() {
t.Error("expected LastActivity to be set")
}
}
func TestBackoffManager_GetOrCreate(t *testing.T) {
bm := NewBackoffManager(DefaultBackoffConfig())
// First call creates
ab1 := bm.GetOrCreate("agent1")
if ab1 == nil {
t.Fatal("expected agent backoff to be created")
}
if ab1.AgentID != "agent1" {
t.Errorf("expected agent ID 'agent1', got %s", ab1.AgentID)
}
// Second call returns same instance
ab2 := bm.GetOrCreate("agent1")
if ab1 != ab2 {
t.Error("expected same instance on second call")
}
// Different agent creates new instance
ab3 := bm.GetOrCreate("agent2")
if ab1 == ab3 {
t.Error("expected different instance for different agent")
}
}
func TestBackoffManager_Stats(t *testing.T) {
config := &BackoffConfig{
Strategy: StrategyGeometric,
BaseInterval: 100 * time.Millisecond,
MaxInterval: 1 * time.Second,
Factor: 1.5,
}
bm := NewBackoffManager(config)
// Create some agents with different backoff states
bm.RecordPoke("agent1")
bm.RecordMiss("agent1")
bm.RecordPoke("agent2")
bm.RecordMiss("agent2")
bm.RecordMiss("agent2")
stats := bm.Stats()
if len(stats) != 2 {
t.Errorf("expected 2 agents in stats, got %d", len(stats))
}
// agent1: 100ms * 1.5 = 150ms
if stats["agent1"] != 150*time.Millisecond {
t.Errorf("expected agent1 interval 150ms, got %v", stats["agent1"])
}
// agent2: 100ms * 1.5 * 1.5 = 225ms
if stats["agent2"] != 225*time.Millisecond {
t.Errorf("expected agent2 interval 225ms, got %v", stats["agent2"])
}
}
func TestExtractRigName(t *testing.T) {
tests := []struct {
session string
expected string
}{
{"gt-gastown-witness", "gastown"},
{"gt-myrig-witness", "myrig"},
{"gt-my-rig-name-witness", "my-rig-name"},
}
for _, tc := range tests {
result := extractRigName(tc.session)
if result != tc.expected {
t.Errorf("extractRigName(%q) = %q, expected %q", tc.session, result, tc.expected)
}
}
}

View File

@@ -2,34 +2,27 @@ package daemon
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
"os/signal"
"path/filepath"
"strconv"
"strings"
"syscall"
"time"
"github.com/steveyegge/gastown/internal/config"
"github.com/steveyegge/gastown/internal/constants"
"github.com/steveyegge/gastown/internal/git"
"github.com/steveyegge/gastown/internal/keepalive"
"github.com/steveyegge/gastown/internal/rig"
"github.com/steveyegge/gastown/internal/tmux"
)
// Daemon is the town-level background service.
// Its only job is to ensure Deacon is running and send periodic heartbeats.
// All health checking, nudging, and decision-making belongs in the Deacon molecule.
type Daemon struct {
config *Config
tmux *tmux.Tmux
logger *log.Logger
ctx context.Context
cancel context.CancelFunc
backoff *BackoffManager
notifications *NotificationManager
lastMOTDIndex int // tracks last MOTD to avoid consecutive repeats
}
@@ -50,18 +43,12 @@ func New(config *Config) (*Daemon, error) {
logger := log.New(logFile, "", log.LstdFlags)
ctx, cancel := context.WithCancel(context.Background())
// Initialize notification manager for slot-based deduplication
notifDir := filepath.Join(daemonDir, "notifications")
notifMaxAge := 5 * time.Minute // Notifications expire after 5 minutes
return &Daemon{
config: config,
tmux: tmux.NewTmux(),
logger: logger,
ctx: ctx,
cancel: cancel,
backoff: NewBackoffManager(DefaultBackoffConfig()),
notifications: NewNotificationManager(notifDir, notifMaxAge),
config: config,
tmux: tmux.NewTmux(),
logger: logger,
ctx: ctx,
cancel: cancel,
}, nil
}
@@ -121,17 +108,15 @@ func (d *Daemon) Run() error {
}
// heartbeat performs one heartbeat cycle.
// The daemon's job is minimal: ensure Deacon is running and send heartbeats.
// All health checking and decision-making belongs in the Deacon molecule.
func (d *Daemon) heartbeat(state *State) {
d.logger.Println("Heartbeat starting")
// 0. Clean up stale notification slots periodically
_ = d.notifications.ClearStaleSlots()
// 1. Ensure Deacon is running (the Deacon is the heartbeat of the system)
// 1. Ensure Deacon is running (process management)
d.ensureDeaconRunning()
// 2. Poke Deacon - the Deacon monitors Mayor and Witnesses
// Note: Deacon self-spawns wisps for patrol cycles (no daemon attachment needed)
// 2. Send heartbeat to Deacon (simple notification, no decision-making)
d.pokeDeacon()
// 3. Process lifecycle requests
@@ -243,10 +228,9 @@ func (d *Daemon) ensureDeaconRunning() {
}
// pokeDeacon sends a heartbeat message to the Deacon session.
// The Deacon is responsible for monitoring Mayor and Witnesses.
// Simple notification - no staleness checking or backoff logic.
// The Deacon molecule decides what to do with heartbeats.
func (d *Daemon) pokeDeacon() {
const agentID = "deacon"
running, err := d.tmux.HasSession(DeaconSessionName)
if err != nil {
d.logger.Printf("Error checking Deacon session: %v", err)
@@ -258,49 +242,6 @@ func (d *Daemon) pokeDeacon() {
return
}
// Check deacon heartbeat to see if it's active
deaconHeartbeatFile := filepath.Join(d.config.TownRoot, "deacon", "heartbeat.json")
var isFresh, isStale, isVeryStale bool
data, err := os.ReadFile(deaconHeartbeatFile)
if err == nil {
var hb struct {
Timestamp time.Time `json:"timestamp"`
}
if json.Unmarshal(data, &hb) == nil {
age := time.Since(hb.Timestamp)
isFresh = age < 2*time.Minute
isStale = age >= 2*time.Minute && age < 5*time.Minute
isVeryStale = age >= 5*time.Minute
} else {
isVeryStale = true
}
} else {
isVeryStale = true // No heartbeat file
}
if isFresh {
// Deacon is actively working, reset backoff and mark notifications consumed
d.backoff.RecordActivity(agentID)
_ = d.notifications.MarkConsumed(DeaconSessionName, SlotHeartbeat)
d.logger.Println("Deacon is fresh, skipping poke")
return
}
// Check if we should poke based on backoff interval
if !d.backoff.ShouldPoke(agentID) {
interval := d.backoff.GetInterval(agentID)
d.logger.Printf("Deacon backoff in effect (interval: %v), skipping poke", interval)
return
}
// Check if we should send (slot-based deduplication)
shouldSend, _ := d.notifications.ShouldSend(DeaconSessionName, SlotHeartbeat)
if !shouldSend {
d.logger.Println("Heartbeat already pending for Deacon, skipping")
return
}
// Send heartbeat message with rotating MOTD
motd := d.nextMOTD()
msg := fmt.Sprintf("HEARTBEAT: %s", motd)
@@ -309,253 +250,12 @@ func (d *Daemon) pokeDeacon() {
return
}
// Record the send for slot deduplication
_ = d.notifications.RecordSend(DeaconSessionName, SlotHeartbeat, msg)
d.backoff.RecordPoke(agentID)
// Adjust backoff based on staleness
if isVeryStale {
d.backoff.RecordMiss(agentID)
interval := d.backoff.GetInterval(agentID)
d.logger.Printf("Poked Deacon (very stale, backoff now: %v)", interval)
} else if isStale {
d.logger.Println("Poked Deacon (stale)")
} else {
d.logger.Println("Poked Deacon")
}
d.logger.Println("Poked Deacon")
}
// pokeMayor sends a heartbeat to the Mayor session.
func (d *Daemon) pokeMayor() {
mayorSession := constants.SessionMayor
agentID := constants.RoleMayor
running, err := d.tmux.HasSession(mayorSession)
if err != nil {
d.logger.Printf("Error checking Mayor session: %v", err)
return
}
if !running {
d.logger.Println("Mayor session not running, skipping poke")
return
}
// Check keepalive to see if agent is active
state := keepalive.Read(d.config.TownRoot)
if state != nil && state.IsFresh() {
// Agent is actively working, reset backoff and mark notifications consumed
d.backoff.RecordActivity(agentID)
_ = d.notifications.MarkConsumed(mayorSession, SlotHeartbeat)
d.logger.Printf("Mayor is fresh (cmd: %s), skipping poke", state.LastCommand)
return
}
// Check if we should poke based on backoff interval
if !d.backoff.ShouldPoke(agentID) {
interval := d.backoff.GetInterval(agentID)
d.logger.Printf("Mayor backoff in effect (interval: %v), skipping poke", interval)
return
}
// Check if we should send (slot-based deduplication)
shouldSend, _ := d.notifications.ShouldSend(mayorSession, SlotHeartbeat)
if !shouldSend {
d.logger.Println("Heartbeat already pending for Mayor, skipping")
return
}
// Send heartbeat message via tmux, replacing any pending input
msg := "HEARTBEAT: check your rigs"
if err := d.tmux.SendKeysReplace(mayorSession, msg, 50); err != nil {
d.logger.Printf("Error poking Mayor: %v", err)
return
}
// Record the send for slot deduplication
_ = d.notifications.RecordSend(mayorSession, SlotHeartbeat, msg)
d.backoff.RecordPoke(agentID)
// If agent is stale or very stale, record a miss (increase backoff)
if state == nil || state.IsVeryStale() {
d.backoff.RecordMiss(agentID)
interval := d.backoff.GetInterval(agentID)
d.logger.Printf("Poked Mayor (very stale, backoff now: %v)", interval)
} else if state.IsStale() {
// Stale but not very stale - don't increase backoff, but don't reset either
d.logger.Println("Poked Mayor (stale)")
} else {
d.logger.Println("Poked Mayor")
}
}
// pokeWitnesses sends heartbeats to all Witness sessions.
// Uses proper rig discovery from rigs.json instead of scanning tmux sessions.
func (d *Daemon) pokeWitnesses() {
// Discover rigs from configuration
rigs := d.discoverRigs()
if len(rigs) == 0 {
d.logger.Println("No rigs discovered")
return
}
for _, r := range rigs {
session := fmt.Sprintf("gt-%s-witness", r.Name)
// Check if witness session exists
running, err := d.tmux.HasSession(session)
if err != nil {
d.logger.Printf("Error checking witness session for rig %s: %v", r.Name, err)
continue
}
if !running {
// Rig exists but no witness session - log for visibility
d.logger.Printf("Rig %s has no witness session (may need: gt witness start %s)", r.Name, r.Name)
continue
}
d.pokeWitness(session)
}
}
// discoverRigs finds all registered rigs using the rig manager.
// Falls back to directory scanning if rigs.json is not available.
func (d *Daemon) discoverRigs() []*rig.Rig {
// Load rigs config from mayor/rigs.json
rigsConfigPath := constants.MayorRigsPath(d.config.TownRoot)
rigsConfig, err := config.LoadRigsConfig(rigsConfigPath)
if err != nil {
// Try fallback: scan town directory for rig directories
return d.discoverRigsFromDirectory()
}
// Use rig manager for proper discovery
g := git.NewGit(d.config.TownRoot)
mgr := rig.NewManager(d.config.TownRoot, rigsConfig, g)
rigs, err := mgr.DiscoverRigs()
if err != nil {
d.logger.Printf("Error discovering rigs from config: %v", err)
return d.discoverRigsFromDirectory()
}
return rigs
}
// discoverRigsFromDirectory scans the town directory for rig directories.
// A directory is considered a rig if it has a .beads subdirectory or config.json.
func (d *Daemon) discoverRigsFromDirectory() []*rig.Rig {
entries, err := os.ReadDir(d.config.TownRoot)
if err != nil {
d.logger.Printf("Error reading town directory: %v", err)
return nil
}
var rigs []*rig.Rig
for _, entry := range entries {
if !entry.IsDir() {
continue
}
name := entry.Name()
// Skip known non-rig directories
if name == "mayor" || name == "daemon" || name == ".git" || name[0] == '.' {
continue
}
dirPath := filepath.Join(d.config.TownRoot, name)
// Check for .beads directory (indicates a rig)
beadsPath := filepath.Join(dirPath, ".beads")
if _, err := os.Stat(beadsPath); err == nil {
rigs = append(rigs, &rig.Rig{Name: name, Path: dirPath})
continue
}
// Check for config.json with type: rig
configPath := filepath.Join(dirPath, "config.json")
if _, err := os.Stat(configPath); err == nil {
// For simplicity, assume any directory with config.json is a rig
rigs = append(rigs, &rig.Rig{Name: name, Path: dirPath})
}
}
return rigs
}
// pokeWitness sends a heartbeat to a single witness session with backoff.
func (d *Daemon) pokeWitness(session string) {
// Extract rig name from session (gt-<rig>-witness -> <rig>)
rigName := extractRigName(session)
agentID := session // Use session name as agent ID
// Find the rig's workspace for keepalive check
rigWorkspace := filepath.Join(d.config.TownRoot, "gastown", rigName)
// Check keepalive to see if the witness is active
state := keepalive.Read(rigWorkspace)
if state != nil && state.IsFresh() {
// Witness is actively working, reset backoff and mark notifications consumed
d.backoff.RecordActivity(agentID)
_ = d.notifications.MarkConsumed(session, SlotHeartbeat)
d.logger.Printf("Witness %s is fresh (cmd: %s), skipping poke", session, state.LastCommand)
return
}
// Check if we should poke based on backoff interval
if !d.backoff.ShouldPoke(agentID) {
interval := d.backoff.GetInterval(agentID)
d.logger.Printf("Witness %s backoff in effect (interval: %v), skipping poke", session, interval)
return
}
// Check if we should send (slot-based deduplication)
shouldSend, _ := d.notifications.ShouldSend(session, SlotHeartbeat)
if !shouldSend {
d.logger.Printf("Heartbeat already pending for Witness %s, skipping", session)
return
}
// Send heartbeat message, replacing any pending input
msg := "HEARTBEAT: check your workers"
if err := d.tmux.SendKeysReplace(session, msg, 50); err != nil {
d.logger.Printf("Error poking Witness %s: %v", session, err)
return
}
// Record the send for slot deduplication
_ = d.notifications.RecordSend(session, SlotHeartbeat, msg)
d.backoff.RecordPoke(agentID)
// If agent is stale or very stale, record a miss (increase backoff)
if state == nil || state.IsVeryStale() {
d.backoff.RecordMiss(agentID)
interval := d.backoff.GetInterval(agentID)
d.logger.Printf("Poked Witness %s (very stale, backoff now: %v)", session, interval)
} else if state.IsStale() {
d.logger.Printf("Poked Witness %s (stale)", session)
} else {
d.logger.Printf("Poked Witness %s", session)
}
}
// extractRigName extracts the rig name from a witness session name.
// "gt-gastown-witness" -> "gastown"
func extractRigName(session string) string {
// Remove "gt-" prefix and "-witness" suffix
name := strings.TrimPrefix(session, "gt-")
name = strings.TrimSuffix(name, "-witness")
return name
}
// isWitnessSession checks if a session name is a witness session.
func isWitnessSession(name string) bool {
// Pattern: gt-<rig>-witness
if len(name) < 12 { // "gt-x-witness" minimum
return false
}
return name[:3] == "gt-" && name[len(name)-8:] == "-witness"
}
// NOTE: pokeMayor, pokeWitnesses, and pokeWitness have been removed.
// The Deacon molecule is responsible for monitoring Mayor and Witnesses.
// The daemon only ensures Deacon is running and sends it heartbeats.
// processLifecycleRequests checks for and processes lifecycle requests.
func (d *Daemon) processLifecycleRequests() {

View File

@@ -206,32 +206,8 @@ func TestSaveLoadState_Roundtrip(t *testing.T) {
}
}
func TestIsWitnessSession(t *testing.T) {
tests := []struct {
name string
expected bool
}{
{"gt-gastown-witness", true},
{"gt-myrig-witness", true},
{"gt-my-rig-name-witness", true},
{"gt-a-witness", true}, // minimum valid
{"gt-witness", false}, // no rig name
{"gastown-witness", false}, // missing gt- prefix
{"gt-gastown", false}, // missing -witness suffix
{"gt-mayor", false}, // not a witness
{"random-session", false},
{"", false},
{"gt-", false},
{"witness", false},
}
for _, tc := range tests {
result := isWitnessSession(tc.name)
if result != tc.expected {
t.Errorf("isWitnessSession(%q) = %v, expected %v", tc.name, result, tc.expected)
}
}
}
// NOTE: TestIsWitnessSession removed - isWitnessSession function was deleted
// as part of ZFC cleanup (gt-gaxo). Witness poking is now Deacon's responsibility.
func TestLifecycleAction_Constants(t *testing.T) {
// Verify constants have expected string values

View File

@@ -70,46 +70,55 @@ func (d *Daemon) ProcessLifecycleRequests() {
}
}
// parseLifecycleRequest extracts a lifecycle request from a message.
func (d *Daemon) parseLifecycleRequest(msg *BeadsMessage) *LifecycleRequest {
// Look for lifecycle keywords in subject
// Expected format: "LIFECYCLE: <role> requesting <action>"
subject := strings.ToLower(msg.Subject)
// LifecycleBody is the structured body format for lifecycle requests.
// Claude should send mail with JSON body: {"action": "cycle"} or {"action": "shutdown"}
type LifecycleBody struct {
Action string `json:"action"`
}
// parseLifecycleRequest extracts a lifecycle request from a message.
// Uses structured body parsing instead of keyword matching on subject.
func (d *Daemon) parseLifecycleRequest(msg *BeadsMessage) *LifecycleRequest {
// Gate: subject must start with "LIFECYCLE:"
subject := strings.ToLower(msg.Subject)
if !strings.HasPrefix(subject, "lifecycle:") {
return nil
}
var action LifecycleAction
var from string
// Parse structured body for action
var body LifecycleBody
if err := json.Unmarshal([]byte(msg.Body), &body); err != nil {
// Fallback: check for simple action strings in body
bodyLower := strings.ToLower(strings.TrimSpace(msg.Body))
switch {
case bodyLower == "restart" || bodyLower == "action: restart":
body.Action = "restart"
case bodyLower == "shutdown" || bodyLower == "action: shutdown" || bodyLower == "stop":
body.Action = "shutdown"
case bodyLower == "cycle" || bodyLower == "action: cycle":
body.Action = "cycle"
default:
d.logger.Printf("Lifecycle request with unparseable body: %q", msg.Body)
return nil
}
}
// Check restart/shutdown before cycle.
// Note: Can't use Contains(subject, "cycle") because "lifecycle:" contains "cycle".
// Use " cycle" (with leading space) to match the word, not the prefix.
if strings.Contains(subject, "restart") {
// Map action string to enum
var action LifecycleAction
switch strings.ToLower(body.Action) {
case "restart":
action = ActionRestart
} else if strings.Contains(subject, "shutdown") || strings.Contains(subject, "stop") {
case "shutdown", "stop":
action = ActionShutdown
} else if strings.Contains(subject, " cycle") || strings.Contains(subject, "cycling") {
case "cycle":
action = ActionCycle
} else {
default:
d.logger.Printf("Unknown lifecycle action: %q", body.Action)
return nil
}
// Extract role from subject: "LIFECYCLE: <role> requesting ..."
// Parse between "lifecycle: " and " requesting"
parts := strings.Split(subject, " requesting")
if len(parts) >= 1 {
rolePart := strings.TrimPrefix(parts[0], "lifecycle:")
from = strings.TrimSpace(rolePart)
}
if from == "" {
from = msg.From // fallback
}
return &LifecycleRequest{
From: from,
From: msg.From,
Action: action,
Timestamp: time.Now(),
}

View File

@@ -1,14 +1,16 @@
package daemon
import (
"io"
"log"
"testing"
)
// testDaemon creates a minimal Daemon for testing.
// We only need the struct to call methods on it.
func testDaemon() *Daemon {
return &Daemon{
config: &Config{TownRoot: "/tmp/test"},
logger: log.New(io.Discard, "", 0), // silent logger for tests
}
}
@@ -16,59 +18,62 @@ func TestParseLifecycleRequest_Cycle(t *testing.T) {
d := testDaemon()
tests := []struct {
title string
subject string
body string
expected LifecycleAction
}{
// Explicit cycle requests
{"LIFECYCLE: mayor requesting cycle", ActionCycle},
{"lifecycle: gastown-witness requesting cycling", ActionCycle},
{"LIFECYCLE: witness requesting cycle now", ActionCycle},
// JSON body format
{"LIFECYCLE: requesting action", `{"action": "cycle"}`, ActionCycle},
// Simple text body format
{"LIFECYCLE: requesting action", "cycle", ActionCycle},
{"lifecycle: action request", "action: cycle", ActionCycle},
}
for _, tc := range tests {
msg := &BeadsMessage{
Subject: tc.title,
Subject: tc.subject,
Body: tc.body,
From: "test-sender",
}
result := d.parseLifecycleRequest(msg)
if result == nil {
t.Errorf("parseLifecycleRequest(%q) returned nil, expected action %s", tc.title, tc.expected)
t.Errorf("parseLifecycleRequest(subject=%q, body=%q) returned nil, expected action %s", tc.subject, tc.body, tc.expected)
continue
}
if result.Action != tc.expected {
t.Errorf("parseLifecycleRequest(%q) action = %s, expected %s", tc.title, result.Action, tc.expected)
t.Errorf("parseLifecycleRequest(subject=%q, body=%q) action = %s, expected %s", tc.subject, tc.body, result.Action, tc.expected)
}
}
}
func TestParseLifecycleRequest_RestartAndShutdown(t *testing.T) {
// Verify that restart and shutdown are correctly parsed.
// Previously, the "lifecycle:" prefix contained "cycle", which caused
// all messages to match as cycle. Fixed by checking restart/shutdown
// before cycle, and using " cycle" (with space) to avoid prefix match.
// Verify that restart and shutdown are correctly parsed using structured body.
d := testDaemon()
tests := []struct {
title string
subject string
body string
expected LifecycleAction
}{
{"LIFECYCLE: mayor requesting restart", ActionRestart},
{"LIFECYCLE: mayor requesting shutdown", ActionShutdown},
{"lifecycle: witness requesting stop", ActionShutdown},
{"LIFECYCLE: action", `{"action": "restart"}`, ActionRestart},
{"LIFECYCLE: action", `{"action": "shutdown"}`, ActionShutdown},
{"lifecycle: action", "stop", ActionShutdown},
{"LIFECYCLE: action", "restart", ActionRestart},
}
for _, tc := range tests {
msg := &BeadsMessage{
Subject: tc.title,
Subject: tc.subject,
Body: tc.body,
From: "test-sender",
}
result := d.parseLifecycleRequest(msg)
if result == nil {
t.Errorf("parseLifecycleRequest(%q) returned nil", tc.title)
t.Errorf("parseLifecycleRequest(subject=%q, body=%q) returned nil", tc.subject, tc.body)
continue
}
if result.Action != tc.expected {
t.Errorf("parseLifecycleRequest(%q) action = %s, expected %s", tc.title, result.Action, tc.expected)
t.Errorf("parseLifecycleRequest(subject=%q, body=%q) action = %s, expected %s", tc.subject, tc.body, result.Action, tc.expected)
}
}
}
@@ -96,52 +101,53 @@ func TestParseLifecycleRequest_NotLifecycle(t *testing.T) {
}
}
func TestParseLifecycleRequest_ExtractsFrom(t *testing.T) {
func TestParseLifecycleRequest_UsesFromField(t *testing.T) {
d := testDaemon()
// Now that we use structured body, the From field comes directly from the message
tests := []struct {
title string
subject string
body string
sender string
expectedFrom string
}{
{"LIFECYCLE: mayor requesting cycle", "fallback", "mayor"},
{"LIFECYCLE: gastown-witness requesting restart", "fallback", "gastown-witness"},
{"lifecycle: my-rig-witness requesting shutdown", "fallback", "my-rig-witness"},
{"LIFECYCLE: action", `{"action": "cycle"}`, "mayor", "mayor"},
{"LIFECYCLE: action", "restart", "gastown-witness", "gastown-witness"},
{"lifecycle: action", "shutdown", "my-rig-refinery", "my-rig-refinery"},
}
for _, tc := range tests {
msg := &BeadsMessage{
Subject: tc.title,
Subject: tc.subject,
Body: tc.body,
From: tc.sender,
}
result := d.parseLifecycleRequest(msg)
if result == nil {
t.Errorf("parseLifecycleRequest(%q) returned nil", tc.title)
t.Errorf("parseLifecycleRequest(body=%q) returned nil", tc.body)
continue
}
if result.From != tc.expectedFrom {
t.Errorf("parseLifecycleRequest(%q) from = %q, expected %q", tc.title, result.From, tc.expectedFrom)
t.Errorf("parseLifecycleRequest() from = %q, expected %q", result.From, tc.expectedFrom)
}
}
}
func TestParseLifecycleRequest_FallsBackToSender(t *testing.T) {
func TestParseLifecycleRequest_AlwaysUsesFromField(t *testing.T) {
d := testDaemon()
// When the title doesn't contain a parseable "from", use sender
// With structured body parsing, From always comes from message From field
msg := &BeadsMessage{
Subject: "LIFECYCLE: requesting cycle", // no role before "requesting"
From: "fallback-sender",
Subject: "LIFECYCLE: action",
Body: "cycle",
From: "the-sender",
}
result := d.parseLifecycleRequest(msg)
if result == nil {
t.Fatal("expected non-nil result")
}
// The "from" should be empty string from title parsing, then fallback to sender
if result.From != "fallback-sender" && result.From != "" {
// Note: the actual behavior may just be empty string if parsing gives nothing
// Let's check what actually happens
t.Logf("parseLifecycleRequest fallback: from=%q", result.From)
if result.From != "the-sender" {
t.Errorf("parseLifecycleRequest() from = %q, expected 'the-sender'", result.From)
}
}

View File

@@ -1,283 +0,0 @@
package doctor
import (
"fmt"
"path/filepath"
"time"
"github.com/steveyegge/gastown/internal/beads"
)
// DefaultStaleThreshold is the default time after which an attachment is considered stale.
// Attachments with no molecule activity in this duration may indicate stuck work.
const DefaultStaleThreshold = 1 * time.Hour
// StaleAttachmentsCheck detects attached molecules that haven't been updated in too long.
// This may indicate stuck work - a polecat that crashed or got stuck during processing.
type StaleAttachmentsCheck struct {
BaseCheck
Threshold time.Duration // Configurable staleness threshold
}
// NewStaleAttachmentsCheck creates a new stale attachments check with the default threshold.
func NewStaleAttachmentsCheck() *StaleAttachmentsCheck {
return NewStaleAttachmentsCheckWithThreshold(DefaultStaleThreshold)
}
// NewStaleAttachmentsCheckWithThreshold creates a new stale attachments check with a custom threshold.
func NewStaleAttachmentsCheckWithThreshold(threshold time.Duration) *StaleAttachmentsCheck {
return &StaleAttachmentsCheck{
BaseCheck: BaseCheck{
CheckName: "stale-attachments",
CheckDescription: "Check for attached molecules that haven't been updated in too long",
},
Threshold: threshold,
}
}
// StaleAttachment represents a single stale attachment finding.
type StaleAttachment struct {
Rig string
PinnedBeadID string
PinnedTitle string
Assignee string
MoleculeID string
MoleculeTitle string
LastUpdated time.Time
StaleDuration time.Duration
}
// Run checks for stale attachments across all rigs.
func (c *StaleAttachmentsCheck) Run(ctx *CheckContext) *CheckResult {
// If a specific rig is specified, only check that one
var rigsToCheck []string
if ctx.RigName != "" {
rigsToCheck = []string{ctx.RigName}
} else {
// Discover all rigs
rigs, err := discoverRigs(ctx.TownRoot)
if err != nil {
return &CheckResult{
Name: c.Name(),
Status: StatusError,
Message: "Failed to discover rigs",
Details: []string{err.Error()},
}
}
rigsToCheck = rigs
}
if len(rigsToCheck) == 0 {
return &CheckResult{
Name: c.Name(),
Status: StatusOK,
Message: "No rigs configured",
}
}
// Find stale attachments across all rigs
var staleAttachments []StaleAttachment
var checkedCount int
cutoff := time.Now().Add(-c.Threshold)
for _, rigName := range rigsToCheck {
stale, checked, err := c.checkRig(ctx.TownRoot, rigName, cutoff)
if err != nil {
// Log but continue with other rigs
continue
}
staleAttachments = append(staleAttachments, stale...)
checkedCount += checked
}
// Also check town-level beads for pinned attachments
townStale, townChecked, err := c.checkBeadsDir(ctx.TownRoot, filepath.Join(ctx.TownRoot, ".beads"), cutoff)
if err == nil {
staleAttachments = append(staleAttachments, townStale...)
checkedCount += townChecked
}
if len(staleAttachments) > 0 {
details := make([]string, 0, len(staleAttachments))
for _, sa := range staleAttachments {
location := sa.Rig
if location == "" {
location = "town"
}
assigneeInfo := ""
if sa.Assignee != "" {
assigneeInfo = fmt.Sprintf(" (assignee: %s)", sa.Assignee)
}
details = append(details, fmt.Sprintf("%s: %s → %s%s (stale for %s)",
location, sa.PinnedTitle, sa.MoleculeTitle, assigneeInfo, formatDuration(sa.StaleDuration)))
}
return &CheckResult{
Name: c.Name(),
Status: StatusWarning,
Message: fmt.Sprintf("%d stale attachment(s) found (no activity for >%s)", len(staleAttachments), formatDuration(c.Threshold)),
Details: details,
FixHint: "Check if polecats are stuck or crashed. Use 'gt witness nudge <polecat>' or 'gt polecat kill <name>' if needed",
}
}
if checkedCount == 0 {
return &CheckResult{
Name: c.Name(),
Status: StatusOK,
Message: "No attachments to check",
}
}
return &CheckResult{
Name: c.Name(),
Status: StatusOK,
Message: fmt.Sprintf("Checked %d attachment(s), none stale", checkedCount),
}
}
// checkRig checks a single rig for stale attachments.
func (c *StaleAttachmentsCheck) checkRig(townRoot, rigName string, cutoff time.Time) ([]StaleAttachment, int, error) {
// Check rig-level beads and polecats
rigPath := filepath.Join(townRoot, rigName)
// Each polecat has its own beads directory
polecatsDir := filepath.Join(rigPath, "polecats")
polecatDirs, err := filepath.Glob(filepath.Join(polecatsDir, "*", ".beads"))
if err != nil {
return nil, 0, err
}
var allStale []StaleAttachment
var totalChecked int
for _, beadsPath := range polecatDirs {
// Extract polecat name from path
polecatPath := filepath.Dir(beadsPath)
polecatName := filepath.Base(polecatPath)
stale, checked, err := c.checkBeadsDirWithContext(rigPath, beadsPath, cutoff, rigName, polecatName)
if err != nil {
continue
}
allStale = append(allStale, stale...)
totalChecked += checked
}
// Also check rig-level beads (crew workers, etc.)
crewDir := filepath.Join(rigPath, "crew")
crewDirs, err := filepath.Glob(filepath.Join(crewDir, "*", ".beads"))
if err == nil {
for _, beadsPath := range crewDirs {
workerPath := filepath.Dir(beadsPath)
workerName := filepath.Base(workerPath)
stale, checked, err := c.checkBeadsDirWithContext(rigPath, beadsPath, cutoff, rigName, "crew/"+workerName)
if err != nil {
continue
}
allStale = append(allStale, stale...)
totalChecked += checked
}
}
return allStale, totalChecked, nil
}
// checkBeadsDir checks a beads directory for stale attachments.
func (c *StaleAttachmentsCheck) checkBeadsDir(townRoot, beadsDir string, cutoff time.Time) ([]StaleAttachment, int, error) {
return c.checkBeadsDirWithContext(townRoot, beadsDir, cutoff, "", "")
}
// checkBeadsDirWithContext checks a beads directory for stale attachments with rig context.
func (c *StaleAttachmentsCheck) checkBeadsDirWithContext(workDir, beadsDir string, cutoff time.Time, rigName, workerName string) ([]StaleAttachment, int, error) {
// Create beads client for the directory containing .beads
parentDir := filepath.Dir(beadsDir)
bd := beads.New(parentDir)
// List all pinned beads (attachments are stored on pinned beads)
pinnedIssues, err := bd.List(beads.ListOptions{
Status: beads.StatusPinned,
Priority: -1, // No filter
})
if err != nil {
return nil, 0, err
}
var staleAttachments []StaleAttachment
var checked int
for _, pinned := range pinnedIssues {
// Parse attachment fields
attachment := beads.ParseAttachmentFields(pinned)
if attachment == nil || attachment.AttachedMolecule == "" {
continue // No attachment
}
checked++
// Fetch the attached molecule to check its updated_at timestamp
mol, err := bd.Show(attachment.AttachedMolecule)
if err != nil {
// Molecule might have been deleted or is inaccessible
// This itself could be a problem worth reporting
staleAttachments = append(staleAttachments, StaleAttachment{
Rig: rigName,
PinnedBeadID: pinned.ID,
PinnedTitle: pinned.Title,
Assignee: pinned.Assignee,
MoleculeID: attachment.AttachedMolecule,
MoleculeTitle: "(molecule not found)",
LastUpdated: time.Time{},
StaleDuration: time.Since(cutoff) + c.Threshold, // Report as stale
})
continue
}
// Parse the molecule's updated_at timestamp
updatedAt, err := parseTimestamp(mol.UpdatedAt)
if err != nil {
continue // Skip if we can't parse the timestamp
}
// Check if the molecule is stale (hasn't been updated since cutoff)
// Only check molecules that are still in progress
if mol.Status == "in_progress" && updatedAt.Before(cutoff) {
staleAttachments = append(staleAttachments, StaleAttachment{
Rig: rigName,
PinnedBeadID: pinned.ID,
PinnedTitle: pinned.Title,
Assignee: pinned.Assignee,
MoleculeID: mol.ID,
MoleculeTitle: mol.Title,
LastUpdated: updatedAt,
StaleDuration: time.Since(updatedAt),
})
}
}
return staleAttachments, checked, nil
}
// parseTimestamp parses an ISO 8601 timestamp string.
func parseTimestamp(ts string) (time.Time, error) {
// Try RFC3339 first (most common)
t, err := time.Parse(time.RFC3339, ts)
if err == nil {
return t, nil
}
// Try without timezone
t, err = time.Parse("2006-01-02T15:04:05", ts)
if err == nil {
return t, nil
}
// Try date only
t, err = time.Parse("2006-01-02", ts)
if err == nil {
return t, nil
}
return time.Time{}, fmt.Errorf("unable to parse timestamp: %s", ts)
}

View File

@@ -86,26 +86,7 @@ func Read(workspaceRoot string) *State {
// Returns a very large duration if the state is nil.
func (s *State) Age() time.Duration {
if s == nil {
return 24 * time.Hour * 365 // Very stale
return 24 * time.Hour * 365 // No keepalive
}
return time.Since(s.Timestamp)
}
// IsFresh returns true if the keepalive is less than 2 minutes old.
func (s *State) IsFresh() bool {
return s != nil && s.Age() < 2*time.Minute
}
// IsStale returns true if the keepalive is 2-5 minutes old.
func (s *State) IsStale() bool {
if s == nil {
return false
}
age := s.Age()
return age >= 2*time.Minute && age < 5*time.Minute
}
// IsVeryStale returns true if the keepalive is more than 5 minutes old.
func (s *State) IsVeryStale() bool {
return s == nil || s.Age() >= 5*time.Minute
}

View File

@@ -39,47 +39,29 @@ func TestReadNonExistent(t *testing.T) {
}
func TestStateAge(t *testing.T) {
// Test nil state
// Test nil state returns very large age
var nilState *State
if nilState.Age() < 24*time.Hour {
t.Error("nil state should have very large age")
}
// Test fresh state
// Test fresh state returns accurate age
freshState := &State{Timestamp: time.Now().Add(-30 * time.Second)}
if !freshState.IsFresh() {
t.Error("30-second-old state should be fresh")
}
if freshState.IsStale() {
t.Error("30-second-old state should not be stale")
}
if freshState.IsVeryStale() {
t.Error("30-second-old state should not be very stale")
age := freshState.Age()
if age < 29*time.Second || age > 31*time.Second {
t.Errorf("expected ~30s age, got %v", age)
}
// Test stale state (3 minutes)
staleState := &State{Timestamp: time.Now().Add(-3 * time.Minute)}
if staleState.IsFresh() {
t.Error("3-minute-old state should not be fresh")
}
if !staleState.IsStale() {
t.Error("3-minute-old state should be stale")
}
if staleState.IsVeryStale() {
t.Error("3-minute-old state should not be very stale")
// Test older state returns accurate age
olderState := &State{Timestamp: time.Now().Add(-5 * time.Minute)}
age = olderState.Age()
if age < 4*time.Minute+55*time.Second || age > 5*time.Minute+5*time.Second {
t.Errorf("expected ~5m age, got %v", age)
}
// Test very stale state (10 minutes)
veryStaleState := &State{Timestamp: time.Now().Add(-10 * time.Minute)}
if veryStaleState.IsFresh() {
t.Error("10-minute-old state should not be fresh")
}
if veryStaleState.IsStale() {
t.Error("10-minute-old state should not be stale (it's very stale)")
}
if !veryStaleState.IsVeryStale() {
t.Error("10-minute-old state should be very stale")
}
// NOTE: IsFresh(), IsStale(), IsVeryStale() were removed as part of ZFC cleanup.
// Staleness classification belongs in Deacon molecule, not Go code.
// See gt-gaxo epic for rationale.
}
func TestDirectoryCreation(t *testing.T) {