Implement Witness handoff bead state persistence (gt-caih)
Add persistent state storage for Witness across wisp burns: - Add WorkerState and WitnessHandoffState types - Implement loadHandoffState/saveHandoffState for bead persistence - Update getNudgeCount/recordNudge to use persistent state - Add activity tracking integration into healthCheck 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -414,7 +414,7 @@
|
||||
{"id":"gt-iua8","title":"Merge: gt-frs","description":"branch: polecat/Slit\ntarget: main\nsource_issue: gt-frs\nrig: gastown","status":"closed","priority":2,"issue_type":"merge-request","created_at":"2025-12-19T16:30:05.529099-08:00","updated_at":"2025-12-19T18:26:14.104887-08:00","closed_at":"2025-12-19T17:48:44.654109-08:00"}
|
||||
{"id":"gt-j4nu","title":"Merge: gt-g44u.3","description":"branch: polecat/Ace\ntarget: main\nsource_issue: gt-g44u.3\nrig: gastown","status":"closed","priority":0,"issue_type":"merge-request","created_at":"2025-12-19T16:14:52.767156-08:00","updated_at":"2025-12-19T17:35:36.663796-08:00","closed_at":"2025-12-19T17:35:36.663796-08:00"}
|
||||
{"id":"gt-j5tk","title":"Work assignment messages should auto-close on completion","description":"When a polecat completes work on an issue, the work assignment message (msg-type:task) stays open. Found 7 stale work assignments in gastown after swarm completed.\n\nProposal: When bd close is called on an issue, auto-close any work assignment messages that reference that issue in their body.\n\nAlternative: Work assignment messages could use a different lifecycle - perhaps they should be acked (closed) when the polecat starts working, not when they finish.","status":"open","priority":2,"issue_type":"feature","created_at":"2025-12-20T03:12:28.403974-08:00","updated_at":"2025-12-20T03:12:28.403974-08:00"}
|
||||
{"id":"gt-j6s8","title":"Refinery startup: bond mol-refinery-patrol on start","description":"Wire up Refinery to automatically bond its patrol molecule on startup.\n\n## Current state\n- mol-refinery-patrol exists in builtin_molecules.go\n- prompts/roles/refinery.md describes the protocol\n- Refinery doesn't auto-bond on startup\n\n## Desired behavior\nOn Refinery session start:\n1. gt prime detects RoleRefinery\n2. Check for existing in-progress patrol: bd list --status=in_progress --assignee=refinery\n3. If found: resume from current step\n4. If not found: bd mol bond mol-refinery-patrol --wisp\n5. Output patrol context to agent\n\n## Implementation options\nA) Add to gt prime (outputRefineryPatrolContext)\nB) Add startup hook in refinery CLAUDE.md\nC) Both (prime detects, template reinforces)\n\n## Testing\n- Start refinery session\n- Verify patrol bonds automatically\n- Kill mid-patrol, restart, verify resumes\n\n## Depends on\n- gt-3x0z.10 (existing issue for Refinery patrol)","status":"in_progress","priority":1,"issue_type":"task","assignee":"gastown/dementus","created_at":"2025-12-22T16:43:34.739741-08:00","updated_at":"2025-12-23T00:02:35.269457-08:00"}
|
||||
{"id":"gt-j6s8","title":"Refinery startup: bond mol-refinery-patrol on start","description":"Wire up Refinery to automatically bond its patrol molecule on startup.\n\n## Current state\n- mol-refinery-patrol exists in builtin_molecules.go\n- prompts/roles/refinery.md describes the protocol\n- Refinery doesn't auto-bond on startup\n\n## Desired behavior\nOn Refinery session start:\n1. gt prime detects RoleRefinery\n2. Check for existing in-progress patrol: bd list --status=in_progress --assignee=refinery\n3. If found: resume from current step\n4. If not found: bd mol bond mol-refinery-patrol --wisp\n5. Output patrol context to agent\n\n## Implementation options\nA) Add to gt prime (outputRefineryPatrolContext)\nB) Add startup hook in refinery CLAUDE.md\nC) Both (prime detects, template reinforces)\n\n## Testing\n- Start refinery session\n- Verify patrol bonds automatically\n- Kill mid-patrol, restart, verify resumes\n\n## Depends on\n- gt-3x0z.10 (existing issue for Refinery patrol)","status":"open","priority":1,"issue_type":"task","created_at":"2025-12-22T16:43:34.739741-08:00","updated_at":"2025-12-22T16:43:34.739741-08:00"}
|
||||
{"id":"gt-j87","title":"Design: Work flow simulation and validation","description":"Validate GGT designs through simulation before implementation.\n\n## Validation Approaches\n\n### 1. Dry-Run Simulation (Recommended First)\nMayor walks through scenarios mentally/on paper:\n- \"If polecat Toast signals done with dirty git state, what happens?\"\n- \"If Witness context fills mid-verification, what state is lost?\"\n- \"If two polecats try to close same issue, what happens?\"\n\nCreate beads for any gaps discovered.\n\n### 2. Real Work in gastown-py\nUse Python Gas Town to stress-test assumptions:\n- Run actual batch work on test repos\n- Observe edge cases in practice\n- Document issues found\n\n### 3. Edge Case Analysis\nSystematic review of failure modes:\n- Agent crashes mid-operation\n- Network failures during sync\n- Concurrent access to shared state\n- Context limits hit at bad times\n\n## Key Scenarios to Validate\n\n- [ ] Witness session cycling (state preservation)\n- [ ] Polecat decommission with dirty state\n- [ ] Merge conflicts in queue\n- [ ] Beads sync conflicts between workers\n- [ ] Escalation path (stuck worker -\u003e Mayor)\n- [ ] Cross-rig communication\n- [ ] Federation mail routing (future)\n\n## Success Criteria\n\n- No data loss scenarios identified\n- Clear recovery paths for all failure modes\n- Edge cases either handled or documented as limitations\n- Design improves as model cognition improves\n\n## Output\n\nFor each scenario validated:\n1. Document in relevant bead if issue found\n2. Create new beads for missing functionality\n3. Update architecture.md if design changes","status":"open","priority":1,"issue_type":"epic","created_at":"2025-12-15T20:24:11.251841-08:00","updated_at":"2025-12-16T17:25:49.858717-08:00"}
|
||||
{"id":"gt-jgdx","title":"Digest: mol-deacon-patrol","description":"Test patrol cycle - first run, no actual work done","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-22T02:07:03.388821-08:00","updated_at":"2025-12-22T02:07:03.388821-08:00","closed_at":"2025-12-22T02:07:03.388793-08:00","close_reason":"Squashed from 5 wisps"}
|
||||
{"id":"gt-jpt","title":"Town-level beads: Real DB for coordination mail","description":"Implement Option A from mail redesign: Town gets real beads DB for coordination.\n\n## Background\n\nMail is now Beads. But currently:\n- Town .beads/redirect points to rig beads\n- mayor/mail/ has legacy JSONL files\n- Cross-rig coordination has no clear home\n\n## Design\n\nTown beads = coordination, cross-rig mail, mayor inbox, handoffs\nRig beads = project issues, work items\n\nMatches HOP hierarchy: platform \u003e project \u003e worker\n\n## Structure\n\n~/gt/\n .beads/ # REAL beads DB (prefix: gm-)\n mayor/\n town.json\n state.json # NO mail/ directory\n gastown/\n .beads/ # Rig beads (prefix: ga-)\n\n## Tasks\n\n1. Delete ~/gt/.beads/redirect\n2. Run bd init --prefix gm at ~/gt/ (town beads)\n3. Delete ~/gt/mayor/mail/ directory\n4. Update gt mail to use beads not JSONL\n5. Add mail fields (thread_id, reply_to, msg_type)\n6. Update gt prime for two-tier model\n7. Update docs/architecture.md\n\n## Addressing\n\n- mayor/ -\u003e town beads\n- rig/agent -\u003e rig beads\n- Cross-rig -\u003e town beads","status":"closed","priority":1,"issue_type":"epic","created_at":"2025-12-17T19:09:55.855955-08:00","updated_at":"2025-12-19T01:57:17.032558-08:00","closed_at":"2025-12-19T01:57:17.032558-08:00"}
|
||||
@@ -643,7 +643,7 @@
|
||||
{"id":"gt-upom","title":"Witness patrol: cleanup idle orphan polecats","description":"Add patrol step to find and cleanup polecats that are idle with no assigned issue. These orphans occur when polecats crash before sending DONE or Witness misses the message. Patrol should verify git is clean before removing worktree. Part of gt-rana.","status":"open","priority":2,"issue_type":"feature","created_at":"2025-12-21T23:09:41.756753-08:00","updated_at":"2025-12-21T23:09:41.756753-08:00"}
|
||||
{"id":"gt-us8","title":"Daemon: configurable heartbeat interval","description":"Heartbeat interval is hardcoded to 60s. Should be configurable via:\n- town.json config\n- Command line flag\n- Environment variable\n\nDefault 60s is reasonable but some deployments may want faster/slower.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-18T13:38:14.282216-08:00","updated_at":"2025-12-18T13:38:14.282216-08:00","dependencies":[{"issue_id":"gt-us8","depends_on_id":"gt-99m","type":"blocks","created_at":"2025-12-18T13:38:26.704111-08:00","created_by":"daemon"}]}
|
||||
{"id":"gt-usy0","title":"Merge: gt-3x0z.3","description":"branch: polecat/rictus\ntarget: main\nsource_issue: gt-3x0z.3\nrig: gastown","status":"closed","priority":2,"issue_type":"merge-request","created_at":"2025-12-21T16:03:43.535266-08:00","updated_at":"2025-12-21T17:20:27.505696-08:00","closed_at":"2025-12-21T17:20:27.505696-08:00","close_reason":"ORPHANED: Branch never pushed, worktree deleted"}
|
||||
{"id":"gt-utwc","title":"Self-mail should suppress tmux notification","description":"When sending mail to yourself (e.g., mayor sending to mayor/), the tmux notification shouldn't fire.\n\n**Rationale:**\n- Self-mail is intended for future-you (next session handoff)\n- Present-you just sent it, so you already know about it\n- The notification is redundant/confusing in this case\n\n**Fix:**\nSuppress tmux notification when sender == recipient address.","status":"closed","priority":3,"issue_type":"bug","created_at":"2025-12-22T17:55:39.573705-08:00","updated_at":"2025-12-22T23:58:02.827026-08:00","closed_at":"2025-12-22T23:58:02.827026-08:00","close_reason":"Skip tmux notification when sender == recipient"}
|
||||
{"id":"gt-utwc","title":"Self-mail should suppress tmux notification","description":"When sending mail to yourself (e.g., mayor sending to mayor/), the tmux notification shouldn't fire.\n\n**Rationale:**\n- Self-mail is intended for future-you (next session handoff)\n- Present-you just sent it, so you already know about it\n- The notification is redundant/confusing in this case\n\n**Fix:**\nSuppress tmux notification when sender == recipient address.","status":"open","priority":3,"issue_type":"bug","created_at":"2025-12-22T17:55:39.573705-08:00","updated_at":"2025-12-22T17:55:39.573705-08:00"}
|
||||
{"id":"gt-uym5","title":"Implement gt mol status command","description":"Show what's on an agent's hook.\n\n```bash\ngt mol status [target]\n```\n\nOutput:\n- What's slung (molecule name, associated issue)\n- Current phase and progress\n- Whether it's a wisp\n- Next action hint\n\nIf no target, shows current agent's status.\n\nAcceptance:\n- [ ] Read pinned bead attachment\n- [ ] Display molecule/issue info\n- [ ] Show phase progress\n- [ ] Indicate wisp vs durable","status":"closed","priority":1,"issue_type":"task","assignee":"gastown/nux","created_at":"2025-12-22T03:17:34.679963-08:00","updated_at":"2025-12-22T12:34:19.942265-08:00","closed_at":"2025-12-22T12:34:19.942265-08:00","close_reason":"Implemented gt mol status command with mol alias, auto-detection, progress display, wisp detection, and next action hints"}
|
||||
{"id":"gt-v5hv","title":"Work on ga-y6b: Implement Refinery as Claude agent. Conve...","description":"Work on ga-y6b: Implement Refinery as Claude agent. Convert from shell to Claude agent that processes MRs in merge queue, runs tests, merges to integration branch. When done, submit MR (not PR) to integration branch for Refinery.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-19T22:58:17.576892-08:00","updated_at":"2025-12-19T23:23:22.778407-08:00","closed_at":"2025-12-19T23:23:22.778407-08:00"}
|
||||
{"id":"gt-v5k","title":"Design: Failure modes and recovery","description":"Document failure modes and recovery strategies for Gas Town operations.\n\n## Critical Failure Modes\n\n### 1. Agent Crash Mid-Operation\n\n**Scenario**: Polecat crashes while committing, Witness crashes while verifying\n\n**Detection**:\n- Session suddenly gone (tmux check fails)\n- State shows 'working' but no session\n- Heartbeat stops (for Witness)\n\n**Recovery**:\n- Doctor detects via ZombieSessionCheck\n- Capture any recoverable state\n- Reset agent state to 'idle'\n- For Witness: auto-restart via supervisor or manual gt witness start\n\n### 2. Git State Corruption\n\n**Scenario**: Merge conflict, failed rebase, detached HEAD\n\n**Detection**:\n- Git commands fail\n- Dirty state that won't commit\n- Branch diverged from origin\n\n**Recovery**:\n- gt doctor reports git health issues\n- Manual intervention recommended\n- Severe cases: remove clone, re-clone\n\n### 3. Beads Sync Conflict\n\n**Scenario**: Two polecats modify same issue\n\n**Detection**:\n- bd sync fails with conflict\n- Beads tombstone mechanism handles most cases\n\n**Recovery**:\n- Beads has last-write-wins semantics\n- bd sync --force in extreme cases\n- Issues may need manual dedup\n\n### 4. Tmux Failure\n\n**Scenario**: Tmux server crashes, socket issues\n\n**Detection**:\n- All sessions inaccessible\n- \"no server running\" errors\n\n**Recovery**:\n- Kill any orphan processes\n- tmux kill-server \u0026\u0026 tmux start-server\n- All agent states reset to idle\n- Re-spawn active work\n\n### 5. Claude API Issues\n\n**Scenario**: Rate limits, outages, context limits\n\n**Detection**:\n- Sessions hang or produce errors\n- Repeated failure patterns\n\n**Recovery**:\n- Exponential backoff (handled by Claude Code)\n- For context limits: session cycling (mail-to-self)\n- For outages: wait and retry\n\n### 6. Disk Full\n\n**Scenario**: Clones, logs, or beads fill disk\n\n**Detection**:\n- Write operations fail\n- git/bd commands error\n\n**Recovery**:\n- Clean up logs: rm ~/.gastown/logs/*\n- Remove old polecat clones\n- gt doctor --fix can clean some cruft\n\n### 7. Network Failure\n\n**Scenario**: Can't reach GitHub, API servers\n\n**Detection**:\n- git fetch/push fails\n- Claude sessions hang\n\n**Recovery**:\n- Work continues locally\n- Queue pushes for later\n- Sync when connectivity restored\n\n## Recovery Principles\n\n1. **Fail safe**: Prefer stopping over corrupting\n2. **State is recoverable**: Git and beads have recovery mechanisms\n3. **Doctor heals**: gt doctor --fix handles common issues\n4. **Emergency stop**: gt stop --all as last resort\n5. **Human escalation**: Some failures need Overseer intervention\n\n## Implementation\n\n- Document each failure mode in architecture.md\n- Ensure doctor checks cover detection\n- Add recovery hints to error messages\n- Log all failures for debugging","status":"open","priority":1,"issue_type":"task","created_at":"2025-12-15T23:19:07.198289-08:00","updated_at":"2025-12-15T23:19:28.171942-08:00"}
|
||||
|
||||
@@ -27,8 +27,9 @@ var (
|
||||
|
||||
// Manager handles witness lifecycle and monitoring operations.
|
||||
type Manager struct {
|
||||
rig *rig.Rig
|
||||
workDir string
|
||||
rig *rig.Rig
|
||||
workDir string
|
||||
handoffState *WitnessHandoffState // Cached handoff state for persistence across burns
|
||||
}
|
||||
|
||||
// NewManager creates a new witness manager for a rig.
|
||||
@@ -80,6 +81,166 @@ func (m *Manager) saveState(w *Witness) error {
|
||||
return os.WriteFile(m.stateFile(), data, 0644)
|
||||
}
|
||||
|
||||
// handoffBeadID returns the well-known ID for this rig's witness handoff bead.
|
||||
func (m *Manager) handoffBeadID() string {
|
||||
return fmt.Sprintf("gt-%s-%s", m.rig.Name, HandoffBeadID)
|
||||
}
|
||||
|
||||
// loadHandoffState loads worker states from the handoff bead.
|
||||
// If the bead doesn't exist, returns an empty state and creates the bead.
|
||||
func (m *Manager) loadHandoffState() (*WitnessHandoffState, error) {
|
||||
beadID := m.handoffBeadID()
|
||||
|
||||
// Try to read the bead
|
||||
cmd := exec.Command("bd", "show", beadID, "--json")
|
||||
cmd.Dir = m.workDir
|
||||
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
// Bead doesn't exist - create it
|
||||
if strings.Contains(stderr.String(), "not found") || strings.Contains(stderr.String(), "No issue") {
|
||||
if err := m.ensureHandoffBead(); err != nil {
|
||||
return nil, fmt.Errorf("creating handoff bead: %w", err)
|
||||
}
|
||||
return &WitnessHandoffState{
|
||||
WorkerStates: make(map[string]WorkerState),
|
||||
}, nil
|
||||
}
|
||||
return nil, fmt.Errorf("reading handoff bead: %s", stderr.String())
|
||||
}
|
||||
|
||||
// Parse the bead JSON
|
||||
var issues []struct {
|
||||
Description string `json:"description"`
|
||||
}
|
||||
if err := json.Unmarshal(stdout.Bytes(), &issues); err != nil {
|
||||
return nil, fmt.Errorf("parsing handoff bead: %w", err)
|
||||
}
|
||||
|
||||
if len(issues) == 0 {
|
||||
return &WitnessHandoffState{
|
||||
WorkerStates: make(map[string]WorkerState),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// The description contains our JSON state
|
||||
desc := issues[0].Description
|
||||
|
||||
// Extract JSON from description (skip any markdown header)
|
||||
state := &WitnessHandoffState{
|
||||
WorkerStates: make(map[string]WorkerState),
|
||||
}
|
||||
|
||||
// Try to find JSON in the description
|
||||
if idx := strings.Index(desc, "{"); idx >= 0 {
|
||||
jsonPart := desc[idx:]
|
||||
// Find the matching closing brace
|
||||
if endIdx := findMatchingBrace(jsonPart); endIdx > 0 {
|
||||
jsonPart = jsonPart[:endIdx+1]
|
||||
if err := json.Unmarshal([]byte(jsonPart), state); err != nil {
|
||||
// If parsing fails, just return empty state
|
||||
return &WitnessHandoffState{
|
||||
WorkerStates: make(map[string]WorkerState),
|
||||
}, nil
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return state, nil
|
||||
}
|
||||
|
||||
// findMatchingBrace finds the index of the matching closing brace.
|
||||
func findMatchingBrace(s string) int {
|
||||
depth := 0
|
||||
inString := false
|
||||
escaped := false
|
||||
|
||||
for i, c := range s {
|
||||
if escaped {
|
||||
escaped = false
|
||||
continue
|
||||
}
|
||||
if c == '\\' && inString {
|
||||
escaped = true
|
||||
continue
|
||||
}
|
||||
if c == '"' {
|
||||
inString = !inString
|
||||
continue
|
||||
}
|
||||
if inString {
|
||||
continue
|
||||
}
|
||||
if c == '{' {
|
||||
depth++
|
||||
} else if c == '}' {
|
||||
depth--
|
||||
if depth == 0 {
|
||||
return i
|
||||
}
|
||||
}
|
||||
}
|
||||
return -1
|
||||
}
|
||||
|
||||
// saveHandoffState persists worker states to the handoff bead.
|
||||
func (m *Manager) saveHandoffState(state *WitnessHandoffState) error {
|
||||
beadID := m.handoffBeadID()
|
||||
|
||||
// Serialize state to JSON
|
||||
stateJSON, err := json.MarshalIndent(state, "", " ")
|
||||
if err != nil {
|
||||
return fmt.Errorf("serializing state: %w", err)
|
||||
}
|
||||
|
||||
// Update the bead's description with the JSON state
|
||||
desc := fmt.Sprintf("Witness handoff state for %s.\n\n```json\n%s\n```", m.rig.Name, string(stateJSON))
|
||||
|
||||
cmd := exec.Command("bd", "update", beadID, "--description", desc)
|
||||
cmd.Dir = m.workDir
|
||||
|
||||
if out, err := cmd.CombinedOutput(); err != nil {
|
||||
return fmt.Errorf("updating handoff bead: %s", strings.TrimSpace(string(out)))
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// ensureHandoffBead creates the handoff bead if it doesn't exist.
|
||||
func (m *Manager) ensureHandoffBead() error {
|
||||
beadID := m.handoffBeadID()
|
||||
title := fmt.Sprintf("Witness handoff state (%s)", m.rig.Name)
|
||||
desc := fmt.Sprintf("Witness handoff state for %s.\n\n```json\n{\"worker_states\": {}, \"last_patrol\": null}\n```", m.rig.Name)
|
||||
|
||||
// Create pinned handoff bead with specific ID
|
||||
cmd := exec.Command("bd", "create",
|
||||
"--id", beadID,
|
||||
"--title", title,
|
||||
"--type", "task",
|
||||
"--priority", "4", // Low priority - just state storage
|
||||
"--description", desc,
|
||||
)
|
||||
cmd.Dir = m.workDir
|
||||
|
||||
if out, err := cmd.CombinedOutput(); err != nil {
|
||||
// If it already exists, that's fine
|
||||
if strings.Contains(string(out), "already exists") {
|
||||
return nil
|
||||
}
|
||||
return fmt.Errorf("creating handoff bead: %s", strings.TrimSpace(string(out)))
|
||||
}
|
||||
|
||||
// Pin the bead so it survives cleanup
|
||||
cmd = exec.Command("bd", "update", beadID, "--pinned")
|
||||
cmd.Dir = m.workDir
|
||||
_ = cmd.Run() // Best effort - pinning might not be supported
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Status returns the current witness status.
|
||||
func (m *Manager) Status() (*Witness, error) {
|
||||
w, err := m.loadState()
|
||||
@@ -165,6 +326,17 @@ func (m *Manager) run(w *Witness) error {
|
||||
fmt.Println("Witness running...")
|
||||
fmt.Println("Press Ctrl+C to stop")
|
||||
|
||||
// Load handoff state from persistent bead (survives wisp burns)
|
||||
handoffState, err := m.loadHandoffState()
|
||||
if err != nil {
|
||||
fmt.Printf("Warning: could not load handoff state: %v\n", err)
|
||||
handoffState = &WitnessHandoffState{
|
||||
WorkerStates: make(map[string]WorkerState),
|
||||
}
|
||||
}
|
||||
m.handoffState = handoffState
|
||||
fmt.Printf("Loaded handoff state with %d worker(s)\n", len(m.handoffState.WorkerStates))
|
||||
|
||||
// Initial check immediately
|
||||
m.checkAndProcess(w)
|
||||
|
||||
@@ -195,6 +367,13 @@ func (m *Manager) checkAndProcess(w *Witness) {
|
||||
fmt.Printf("Auto-spawn error: %v\n", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Update last patrol time and persist handoff state
|
||||
if m.handoffState != nil {
|
||||
now := time.Now()
|
||||
m.handoffState.LastPatrol = &now
|
||||
// Note: individual nudge/activity updates already persist, so this is just for LastPatrol
|
||||
}
|
||||
}
|
||||
|
||||
// healthCheck performs a health check on all monitored polecats.
|
||||
@@ -225,6 +404,9 @@ func (m *Manager) healthCheck(w *Witness) error {
|
||||
status := m.checkPolecatHealth(p.Name, p.ClonePath)
|
||||
if status == PolecatStuck {
|
||||
m.handleStuckPolecat(w, p.Name)
|
||||
} else if status == PolecatHealthy {
|
||||
// Worker is active - update activity tracking and clear nudge count
|
||||
m.updateWorkerActivity(p.Name, "")
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -331,9 +513,16 @@ func (m *Manager) handleStuckPolecat(w *Witness, polecatName string) {
|
||||
}
|
||||
|
||||
// getNudgeCount returns how many times a polecat has been nudged.
|
||||
// Uses handoff state for persistence across wisp burns.
|
||||
func (m *Manager) getNudgeCount(w *Witness, polecatName string) int {
|
||||
// Count occurrences in SpawnedIssues that start with "nudge:" prefix
|
||||
// We reuse SpawnedIssues to track nudges with a "nudge:<name>" pattern
|
||||
// First check handoff state (persistent across burns)
|
||||
if m.handoffState != nil {
|
||||
if ws, ok := m.handoffState.WorkerStates[polecatName]; ok {
|
||||
return ws.NudgeCount
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback to legacy SpawnedIssues for backwards compatibility
|
||||
count := 0
|
||||
nudgeKey := "nudge:" + polecatName
|
||||
for _, entry := range w.SpawnedIssues {
|
||||
@@ -345,11 +534,70 @@ func (m *Manager) getNudgeCount(w *Witness, polecatName string) int {
|
||||
}
|
||||
|
||||
// recordNudge records that a nudge was sent to a polecat.
|
||||
// Updates both handoff state (persistent) and legacy SpawnedIssues.
|
||||
func (m *Manager) recordNudge(w *Witness, polecatName string) {
|
||||
now := time.Now()
|
||||
|
||||
// Update handoff state (persistent across burns)
|
||||
if m.handoffState != nil {
|
||||
if m.handoffState.WorkerStates == nil {
|
||||
m.handoffState.WorkerStates = make(map[string]WorkerState)
|
||||
}
|
||||
ws := m.handoffState.WorkerStates[polecatName]
|
||||
ws.NudgeCount++
|
||||
ws.LastNudge = &now
|
||||
m.handoffState.WorkerStates[polecatName] = ws
|
||||
|
||||
// Persist to handoff bead
|
||||
if err := m.saveHandoffState(m.handoffState); err != nil {
|
||||
fmt.Printf("Warning: failed to persist handoff state: %v\n", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Also update legacy SpawnedIssues for backwards compatibility
|
||||
nudgeKey := "nudge:" + polecatName
|
||||
w.SpawnedIssues = append(w.SpawnedIssues, nudgeKey)
|
||||
}
|
||||
|
||||
// clearNudgeCount clears the nudge count for a polecat (e.g., when they become active again).
|
||||
func (m *Manager) clearNudgeCount(polecatName string) {
|
||||
if m.handoffState != nil && m.handoffState.WorkerStates != nil {
|
||||
if ws, ok := m.handoffState.WorkerStates[polecatName]; ok {
|
||||
ws.NudgeCount = 0
|
||||
ws.LastNudge = nil
|
||||
now := time.Now()
|
||||
ws.LastActive = &now
|
||||
m.handoffState.WorkerStates[polecatName] = ws
|
||||
|
||||
// Persist to handoff bead
|
||||
if err := m.saveHandoffState(m.handoffState); err != nil {
|
||||
fmt.Printf("Warning: failed to persist handoff state: %v\n", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// updateWorkerActivity updates the last active time for a worker.
|
||||
func (m *Manager) updateWorkerActivity(polecatName, issueID string) {
|
||||
if m.handoffState != nil {
|
||||
if m.handoffState.WorkerStates == nil {
|
||||
m.handoffState.WorkerStates = make(map[string]WorkerState)
|
||||
}
|
||||
ws := m.handoffState.WorkerStates[polecatName]
|
||||
now := time.Now()
|
||||
ws.LastActive = &now
|
||||
if issueID != "" {
|
||||
ws.Issue = issueID
|
||||
}
|
||||
// Reset nudge count if worker is active
|
||||
if ws.NudgeCount > 0 {
|
||||
ws.NudgeCount = 0
|
||||
ws.LastNudge = nil
|
||||
}
|
||||
m.handoffState.WorkerStates[polecatName] = ws
|
||||
}
|
||||
}
|
||||
|
||||
// escalateToMayor sends an escalation message to the Mayor.
|
||||
func (m *Manager) escalateToMayor(polecatName string) error {
|
||||
subject := fmt.Sprintf("ESCALATION: Polecat %s stuck", polecatName)
|
||||
|
||||
@@ -84,3 +84,32 @@ type WitnessStats struct {
|
||||
// TodayNudges is the number of nudges today.
|
||||
TodayNudges int `json:"today_nudges"`
|
||||
}
|
||||
|
||||
// WorkerState tracks the state of a single worker (polecat) across wisp burns.
|
||||
type WorkerState struct {
|
||||
// Issue is the current issue the worker is assigned to.
|
||||
Issue string `json:"issue,omitempty"`
|
||||
|
||||
// NudgeCount is how many times this worker has been nudged.
|
||||
NudgeCount int `json:"nudge_count"`
|
||||
|
||||
// LastNudge is when the worker was last nudged.
|
||||
LastNudge *time.Time `json:"last_nudge,omitempty"`
|
||||
|
||||
// LastActive is when the worker was last seen active.
|
||||
LastActive *time.Time `json:"last_active,omitempty"`
|
||||
}
|
||||
|
||||
// WitnessHandoffState tracks all worker states across wisp burns.
|
||||
// This is persisted in a pinned handoff bead that survives wisp burns.
|
||||
type WitnessHandoffState struct {
|
||||
// WorkerStates maps polecat names to their state.
|
||||
WorkerStates map[string]WorkerState `json:"worker_states"`
|
||||
|
||||
// LastPatrol is when the last patrol cycle completed.
|
||||
LastPatrol *time.Time `json:"last_patrol,omitempty"`
|
||||
}
|
||||
|
||||
// HandoffBeadID is the well-known ID suffix for the witness handoff bead.
|
||||
// The full ID is constructed as "<rig>-witness-state" (e.g., "gastown-witness-state").
|
||||
const HandoffBeadID = "witness-state"
|
||||
|
||||
Reference in New Issue
Block a user