Commit Graph

3 Commits

Author SHA1 Message Date
gastown/crew/gus
e0858096f6 fix(zfc): move stuck detection thresholds to agent-controlled config
Per ZFC principle: 'Let agents decide thresholds. Stuck is a judgment call.'

Changes:
- Add health check threshold fields to RoleConfig (ping_timeout,
  consecutive_failures, kill_cooldown, stuck_threshold)
- Add LoadStuckConfig() to read thresholds from hq-deacon-role bead
- Update patrol_check.go to use configurable stuck threshold
- Defaults remain as fallbacks when no role bead config exists

Agents can now configure their stuck detection by adding fields to their
role bead, e.g.:
  ping_timeout: 45s
  consecutive_failures: 5
  kill_cooldown: 10m
  stuck_threshold: 2h

Fixes: hq-2355b
2026-01-09 22:07:35 -08:00
max
1b69576573 fix: Address golangci-lint errors (errcheck, gosec) (#76)
Apply PR #76 from dannomayernotabot:

- Add golangci exclusions for internal package false positives
- Tighten file permissions (0644 -> 0600) for sensitive files
- Add ReadHeaderTimeout to HTTP server (slowloris prevention)
- Explicit error ignoring with _ = for intentional cases
- Add //nolint comments with justifications
- Spelling: cancelled -> canceled (US locale)

Co-Authored-By: dannomayernotabot <noreply@github.com>

🤖 Generated with Claude Code
2026-01-03 16:11:55 -08:00
gastown/polecats/slit
3dd18a1981 Add Deacon stuck-session detection and force-kill protocol
Implements gt-l6ro3.4: Deacon can now detect and kill genuinely stuck/hung
Claude Code sessions during health rounds.

New commands:
- gt deacon health-check <agent>: Pings agent, waits for response, tracks
  consecutive failures. Returns exit code 2 when force-kill threshold reached.
- gt deacon force-kill <agent>: Kills tmux session, updates agent bead state,
  notifies mayor. Respects cooldown period.
- gt deacon health-state: Shows health check state for all monitored agents.

Detection protocol:
1. Send HEALTH_CHECK nudge to agent
2. Wait for agent bead update (configurable timeout, default 30s)
3. Track consecutive failures (default threshold: 3)
4. Recommend force-kill when threshold exceeded

Force-kill protocol:
1. Log intervention (mail to agent)
2. Kill tmux session
3. Update agent bead state to "killed"
4. Notify mayor (optional)

Configurable parameters:
- --timeout: How long to wait for response (default 30s)
- --failures: Consecutive failures before force-kill (default 3)
- --cooldown: Minimum time between force-kills (default 5m)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:14:05 -08:00