Commit Graph

20 Commits

Author SHA1 Message Date
Johann Dirry
3d5a66f850 Fixing unit tests on windows (#813)
* Add Windows stub for orphan cleanup

* Fix account switch tests on Windows

* Make query session events test portable

* Disable beads daemon in query session events test

* Add Windows bd stubs for sling tests

* Make expandOutputPath test OS-agnostic

* Make role_agents test Windows-friendly

* Make config path tests OS-agnostic

* Make HealthCheckStateFile test OS-agnostic

* Skip orphan process check on Windows

* Normalize sparse checkout detail paths

* Make dog path tests OS-agnostic

* Fix bare repo refspec config on Windows

* Add Windows process detection for locks

* Add Windows CI workflow

* Make mail path tests OS-agnostic

* Skip plugin file mode test on Windows

* Skip tmux-dependent polecat tests on Windows

* Normalize polecat paths and AGENTS.md content

* Make beads init failure test Windows-friendly

* Skip rig agent bead init test on Windows

* Make XDG path tests OS-agnostic

* Make exec tests portable on Windows

* Adjust atomic write tests for Windows

* Make wisp tests Windows-friendly

* Make workspace find tests OS-agnostic

* Fix Windows rig add integration test

* Make sling var logging Windows-friendly

* Fix sling attached molecule update ordering

---------

Co-authored-by: Johann Dirry <johann.dirry@microsea.at>
2026-01-20 14:17:35 -08:00
dustin
fd61259336 feat: add initial prompt for autonomous patrol startup (deacon & witness) (#769)
Add initial prompt to deacon and witness startup commands to trigger
autonomous patrol. Without this, agents sit idle at the prompt after
SessionStart hooks run.

Implements GUPP (Gas Town Universal Propulsion Principle): agents start
patrol immediately when Claude launches.
2026-01-20 14:10:30 -08:00
aleiby
15d1dc8fa8 fix: Make WaitForCommand/WaitForRuntimeReady fatal in manager Start() (#529)
Fixes #525: gt up reports deacon success but session doesn't actually start

Previously, WaitForCommand failures were marked as "non-fatal" in the
manager Start() methods used by gt up. This caused gt up to report
success even when Claude failed to start, because the error was silently
ignored.

Now when WaitForCommand or WaitForRuntimeReady times out:
1. The zombie tmux session is killed
2. An error is returned to the caller
3. gt up properly reports the failure

This aligns the manager Start() behavior with the cmd start functions
(e.g., gt deacon start) which already had fatal WaitForCommand behavior.

Changed files:
- internal/deacon/manager.go
- internal/mayor/manager.go
- internal/witness/manager.go
- internal/refinery/manager.go

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 00:00:53 -08:00
Julian Knutsen
e7ca4908dc refactor(config): remove BEADS_DIR from agent environment and add doctor check (#455)
* fix(sling_test): update test for cook dir change

The cook command no longer needs database context and runs from cwd,
not the target rig directory. Update test to match this behavior
change from bd2a5ab5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(tests): skip tests requiring missing binaries, handle --allow-stale

- Add skipIfAgentBinaryMissing helper to skip tests when codex/gemini
  binaries aren't available in the test environment
- Update rig manager test stub to handle --allow-stale flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(config): remove BEADS_DIR from agent environment

Stop exporting BEADS_DIR in AgentEnv - agents should use beads redirect
mechanism instead of relying on environment variable. This prevents
prefix mismatches when agents operate across different beads databases.

Changes:
- Remove BeadsDir field from AgentEnvConfig
- Remove BEADS_DIR from env vars set on agent sessions
- Update doctor env_check to not expect BEADS_DIR
- Update all manager Start() calls to not pass BeadsDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(doctor): detect BEADS_DIR in tmux session environment

Add a doctor check that warns when BEADS_DIR is set in any Gas Town
tmux session. BEADS_DIR in the environment overrides prefix-based
routing and breaks multi-rig lookups - agents should use the beads
redirect mechanism instead.

The check:
- Iterates over all Gas Town tmux sessions (gt-* and hq-*)
- Checks if BEADS_DIR is set in the session environment
- Returns a warning with fix hint to restart sessions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: julianknutsen <julianknutsen@users.noreply.github>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:13:57 -08:00
gastown/crew/gus
e0858096f6 fix(zfc): move stuck detection thresholds to agent-controlled config
Per ZFC principle: 'Let agents decide thresholds. Stuck is a judgment call.'

Changes:
- Add health check threshold fields to RoleConfig (ping_timeout,
  consecutive_failures, kill_cooldown, stuck_threshold)
- Add LoadStuckConfig() to read thresholds from hq-deacon-role bead
- Update patrol_check.go to use configurable stuck threshold
- Defaults remain as fallbacks when no role bead config exists

Agents can now configure their stuck detection by adding fields to their
role bead, e.g.:
  ping_timeout: 45s
  consecutive_failures: 5
  kill_cooldown: 10m
  stuck_threshold: 2h

Fixes: hq-2355b
2026-01-09 22:07:35 -08:00
julianknutsen
e999ceb1c1 refactor: consolidate agent env vars into config.AgentEnv
Create centralized AgentEnv function as single source of truth for all
agent environment variables. All agents now consistently receive:
- GT_ROLE, BD_ACTOR, GIT_AUTHOR_NAME (role identity)
- GT_ROOT, BEADS_DIR (workspace paths)
- GT_RIG, GT_POLECAT/GT_CREW (rig-specific identity)
- BEADS_AGENT_NAME, BEADS_NO_DAEMON (beads config)
- CLAUDE_CONFIG_DIR (optional account selection)

Remove RoleEnvVars in favor of AgentEnvSimple wrapper.
Remove IncludeBeadsEnv flag - beads env vars always included.
Update all manager and cmd call sites to use AgentEnv.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 21:52:30 -08:00
jack
afff85cdff fix(tmux): use NewSessionWithCommand to avoid send-keys race condition
Agent sessions would fail on startup because send-keys arrived before the
shell was ready, causing 'bad pattern' and 'command not found' errors.

Fix: Create sessions with the command directly using tmux new-session's
command argument. This runs the agent as the pane's initial process,
avoiding shell readiness timing issues entirely.

Updated all agent managers: mayor, deacon, witness, refinery, polecat, crew.

Also fixes pre-existing build error in polecat/manager.go (polecatPath →
clonePath/newClonePath).

Closes #280

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 23:35:31 -08:00
Steve Yegge
c4d956ebe7 feat(deacon): implement bulletproof pause mechanism (gt-bpo2c) (#265)
Add multi-layer pause mechanism to prevent Deacon from causing damage:

Layer 1: File-based pause state
- Location: ~/.runtime/deacon/paused.json
- Stores: paused flag, reason, timestamp, paused_by

Layer 2: Commands
- `gt deacon pause [--reason="..."]` - pause with optional reason
- `gt deacon resume` - remove pause file
- `gt deacon status` - shows pause state prominently

Layer 3: Guards
- `gt prime` for deacon: shows PAUSED message, skips patrol context
- `gt deacon heartbeat`: fails when paused

Helper package:
- internal/deacon/pause.go with IsPaused/Pause/Resume functions

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 21:56:46 -08:00
julianknutsen
c91ab85457 fix: restore agent override support lost in manager refactors
The manager refactors (ea8bef2, 72544cc0) conflicted with the agent
override feature, causing regressions:

Deacon (ea8bef2):
- Lost agentOverride parameter
- Re-added respawn loop (removed in 5f2e16f)
- Lost GUPP (startup + propulsion nudges)

Crew (72544cc0):
- Lost agentOverride wiring to StartOptions
- --agent flag had no effect on crew refresh/restart

This fix restores agent override support and GUPP while keeping
improvements from the manager refactors (zombie detection, etc).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 21:08:51 -08:00
Julian Knutsen
28a9de64d5 fix(tmux): wait for shell ready before sending keys (#264)
Add WaitForShellReady call before SendKeys in all agent managers
(deacon, mayor, witness, refinery). This prevents intermittent
"can't find pane" errors that occur when the tmux session is
created but the shell isn't ready to receive input yet.

The issue manifests under load (e.g., during `gt up` when multiple
agents start in sequence) where the 200ms delay in SendKeysDelayed
isn't sufficient for the pane to be fully initialized.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 20:49:25 -08:00
Steve Yegge
a6157829d7 Merge pull request #239 from julianknutsen/fix/claude-settings-inheritance
fix: Isolate Claude settings outside source repos, unify agent startup
2026-01-07 00:10:05 -08:00
max
bf16f7894b fix(deacon): use session package for hq- session names (gt-r38pj)
stale_hooks.go was using hardcoded 'gt-deacon' and 'gt-mayor' instead of
session.DeaconSessionName() and session.MayorSessionName() which return
'hq-deacon' and 'hq-mayor'. This caused incorrect session lookups.

Also fixes duplicate WorktreeAddFromRef method from merge conflict.
2026-01-06 23:42:03 -08:00
julianknutsen
ea8bef2029 refactor: unify agent startup with Manager pattern
- Create mayor.Manager for mayor lifecycle (Start/Stop/IsRunning/Status)
- Create deacon.Manager for deacon lifecycle with respawn loop
- Move session.Manager to polecat.SessionManager (clearer naming)
- Add zombie session detection for mayor/deacon (kills tmux if Claude dead)
- Remove duplicate session startup code from up.go, start.go, mayor.go
- Rename sessMgr -> polecatMgr for consistency
- Make witness/refinery SessionName() public for status display

All agent types now follow the same Manager pattern:
  mgr := agent.NewManager(...)
  mgr.Start(...)
  mgr.Stop()
  mgr.IsRunning()
  mgr.Status()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 22:32:35 -08:00
gastown/crew/gus
74409dc32b feat(deacon): add stale hooked bead cleanup (gt-2yls3)
Add `gt deacon stale-hooks` command to find and unhook stale beads.

Problem: Beads can get stuck in 'hooked' status when agents die or
abandon work without properly unhooking.

Solution:
- New command scans for hooked beads older than threshold (default 1h)
- Checks if assignee agent is still alive (tmux session exists)
- Unhooks beads with dead agents (sets status back to 'open')
- Supports --dry-run to preview without making changes

Also adds "stale-hook-check" step to Deacon patrol formula.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 13:20:45 -08:00
max
1b69576573 fix: Address golangci-lint errors (errcheck, gosec) (#76)
Apply PR #76 from dannomayernotabot:

- Add golangci exclusions for internal package false positives
- Tighten file permissions (0644 -> 0600) for sensitive files
- Add ReadHeaderTimeout to HTTP server (slowloris prevention)
- Explicit error ignoring with _ = for intentional cases
- Add //nolint comments with justifications
- Spelling: cancelled -> canceled (US locale)

Co-Authored-By: dannomayernotabot <noreply@github.com>

🤖 Generated with Claude Code
2026-01-03 16:11:55 -08:00
gastown/polecats/slit
3dd18a1981 Add Deacon stuck-session detection and force-kill protocol
Implements gt-l6ro3.4: Deacon can now detect and kill genuinely stuck/hung
Claude Code sessions during health rounds.

New commands:
- gt deacon health-check <agent>: Pings agent, waits for response, tracks
  consecutive failures. Returns exit code 2 when force-kill threshold reached.
- gt deacon force-kill <agent>: Kills tmux session, updates agent bead state,
  notifies mayor. Respects cooldown period.
- gt deacon health-state: Shows health check state for all monitored agents.

Detection protocol:
1. Send HEALTH_CHECK nudge to agent
2. Wait for agent bead update (configurable timeout, default 30s)
3. Track consecutive failures (default threshold: 3)
4. Recommend force-kill when threshold exceeded

Force-kill protocol:
1. Log intervention (mail to agent)
2. Kill tmux session
3. Update agent bead state to "killed"
4. Notify mayor (optional)

Configurable parameters:
- --timeout: How long to wait for response (default 30s)
- --failures: Consecutive failures before force-kill (default 3)
- --cooldown: Minimum time between force-kills (default 5m)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 22:14:05 -08:00
Steve Yegge
d3164f72a6 refactor: Move pending spawn tracking to gt spawn pending (hq-466n)
- Moved pending.go from deacon/ to polecat/ package
- Changed storage location from deacon/pending.json to spawn/pending.json
- Added 'gt spawn pending' subcommand for listing/clearing pending spawns
- Deprecated 'gt deacon pending' (still works, shows deprecation notice)

This decouples pending spawn observation from the Deacon role - anyone
can now observe pending polecats (Mayor, humans, debugging, etc.).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 13:50:26 -08:00
Steve Yegge
2e8dfcbe75 feat: Add gt deacon trigger-pending command (gt-t5uk)
Implements auto-triggering of polecats after spawn:
- New pending.go in deacon package tracks pending spawns
- CheckInboxForSpawns reads POLECAT_STARTED messages
- TriggerPendingSpawns polls WaitForClaudeReady and nudges when ready
- PruneStalePending removes spawns older than 5 minutes

The Deacon can now call `gt deacon trigger-pending` during patrol to
automatically send "Begin." to newly spawned polecats once Claude is ready.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 12:19:28 -08:00
Steve Yegge
1554380228 feat(deacon): improve timing and add heartbeat command
Timing changes for more relaxed poke intervals:
- Daemon heartbeat: 60s → 5 minutes
- Backoff base: 60s → 5 minutes
- Backoff max: 10m → 30 minutes
- Fresh threshold: <2min → <5min
- Stale threshold: 2-5min → 5-15min
- Very stale threshold: >5min → >15min

New command:
- `gt deacon heartbeat [action]` - Touch heartbeat file easily

Template rewrite:
- Clearer wake/sleep model
- Documents wake sources (daemon poke, mail, timer callbacks)
- Simpler rounds with `gt deacon heartbeat` instead of bash echo
- Mentions plugins as optional maintenance tasks
- Explains timer callbacks pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 02:12:21 -08:00
Steve Yegge
33455e6d9c feat(deacon): add heartbeat mechanism for daemon coordination (gt-5af.6)
Add deacon package with heartbeat read/write helpers:
- Heartbeat struct with Timestamp, Cycle, LastAction, health counts
- WriteHeartbeat/ReadHeartbeat for persistence to deacon/heartbeat.json
- IsFresh/IsStale/IsVeryStale for age checks
- ShouldPoke to determine if daemon should wake the Deacon
- Touch/TouchWithAction convenience functions

The Deacon writes heartbeat on each wake cycle. The Go daemon reads it
to decide whether to poke the Deacon (only if very stale >5 minutes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 17:26:53 -08:00