docs: clarify polecat three-state model (working/stalled/zombie)
Polecats have exactly three operating conditions - there is no idle pool: - Working: session active, doing assigned work - Stalled: session stopped unexpectedly, never nudged back - Zombie: gt done called but cleanup failed Key clarifications: - These are SESSION states; polecat identity persists across sessions - "Stalled" and "zombie" are detected conditions, not stored states - The status:idle label only applies to persistent agents, not polecats Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
committed by
Steve Yegge
parent
3247b57926
commit
98b11eda3c
@@ -8,6 +8,27 @@ Polecats have three distinct lifecycle layers that operate independently. Confus
|
||||
these layers leads to bugs like "idle polecats" and misunderstanding when
|
||||
recycling occurs.
|
||||
|
||||
## The Three Operating States
|
||||
|
||||
Polecats have exactly three operating states. There is **no idle pool**.
|
||||
|
||||
| State | Description | How it happens |
|
||||
|-------|-------------|----------------|
|
||||
| **Working** | Actively doing assigned work | Normal operation |
|
||||
| **Stalled** | Session stopped mid-work | Interrupted, crashed, or timed out without being nudged |
|
||||
| **Zombie** | Completed work but failed to die | `gt done` failed during cleanup |
|
||||
|
||||
**The key distinction:** Zombies completed their work; stalled polecats did not.
|
||||
|
||||
- **Stalled** = supposed to be working, but stopped. The polecat was interrupted or
|
||||
crashed and was never nudged back to life. Work is incomplete.
|
||||
- **Zombie** = finished work, tried to exit via `gt done`, but cleanup failed. The
|
||||
session should have shut down but didn't. Work is complete, just stuck in limbo.
|
||||
|
||||
There is no "idle" state. Polecats don't wait around between tasks. When work is
|
||||
done, `gt done` shuts down the session. If you see a non-working polecat, something
|
||||
is broken.
|
||||
|
||||
## The Self-Cleaning Polecat Model
|
||||
|
||||
**Polecats are responsible for their own cleanup.** When a polecat completes its
|
||||
@@ -23,7 +44,7 @@ never sit idle. The simple model: **sandbox dies with session**.
|
||||
### Why Self-Cleaning?
|
||||
|
||||
- **No idle polecats** - There's no state where a polecat exists without work
|
||||
- **Reduced watchdog overhead** - Deacon doesn't need to patrol for zombies
|
||||
- **Reduced watchdog overhead** - Deacon patrols for stalled/zombie polecats, not idle ones
|
||||
- **Faster turnover** - Resources freed immediately on completion
|
||||
- **Simpler mental model** - Done means gone
|
||||
|
||||
@@ -158,19 +179,24 @@ during normal operation.
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### Idle Polecats
|
||||
### "Idle" Polecats (They Don't Exist)
|
||||
|
||||
**Myth:** Polecats wait between tasks in an idle state.
|
||||
**Myth:** Polecats wait between tasks in an idle pool.
|
||||
|
||||
**Reality:** Polecats don't exist without work. The lifecycle is:
|
||||
**Reality:** There is no idle state. Polecats don't exist without work:
|
||||
1. Work assigned → polecat spawned
|
||||
2. Work done → polecat nuked
|
||||
3. There is no idle state
|
||||
2. Work done → `gt done` → session exits → polecat nuked
|
||||
3. There is no step 3 where they wait around
|
||||
|
||||
If you see a polecat without work, something is broken. Either:
|
||||
- The hook was lost (bug)
|
||||
- The session crashed before loading context
|
||||
- Manual intervention corrupted state
|
||||
If you see a non-working polecat, it's in a **failure state**:
|
||||
|
||||
| What you see | What it is | What went wrong |
|
||||
|--------------|------------|-----------------|
|
||||
| Session exists but not working | **Stalled** | Interrupted/crashed, never nudged |
|
||||
| Session done but didn't exit | **Zombie** | `gt done` failed during cleanup |
|
||||
|
||||
Don't call these "idle" - that implies they're waiting for work. They're not.
|
||||
A stalled polecat is *supposed* to be working. A zombie is *supposed* to be dead.
|
||||
|
||||
### Manual State Transitions
|
||||
|
||||
@@ -192,20 +218,23 @@ gt polecat nuke Toast # (from Witness, after verification)
|
||||
Polecats manage their own session lifecycle. The Witness manages sandbox lifecycle.
|
||||
External manipulation bypasses verification.
|
||||
|
||||
### Sandboxes Without Work
|
||||
### Sandboxes Without Work (Stalled Polecats)
|
||||
|
||||
**Anti-pattern:** A sandbox exists but no molecule is hooked.
|
||||
**Anti-pattern:** A sandbox exists but no molecule is hooked, or the session isn't running.
|
||||
|
||||
This means:
|
||||
- The polecat was spawned incorrectly
|
||||
- The hook was lost during crash
|
||||
This is a **stalled** polecat. It means:
|
||||
- The session crashed and wasn't nudged back to life
|
||||
- The hook was lost during a crash
|
||||
- State corruption occurred
|
||||
|
||||
This is NOT an "idle" polecat waiting for work. It's stalled - supposed to be
|
||||
working but stopped unexpectedly.
|
||||
|
||||
**Recovery:**
|
||||
```bash
|
||||
# From Witness:
|
||||
gt polecat nuke Toast # Clean slate
|
||||
gt sling gt-abc gastown # Respawn with work
|
||||
gt polecat nuke Toast # Clean up the stalled polecat
|
||||
gt sling gt-abc gastown # Respawn with fresh polecat
|
||||
```
|
||||
|
||||
### Confusing Session with Sandbox
|
||||
@@ -244,10 +273,10 @@ The Witness monitors polecats but does NOT:
|
||||
- Nuke polecats (polecats self-nuke via `gt done`)
|
||||
|
||||
The Witness DOES:
|
||||
- Detect and nudge stalled polecats (sessions that stopped unexpectedly)
|
||||
- Clean up zombie polecats (sessions where `gt done` failed)
|
||||
- Respawn crashed sessions
|
||||
- Nudge stuck polecats
|
||||
- Handle escalations
|
||||
- Clean up orphaned polecats (crash before `gt done`)
|
||||
- Handle escalations from stuck polecats (polecats that explicitly asked for help)
|
||||
|
||||
## Polecat Identity
|
||||
|
||||
|
||||
@@ -67,7 +67,12 @@ Events capture the full history. Labels cache the current state for fast queries
|
||||
Labels use `<dimension>:<value>` format:
|
||||
- `patrol:muted` / `patrol:active`
|
||||
- `mode:degraded` / `mode:normal`
|
||||
- `status:idle` / `status:working`
|
||||
- `status:idle` / `status:working` (for persistent agents only - see note)
|
||||
|
||||
**Note on polecats:** The `status:idle` label does NOT apply to polecats. Polecats
|
||||
have no idle state - they're either working, stalled (stopped unexpectedly), or
|
||||
zombie (`gt done` failed). This label is for persistent agents like Deacon, Witness,
|
||||
and Crew members who can legitimately be idle between tasks.
|
||||
|
||||
### State Change Flow
|
||||
|
||||
|
||||
@@ -3,20 +3,41 @@ package polecat
|
||||
|
||||
import "time"
|
||||
|
||||
// State represents the current state of a polecat.
|
||||
// In the transient model, polecats exist only while working.
|
||||
// State represents the current session state of a polecat.
|
||||
//
|
||||
// IMPORTANT: There is NO idle state. Polecats have three operating conditions:
|
||||
//
|
||||
// - Working: Session active, doing assigned work (normal operation)
|
||||
// - Stalled: Session stopped unexpectedly, was never nudged back to life
|
||||
// - Zombie: Session called 'gt done' but cleanup failed - tried to die but couldn't
|
||||
//
|
||||
// The distinction matters: zombies completed their work; stalled polecats did not.
|
||||
// Neither is "idle" - stalled polecats are SUPPOSED to be working, zombies are
|
||||
// SUPPOSED to be dead. There is no idle pool where polecats wait for work.
|
||||
//
|
||||
// Note: These are SESSION states. The polecat IDENTITY (CV chain, mailbox, work
|
||||
// history) persists across sessions. A stalled or zombie session doesn't destroy
|
||||
// the polecat's identity - it just means the session needs intervention.
|
||||
//
|
||||
// "Stalled" and "zombie" are detected conditions, not stored states. The Witness
|
||||
// detects them through monitoring (tmux state, age in StateDone, etc.).
|
||||
type State string
|
||||
|
||||
const (
|
||||
// StateWorking means the polecat is actively working on an issue.
|
||||
// StateWorking means the polecat session is actively working on an issue.
|
||||
// This is the initial and primary state for transient polecats.
|
||||
// Working is the ONLY healthy operating state - there is no idle pool.
|
||||
StateWorking State = "working"
|
||||
|
||||
// StateDone means the polecat has completed its assigned work
|
||||
// and is ready for cleanup by the Witness.
|
||||
// StateDone means the polecat has completed its assigned work and called
|
||||
// 'gt done'. This is normally a transient state - the session should exit
|
||||
// immediately after. If a polecat remains in StateDone, it's a "zombie":
|
||||
// the cleanup failed and the session is stuck.
|
||||
StateDone State = "done"
|
||||
|
||||
// StateStuck means the polecat needs assistance.
|
||||
// StateStuck means the polecat has explicitly signaled it needs assistance.
|
||||
// This is an intentional request for help from the polecat itself.
|
||||
// Different from "stalled" (detected externally when session stops working).
|
||||
StateStuck State = "stuck"
|
||||
|
||||
// StateActive is deprecated: use StateWorking.
|
||||
|
||||
@@ -55,7 +55,12 @@ You:
|
||||
- Nuke your own sandbox and session
|
||||
- Exit immediately
|
||||
|
||||
There is no idle state. Done means gone.
|
||||
**There is no idle state.** Polecats have exactly three operating states:
|
||||
- **Working** - actively doing assigned work (normal)
|
||||
- **Stalled** - session stopped mid-work (failure: should be working)
|
||||
- **Zombie** - `gt done` failed during cleanup (failure: should be dead)
|
||||
|
||||
Done means gone. If `gt done` succeeds, you cease to exist.
|
||||
|
||||
**Important:** Your molecule already has step beads. Use `bd ready` to find them.
|
||||
Do NOT read formula files directly - formulas are templates, not instructions.
|
||||
@@ -167,9 +172,10 @@ The `gt done` command (self-cleaning):
|
||||
- Nukes your sandbox (worktree cleanup)
|
||||
- Exits your session immediately
|
||||
|
||||
**You are gone after `gt done`.** No idle waiting. The Refinery will merge
|
||||
your work from the MQ. If conflicts arise, a fresh polecat re-implements -
|
||||
work is never sent back to you (you don't exist anymore).
|
||||
**You are gone after `gt done`.** The session shuts down - there's no idle state
|
||||
where you wait for more work. The Refinery will merge your work from the MQ.
|
||||
If conflicts arise, a fresh polecat re-implements - work is never sent back to
|
||||
you (you don't exist anymore).
|
||||
|
||||
### No PRs in Maintainer Repos
|
||||
|
||||
@@ -236,8 +242,10 @@ If you forget to handoff:
|
||||
- Work continues from hook (molecule state preserved)
|
||||
- No work is lost
|
||||
|
||||
**The Witness role**: Witness monitors for stuck polecats (long idle on same step)
|
||||
but does NOT force recycle between steps. You manage your own session lifecycle.
|
||||
**The Witness role**: Witness monitors for stalled polecats (sessions that stopped
|
||||
unexpectedly) but does NOT force recycle between steps. You manage your own session
|
||||
lifecycle. Note: "stalled" means you stopped when you should be working - it's not
|
||||
an idle state.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user