Polecats have exactly three operating conditions - there is no idle pool: - Working: session active, doing assigned work - Stalled: session stopped unexpectedly, never nudged back - Zombie: gt done called but cleanup failed Key clarifications: - These are SESSION states; polecat identity persists across sessions - "Stalled" and "zombie" are detected conditions, not stored states - The status:idle label only applies to persistent agents, not polecats Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
313 lines
12 KiB
Markdown
313 lines
12 KiB
Markdown
# Polecat Lifecycle
|
|
|
|
> Understanding the three-layer architecture of polecat workers
|
|
|
|
## Overview
|
|
|
|
Polecats have three distinct lifecycle layers that operate independently. Confusing
|
|
these layers leads to bugs like "idle polecats" and misunderstanding when
|
|
recycling occurs.
|
|
|
|
## The Three Operating States
|
|
|
|
Polecats have exactly three operating states. There is **no idle pool**.
|
|
|
|
| State | Description | How it happens |
|
|
|-------|-------------|----------------|
|
|
| **Working** | Actively doing assigned work | Normal operation |
|
|
| **Stalled** | Session stopped mid-work | Interrupted, crashed, or timed out without being nudged |
|
|
| **Zombie** | Completed work but failed to die | `gt done` failed during cleanup |
|
|
|
|
**The key distinction:** Zombies completed their work; stalled polecats did not.
|
|
|
|
- **Stalled** = supposed to be working, but stopped. The polecat was interrupted or
|
|
crashed and was never nudged back to life. Work is incomplete.
|
|
- **Zombie** = finished work, tried to exit via `gt done`, but cleanup failed. The
|
|
session should have shut down but didn't. Work is complete, just stuck in limbo.
|
|
|
|
There is no "idle" state. Polecats don't wait around between tasks. When work is
|
|
done, `gt done` shuts down the session. If you see a non-working polecat, something
|
|
is broken.
|
|
|
|
## The Self-Cleaning Polecat Model
|
|
|
|
**Polecats are responsible for their own cleanup.** When a polecat completes its
|
|
work unit, it:
|
|
|
|
1. Signals completion via `gt done`
|
|
2. Exits its session immediately (no idle waiting)
|
|
3. Requests its own nuke (self-delete)
|
|
|
|
This removes dependency on the Witness/Deacon for cleanup and ensures polecats
|
|
never sit idle. The simple model: **sandbox dies with session**.
|
|
|
|
### Why Self-Cleaning?
|
|
|
|
- **No idle polecats** - There's no state where a polecat exists without work
|
|
- **Reduced watchdog overhead** - Deacon patrols for stalled/zombie polecats, not idle ones
|
|
- **Faster turnover** - Resources freed immediately on completion
|
|
- **Simpler mental model** - Done means gone
|
|
|
|
### What About Pending Merges?
|
|
|
|
The Refinery owns the merge queue. Once `gt done` submits work:
|
|
- The branch is pushed to origin
|
|
- Work exists in the MQ, not in the polecat
|
|
- If rebase fails, Refinery re-implements on new baseline (fresh polecat)
|
|
- The original polecat is already gone - no sending work "back"
|
|
|
|
## The Three Layers
|
|
|
|
| Layer | Component | Lifecycle | Persistence |
|
|
|-------|-----------|-----------|-------------|
|
|
| **Session** | Claude (tmux pane) | Ephemeral | Cycles per step/handoff |
|
|
| **Sandbox** | Git worktree | Persistent | Until nuke |
|
|
| **Slot** | Name from pool | Persistent | Until nuke |
|
|
|
|
### Session Layer
|
|
|
|
The Claude session is **ephemeral**. It cycles frequently:
|
|
|
|
- After each molecule step (via `gt handoff`)
|
|
- On context compaction
|
|
- On crash/timeout
|
|
- After extended work periods
|
|
|
|
**Key insight:** Session cycling is **normal operation**, not failure. The polecat
|
|
continues working—only the Claude context refreshes.
|
|
|
|
```
|
|
Session 1: Steps 1-2 → handoff
|
|
Session 2: Steps 3-4 → handoff
|
|
Session 3: Step 5 → gt done
|
|
```
|
|
|
|
All three sessions are the **same polecat**. The sandbox and slot persist throughout.
|
|
|
|
### Sandbox Layer
|
|
|
|
The sandbox is the **git worktree**—the polecat's working directory:
|
|
|
|
```
|
|
~/gt/gastown/polecats/Toast/
|
|
```
|
|
|
|
This worktree:
|
|
- Exists from `gt sling` until `gt polecat nuke`
|
|
- Survives all session cycles
|
|
- Contains uncommitted work, staged changes, branch state
|
|
- Is independent of other polecat sandboxes
|
|
|
|
The Witness never destroys sandboxes mid-work. Only `nuke` removes them.
|
|
|
|
### Slot Layer
|
|
|
|
The slot is the **name allocation** from the polecat pool:
|
|
|
|
```bash
|
|
# Pool: [Toast, Shadow, Copper, Ash, Storm...]
|
|
# Toast is allocated to work gt-abc
|
|
```
|
|
|
|
The slot:
|
|
- Determines the sandbox path (`polecats/Toast/`)
|
|
- Maps to a tmux session (`gt-gastown-Toast`)
|
|
- Appears in attribution (`gastown/polecats/Toast`)
|
|
- Is released only on nuke
|
|
|
|
## Correct Lifecycle
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ gt sling │
|
|
│ → Allocate slot from pool (Toast) │
|
|
│ → Create sandbox (worktree on new branch) │
|
|
│ → Start session (Claude in tmux) │
|
|
│ → Hook molecule to polecat │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Work Happens │
|
|
│ │
|
|
│ Session cycles happen here: │
|
|
│ - gt handoff between steps │
|
|
│ - Compaction triggers respawn │
|
|
│ - Crash → Witness respawns │
|
|
│ │
|
|
│ Sandbox persists through ALL session cycles │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ gt done (self-cleaning) │
|
|
│ → Push branch to origin │
|
|
│ → Submit work to merge queue (MR bead) │
|
|
│ → Request self-nuke (sandbox + session cleanup) │
|
|
│ → Exit immediately │
|
|
│ │
|
|
│ Work now lives in MQ, not in polecat. │
|
|
│ Polecat is GONE. No idle state. │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Refinery: merge queue │
|
|
│ → Rebase and merge to main │
|
|
│ → Close the issue │
|
|
│ → If conflict: spawn FRESH polecat to re-implement │
|
|
│ (never send work back to original polecat - it's gone) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## What "Recycle" Means
|
|
|
|
**Session cycling**: Normal. Claude restarts, sandbox stays, slot stays.
|
|
|
|
```bash
|
|
gt handoff # Session cycles, polecat continues
|
|
```
|
|
|
|
**Sandbox recreation**: Repair only. Should be rare.
|
|
|
|
```bash
|
|
gt polecat repair Toast # Emergency: recreate corrupted worktree
|
|
```
|
|
|
|
Session cycling happens constantly. Sandbox recreation should almost never happen
|
|
during normal operation.
|
|
|
|
## Anti-Patterns
|
|
|
|
### "Idle" Polecats (They Don't Exist)
|
|
|
|
**Myth:** Polecats wait between tasks in an idle pool.
|
|
|
|
**Reality:** There is no idle state. Polecats don't exist without work:
|
|
1. Work assigned → polecat spawned
|
|
2. Work done → `gt done` → session exits → polecat nuked
|
|
3. There is no step 3 where they wait around
|
|
|
|
If you see a non-working polecat, it's in a **failure state**:
|
|
|
|
| What you see | What it is | What went wrong |
|
|
|--------------|------------|-----------------|
|
|
| Session exists but not working | **Stalled** | Interrupted/crashed, never nudged |
|
|
| Session done but didn't exit | **Zombie** | `gt done` failed during cleanup |
|
|
|
|
Don't call these "idle" - that implies they're waiting for work. They're not.
|
|
A stalled polecat is *supposed* to be working. A zombie is *supposed* to be dead.
|
|
|
|
### Manual State Transitions
|
|
|
|
**Anti-pattern:**
|
|
```bash
|
|
gt polecat done Toast # DON'T: external state manipulation
|
|
gt polecat reset Toast # DON'T: manual lifecycle control
|
|
```
|
|
|
|
**Correct:**
|
|
```bash
|
|
# Polecat signals its own completion:
|
|
gt done # (from inside the polecat session)
|
|
|
|
# Only Witness nukes polecats:
|
|
gt polecat nuke Toast # (from Witness, after verification)
|
|
```
|
|
|
|
Polecats manage their own session lifecycle. The Witness manages sandbox lifecycle.
|
|
External manipulation bypasses verification.
|
|
|
|
### Sandboxes Without Work (Stalled Polecats)
|
|
|
|
**Anti-pattern:** A sandbox exists but no molecule is hooked, or the session isn't running.
|
|
|
|
This is a **stalled** polecat. It means:
|
|
- The session crashed and wasn't nudged back to life
|
|
- The hook was lost during a crash
|
|
- State corruption occurred
|
|
|
|
This is NOT an "idle" polecat waiting for work. It's stalled - supposed to be
|
|
working but stopped unexpectedly.
|
|
|
|
**Recovery:**
|
|
```bash
|
|
# From Witness:
|
|
gt polecat nuke Toast # Clean up the stalled polecat
|
|
gt sling gt-abc gastown # Respawn with fresh polecat
|
|
```
|
|
|
|
### Confusing Session with Sandbox
|
|
|
|
**Anti-pattern:** Thinking session restart = losing work.
|
|
|
|
```bash
|
|
# Session ends (handoff, crash, compaction)
|
|
# Work is NOT lost because:
|
|
# - Git commits persist in sandbox
|
|
# - Staged changes persist in sandbox
|
|
# - Molecule state persists in beads
|
|
# - Hook persists across sessions
|
|
```
|
|
|
|
The new session picks up where the old one left off via `gt prime`.
|
|
|
|
## Session Lifecycle Details
|
|
|
|
Sessions cycle for these reasons:
|
|
|
|
| Trigger | Action | Result |
|
|
|---------|--------|--------|
|
|
| `gt handoff` | Voluntary | Clean cycle to fresh context |
|
|
| Context compaction | Automatic | Forced by Claude Code |
|
|
| Crash/timeout | Failure | Witness respawns |
|
|
| `gt done` | Completion | Session exits, Witness takes over |
|
|
|
|
All except `gt done` result in continued work. Only `gt done` signals completion.
|
|
|
|
## Witness Responsibilities
|
|
|
|
The Witness monitors polecats but does NOT:
|
|
- Force session cycles (polecats self-manage via handoff)
|
|
- Interrupt mid-step (unless truly stuck)
|
|
- Nuke polecats (polecats self-nuke via `gt done`)
|
|
|
|
The Witness DOES:
|
|
- Detect and nudge stalled polecats (sessions that stopped unexpectedly)
|
|
- Clean up zombie polecats (sessions where `gt done` failed)
|
|
- Respawn crashed sessions
|
|
- Handle escalations from stuck polecats (polecats that explicitly asked for help)
|
|
|
|
## Polecat Identity
|
|
|
|
**Key insight:** Polecat *identity* is long-lived; only sessions and sandboxes are ephemeral.
|
|
|
|
In the HOP model, every entity has a chain (CV) that tracks:
|
|
- What work they've done
|
|
- Success/failure rates
|
|
- Skills demonstrated
|
|
- Quality metrics
|
|
|
|
The polecat *name* (Toast, Shadow, etc.) is a slot from a pool - truly ephemeral.
|
|
But the *agent identity* that executes as that polecat accumulates a work history.
|
|
|
|
```
|
|
POLECAT IDENTITY (persistent) SESSION (ephemeral) SANDBOX (ephemeral)
|
|
├── CV chain ├── Claude instance ├── Git worktree
|
|
├── Work history ├── Context window ├── Branch
|
|
├── Skills demonstrated └── Dies on handoff └── Dies on gt done
|
|
└── Credit for work or gt done
|
|
```
|
|
|
|
This distinction matters for:
|
|
- **Attribution** - Who gets credit for the work?
|
|
- **Skill routing** - Which agent is best for this task?
|
|
- **Cost accounting** - Who pays for inference?
|
|
- **Federation** - Agents having their own chains in a distributed world
|
|
|
|
## Related Documentation
|
|
|
|
- [Overview](../overview.md) - Role taxonomy and architecture
|
|
- [Molecules](molecules.md) - Molecule execution and polecat workflow
|
|
- [Propulsion Principle](propulsion-principle.md) - Why work triggers immediate execution
|