docs: add plugin and escalation system designs
Plugin System (gt-n08ix): - Deacon-dispatched periodic automation - Dog execution model (non-blocking) - Wisps for state tracking (no state.json) - Gate types: cooldown, cron, condition, event - First plugin: rebuild-gt for stale binary detection Escalation System (gt-i9r20): - Unified gt escalate command with severity routing - Config-driven: settings/escalation.json - Escalation beads for tracking - Stale escalation re-escalation - Actions: bead, mail, email, sms Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
committed by
Steve Yegge
parent
a106796a0e
commit
f5832188a6
576
docs/design/escalation-system.md
Normal file
576
docs/design/escalation-system.md
Normal file
@@ -0,0 +1,576 @@
|
|||||||
|
# Escalation System Design
|
||||||
|
|
||||||
|
> Detailed design for the Gas Town unified escalation system.
|
||||||
|
> Written 2026-01-11, crew/george session.
|
||||||
|
> Parent epic: gt-i9r20
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Current escalation is ad-hoc "mail Mayor". Issues:
|
||||||
|
- Mayor gets backlogged easily (especially during swarms)
|
||||||
|
- No severity differentiation
|
||||||
|
- No alternative channels (email, SMS, Slack)
|
||||||
|
- No tracking of stale/unacknowledged escalations
|
||||||
|
- No visibility into escalation history
|
||||||
|
|
||||||
|
## Design Goals
|
||||||
|
|
||||||
|
1. **Unified API**: Single `gt escalate` command for all escalation needs
|
||||||
|
2. **Severity-based routing**: Different severities go to different channels
|
||||||
|
3. **Config-driven**: Town config controls routing, no code changes needed
|
||||||
|
4. **Audit trail**: All escalations tracked as beads
|
||||||
|
5. **Stale detection**: Unacknowledged escalations re-escalate automatically
|
||||||
|
6. **Extensible**: Easy to add new notification channels
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Components
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ gt escalate command │
|
||||||
|
│ --severity --subject --body --source │
|
||||||
|
└─────────────────────┬───────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Escalation Manager │
|
||||||
|
│ 1. Read config (settings/escalation.json) │
|
||||||
|
│ 2. Create escalation bead │
|
||||||
|
│ 3. Execute route actions for severity │
|
||||||
|
└─────────────────────┬───────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────┼───────────┬───────────┐
|
||||||
|
▼ ▼ ▼ ▼
|
||||||
|
┌───────┐ ┌─────────┐ ┌───────┐ ┌───────┐
|
||||||
|
│ Bead │ │ Mail │ │ Email │ │ SMS │
|
||||||
|
│Create │ │ Action │ │Action │ │Action │
|
||||||
|
└───────┘ └─────────┘ └───────┘ └───────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Flow
|
||||||
|
|
||||||
|
1. Agent calls `gt escalate --severity=high --subject="..." --body="..."`
|
||||||
|
2. Command loads escalation config from `settings/escalation.json`
|
||||||
|
3. Creates escalation bead with severity, subject, body, source labels
|
||||||
|
4. Looks up route for severity level
|
||||||
|
5. Executes each action in the route (bead already created, then mail, email, etc.)
|
||||||
|
6. Returns escalation bead ID
|
||||||
|
|
||||||
|
### Stale Escalation Flow
|
||||||
|
|
||||||
|
1. Deacon patrol (or plugin) runs `gt escalate stale`
|
||||||
|
2. Queries for escalation beads older than threshold without `acknowledged:true`
|
||||||
|
3. For each stale escalation:
|
||||||
|
- Bump severity (low→medium, medium→high, high→critical)
|
||||||
|
- Re-execute route for new severity
|
||||||
|
- Add `reescalated:true` label and timestamp
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### File Location
|
||||||
|
|
||||||
|
`~/gt/settings/escalation.json`
|
||||||
|
|
||||||
|
This follows the existing pattern where `~/gt/settings/` contains town-level behavioral config.
|
||||||
|
|
||||||
|
### Schema
|
||||||
|
|
||||||
|
```go
|
||||||
|
// EscalationConfig represents escalation routing configuration.
|
||||||
|
type EscalationConfig struct {
|
||||||
|
Type string `json:"type"` // "escalation"
|
||||||
|
Version int `json:"version"` // schema version
|
||||||
|
|
||||||
|
// Routes maps severity levels to action lists.
|
||||||
|
// Actions are executed in order.
|
||||||
|
Routes map[string][]string `json:"routes"`
|
||||||
|
|
||||||
|
// Contacts contains contact information for actions.
|
||||||
|
Contacts EscalationContacts `json:"contacts"`
|
||||||
|
|
||||||
|
// StaleThreshold is how long before an unacknowledged escalation
|
||||||
|
// is considered stale and gets re-escalated. Default: "4h"
|
||||||
|
StaleThreshold string `json:"stale_threshold,omitempty"`
|
||||||
|
|
||||||
|
// MaxReescalations limits how many times an escalation can be
|
||||||
|
// re-escalated. Default: 2 (low→medium→high, then stops)
|
||||||
|
MaxReescalations int `json:"max_reescalations,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// EscalationContacts contains contact information.
|
||||||
|
type EscalationContacts struct {
|
||||||
|
HumanEmail string `json:"human_email,omitempty"`
|
||||||
|
HumanSMS string `json:"human_sms,omitempty"`
|
||||||
|
SlackWebhook string `json:"slack_webhook,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
const CurrentEscalationVersion = 1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Default Configuration
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"type": "escalation",
|
||||||
|
"version": 1,
|
||||||
|
"routes": {
|
||||||
|
"low": ["bead"],
|
||||||
|
"medium": ["bead", "mail:mayor"],
|
||||||
|
"high": ["bead", "mail:mayor", "email:human"],
|
||||||
|
"critical": ["bead", "mail:mayor", "email:human", "sms:human"]
|
||||||
|
},
|
||||||
|
"contacts": {
|
||||||
|
"human_email": "",
|
||||||
|
"human_sms": ""
|
||||||
|
},
|
||||||
|
"stale_threshold": "4h",
|
||||||
|
"max_reescalations": 2
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Action Types
|
||||||
|
|
||||||
|
| Action | Format | Behavior |
|
||||||
|
|--------|--------|----------|
|
||||||
|
| `bead` | `bead` | Create escalation bead (always first, implicit) |
|
||||||
|
| `mail:<target>` | `mail:mayor` | Send gt mail to target |
|
||||||
|
| `email:human` | `email:human` | Send email to `contacts.human_email` |
|
||||||
|
| `sms:human` | `sms:human` | Send SMS to `contacts.human_sms` |
|
||||||
|
| `slack` | `slack` | Post to `contacts.slack_webhook` |
|
||||||
|
| `log` | `log` | Write to escalation log file |
|
||||||
|
|
||||||
|
### Severity Levels
|
||||||
|
|
||||||
|
| Level | Use Case | Default Route |
|
||||||
|
|-------|----------|---------------|
|
||||||
|
| `low` | Informational, non-urgent | bead only |
|
||||||
|
| `medium` | Needs attention soon | bead + mail mayor |
|
||||||
|
| `high` | Urgent, needs human | bead + mail + email |
|
||||||
|
| `critical` | Emergency, immediate | bead + mail + email + SMS |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Escalation Beads
|
||||||
|
|
||||||
|
### Bead Format
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
id: gt-esc-abc123
|
||||||
|
type: escalation
|
||||||
|
status: open
|
||||||
|
title: "Plugin FAILED: rebuild-gt"
|
||||||
|
labels:
|
||||||
|
- severity:high
|
||||||
|
- source:plugin:rebuild-gt
|
||||||
|
- acknowledged:false
|
||||||
|
- reescalated:false
|
||||||
|
- reescalation_count:0
|
||||||
|
description: |
|
||||||
|
Build failed: make returned exit code 2
|
||||||
|
|
||||||
|
## Context
|
||||||
|
- Source: plugin:rebuild-gt
|
||||||
|
- Original severity: medium
|
||||||
|
- Escalated at: 2026-01-11T19:00:00Z
|
||||||
|
created_at: 2026-01-11T15:00:00Z
|
||||||
|
```
|
||||||
|
|
||||||
|
### Label Schema
|
||||||
|
|
||||||
|
| Label | Values | Purpose |
|
||||||
|
|-------|--------|---------|
|
||||||
|
| `severity:<level>` | low, medium, high, critical | Current severity |
|
||||||
|
| `source:<type>:<name>` | plugin:rebuild-gt, patrol:deacon | What triggered it |
|
||||||
|
| `acknowledged:<bool>` | true, false | Has human acknowledged |
|
||||||
|
| `reescalated:<bool>` | true, false | Has been re-escalated |
|
||||||
|
| `reescalation_count:<n>` | 0, 1, 2, ... | Times re-escalated |
|
||||||
|
| `original_severity:<level>` | low, medium, high | Initial severity |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
### gt escalate
|
||||||
|
|
||||||
|
Create a new escalation.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate \
|
||||||
|
--severity=<low|medium|high|critical> \
|
||||||
|
--subject="Short description" \
|
||||||
|
--body="Detailed explanation" \
|
||||||
|
[--source="plugin:rebuild-gt"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Flags:**
|
||||||
|
- `--severity` (required): Escalation severity level
|
||||||
|
- `--subject` (required): Short description (becomes bead title)
|
||||||
|
- `--body` (required): Detailed explanation (becomes bead description)
|
||||||
|
- `--source`: Source identifier for tracking (e.g., "plugin:rebuild-gt")
|
||||||
|
- `--dry-run`: Show what would happen without executing
|
||||||
|
- `--json`: Output escalation bead ID as JSON
|
||||||
|
|
||||||
|
**Exit codes:**
|
||||||
|
- 0: Success
|
||||||
|
- 1: Config error or invalid flags
|
||||||
|
- 2: Action failed (e.g., email send failed)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
gt escalate \
|
||||||
|
--severity=high \
|
||||||
|
--subject="Plugin FAILED: rebuild-gt" \
|
||||||
|
--body="Build failed: make returned exit code 2. Working directory: ~/gt/gastown/crew/george" \
|
||||||
|
--source="plugin:rebuild-gt"
|
||||||
|
|
||||||
|
# Output:
|
||||||
|
# ✓ Created escalation gt-esc-abc123 (severity: high)
|
||||||
|
# → Created bead
|
||||||
|
# → Mailed mayor/
|
||||||
|
# → Emailed steve@example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
### gt escalate ack
|
||||||
|
|
||||||
|
Acknowledge an escalation.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate ack <bead-id> [--note="Investigating"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Behavior:**
|
||||||
|
- Sets `acknowledged:true` label
|
||||||
|
- Optionally adds note to bead
|
||||||
|
- Prevents re-escalation
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
gt escalate ack gt-esc-abc123 --note="Looking into it"
|
||||||
|
# ✓ Acknowledged gt-esc-abc123
|
||||||
|
```
|
||||||
|
|
||||||
|
### gt escalate list
|
||||||
|
|
||||||
|
List escalations.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate list [--severity=...] [--stale] [--unacked] [--all]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Flags:**
|
||||||
|
- `--severity`: Filter by severity level
|
||||||
|
- `--stale`: Show only stale (past threshold, unacked)
|
||||||
|
- `--unacked`: Show only unacknowledged
|
||||||
|
- `--all`: Include acknowledged/closed
|
||||||
|
- `--json`: Output as JSON
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
gt escalate list --unacked
|
||||||
|
# 📢 Unacknowledged Escalations (2)
|
||||||
|
#
|
||||||
|
# ● gt-esc-abc123 [HIGH] Plugin FAILED: rebuild-gt
|
||||||
|
# Source: plugin:rebuild-gt · Age: 2h · Stale in: 2h
|
||||||
|
# ● gt-esc-def456 [MEDIUM] Witness unresponsive
|
||||||
|
# Source: patrol:deacon · Age: 30m · Stale in: 3h30m
|
||||||
|
```
|
||||||
|
|
||||||
|
### gt escalate stale
|
||||||
|
|
||||||
|
Check for and re-escalate stale escalations.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate stale [--dry-run]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Behavior:**
|
||||||
|
- Queries unacked escalations older than `stale_threshold`
|
||||||
|
- For each, bumps severity and re-executes route
|
||||||
|
- Respects `max_reescalations` limit
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
gt escalate stale
|
||||||
|
# 🔄 Re-escalating stale escalations...
|
||||||
|
#
|
||||||
|
# gt-esc-abc123: medium → high (age: 5h, reescalation: 1/2)
|
||||||
|
# → Emailed steve@example.com
|
||||||
|
#
|
||||||
|
# ✓ Re-escalated 1 escalation
|
||||||
|
```
|
||||||
|
|
||||||
|
### gt escalate close
|
||||||
|
|
||||||
|
Close an escalation (resolved).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate close <bead-id> [--reason="Fixed in commit abc123"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Behavior:**
|
||||||
|
- Sets status to closed
|
||||||
|
- Adds resolution note
|
||||||
|
- Records who closed it
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
### File: internal/cmd/escalate.go
|
||||||
|
|
||||||
|
```go
|
||||||
|
package cmd
|
||||||
|
|
||||||
|
// escalateCmd is the parent command for escalation management.
|
||||||
|
var escalateCmd = &cobra.Command{
|
||||||
|
Use: "escalate",
|
||||||
|
Short: "Manage escalations",
|
||||||
|
Long: `Create, acknowledge, and manage escalations with severity-based routing.`,
|
||||||
|
}
|
||||||
|
|
||||||
|
// escalateCreateCmd creates a new escalation.
|
||||||
|
var escalateCreateCmd = &cobra.Command{
|
||||||
|
Use: "escalate --severity=<level> --subject=<text> --body=<text>",
|
||||||
|
Short: "Create a new escalation",
|
||||||
|
// ... implementation
|
||||||
|
}
|
||||||
|
|
||||||
|
// escalateAckCmd acknowledges an escalation.
|
||||||
|
var escalateAckCmd = &cobra.Command{
|
||||||
|
Use: "ack <bead-id>",
|
||||||
|
Short: "Acknowledge an escalation",
|
||||||
|
// ... implementation
|
||||||
|
}
|
||||||
|
|
||||||
|
// escalateListCmd lists escalations.
|
||||||
|
var escalateListCmd = &cobra.Command{
|
||||||
|
Use: "list",
|
||||||
|
Short: "List escalations",
|
||||||
|
// ... implementation
|
||||||
|
}
|
||||||
|
|
||||||
|
// escalateStaleCmd checks for stale escalations.
|
||||||
|
var escalateStaleCmd = &cobra.Command{
|
||||||
|
Use: "stale",
|
||||||
|
Short: "Re-escalate stale escalations",
|
||||||
|
// ... implementation
|
||||||
|
}
|
||||||
|
|
||||||
|
// escalateCloseCmd closes an escalation.
|
||||||
|
var escalateCloseCmd = &cobra.Command{
|
||||||
|
Use: "close <bead-id>",
|
||||||
|
Short: "Close an escalation",
|
||||||
|
// ... implementation
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### File: internal/escalation/manager.go
|
||||||
|
|
||||||
|
```go
|
||||||
|
package escalation
|
||||||
|
|
||||||
|
// Manager handles escalation creation and routing.
|
||||||
|
type Manager struct {
|
||||||
|
config *config.EscalationConfig
|
||||||
|
beads *beads.Client
|
||||||
|
mailer *mail.Client
|
||||||
|
}
|
||||||
|
|
||||||
|
// Escalate creates a new escalation and executes the route.
|
||||||
|
func (m *Manager) Escalate(ctx context.Context, opts EscalateOptions) (*Escalation, error) {
|
||||||
|
// 1. Validate options
|
||||||
|
// 2. Create escalation bead
|
||||||
|
// 3. Look up route for severity
|
||||||
|
// 4. Execute each action
|
||||||
|
// 5. Return escalation with results
|
||||||
|
}
|
||||||
|
|
||||||
|
// Acknowledge marks an escalation as acknowledged.
|
||||||
|
func (m *Manager) Acknowledge(ctx context.Context, beadID string, note string) error {
|
||||||
|
// 1. Load escalation bead
|
||||||
|
// 2. Set acknowledged:true label
|
||||||
|
// 3. Add note if provided
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReescalateStale finds and re-escalates stale escalations.
|
||||||
|
func (m *Manager) ReescalateStale(ctx context.Context) ([]Reescalation, error) {
|
||||||
|
// 1. Query unacked escalations older than threshold
|
||||||
|
// 2. For each, bump severity
|
||||||
|
// 3. Execute new route
|
||||||
|
// 4. Update labels
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### File: internal/escalation/actions.go
|
||||||
|
|
||||||
|
```go
|
||||||
|
package escalation
|
||||||
|
|
||||||
|
// Action is an escalation route action.
|
||||||
|
type Action interface {
|
||||||
|
Execute(ctx context.Context, esc *Escalation) error
|
||||||
|
String() string
|
||||||
|
}
|
||||||
|
|
||||||
|
// BeadAction creates the escalation bead.
|
||||||
|
type BeadAction struct{}
|
||||||
|
|
||||||
|
// MailAction sends gt mail.
|
||||||
|
type MailAction struct {
|
||||||
|
Target string // e.g., "mayor"
|
||||||
|
}
|
||||||
|
|
||||||
|
// EmailAction sends email.
|
||||||
|
type EmailAction struct {
|
||||||
|
Recipient string // from config.contacts
|
||||||
|
}
|
||||||
|
|
||||||
|
// SMSAction sends SMS.
|
||||||
|
type SMSAction struct {
|
||||||
|
Recipient string // from config.contacts
|
||||||
|
}
|
||||||
|
|
||||||
|
// ParseAction parses an action string into an Action.
|
||||||
|
func ParseAction(s string) (Action, error) {
|
||||||
|
// "bead" -> BeadAction{}
|
||||||
|
// "mail:mayor" -> MailAction{Target: "mayor"}
|
||||||
|
// "email:human" -> EmailAction{Recipient: "human"}
|
||||||
|
// etc.
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Email/SMS Implementation
|
||||||
|
|
||||||
|
For v1, use simple exec of external commands:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// EmailAction sends email using the 'mail' command or similar.
|
||||||
|
func (a *EmailAction) Execute(ctx context.Context, esc *Escalation) error {
|
||||||
|
// Option 1: Use system mail command
|
||||||
|
// Option 2: Use sendgrid/ses API (future)
|
||||||
|
// Option 3: Use configured webhook
|
||||||
|
|
||||||
|
// For now, just log a placeholder
|
||||||
|
// Real implementation can be added based on user's infrastructure
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The email/SMS actions can start as stubs that log warnings, with real implementations added based on the user's infrastructure (SendGrid, Twilio, etc.).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration Points
|
||||||
|
|
||||||
|
### Plugin System
|
||||||
|
|
||||||
|
Plugins use escalation for failure notification:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# In plugin.md execution section:
|
||||||
|
|
||||||
|
On failure:
|
||||||
|
```bash
|
||||||
|
gt escalate \
|
||||||
|
--severity=medium \
|
||||||
|
--subject="Plugin FAILED: rebuild-gt" \
|
||||||
|
--body="$ERROR" \
|
||||||
|
--source="plugin:rebuild-gt"
|
||||||
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deacon Patrol
|
||||||
|
|
||||||
|
Deacon uses escalation for health issues:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# In health-scan step:
|
||||||
|
if [ $unresponsive_cycles -ge 5 ]; then
|
||||||
|
gt escalate \
|
||||||
|
--severity=high \
|
||||||
|
--subject="Witness unresponsive: gastown" \
|
||||||
|
--body="Witness has been unresponsive for $unresponsive_cycles cycles" \
|
||||||
|
--source="patrol:deacon:health-scan"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stale Escalation Check
|
||||||
|
|
||||||
|
Can be either:
|
||||||
|
1. A Deacon patrol step
|
||||||
|
2. A plugin (dogfood!)
|
||||||
|
3. Part of `gt escalate` itself (run periodically)
|
||||||
|
|
||||||
|
Recommendation: Start as patrol step, migrate to plugin later.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Plan
|
||||||
|
|
||||||
|
### Unit Tests
|
||||||
|
|
||||||
|
- Config loading and validation
|
||||||
|
- Action parsing
|
||||||
|
- Severity level ordering
|
||||||
|
- Re-escalation logic
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
|
||||||
|
- Create escalation → bead exists
|
||||||
|
- Acknowledge → label updated
|
||||||
|
- Stale detection → re-escalation triggers
|
||||||
|
- Route execution → all actions called
|
||||||
|
|
||||||
|
### Manual Testing
|
||||||
|
|
||||||
|
1. `gt escalate --severity=low --subject="Test" --body="Testing"`
|
||||||
|
2. `gt escalate list --unacked`
|
||||||
|
3. `gt escalate ack <id>`
|
||||||
|
4. Wait for stale threshold, run `gt escalate stale`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
### Internal Dependencies (task order)
|
||||||
|
|
||||||
|
```
|
||||||
|
gt-i9r20.2 (Config Schema)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
gt-i9r20.1 (gt escalate command)
|
||||||
|
│
|
||||||
|
├──▶ gt-i9r20.4 (gt escalate ack)
|
||||||
|
│
|
||||||
|
└──▶ gt-i9r20.3 (Stale patrol)
|
||||||
|
```
|
||||||
|
|
||||||
|
### External Dependencies
|
||||||
|
|
||||||
|
- `bd create` for creating escalation beads
|
||||||
|
- `bd list` for querying escalations
|
||||||
|
- `bd label` for updating labels
|
||||||
|
- `gt mail send` for mail action
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions (Resolved)
|
||||||
|
|
||||||
|
1. **Where to store config?** → `settings/escalation.json` (follows existing pattern)
|
||||||
|
2. **How to implement email/SMS?** → Start with stubs, add real impl based on infrastructure
|
||||||
|
3. **Stale check: patrol step or plugin?** → Start as patrol step, can migrate to plugin
|
||||||
|
4. **Escalation bead type?** → `type: escalation` (new bead type)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
1. **Slack integration**: Post to Slack channels
|
||||||
|
2. **PagerDuty integration**: Create incidents
|
||||||
|
3. **Escalation dashboard**: Web UI for escalation management
|
||||||
|
4. **Scheduled escalations**: "Remind me in 2h if not resolved"
|
||||||
|
5. **Escalation templates**: Pre-defined escalation types
|
||||||
485
docs/design/plugin-system.md
Normal file
485
docs/design/plugin-system.md
Normal file
@@ -0,0 +1,485 @@
|
|||||||
|
# Plugin System Design
|
||||||
|
|
||||||
|
> Design document for the Gas Town plugin system.
|
||||||
|
> Written 2026-01-11, crew/george session.
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Gas Town needs extensible, project-specific automation that runs during Deacon patrol cycles. The immediate use case is rebuilding stale binaries (gt, bd, wv), but the pattern generalizes to any periodic maintenance task.
|
||||||
|
|
||||||
|
Current state:
|
||||||
|
- Plugin infrastructure exists conceptually (patrol step mentions it)
|
||||||
|
- `~/gt/plugins/` directory exists with README
|
||||||
|
- No actual plugins in production use
|
||||||
|
- No formalized execution model
|
||||||
|
|
||||||
|
## Design Principles Applied
|
||||||
|
|
||||||
|
### Discover, Don't Track
|
||||||
|
> Reality is truth. State is derived.
|
||||||
|
|
||||||
|
Plugin state (last run, run count, results) lives on the ledger as wisps, not in shadow state files. Gate evaluation queries the ledger directly.
|
||||||
|
|
||||||
|
### ZFC: Zero Framework Cognition
|
||||||
|
> Agent decides. Go transports.
|
||||||
|
|
||||||
|
The Deacon (agent) evaluates gates and decides whether to dispatch. Go code provides transport (`gt dog dispatch`) but doesn't make decisions.
|
||||||
|
|
||||||
|
### MEOW Stack Integration
|
||||||
|
|
||||||
|
| Layer | Plugin Analog |
|
||||||
|
|-------|---------------|
|
||||||
|
| **M**olecule | `plugin.md` - work template with TOML frontmatter |
|
||||||
|
| **E**phemeral | Plugin-run wisps - high-volume, digestible |
|
||||||
|
| **O**bservable | Plugin runs appear in `bd activity` feed |
|
||||||
|
| **W**orkflow | Gate → Dispatch → Execute → Record → Digest |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Plugin Locations
|
||||||
|
|
||||||
|
```
|
||||||
|
~/gt/
|
||||||
|
├── plugins/ # Town-level plugins (universal)
|
||||||
|
│ └── README.md
|
||||||
|
├── gastown/
|
||||||
|
│ └── plugins/ # Rig-level plugins
|
||||||
|
│ └── rebuild-gt/
|
||||||
|
│ └── plugin.md
|
||||||
|
├── beads/
|
||||||
|
│ └── plugins/
|
||||||
|
│ └── rebuild-bd/
|
||||||
|
│ └── plugin.md
|
||||||
|
└── wyvern/
|
||||||
|
└── plugins/
|
||||||
|
└── rebuild-wv/
|
||||||
|
└── plugin.md
|
||||||
|
```
|
||||||
|
|
||||||
|
**Town-level** (`~/gt/plugins/`): Universal plugins that apply everywhere.
|
||||||
|
**Rig-level** (`<rig>/plugins/`): Project-specific plugins.
|
||||||
|
|
||||||
|
The Deacon scans both locations during patrol.
|
||||||
|
|
||||||
|
### Execution Model: Dog Dispatch
|
||||||
|
|
||||||
|
**Key insight**: Plugin execution should not block Deacon patrol.
|
||||||
|
|
||||||
|
Dogs are reusable workers designed for infrastructure tasks. Plugin execution is dispatched to dogs:
|
||||||
|
|
||||||
|
```
|
||||||
|
Deacon Patrol Dog Worker
|
||||||
|
───────────────── ─────────────────
|
||||||
|
1. Scan plugins
|
||||||
|
2. Evaluate gates
|
||||||
|
3. For open gates:
|
||||||
|
└─ gt dog dispatch plugin ──→ 4. Execute plugin
|
||||||
|
(non-blocking) 5. Create result wisp
|
||||||
|
6. Send DOG_DONE
|
||||||
|
4. Continue patrol
|
||||||
|
...
|
||||||
|
5. Process DOG_DONE ←── (next cycle)
|
||||||
|
```
|
||||||
|
|
||||||
|
Benefits:
|
||||||
|
- Deacon stays responsive
|
||||||
|
- Multiple plugins can run concurrently (different dogs)
|
||||||
|
- Plugin failures don't stall patrol
|
||||||
|
- Consistent with Dogs' purpose (infrastructure work)
|
||||||
|
|
||||||
|
### State Tracking: Wisps on the Ledger
|
||||||
|
|
||||||
|
Each plugin run creates a wisp:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bd wisp create \
|
||||||
|
--label type:plugin-run \
|
||||||
|
--label plugin:rebuild-gt \
|
||||||
|
--label rig:gastown \
|
||||||
|
--label result:success \
|
||||||
|
--body "Rebuilt gt: abc123 → def456 (5 commits)"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gate evaluation** queries wisps instead of state files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Cooldown check: any runs in last hour?
|
||||||
|
bd list --type=wisp --label=plugin:rebuild-gt --since=1h --limit=1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Derived state** (no state.json needed):
|
||||||
|
|
||||||
|
| Query | Command |
|
||||||
|
|-------|---------|
|
||||||
|
| Last run time | `bd list --label=plugin:X --limit=1 --json` |
|
||||||
|
| Run count | `bd list --label=plugin:X --json \| jq length` |
|
||||||
|
| Last result | Parse `result:` label from latest wisp |
|
||||||
|
| Failure rate | Count `result:failure` vs total |
|
||||||
|
|
||||||
|
### Digest Pattern
|
||||||
|
|
||||||
|
Like cost digests, plugin wisps accumulate and get squashed daily:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt plugin digest --yesterday
|
||||||
|
```
|
||||||
|
|
||||||
|
Creates: `Plugin Digest 2026-01-10` bead with summary
|
||||||
|
Deletes: Individual plugin-run wisps from that day
|
||||||
|
|
||||||
|
This keeps the ledger clean while preserving audit history.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Plugin Format Specification
|
||||||
|
|
||||||
|
### File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
rebuild-gt/
|
||||||
|
└── plugin.md # Definition with TOML frontmatter
|
||||||
|
```
|
||||||
|
|
||||||
|
### plugin.md Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
+++
|
||||||
|
name = "rebuild-gt"
|
||||||
|
description = "Rebuild stale gt binary from source"
|
||||||
|
version = 1
|
||||||
|
|
||||||
|
[gate]
|
||||||
|
type = "cooldown"
|
||||||
|
duration = "1h"
|
||||||
|
|
||||||
|
[tracking]
|
||||||
|
labels = ["plugin:rebuild-gt", "rig:gastown", "category:maintenance"]
|
||||||
|
digest = true
|
||||||
|
|
||||||
|
[execution]
|
||||||
|
timeout = "5m"
|
||||||
|
notify_on_failure = true
|
||||||
|
+++
|
||||||
|
|
||||||
|
# Rebuild gt Binary
|
||||||
|
|
||||||
|
Instructions for the dog worker to execute...
|
||||||
|
```
|
||||||
|
|
||||||
|
### TOML Frontmatter Schema
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# Required
|
||||||
|
name = "string" # Unique plugin identifier
|
||||||
|
description = "string" # Human-readable description
|
||||||
|
version = 1 # Schema version (for future evolution)
|
||||||
|
|
||||||
|
[gate]
|
||||||
|
type = "cooldown|cron|condition|event|manual"
|
||||||
|
# Type-specific fields:
|
||||||
|
duration = "1h" # For cooldown
|
||||||
|
schedule = "0 9 * * *" # For cron
|
||||||
|
check = "gt stale -q" # For condition (exit 0 = run)
|
||||||
|
on = "startup" # For event
|
||||||
|
|
||||||
|
[tracking]
|
||||||
|
labels = ["label:value", ...] # Labels for execution wisps
|
||||||
|
digest = true|false # Include in daily digest
|
||||||
|
|
||||||
|
[execution]
|
||||||
|
timeout = "5m" # Max execution time
|
||||||
|
notify_on_failure = true # Escalate on failure
|
||||||
|
severity = "low" # Escalation severity if failed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Gate Types
|
||||||
|
|
||||||
|
| Type | Config | Behavior |
|
||||||
|
|------|--------|----------|
|
||||||
|
| `cooldown` | `duration = "1h"` | Query wisps, run if none in window |
|
||||||
|
| `cron` | `schedule = "0 9 * * *"` | Run on cron schedule |
|
||||||
|
| `condition` | `check = "cmd"` | Run check command, run if exit 0 |
|
||||||
|
| `event` | `on = "startup"` | Run on Deacon startup |
|
||||||
|
| `manual` | (no gate section) | Never auto-run, dispatch explicitly |
|
||||||
|
|
||||||
|
### Instructions Section
|
||||||
|
|
||||||
|
The markdown body after the frontmatter contains agent-executable instructions. The dog worker reads and executes these steps.
|
||||||
|
|
||||||
|
Standard sections:
|
||||||
|
- **Detection**: Check if action is needed
|
||||||
|
- **Action**: The actual work
|
||||||
|
- **Record Result**: Create the execution wisp
|
||||||
|
- **Notification**: On success/failure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Escalation System
|
||||||
|
|
||||||
|
### Problem
|
||||||
|
|
||||||
|
Current escalation is ad-hoc "mail Mayor". Issues:
|
||||||
|
- Mayor gets backlogged easily
|
||||||
|
- No severity differentiation
|
||||||
|
- No alternative channels (email, SMS, etc.)
|
||||||
|
- No tracking of stale escalations
|
||||||
|
|
||||||
|
### Solution: Unified Escalation API
|
||||||
|
|
||||||
|
New command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate \
|
||||||
|
--severity=<low|medium|high|critical> \
|
||||||
|
--subject="Plugin FAILED: rebuild-gt" \
|
||||||
|
--body="Build failed: make returned exit code 2" \
|
||||||
|
--source="plugin:rebuild-gt"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Escalation Routing
|
||||||
|
|
||||||
|
The command reads town config (`~/gt/config.json` or similar) for routing rules:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"escalation": {
|
||||||
|
"routes": {
|
||||||
|
"low": ["bead"],
|
||||||
|
"medium": ["bead", "mail:mayor"],
|
||||||
|
"high": ["bead", "mail:mayor", "email:human"],
|
||||||
|
"critical": ["bead", "mail:mayor", "email:human", "sms:human"]
|
||||||
|
},
|
||||||
|
"contacts": {
|
||||||
|
"human_email": "steve@example.com",
|
||||||
|
"human_sms": "+1234567890"
|
||||||
|
},
|
||||||
|
"stale_threshold": "4h"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Escalation Actions
|
||||||
|
|
||||||
|
| Action | Behavior |
|
||||||
|
|--------|----------|
|
||||||
|
| `bead` | Create escalation bead with severity label |
|
||||||
|
| `mail:mayor` | Send mail to mayor/ |
|
||||||
|
| `email:human` | Send email via configured service |
|
||||||
|
| `sms:human` | Send SMS via configured service |
|
||||||
|
|
||||||
|
### Escalation Beads
|
||||||
|
|
||||||
|
Every escalation creates a bead:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
type: escalation
|
||||||
|
status: open
|
||||||
|
labels:
|
||||||
|
- severity:high
|
||||||
|
- source:plugin:rebuild-gt
|
||||||
|
- acknowledged:false
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stale Escalation Patrol
|
||||||
|
|
||||||
|
A patrol step (or plugin!) checks for unacknowledged escalations:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bd list --type=escalation --label=acknowledged:false --older-than=4h
|
||||||
|
```
|
||||||
|
|
||||||
|
Stale escalations get re-escalated at higher severity.
|
||||||
|
|
||||||
|
### Acknowledging Escalations
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate ack <bead-id>
|
||||||
|
# Sets label acknowledged:true
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## New Commands Required
|
||||||
|
|
||||||
|
### gt stale
|
||||||
|
|
||||||
|
Expose binary staleness check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt stale # Human-readable output
|
||||||
|
gt stale --json # Machine-readable
|
||||||
|
gt stale --quiet # Exit code only (0=stale, 1=fresh)
|
||||||
|
```
|
||||||
|
|
||||||
|
### gt dog dispatch
|
||||||
|
|
||||||
|
Formalized plugin dispatch to dogs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt dog dispatch --plugin <name> [--rig <rig>]
|
||||||
|
```
|
||||||
|
|
||||||
|
This:
|
||||||
|
1. Finds the plugin definition
|
||||||
|
2. Slinga a standardized work unit to an idle dog
|
||||||
|
3. Returns immediately (non-blocking)
|
||||||
|
|
||||||
|
### gt escalate
|
||||||
|
|
||||||
|
Unified escalation API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt escalate \
|
||||||
|
--severity=<level> \
|
||||||
|
--subject="..." \
|
||||||
|
--body="..." \
|
||||||
|
[--source="..."]
|
||||||
|
|
||||||
|
gt escalate ack <bead-id>
|
||||||
|
gt escalate list [--severity=...] [--stale]
|
||||||
|
```
|
||||||
|
|
||||||
|
### gt plugin
|
||||||
|
|
||||||
|
Plugin management:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt plugin list # List all plugins
|
||||||
|
gt plugin show <name> # Show plugin details
|
||||||
|
gt plugin run <name> [--force] # Manual trigger
|
||||||
|
gt plugin digest [--yesterday] # Squash wisps to digest
|
||||||
|
gt plugin history <name> # Show execution history
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Foundation
|
||||||
|
|
||||||
|
1. **`gt stale` command** - Expose CheckStaleBinary() via CLI
|
||||||
|
2. **Plugin format spec** - Finalize TOML schema
|
||||||
|
3. **Plugin scanning** - Deacon scans town + rig plugin dirs
|
||||||
|
|
||||||
|
### Phase 2: Execution
|
||||||
|
|
||||||
|
4. **`gt dog dispatch --plugin`** - Formalized dog dispatch
|
||||||
|
5. **Plugin execution in dogs** - Dog reads plugin.md, executes
|
||||||
|
6. **Wisp creation** - Record results on ledger
|
||||||
|
|
||||||
|
### Phase 3: Gates & State
|
||||||
|
|
||||||
|
7. **Gate evaluation** - Cooldown via wisp query
|
||||||
|
8. **Other gate types** - Cron, condition, event
|
||||||
|
9. **Plugin digest** - Daily squash of plugin wisps
|
||||||
|
|
||||||
|
### Phase 4: Escalation
|
||||||
|
|
||||||
|
10. **`gt escalate` command** - Unified escalation API
|
||||||
|
11. **Escalation routing** - Config-driven multi-channel
|
||||||
|
12. **Stale escalation patrol** - Check unacknowledged
|
||||||
|
|
||||||
|
### Phase 5: First Plugin
|
||||||
|
|
||||||
|
13. **`rebuild-gt` plugin** - The actual gastown plugin
|
||||||
|
14. **Documentation** - So Beads/Wyvern can create theirs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example: rebuild-gt Plugin
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
+++
|
||||||
|
name = "rebuild-gt"
|
||||||
|
description = "Rebuild stale gt binary from gastown source"
|
||||||
|
version = 1
|
||||||
|
|
||||||
|
[gate]
|
||||||
|
type = "cooldown"
|
||||||
|
duration = "1h"
|
||||||
|
|
||||||
|
[tracking]
|
||||||
|
labels = ["plugin:rebuild-gt", "rig:gastown", "category:maintenance"]
|
||||||
|
digest = true
|
||||||
|
|
||||||
|
[execution]
|
||||||
|
timeout = "5m"
|
||||||
|
notify_on_failure = true
|
||||||
|
severity = "medium"
|
||||||
|
+++
|
||||||
|
|
||||||
|
# Rebuild gt Binary
|
||||||
|
|
||||||
|
Checks if the gt binary is stale (built from older commit than HEAD) and rebuilds.
|
||||||
|
|
||||||
|
## Gate Check
|
||||||
|
|
||||||
|
The Deacon evaluates this before dispatch. If gate closed, skip.
|
||||||
|
|
||||||
|
## Detection
|
||||||
|
|
||||||
|
Check binary staleness:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gt stale --json
|
||||||
|
```
|
||||||
|
|
||||||
|
If `"stale": false`, record success wisp and exit early.
|
||||||
|
|
||||||
|
## Action
|
||||||
|
|
||||||
|
Rebuild from source:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/gt/gastown/crew/george && make build && make install
|
||||||
|
```
|
||||||
|
|
||||||
|
## Record Result
|
||||||
|
|
||||||
|
On success:
|
||||||
|
```bash
|
||||||
|
bd wisp create \
|
||||||
|
--label type:plugin-run \
|
||||||
|
--label plugin:rebuild-gt \
|
||||||
|
--label rig:gastown \
|
||||||
|
--label result:success \
|
||||||
|
--body "Rebuilt gt: $OLD → $NEW ($N commits)"
|
||||||
|
```
|
||||||
|
|
||||||
|
On failure:
|
||||||
|
```bash
|
||||||
|
bd wisp create \
|
||||||
|
--label type:plugin-run \
|
||||||
|
--label plugin:rebuild-gt \
|
||||||
|
--label rig:gastown \
|
||||||
|
--label result:failure \
|
||||||
|
--body "Build failed: $ERROR"
|
||||||
|
|
||||||
|
gt escalate --severity=medium \
|
||||||
|
--subject="Plugin FAILED: rebuild-gt" \
|
||||||
|
--body="$ERROR" \
|
||||||
|
--source="plugin:rebuild-gt"
|
||||||
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. **Plugin discovery in multiple clones**: If gastown has crew/george, crew/max, crew/joe - which clone's plugins/ dir is canonical? Probably: scan all, dedupe by name, prefer rig-root if exists.
|
||||||
|
|
||||||
|
2. **Dog assignment**: Should specific plugins prefer specific dogs? Or any idle dog?
|
||||||
|
|
||||||
|
3. **Plugin dependencies**: Can plugins depend on other plugins? Probably not in v1.
|
||||||
|
|
||||||
|
4. **Plugin disable/enable**: How to temporarily disable a plugin without deleting it? Label on a plugin bead? `enabled = false` in frontmatter?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- PRIMING.md - Core design principles
|
||||||
|
- mol-deacon-patrol.formula.toml - Patrol step plugin-run
|
||||||
|
- ~/gt/plugins/README.md - Current plugin stub
|
||||||
Reference in New Issue
Block a user