From 0c9c3f556346ffa567867005bc6dc18c859349d0 Mon Sep 17 00:00:00 2001 From: Steve Yegge Date: Tue, 16 Dec 2025 18:07:15 -0800 Subject: [PATCH] docs: expand federation section with Outpost architecture MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Replace stub "Future: Federation" with comprehensive section - Add Outpost interface and Worker interface definitions - Document Cloud Run workers with persistent connections - Document SSH/VM outposts for long-running work - Add configuration example (outposts.yaml) - Add architecture diagram showing outpost hierarchy - Add Key Design Decision #14: Outpost Abstraction Also reopened gt-u1j.6 and gt-u1j.12 (mail tasks) since beads mail support (bd-kwro) is not yet complete. πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- docs/architecture.md | 167 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 161 insertions(+), 6 deletions(-) diff --git a/docs/architecture.md b/docs/architecture.md index 77e187ed..7d3f7d86 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -750,6 +750,27 @@ sequenceDiagram - If at limit, wait for workers to complete - Prioritize higher-priority ready issues +### 14. Outpost Abstraction for Federation + +**Decision**: Federation uses an "Outpost" abstraction to support multiple compute backends (local, SSH/VM, Cloud Run, etc.) through a unified interface. + +**Rationale**: +- Different workloads need different compute: burst vs long-running, cheap vs fast +- Cloud Run's pay-per-use model is ideal for elastic burst capacity +- VMs are better for autonomous long-running work +- Local is always the default for development +- Platform flexibility lets users choose based on their needs and budget + +**Key insight**: Cloud Run's persistent HTTP/2 connections solve the "zero to one" cold start problem, making container workers viable for interactive-ish work at ~$0.017 per 5-minute session. + +**Design principles**: +1. **Local-first** - Remote outposts are overflow, not primary +2. **Git remains source of truth** - All outposts sync via git +3. **HTTP for Cloud Run** - Don't force filesystem mail onto containers +4. **Graceful degradation** - System works with any subset of outposts + +**See**: `docs/federation-design.md` for full architectural analysis. + ## Multi-Wave Work Processing For large task trees (like implementing GGT itself), workers can process multiple "waves" of work automatically based on the dependency graph. @@ -1070,13 +1091,147 @@ Gas Town is designed for resilience. Common failure modes and their recovery: Run `gt doctor` regularly. Run `gt doctor --fix` to auto-repair issues. -## Future: Federation +## Federation: Outposts -Federation enables work distribution across multiple machines via SSH. Not yet implemented, but the architecture supports: -- Machine registry (local, ssh, gcp) -- Extended addressing: `[machine:]rig/polecat` -- Cross-machine mail routing -- Remote session management +Federation enables Gas Town to scale across machines via **Outposts** - remote compute environments that can run workers. + +**Full design**: See `docs/federation-design.md` + +### Outpost Types + +| Type | Description | Cost Model | Best For | +|------|-------------|------------|----------| +| Local | Current tmux model | Free | Development, primary work | +| SSH/VM | Full Gas Town clone on VM | Always-on | Long-running, autonomous | +| CloudRun | Container workers on GCP | Pay-per-use | Burst, elastic, background | + +### Core Abstraction + +```go +type Outpost interface { + Name() string + Type() OutpostType // local, ssh, cloudrun + MaxWorkers() int + ActiveWorkers() int + Spawn(issue string, config WorkerConfig) (Worker, error) + Workers() []Worker + Ping() error +} + +type Worker interface { + ID() string + Outpost() string + Status() WorkerStatus // idle, working, done, failed + Issue() string + Attach() error // for interactive outposts + Logs() (io.Reader, error) + Stop() error +} +``` + +### Configuration + +```yaml +# ~/ai/config/outposts.yaml +outposts: + - name: local + type: local + max_workers: 4 + + - name: gce-burst + type: ssh + host: 10.0.0.5 + user: steve + town_path: /home/steve/ai + max_workers: 8 + + - name: cloudrun-burst + type: cloudrun + project: my-gcp-project + region: us-central1 + service: gastown-worker + max_workers: 20 + cost_cap_hourly: 5.00 + +policy: + default_preference: [local, gce-burst, cloudrun-burst] +``` + +### Cloud Run Workers + +Cloud Run enables elastic, pay-per-use workers: +- **Persistent HTTP/2 connections** solve cold start (zero-to-one) problem +- **Cost**: ~$0.017 per 5-minute worker session +- **Scaling**: 0β†’N automatically based on demand +- **When idle**: Scales to zero, costs nothing + +Workers receive work via HTTP, clone code from git, run Claude, push results. No filesystem mail needed - HTTP is the control plane. + +### SSH/VM Outposts + +Full Gas Town clone on remote machines: +- **Model**: Complete town installation via SSH +- **Workers**: Remote tmux sessions +- **Sync**: Git for code and beads +- **Good for**: Long-running work, full autonomy if disconnected + +### Design Principles + +1. **Outpost abstraction** - Support multiple backends via unified interface +2. **Local-first** - Remote outposts are for overflow/burst, not primary +3. **Git as source of truth** - Code and beads sync everywhere +4. **HTTP for Cloud Run** - Don't force mail onto stateless containers +5. **Graceful degradation** - System works with any subset of outposts + +### Architecture Diagram + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ MAYOR β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Outpost Manager β”‚ β”‚ +β”‚ β”‚ - Tracks all registered outposts β”‚ β”‚ +β”‚ β”‚ - Routes work to appropriate outpost β”‚ β”‚ +β”‚ β”‚ - Monitors worker status across outposts β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β–Ό β–Ό β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Local β”‚ β”‚ SSH β”‚ β”‚ CloudRun β”‚ β”‚ +β”‚ β”‚ Outpost β”‚ β”‚ Outpost β”‚ β”‚ Outpost β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ β”‚ + β–Ό β–Ό β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ tmux β”‚ β”‚ SSH β”‚ β”‚ HTTP/2 β”‚ + β”‚ panes β”‚ β”‚sessions β”‚ β”‚ connections β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Git Repos β”‚ + β”‚ (code + beads) β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### CLI Commands + +```bash +gt outpost list # List configured outposts +gt outpost status [name] # Detailed status +gt outpost add ... # Add new outpost +gt outpost ping # Test connectivity +``` + +### Implementation Status + +Federation is tracked in **gt-9a2** (P3 epic). Key tasks: +- `gt-9a2.1`: Outpost/Worker interfaces +- `gt-9a2.2`: LocalOutpost (refactor current spawning) +- `gt-9a2.5`: SSHOutpost +- `gt-9a2.8`: CloudRunOutpost ## Implementation Status