diff --git a/docs/federation.md b/docs/federation.md index 4c9a38d6..7b8983be 100644 --- a/docs/federation.md +++ b/docs/federation.md @@ -7,6 +7,19 @@ Federation enables multiple Gas Town instances to reference each other's work, coordinate across organizations, and track distributed projects. +## Why Federation? + +Real enterprise projects don't live in a single repo: + +- **Microservices:** 50 repos, tight dependencies, coordinated releases +- **Platform teams:** Shared libraries used by dozens of downstream projects +- **Contractors:** External teams working on components you need to track +- **Acquisitions:** New codebases that need to integrate with existing work + +Traditional tools force you to choose: unified tracking (monorepo) or team +autonomy (multi-repo with fragmented visibility). Federation provides both: +each workspace is autonomous, but cross-workspace references are first-class. + ## Entity Model ### Three Levels @@ -217,3 +230,17 @@ Completion cascades up. Attribution preserved. 3. **Git-native** - Federation uses git mechanics (remotes, refs) 4. **Incremental** - Works standalone, gains power with federation 5. **Privacy-preserving** - Each entity controls their chain visibility + +## Enterprise Benefits + +| Challenge | Without Federation | With Federation | +|-----------|-------------------|-----------------| +| Cross-repo dependencies | "Check with backend team" | Explicit dependency tracking | +| Contractor visibility | Email updates, status calls | Live status, same tooling | +| Release coordination | Spreadsheets, Slack threads | Unified timeline view | +| Agent attribution | Per-repo, fragmented | Cross-workspace CV | +| Compliance audit | Stitch together logs | Query across workspaces | + +Federation isn't just about connecting repos - it's about treating distributed +engineering as a first-class concern, with the same visibility and tooling +you'd expect from a monorepo, while preserving team autonomy. diff --git a/docs/identity.md b/docs/identity.md index 2a142d02..40d6459b 100644 --- a/docs/identity.md +++ b/docs/identity.md @@ -2,6 +2,19 @@ > Canonical format for agent identity in Gas Town +## Why Identity Matters + +When you deploy AI agents at scale, anonymous work creates real problems: + +- **Debugging:** "The AI broke it" isn't actionable. *Which* AI? +- **Quality tracking:** You can't improve what you can't measure. +- **Compliance:** Auditors ask "who approved this code?" - you need an answer. +- **Performance management:** Some agents are better than others at certain tasks. + +Gas Town solves this with **universal attribution**: every action, every commit, +every bead update is linked to a specific agent identity. This enables work +history tracking, capability-based routing, and objective quality measurement. + ## BD_ACTOR Format Convention The `BD_ACTOR` environment variable identifies agents in slash-separated path format. @@ -200,8 +213,42 @@ bd cv steve@example.com # Discovers all towns, aggregates work, derives skills ``` -### Enterprise Framing - -"Work attribution for audit and compliance. Track which agents produce clean work. Enable cross-project visibility into developer productivity and skill development." - See `~/gt/docs/hop/decisions/008-identity-model.md` for architectural rationale. + +## Enterprise Use Cases + +### Compliance and Audit + +```bash +# Who touched this file in the last 90 days? +git log --since="90 days ago" -- path/to/sensitive/file.go + +# All changes by a specific agent +bd audit --actor=gastown/polecats/toast --since=2025-01-01 +``` + +### Performance Tracking + +```bash +# Completion rate by agent +bd stats --group-by=actor + +# Average time to completion +bd stats --actor=gastown/polecats/* --metric=cycle-time +``` + +### Model Comparison + +When agents use different underlying models, attribution enables A/B comparison: + +```bash +# Tag agents by model +# gastown/polecats/claude-1 uses Claude +# gastown/polecats/gpt-1 uses GPT-4 + +# Compare quality signals +bd stats --actor=gastown/polecats/claude-* --metric=revision-count +bd stats --actor=gastown/polecats/gpt-* --metric=revision-count +``` + +Lower revision counts suggest higher first-pass quality. diff --git a/docs/understanding-gas-town.md b/docs/understanding-gas-town.md index 7a2900e9..74296e32 100644 --- a/docs/understanding-gas-town.md +++ b/docs/understanding-gas-town.md @@ -3,6 +3,19 @@ This document provides a conceptual overview of Gas Town's architecture, focusing on the role taxonomy and how different agents interact. +## Why Gas Town Exists + +As AI agents become central to engineering workflows, teams face new challenges: + +- **Accountability:** Who did what? Which agent introduced this bug? +- **Quality:** Which agents are reliable? Which need tuning? +- **Efficiency:** How do you route work to the right agent? +- **Scale:** How do you coordinate agents across repos and teams? + +Gas Town is an orchestration layer that treats AI agent work as structured data. +Every action is attributed. Every agent has a track record. Every piece of work +has provenance. See [Why These Features](why-these-features.md) for the full rationale. + ## Role Taxonomy Gas Town has several agent types, each with distinct responsibilities and lifecycles. @@ -186,6 +199,27 @@ All Gas Town agents follow the same core principle: This applies regardless of role. The hook is your assignment. Execute it immediately without waiting for confirmation. Gas Town is a steam engine - agents are pistons. +## Model Evaluation and A/B Testing + +Gas Town's attribution and work history features enable objective model comparison: + +```bash +# Deploy different models on similar tasks +gt sling gt-abc gastown --model=claude-sonnet +gt sling gt-def gastown --model=gpt-4 + +# Compare outcomes +bd stats --actor=gastown/polecats/* --group-by=model +``` + +Because every task has completion time, quality signals, and revision count, +you can make data-driven decisions about which models to deploy where. + +This is particularly valuable for: +- **Model selection:** Which model handles your codebase best? +- **Capability mapping:** Claude for architecture, GPT for tests? +- **Cost optimization:** When is a smaller model sufficient? + ## Common Mistakes 1. **Using dogs for user work**: Dogs are Deacon infrastructure. Use crew or polecats. diff --git a/docs/why-these-features.md b/docs/why-these-features.md new file mode 100644 index 00000000..d3564e05 --- /dev/null +++ b/docs/why-these-features.md @@ -0,0 +1,260 @@ +# Why These Features? + +> Gas Town's architecture explained through enterprise AI challenges + +## The Problem + +You have AI agents. Maybe a lot of them. They're writing code, reviewing PRs, +fixing bugs, adding features. But you can't answer basic questions: + +- **Who did what?** Which agent wrote this buggy code? +- **Who's reliable?** Which agents consistently deliver quality? +- **Who can do this?** Which agent should handle this Go refactor? +- **What's connected?** Does this frontend change depend on a backend PR? +- **What's the full picture?** How's the project doing across 12 repos? + +Traditional tools don't help. CI/CD tracks builds, not capability. Git tracks +commits, not agent performance. Project management tracks tickets, not the +nuanced reality of who actually did what, and how well. + +## The Solution: A Work Ledger + +Gas Town treats work as structured data. Every action is recorded. Every agent +has a track record. Every piece of work has provenance. + +This isn't about surveillance. It's about **visibility** - the same visibility +you'd expect from any serious engineering system. + +--- + +## Feature: Entity Tracking and Attribution + +**The problem:** You deploy 50 agents across 10 projects. One of them introduces +a critical bug. Which one? Traditional git blame shows a generic "AI Assistant" +or worse, the human's name. + +**The solution:** Every Gas Town agent has a distinct identity. Every action is +attributed: + +``` +Git commits: gastown/polecats/toast +Beads records: created_by: gastown/crew/joe +Event logs: actor: gastown/polecats/nux +``` + +**Why it matters:** +- **Debugging:** Trace problems to specific agents +- **Compliance:** Audit trails for SOX, GDPR, enterprise policy +- **Accountability:** Know exactly who touched what, when + +--- + +## Feature: Work History (Agent CVs) + +**The problem:** You want to assign a complex Go refactor. You have 20 agents. +Some are great at Go. Some have never touched it. Some are flaky. How do you +choose? + +**The solution:** Every agent accumulates a work history: + +```bash +# What has this agent done? +bd audit --actor=gastown/polecats/toast + +# Success rate on Go projects +bd stats --actor=gastown/polecats/toast --tag=go +``` + +**Why it matters:** +- **Performance management:** Objective data on agent reliability +- **Capability matching:** Route work to proven agents +- **Continuous improvement:** Identify underperforming agents for tuning + +This is particularly valuable when **A/B testing models**. Deploy Claude vs GPT +on similar tasks, track their completion rates and quality, make informed decisions. + +--- + +## Feature: Capability-Based Routing + +**The problem:** You have work in Go, Python, TypeScript, Rust. You have agents +with varying capabilities. Manual assignment doesn't scale. + +**The solution:** Work carries skill requirements. Agents have demonstrated +capabilities (derived from their work history). Matching is automatic: + +```bash +# Agent capabilities (derived from work history) +bd skills gastown/polecats/toast +# → go: 47 tasks, python: 12 tasks, typescript: 3 tasks + +# Route based on fit +gt dispatch gt-xyz --prefer-skill=go +``` + +**Why it matters:** +- **Efficiency:** Right agent for the right task +- **Quality:** Agents work in their strengths +- **Scale:** No human bottleneck on assignment + +--- + +## Feature: Recursive Work Decomposition + +**The problem:** Enterprise projects are complex. A "feature" becomes 50 tasks +across 8 repos involving 4 teams. Flat issue lists don't capture this structure. + +**The solution:** Work decomposes naturally: + +``` +Epic: User Authentication System +├── Feature: Login Flow +│ ├── Task: API endpoint +│ ├── Task: Frontend component +│ └── Task: Integration tests +├── Feature: Session Management +│ └── ... +└── Feature: Password Reset + └── ... +``` + +Each level has its own chain. Roll-ups are automatic. You always know where +you stand. + +**Why it matters:** +- **Visibility:** See the forest and the trees +- **Coordination:** Dependencies are explicit +- **Progress tracking:** Accurate status at every level + +--- + +## Feature: Cross-Project References + +**The problem:** Your frontend can't ship until the backend API lands. They're +in different repos. Traditional tools don't track this. + +**The solution:** Explicit cross-project dependencies: + +```yaml +depends_on: + - beads://github/acme/backend/be-456 # Backend API + - beads://github/acme/shared/sh-789 # Shared types +``` + +**Why it matters:** +- **No surprises:** You know what's blocking +- **Coordination:** Teams see their impact on others +- **Planning:** Realistic schedules based on actual dependencies + +--- + +## Feature: Federation + +**The problem:** Enterprise projects span multiple repositories, multiple teams, +sometimes multiple organizations (contractors, partners). Visibility is fragmented. + +**The solution:** Federated workspaces that reference each other: + +```bash +# Register remote workspace +gt remote add partner hop://partner.com/their-project + +# Query across workspaces +bd list --remote=partner --tag=integration +``` + +**Why it matters:** +- **Enterprise scale:** Not limited to single-repo thinking +- **Contractor coordination:** Track delegated work +- **Distributed teams:** Unified view despite separate repos + +--- + +## Feature: Validation and Quality Gates + +**The problem:** An agent says "done." Is it actually done? Is the code quality +acceptable? Did it pass review? + +**The solution:** Structured validation with attribution: + +```json +{ + "validated_by": "gastown/refinery", + "validation_type": "merge", + "timestamp": "2025-01-15T10:30:00Z", + "quality_signals": { + "tests_passed": true, + "review_approved": true, + "lint_clean": true + } +} +``` + +**Why it matters:** +- **Quality control:** Don't trust, verify +- **Audit trails:** Who approved what, when +- **Process enforcement:** Gates are data, not just policy + +--- + +## Feature: Real-Time Activity Feed + +**The problem:** Complex multi-agent work is opaque. You don't know what's +happening until it's done (or failed). + +**The solution:** Work state as a real-time stream: + +```bash +bd activity --follow + +[14:32:08] + patrol-x7k.arm-ace bonded (5 steps) +[14:32:09] → patrol-x7k.arm-ace.capture in_progress +[14:32:10] ✓ patrol-x7k.arm-ace.capture completed +[14:32:14] ✓ patrol-x7k.arm-ace.decide completed +[14:32:17] ✓ patrol-x7k.arm-ace COMPLETE +``` + +**Why it matters:** +- **Debugging in real-time:** See problems as they happen +- **Status awareness:** Always know what's running +- **Pattern recognition:** Spot bottlenecks and inefficiencies + +--- + +## The Enterprise Value Proposition + +Gas Town is a developer tool - like an IDE, but for AI orchestration. However, +the architecture provides enterprise-grade foundations: + +| Capability | Developer Benefit | Enterprise Benefit | +|------------|-------------------|-------------------| +| Attribution | Debug agent issues | Compliance audits | +| Work history | Tune agent assignments | Performance management | +| Skill routing | Faster task completion | Resource optimization | +| Federation | Multi-repo projects | Cross-org visibility | +| Validation | Quality assurance | Process enforcement | +| Activity feed | Real-time debugging | Operational awareness | + +**For model evaluation:** Deploy different models on comparable tasks, track +outcomes objectively, make data-driven decisions about which models to use where. + +**For long-horizon projects:** See how agents perform not just on single tasks, +but across complex, multi-phase, cross-functional initiatives. + +**For cross-functional teams:** Unified visibility across repos, teams, and +even organizations. + +--- + +## Design Philosophy + +These features aren't bolted on. They're foundational: + +1. **Attribution is not optional.** Every action has an actor. +2. **Work is data.** Not just tickets - structured, queryable data. +3. **History matters.** Track records determine trust. +4. **Scale is assumed.** Multi-repo, multi-agent, multi-org from day one. +5. **Verification over trust.** Quality gates are first-class primitives. + +Gas Town is built to answer the questions enterprises will ask as AI agents +become central to their engineering workflows.