Files
beads/docs/contributor-workflow-analysis.md
Steve Yegge 9e60ed1c97 docs: Complete multi-repo documentation gaps
- Add exclusive lock protocol compatibility (bd-u8j)
- Add library consumer migration guide (bd-824)
- Add self-hosting project guidance (bd-x47)
- Add performance benchmarks and targets (bd-wta)
- Clarify JSONL size bounds: per-repo <25k (bd-4ry)

Closes bd-u8j, bd-824, bd-x47, bd-wta, bd-4ry
2025-11-05 14:19:11 -08:00

1381 lines
45 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Beads Contributor Workflow Analysis
**Date**: 2025-11-03
**Context**: Design discussion on how to handle beads issues in PR/OSS contribution workflows
## The Problem (from #207)
When contributing to OSS projects with beads installed:
- Git hooks automatically commit contributor's personal planning to PRs
- Contributor's experimental musings pollute the upstream project's issue tracker
- No clear ownership/permission model for external contributors
- Difficult to keep beads changes out of commits
**Core tension**: Beads is great for team planning (shared namespace), but breaks down for OSS contributions (hierarchical gatekeeping).
## Key Insights from Discussion
### Beads as "Moving Frontier"
Beads is not a traditional issue tracker. It captures the **active working set** - the sliding window of issues currently under attention:
- Work moves fast with AI agents (10x-50x acceleration)
- Completed work fades quickly (95% never revisited, should be pruned aggressively)
- Future work is mostly blocked (small frontier of ready tasks)
- The frontier is bounded by team size (dozens to hundreds of issues, not thousands)
**Design principle**: Beads should focus on the "what's next" cloud, not long-term planning or historical archive.
### The Git Ledger is Fundamental
Beads achieves reliability despite being unreliable (merge conflicts, sync issues, data staleness) through:
**A. Git is the ledger and immutable backstop for forensics**
**B. AI is the ultimate arbiter and problem-solver when things go wrong**
Any solution that removes the git ledger (e.g., gitignored contributor files) breaks this model entirely.
### Requirements for Contributors
Contributors need:
- Git-backed persistence (multi-clone sync, forensics, AI repair)
- Isolated planning space (don't pollute upstream)
- Ability to propose selected issues upstream
- Support for multiple workers across multiple clones of the same repo
## Proposed Solutions
### Idea 1: Fork-Aware Hooks + Two-File System
**Structure**:
```
# Upstream repo
.beads/
beads.jsonl # Canonical frontier (committed)
.gitignore # Ignores local.jsonl
# Contributor's fork
.beads/
beads.jsonl # Synced from upstream (read-only)
local.jsonl # Contributor planning (committed to fork)
beads.db # Hydrated from both
```
**Detection**: Check for `upstream` remote to distinguish fork from canonical repo
**Workflow**:
```bash
# In fork
$ bd add "Experiment" # → local.jsonl (committed to fork)
$ bd sync # → Pulls upstream's beads.jsonl
$ bd show # → Shows both
$ bd propose bd-a3f8e9 # → Moves issue to beads.jsonl for PR
```
**Pros**:
- Git ledger preserved (local.jsonl committed to fork)
- Multi-clone sync works
- Upstream .gitignore prevents pollution
**Cons**:
- Fork detection doesn't help teams using branches (most common workflow)
- Two files to manage
- Requires discipline to use `bd propose`
### Idea 2: Ownership Metadata + Smart PR Filtering
**Structure**:
```jsonl
{"id":"bd-123","owner":"upstream","title":"Canonical issue",...}
{"id":"bd-456","owner":"stevey","title":"My planning",...}
```
**Workflow**:
```bash
$ bd add "Experiment" # → Creates with owner="stevey"
$ bd propose bd-456 # → Changes owner to "upstream"
$ bd clean-pr # → Filters commit to only upstream-owned issues
$ git push # → PR contains only proposed issues
```
**Pros**:
- Single file (simpler)
- Works with any git workflow (branch, fork, etc)
- Git ledger fully preserved
**Cons**:
- Requires discipline to run `bd clean-pr`
- Clean commit is awkward (temporarily removing data)
- Merge conflicts if upstream and contributor both modify beads.jsonl
### Idea 3: Branch-Scoped Databases
Track which issues belong to which branch, filter at PR time.
**Implementation**: Similar to #2 but uses labels/metadata to track branch instead of owner.
**Challenge**: Complex with multiple feature branches, requires tracking branch scope.
### Idea 4: Separate Planning Repo (Most Isolated)
**Structure**:
```bash
# Main project repos (many)
~/projects/beads/.beads/beads.jsonl
~/projects/foo/.beads/beads.jsonl
# Single planning repo (one)
~/.beads-planning/.beads/beads.jsonl
# Configuration links them
~/projects/beads/.beads/config.toml:
planning_repo = "~/.beads-planning"
```
**Workflow**:
```bash
$ cd ~/projects/beads
$ bd add "My idea" # → Commits to ~/.beads-planning/
$ bd show # → Shows beads canonical + my planning
$ bd propose bd-a3f8e9 # → Exports to beads repo for PR
```
**Pros**:
- Complete isolation (separate git histories, zero PR pollution risk)
- Git ledger fully preserved (both repos tracked)
- Multi-clone works perfectly (clone both repos)
- No special filtering/detection needed
- **Scales better**: One planning repo for all projects
**Cons**:
- Two repos to manage
- Less obvious for new users (where's my planning?)
## Analysis: Fork vs Clone vs Branch
**Clone**: Local copy of a repo (`git clone <url>`)
- `origin` remote points to source
- Push directly to origin (if you have write access)
**Fork**: Server-side copy on GitHub
- For contributors without write access
- `origin` → your fork, `upstream` → original repo
- Push to fork, then PR from fork → upstream
**Branch**: Feature branches in same repo
- Most common for teams with write access
- Push to same repo, PR from branch → main
**Key insight**: Branches are universal, forks are only for external contributors. Most teams work on branches in a shared repo.
## Current Thinking: Idea 4 is Cleanest
After analysis, **separate planning repo (#4)** is likely the best solution because:
1. **Only solution that truly prevents PR pollution** (separate git histories)
2. **Git ledger fully preserved** (both repos tracked)
3. **Multi-clone works perfectly** (just clone both)
4. **No complex filtering/detection needed** (simple config)
5. **Better scaling**: One planning repo across all projects you contribute to
The "managing two repos" concern is actually an advantage: your planning is centralized and project-agnostic.
## Open Questions
### About the Workflow
1. **Where does PR pollution actually happen?**
- Scenario A: Feature branch → upstream/main includes all beads changes from that branch?
- Scenario B: Something else?
2. **Multi-clone usage pattern**:
- Multiple clones on different machines?
- All push/pull to same remote?
- Workers coordinate via git sync?
- PRs created from feature branches?
### About Implementation
1. **Proposed issue IDs**: When moving issue from planning → canonical, keep same ID? (Hash-based IDs are globally unique)
2. **Upstream acceptance sync**: If upstream accepts/modifies a proposal, how to sync back to contributor?
- `bd sync` detects accepted proposals
- Moves from planning repo to project's canonical beads.jsonl
3. **Multiple projects**: One planning repo for all projects you contribute to, or one per project?
4. **Backwards compatibility**: Single-user projects unchanged (single beads.jsonl)
5. **Discovery**: How do users discover this feature? Auto-detect and prompt?
## Next Steps
Need to clarify:
1. User's actual multi-clone workflow (understand the real use case)
2. Where exactly PR pollution occurs (branch vs fork workflow)
3. Which solution best fits the "git ledger + multi-clone" requirements
4. Whether centralized planning repo (#4) or per-project isolation (#1/#2) is preferred
## Design Principles to Preserve
From the conversation, these are non-negotiable:
- **Git as ledger**: Everything must be git-tracked for forensics and AI repair
- **Moving frontier**: Focus on active work, aggressively prune completed work
- **Multi-clone sync**: Workers across clones must coordinate via git
- **Small databases**: Keep beads.jsonl small enough for agents to read (<25k per repo, see below)
- **Simple defaults**: Don't break single-user workflows
- **Explicit over implicit**: Clear boundaries between personal and canonical
### JSONL Size Bounds with Multi-Repo
**Critical clarification**: The <25k limit applies **per-repo**, not to total hydrated size.
#### The Rule
**Per-repo limit**: Each individual JSONL file should stay <25k (roughly 100-200 issues depending on metadata).
**Why per-repo, not total**:
1. **Git operations**: Each repo is independently versioned. Git performance depends on per-file size, not aggregate.
2. **AI readability**: Agents read JSONLs for forensics/repair. Reading one 20k file is easy; reading the union of 10 files is still manageable.
3. **Bounded growth**: Total size naturally bounded by number of repos (typically N=1-3, rarely >10).
4. **Pruning granularity**: Completed work is pruned per-repo, keeping each repo's frontier small.
#### Example Scenarios
| Primary | Planning | Team Shared | Total Hydrated | Valid? |
|---------|----------|-------------|----------------|--------|
| 20k | - | - | 20k | ✅ Single-repo, well under limit |
| 20k | 15k | - | 35k | ✅ Each repo <25k (per-repo rule) |
| 20k | 15k | 18k | 53k | ✅ Each repo <25k (per-repo rule) |
| 30k | 15k | - | 45k | ❌ Primary exceeds 25k |
| 20k | 28k | - | 48k | ❌ Planning exceeds 25k |
#### Rationale: Why 25k?
**Agent context limits**: AI agents have finite context windows. A 25k JSONL file is:
- ~100-200 issues with metadata
- ~500-1000 lines of JSON
- Comfortably fits in GPT-4 context (128k tokens)
- Small enough to read/parse in <500ms
**Moving frontier principle**: Beads tracks **active work**, not historical archive. With aggressive pruning:
- Completed issues get compacted/archived
- Blocked work stays dormant
- Only ready + in-progress issues are "hot"
- Typical frontier: 50-100 issues per repo
#### Monitoring Size with Multi-Repo
**Per-repo monitoring**:
```bash
# Check each repo's JSONL size
$ wc -c .beads/beads.jsonl
20480 .beads/beads.jsonl
$ wc -c ~/.beads-planning/beads.jsonl
15360 ~/.beads-planning/beads.jsonl
# Total hydrated size (informational, not a hard limit)
$ expr 20480 + 15360
35840
```
**Automated check**:
```go
// Check all configured repos
for _, repo := range cfg.Repos.All() {
jsonlPath := filepath.Join(repo, "beads.jsonl")
size, _ := getFileSize(jsonlPath)
if size > 25*1024 { // 25k
log.Warnf("Repo %s exceeds 25k limit: %d bytes", repo, size)
}
}
```
#### Pruning Strategy with Multi-Repo
Each repo should be pruned independently:
```bash
# Prune completed work from primary repo
$ bd compact --repo . --older-than 30d
# Prune experimental planning repo
$ bd compact --repo ~/.beads-planning --older-than 7d
# Shared team planning (longer retention)
$ bd compact --repo ~/team-shared/.beads --older-than 90d
```
Different repos can have different retention policies based on their role.
#### Total Size Soft Limit (Guideline Only)
While per-repo limit is the hard rule, consider total hydrated size for performance:
**Guideline**: Keep total hydrated size <100k for optimal performance.
**Why 100k total**:
- SQLite hydration: Parsing 100k JSON still fast (<1s)
- Agent queries: Dependency graphs with 300-500 total issues remain tractable
- Memory footprint: In-memory SQLite comfortably handles 500 issues
**If total exceeds 100k**:
- Not a hard error, but performance may degrade
- Consider pruning completed work more aggressively
- Evaluate if all repos are still needed
- Check if any repos should be archived/removed
#### Summary
| Limit Type | Value | Enforcement |
|------------|-------|-------------|
| **Per-repo (hard limit)** | <25k | ⚠️ Warn if exceeded, agents may struggle |
| **Total hydrated (guideline)** | <100k | Informational, affects performance |
| **Typical usage** | 20k-50k total | ✅ Expected range for active development |
**Bottom line**: Monitor per-repo size (<25k each). Total size naturally bounded by N repos × 25k.
---
# Decision: Separate Repos (Solution #4)
**Date**: 2025-11-03 (continued discussion)
## Why Separate Repos
After consideration, **Solution #4 (Separate Planning Repos)** is the chosen approach:
### Key Rationale
1. **Beads as a Separate Channel**: Beads is fundamentally a separate communication channel that happens to use git/VCS for persistence, not a git-centric tool. It should work with any VCS (jujutsu, sapling, mercurial, etc.).
2. **VCS-Agnostic Design**: Solution #1 (fork detection) is too git-centric and wouldn't work with other version control systems. Separate repos work regardless of VCS.
3. **Maximum Flexibility**: Supports multiple workflows and personas:
- OSS contributor with personal planning
- Multi-phase development (different beads DBs for different stages)
- Multiple personas (architect, implementer, reviewer)
- Team vs personal planning separation
4. **Zero PR Pollution Risk**: Completely separate git histories guarantee no accidental pollution of upstream projects.
5. **Proven Pain Point**: Experience shows that accidental bulk commits (100k issues) can be catastrophic and traumatic to recover from. Complete isolation is worth the complexity.
## Core Architecture Principles
### 1. Multi-Repo Support (N ≥ 1)
**Configuration should support N repos, including N=1 for backward compatibility:**
When N=1 (default), this is the current single-repo workflow - no changes needed.
When N≥2, multiple repos are hydrated together.
```toml
# .beads/config.toml
# Default mode: single repo (backwards compatible)
mode = "single"
# Multi-repo mode
[repos]
# Primary repo: where canonical issues live
primary = "."
# Additional repos to hydrate into the database
additional = [
"~/.beads-planning", # Personal planning across all projects
"~/.beads-work/phase1", # Architecting phase
"~/.beads-work/phase2", # Implementation phase
"~/team-shared/.beads", # Shared team planning
]
# Routing: where do new issues go?
[routing]
mode = "auto" # auto | explicit
default = "~/.beads-planning" # Default for `bd add`
# Auto-detection: based on user permissions
[routing.auto]
maintainer = "." # If maintainer, use primary
contributor = "~/.beads-planning" # Otherwise use planning repo
```
### 2. Hydration Model
On `bd show`, `bd list`, etc., the database hydrates from multiple sources:
```
beads.db ← [
./.beads/beads.jsonl (primary, read-write if maintainer)
~/.beads-planning/beads.jsonl (personal, read-write)
~/team-shared/.beads/beads.jsonl (shared, read-write if team member)
]
```
**Metadata tracking**:
```jsonl
{
"id": "bd-a3f8e9",
"title": "Add dark mode",
"source_repo": "~/.beads-planning", # Which repo owns this issue
"visibility": "local", # local | proposed | canonical
...
}
```
### 3. Visibility States
Issues can be in different states of visibility:
- **local**: Personal planning, only in planning repo
- **proposed**: Exported for upstream consideration (staged for PR)
- **canonical**: In the primary repo (upstream accepted it)
### 4. VCS-Agnostic Operations
Beads should not assume git. Core operations:
- **Sync**: `bd sync` should work with git, jj, hg, sl, etc.
- **Ledger**: Each repo uses whatever VCS it's under (or none)
- **Transport**: Issues move between repos via export/import, not git-specific operations
## Workflow Examples
### Use Case 1: OSS Contributor
```bash
# One-time setup
$ mkdir ~/.beads-planning
$ cd ~/.beads-planning
$ git init && bd init
# Contributing to upstream project
$ cd ~/projects/some-oss-project
$ bd config --add-repo ~/.beads-planning --routing contributor
# Work
$ bd add "Explore dark mode implementation"
# → Goes to ~/.beads-planning/beads.jsonl
# → Commits to planning repo (git tracked, forensic trail)
$ bd show
# → Shows upstream's canonical issues (read-only)
# → Shows my planning issues (read-write)
$ bd work bd-a3f8e9
$ bd status bd-a3f8e9 in-progress
# Ready to propose
$ bd propose bd-a3f8e9 --target upstream
# → Exports issue from planning repo
# → Creates issue in ./beads/beads.jsonl (staged for PR)
# → Marks as visibility="proposed" in planning repo
$ git add .beads/beads.jsonl
$ git commit -m "Propose: Add dark mode"
$ git push origin feature-branch
# → PR contains only the proposed issue, not all my planning
```
### Use Case 2: Multi-Phase Development
```bash
# Setup phases
$ mkdir -p ~/.beads-work/{architecture,implementation,testing}
$ for dir in ~/.beads-work/*; do (cd $dir && git init && bd init); done
# Configure project
$ cd ~/my-big-project
$ bd config --add-repo ~/.beads-work/architecture
$ bd config --add-repo ~/.beads-work/implementation
$ bd config --add-repo ~/.beads-work/testing
# Architecture phase
$ bd add "Design authentication system" --repo ~/.beads-work/architecture
$ bd show --repo ~/.beads-work/architecture
# → Only architecture issues
# Implementation phase (later)
$ bd add "Implement JWT validation" --repo ~/.beads-work/implementation
# View all phases
$ bd show
# → Shows all issues from all configured repos
```
### Use Case 3: Multiple Contributors on Same Project
```bash
# Team member Alice (maintainer)
$ cd ~/project
$ bd add "Fix bug in parser"
# → Goes to ./beads/beads.jsonl (she's maintainer)
# → Commits to project repo
# Team member Bob (contributor)
$ cd ~/project
$ bd add "Explore performance optimization"
# → Goes to ~/.beads-planning/beads.jsonl (he's contributor)
# → Does NOT pollute project repo
$ bd show
# → Sees Alice's canonical issue
# → Sees his own planning
$ bd propose bd-xyz
# → Proposes to Alice's canonical repo
```
## Implementation Outline
### Phase 1: Core Multi-Repo Support
**Commands**:
```bash
bd config --add-repo <path> # Add a repo to hydration
bd config --remove-repo <path> # Remove a repo
bd config --list-repos # Show all configured repos
bd config --routing <mode> # Set routing: single|auto|explicit
```
**Config schema**:
```toml
[repos]
primary = "."
additional = ["path1", "path2", ...]
[routing]
default = "path" # Where `bd add` goes by default
mode = "auto" # auto | explicit
```
**Database changes**:
- Add `source_repo` field to issues
- Hydration layer reads from multiple JSONLs
- Writes go to correct JSONL based on source_repo
### Phase 2: Proposal Flow
**Commands**:
```bash
bd propose <issue-id> [--target <repo>] # Move issue to target repo
bd withdraw <issue-id> # Un-propose (move back)
bd accept <issue-id> # Maintainer accepts proposal
```
**States**:
- `visibility: local` → Personal planning
- `visibility: proposed` → Staged for PR
- `visibility: canonical` → Accepted by upstream
### Phase 3: Routing Rules
**Auto-detection**:
- Detect if user is maintainer (git config, permissions)
- Auto-route to primary vs planning repo
**Config-based routing** (no new schema fields):
```toml
[routing]
mode = "auto" # auto | explicit
default = "~/.beads-planning" # Fallback for contributors
# Auto-detection rules
[routing.auto]
maintainer = "." # If user is maintainer, use primary repo
contributor = "~/.beads-planning" # Otherwise use planning repo
```
**Explicit routing** via CLI flag:
```bash
# Override auto-detection for specific issues
bd add "Design system" --repo ~/.beads-work/architecture
```
**Discovered issue inheritance**:
- Issues with parent_id automatically inherit parent's source_repo
- Keeps related work co-located
### Phase 4: VCS-Agnostic Sync
**Sync operations**:
- Detect VCS type per repo (git, jj, hg, sl)
- Use appropriate sync commands
- Fall back to manual sync if no VCS
**Example**:
```bash
$ bd sync
# Auto-detects:
# - . is git → runs git pull
# - ~/.beads-planning is jj → runs jj git fetch && jj rebase
# - ~/other is hg → runs hg pull && hg update
```
## Migration Path
### Existing Users (Single Repo)
No changes required. Current workflow continues to work:
```bash
$ bd add "Task"
# → .beads/beads.jsonl (as before)
```
### Library Consumers (Go/TypeScript)
**Critical for projects like VC that use beads as a library.**
#### Backward Compatibility (No Changes Required)
Your existing code continues to work unchanged. The storage layer automatically reads `.beads/config.toml` if present:
```go
// Before multi-repo (v0.17.3)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// After multi-repo (v0.18.0+) - EXACT SAME CODE
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// If .beads/config.toml exists, additional repos are auto-hydrated
// If .beads/config.toml doesn't exist, single-repo mode (backward compatible)
```
**What happens automatically**:
1. Storage layer checks for `.beads/config.toml`
2. If found: Reads `repos.additional`, hydrates from all configured repos
3. If not found: Single-repo mode (current behavior)
4. Your code doesn't need to know which mode is active
#### Explicit Multi-Repo Configuration (Optional)
If you need to override config.toml or configure repos programmatically:
```go
// Explicit multi-repo configuration
cfg := beadsLib.Config{
Primary: ".beads/vc.db",
Additional: []string{
filepath.ExpandUser("~/.beads-planning"),
filepath.ExpandUser("~/team-shared/.beads"),
},
}
store, err := beadsLib.NewStorageWithConfig(cfg)
```
**When to use explicit configuration**:
- Testing: Override config for test isolation
- Dynamic repos: Add repos based on runtime conditions
- No config file: Programmatic setup without `.beads/config.toml`
#### When to Use Multi-Repo vs Single-Repo
**Single-repo (default, recommended for most library consumers)**:
```go
// VC executor managing its own database
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// Stays single-repo by default, no config.toml needed
```
**Multi-repo (opt-in for specific use cases)**:
- **Team planning**: VC executor needs to see team-wide issues from shared repo
- **Multi-phase dev**: Different repos for architecture, implementation, testing phases
- **Personal planning**: User wants to track personal experiments separate from VC's canonical DB
**Example: VC with team planning**:
```toml
# .beads/config.toml
[repos]
primary = "."
additional = ["~/team-shared/.beads"]
[routing]
default = "." # VC-generated issues go to primary
```
```go
// VC executor code (unchanged)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// GetReadyWork() now returns issues from:
// - .beads/vc.db (VC-generated issues)
// - ~/team-shared/.beads (team planning)
ready, err := store.GetReadyWork(ctx)
```
#### Migration Checklist for Library Consumers
1. **Test with config.toml**: Create `.beads/config.toml`, verify auto-hydration works
2. **Verify performance**: Ensure multi-repo hydration meets your latency requirements (see Performance section)
3. **Update exclusive locks**: If using locks, decide if you need per-repo or all-repo locking (see Exclusive Lock Protocol section)
4. **Review routing**: Ensure auto-generated issues (e.g., VC's `discovered:blocker`) go to correct repo
5. **Test backward compat**: Verify code works with and without config.toml
#### API Compatibility Matrix
| API Call | v0.17.3 (single-repo) | v0.18.0+ (multi-repo) | Breaking? |
|----------|----------------------|----------------------|-----------|
| `NewSQLiteStorage(path)` | ✅ Single repo | ✅ Auto-detects config | ❌ No |
| `GetReadyWork()` | ✅ Returns from single DB | ✅ Returns from all repos | ❌ No |
| `CreateIssue()` | ✅ Writes to single DB | ✅ Writes to primary (or routing config) | ❌ No |
| `UpdateIssue()` | ✅ Updates in single DB | ✅ Updates in source repo | ❌ No |
| Exclusive locks | ✅ Locks single DB | ✅ Locks per-repo | ❌ No |
**Summary**: Zero breaking changes. Multi-repo is transparent to library consumers.
### Opting Into Multi-Repo (CLI Users)
```bash
# Create planning repo
$ mkdir ~/.beads-planning && cd ~/.beads-planning
$ git init && bd init
# Link to project
$ cd ~/my-project
$ bd config --add-repo ~/.beads-planning
$ bd config --routing auto # Auto-detect maintainer vs contributor
# Optionally migrate existing issues
$ bd migrate --move-to ~/.beads-planning --filter "author=me"
```
### Teams Adopting Beads
```bash
# Maintainer sets up project
$ cd ~/team-project
$ bd init
$ git add .beads/ && git commit -m "Initialize beads"
# Contributors clone and configure
$ git clone team-project
$ cd team-project
$ mkdir ~/.beads-planning && cd ~/.beads-planning
$ git init && bd init
$ cd ~/team-project
$ bd config --add-repo ~/.beads-planning --routing contributor
```
### Self-Hosting Projects (VC, Internal Tools, Pet Projects)
**Important**: The multi-repo design is primarily for **OSS contributors** making PRs to upstream projects. Self-hosting projects have different needs.
#### What is Self-Hosting?
Projects that use beads to build themselves:
- **VC (VibeCoder)**: Uses beads to track development of VC itself
- **Internal tools**: Company tools that track their own roadmap
- **Pet projects**: Personal projects with beads-based planning
**Key difference from OSS contribution**:
- No upstream/downstream distinction (you ARE the project)
- Direct commit access (no PR workflow)
- Often have automated executors/agents
- Bootstrap/early phase stability matters
#### Default Recommendation: Stay Single-Repo
**For most self-hosting projects, single-repo is the right choice:**
```bash
# Simple, stable, proven
$ cd ~/my-project
$ bd init
$ bd create "Task" -p 1
# → .beads/beads.jsonl (committed to project repo)
```
**Why single-repo for self-hosting**:
-**Simpler**: No config, no routing decisions, no multi-repo complexity
-**Proven**: Current architecture, battle-tested
-**Sufficient**: All issues live with the project they describe
-**Stable**: No hydration overhead, no cross-repo coordination
#### When to Adopt Multi-Repo
Multi-repo makes sense for self-hosting projects only in specific scenarios:
**Scenario 1: Team Planning Separation**
Your project has multiple developers with different permission levels:
```toml
# .beads/config.toml
[repos]
primary = "." # Canonical project issues (maintainers only)
additional = ["~/team-shared/.beads"] # Team planning (all contributors)
```
**Scenario 2: Multi-Phase Development**
Your project uses distinct phases (architecture → implementation → testing):
```toml
# .beads/config.toml
[repos]
primary = "." # Current active work
additional = [
"~/.beads-work/architecture", # Design decisions
"~/.beads-work/implementation", # Implementation backlog
]
```
**Scenario 3: Experimental Work Isolation**
You want to keep experimental ideas separate from canonical roadmap:
```toml
# .beads/config.toml
[repos]
primary = "." # Committed roadmap
additional = ["~/.beads-experiments"] # Experimental ideas
```
#### Automated Executors with Multi-Repo
**Critical for projects like VC with automated agents.**
**Default behavior (recommended)**:
```go
// Executor sees ONLY primary repo (canonical work)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// No config.toml = single-repo mode
ready, err := store.GetReadyWork(ctx) // Only canonical issues
```
**With multi-repo (opt-in)**:
```toml
# .beads/config.toml
[repos]
primary = "."
additional = ["~/team-shared/.beads"]
[routing]
default = "." # Executor-created issues stay in primary
```
```go
// Executor code (unchanged)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// Auto-reads config.toml, hydrates from both repos
ready, err := store.GetReadyWork(ctx)
// Returns issues from primary + team-shared
// When executor creates discovered issues:
discovered := &Issue{Title: "Found blocker", ...}
store.CreateIssue(discovered)
// → Goes to primary repo (routing.default = ".")
```
**Recommendation for executors**: Stay single-repo unless you have a clear team coordination need.
#### Bootstrap Phase Considerations
**If your project is in early/bootstrap phase (like VC), extra caution:**
1. **Prioritize stability**: Multi-repo adds complexity. Delay until proven need.
2. **Test thoroughly**: If adopting multi-repo, test with small repos first.
3. **Monitor performance**: Ensure executor polling loops stay sub-second (see Performance section).
4. **Plan rollback**: Keep single-repo workflow working so you can revert if needed.
**Bootstrap-phase checklist**:
- [ ] Do you have multiple developers with different permissions? → Maybe multi-repo
- [ ] Do you have team planning separate from executor roadmap? → Maybe multi-repo
- [ ] Are you solo or small team with unified planning? → Stay single-repo
- [ ] Is executor stability critical right now? → Stay single-repo
- [ ] Can you afford multi-repo testing/debugging time? → If no, stay single-repo
#### Migration Path for Self-Hosting Projects
**From single-repo to multi-repo (when ready)**:
```bash
# Step 1: Create planning repo
$ mkdir ~/.beads-planning && cd ~/.beads-planning
$ git init && bd init
# Step 2: Configure multi-repo (test mode)
$ cd ~/my-project
$ bd config --add-repo ~/.beads-planning --routing auto
# Step 3: Test with small workload
$ bd create "Test issue" --repo ~/.beads-planning
$ bd show # Verify hydration works
$ bd ready # Verify queries work
# Step 4: Verify executor compatibility
# - Run executor with multi-repo config
# - Check GetReadyWork() latency (<100ms)
# - Verify discovered issues route correctly
# Step 5: Migrate planning issues (optional)
$ bd migrate --move-to ~/.beads-planning --filter "label=experimental"
```
**Rollback (if needed)**:
```bash
# Remove config.toml to revert to single-repo
$ rm .beads/config.toml
$ bd show # Back to single-repo mode
```
#### Summary: Self-Hosting Decision Tree
```
Is your project self-hosting? (building itself with beads)
├─ YES
│ ├─ Solo developer or unified team?
│ │ └─ Stay single-repo (simple, stable)
│ ├─ Multiple developers, different permissions?
│ │ └─ Consider multi-repo (team planning separation)
│ ├─ Multi-phase development (arch → impl → test)?
│ │ └─ Consider multi-repo (phase isolation)
│ ├─ Bootstrap/early phase?
│ │ └─ Stay single-repo (stability > flexibility)
│ └─ Automated executor?
│ └─ Stay single-repo unless team coordination needed
└─ NO (OSS contributor)
└─ Use multi-repo (planning repo separate from upstream)
```
**Bottom line for self-hosting**: Default to single-repo. Only adopt multi-repo when you have a proven, specific need.
## Design Decisions (Resolved)
### 1. Namespace Collisions: **Option B (Global Uniqueness)**
**Decision**: Use globally unique hash-based IDs that include timestamp + random component.
**Rationale** (from VC feedback):
- Option C (allow collisions) breaks dependency references: `bd dep add bd-a3f8e9 bd-b7c2d1` becomes ambiguous
- Need to support cross-repo dependencies without repo-scoped namespacing
- Hash should be: `hash(title + description + timestamp_ms + random_4bytes)`
- Collision probability: ~1 in 10^12 (acceptable)
### 2. Cross-Repo Dependencies: **Yes, Fully Supported**
**Decision**: Dependencies work transparently across all repos.
**Implementation**:
- Hydrated database contains all issues from all repos
- Dependencies stored by ID only (no repo qualifier needed)
- `bd ready` checks dependency graph across all repos
- Writes route back to correct JSONL via `source_repo` metadata
### 3. Routing Mechanism: **Config-Based, No Schema Changes**
**Decision**: Use config-based routing + explicit `--repo` flag. No new schema fields.
**Rationale**:
- `IssueType` already exists and is used semantically (bug, feature, task, epic, chore)
- Labels are used semantically by VC (`discovered:blocker`, `no-auto-claim`)
- Routing is a storage concern, not issue metadata
- Simpler: auto-detect maintainer vs contributor from config
- Discovered issues inherit parent's `source_repo` automatically
### 4. Performance: **Smart Caching with File Mtime Tracking**
**Decision**: SQLite DB is the cache, JSONLs are source of truth.
**Implementation**:
```go
type MultiRepoStorage struct {
repos []RepoConfig
db *sql.DB
repoMtimes map[string]time.Time // Track file modification times
}
func (s *MultiRepoStorage) GetReadyWork(ctx) ([]Issue, error) {
// Fast path: check if ANY JSONL changed
needSync := false
for repo, jsonlPath := range s.jsonlPaths() {
currentMtime := stat(jsonlPath).ModTime()
if currentMtime.After(s.repoMtimes[repo]) {
needSync = true
s.repoMtimes[repo] = currentMtime
}
}
// Only re-hydrate if something changed
if needSync {
s.rehydrate() // Expensive but rare
}
// Query is fast (in-memory SQLite)
return s.db.Query("SELECT * FROM issues WHERE ...")
}
```
**Rationale**: VC's polling loop (every 5-10 seconds) requires sub-second queries. File stat is microseconds, re-parsing only when needed.
#### Performance Benchmarks and Targets
**Critical for library consumers (VC) with automated polling.**
##### Performance Targets
Based on VC's polling loop requirements (every 5-10 seconds):
| Operation | Target | Rationale |
|-----------|--------|-----------|
| **File stat** (per repo) | <1ms | Checking mtime of N JSONLs must be negligible |
| **Hydration** (full re-parse) | <500ms | Only happens when JSONL changes, rare in polling loop |
| **Query** (from cached DB) | <10ms | Common case: no JSONL changes, pure SQLite query |
| **Total GetReadyWork()** | <100ms | VC's hard requirement for responsive executor |
##### Scale Testing Targets
Test at multiple repo counts to ensure scaling:
| Repo Count | File Stat Total | Hydration (worst case) | Query (cached) | Total (cached) |
|------------|-----------------|------------------------|----------------|----------------|
| **N=1** (baseline) | <1ms | <200ms | <5ms | <10ms |
| **N=3** (typical) | <3ms | <500ms | <10ms | <20ms |
| **N=10** (edge case) | <10ms | <2s | <50ms | <100ms |
**Assumptions**:
- JSONL size: <25k per repo (see Design Principles)
- SQLite: In-memory mode (`:memory:` or `file::memory:?cache=shared`)
- Cached case: No JSONL changes since last hydration (99% of polling loops)
##### Benchmark Suite (To Be Implemented)
```go
// benchmark/multi_repo_test.go
func BenchmarkFileStatOverhead(b *testing.B) {
// Test: Stat N JSONL files
// Target: <1ms per repo
}
func BenchmarkHydrationN1(b *testing.B) {
// Test: Full hydration from 1 JSONL (20k file)
// Target: <200ms
}
func BenchmarkHydrationN3(b *testing.B) {
// Test: Full hydration from 3 JSONLs (20k each)
// Target: <500ms
}
func BenchmarkHydrationN10(b *testing.B) {
// Test: Full hydration from 10 JSONLs (20k each)
// Target: <2s
}
func BenchmarkQueryCached(b *testing.B) {
// Test: GetReadyWork() with no JSONL changes
// Target: <10ms
}
func BenchmarkGetReadyWorkN3(b *testing.B) {
// Test: Realistic polling loop (3 repos, cached)
// Target: <20ms total
}
```
##### Performance Optimization Notes
If benchmarks fail to meet targets, optimization strategies:
1. **Parallel file stats**: Use goroutines to stat N JSONLs concurrently
2. **Incremental hydration**: Only re-parse changed repos, merge into DB
3. **Smarter caching**: Hash-based cache invalidation (mtime + file size)
4. **SQLite tuning**: `PRAGMA synchronous = OFF`, `PRAGMA journal_mode = MEMORY`
5. **Lazy hydration**: Defer hydration until first query after mtime change
##### Status
**Benchmarks**: ⏳ Not implemented yet (tracked in bd-wta)
**Targets**: ✅ Documented above
**Validation**: ⏳ Pending implementation
**Next steps**:
1. Implement benchmark suite in `benchmark/multi_repo_test.go`
2. Run benchmarks on realistic workloads (VC-sized DBs)
3. Document results in this section
4. File optimization issues if targets not met
### 5. Visibility Field: **Optional, Backward Compatible**
**Decision**: Add `visibility` as optional field, defaults to "canonical" if missing.
**Schema**:
```go
type Issue struct {
// ... existing fields ...
Visibility *string `json:"visibility,omitempty"` // nil = canonical
}
```
**States**:
- `local`: Personal planning only
- `proposed`: Staged for upstream PR
- `canonical`: Accepted by upstream (or default for existing issues)
**Orthogonality**: Visibility and Status are independent:
- `status: in_progress, visibility: local` → Working on personal planning
- `status: open, visibility: proposed` → Proposed to upstream, awaiting review
### 6. Library API Stability: **Transparent Hydration**
**Decision**: Hybrid approach - transparent by default, explicit opt-in available.
**Backward Compatible**:
```go
// Existing code keeps working - reads config.toml automatically
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
```
**Explicit Override**:
```go
// Library consumers can override config
cfg := beadsLib.Config{
Primary: ".beads/vc.db",
Additional: []string{"~/.beads-planning"},
}
store, err := beadsLib.NewStorageWithConfig(cfg)
```
### 7. ACID Guarantees: **Per-Repo File Locking**
**Decision**: Use file-based locks per JSONL, atomic within single repo.
**Implementation**:
```go
func (s *Storage) UpdateIssue(issue Issue) error {
sourceRepo := issue.SourceRepo
// Lock that repo's JSONL
lock := flock(sourceRepo + "/beads.jsonl.lock")
defer lock.Unlock()
// Read-modify-write
issues := s.readJSONL(sourceRepo)
issues.Update(issue)
s.writeJSONL(sourceRepo, issues)
// Update in-memory DB
s.db.Update(issue)
}
```
**Limitation**: Cross-repo transactions are NOT atomic (acceptable, rare use case).
#### Compatibility with Exclusive Lock Protocol
The per-repo file locking (Decision #7) is **fully compatible** with the existing exclusive lock protocol (see [EXCLUSIVE_LOCK.md](../EXCLUSIVE_LOCK.md)).
**How they work together**:
1. **Exclusive locks are daemon-level**: The `.beads/.exclusive-lock` prevents the bd daemon from operating on a specific database
2. **File locks are operation-level**: Per-JSONL file locks (`flock`) ensure atomic read-modify-write for individual operations
3. **Different scopes, complementary purposes**:
- Exclusive lock: "This entire database is off-limits to the daemon"
- File lock: "This specific JSONL is being modified right now"
**Multi-repo behavior**:
With multi-repo configuration, each repo can have its own exclusive lock:
```bash
# VC executor locks its primary database
.beads/.exclusive-lock # Locks primary repo operations
# Planning repo can be locked independently
~/.beads-planning/.exclusive-lock # Locks planning repo operations
```
**When both are active**:
- If primary repo is locked: Daemon skips all operations on primary, but can still sync planning repo
- If planning repo is locked: Daemon skips planning repo, but can still sync primary
- If both locked: Daemon skips entire multi-repo workspace
**No migration needed for library consumers**:
Existing VC code (v0.17.3+) using exclusive locks will continue to work:
```go
// VC's existing lock acquisition
lock, err := types.NewExclusiveLock("vc-executor", "1.0.0")
lockPath := filepath.Join(".beads", ".exclusive-lock")
os.WriteFile(lockPath, data, 0644)
// Works the same with multi-repo:
// - Locks .beads/ (primary repo)
// - Daemon skips primary, can still sync ~/.beads-planning if configured
```
**Atomic multi-repo locking**:
If a library consumer needs to lock **all** repos atomically:
```go
// Lock all configured repos
repos := []string{".beads", filepath.ExpandUser("~/.beads-planning")}
for _, repo := range repos {
lockPath := filepath.Join(repo, ".exclusive-lock")
os.WriteFile(lockPath, lockData, 0644)
}
defer func() {
for _, repo := range repos {
os.Remove(filepath.Join(repo, ".exclusive-lock"))
}
}()
// Now daemon skips all repos until locks released
```
**Summary**: No breaking changes. Exclusive locks work per-repo in multi-repo configs, preventing daemon interference at repo granularity.
## Key Learnings from VC Feedback
The VC project (VibeCoder) provided critical feedback as a real downstream consumer that uses beads as a library. Key insights:
### 1. Two Consumer Models
Beads has two distinct consumer types:
- **CLI users**: Use `bd` commands directly
- **Library consumers**: Use `beadsLib` in Go/TypeScript/etc. (like VC)
Multi-repo must work transparently for both.
### 2. Performance is Critical for Automation
VC's executor polls `GetReadyWork()` every 5-10 seconds. Multi-repo hydration must:
- Use smart caching (file mtime tracking)
- Avoid re-parsing JSONLs on every query
- Keep queries sub-second (ideally <100ms)
### 3. Special Labels Must Work Across Repos
VC uses semantic labels that must work regardless of repo:
- `discovered:blocker` - Auto-generated blocker issues (priority boost)
- `discovered:related` - Auto-generated related work
- `no-auto-claim` - Prevent executor from claiming
- `baseline-failure` - Self-healing baseline failures
These are **semantic labels**, not routing labels. Don't overload labels for routing.
### 4. Discovered Issues Routing
When VC's analysis phase auto-creates issues with `discovered:blocker` label, they should:
- Inherit parent's `source_repo` automatically
- Stay co-located with related work
- Not require manual routing decisions
### 5. Library API Stability is Non-Negotiable
VC's code uses `beadsLib.NewSQLiteStorage()`. Must not break. Solution:
- Read `.beads/config.toml` automatically (transparent)
- Provide `NewStorageWithConfig()` for explicit override
- Hydration happens at storage layer, invisible to library consumers
## Remaining Open Questions
1. **Sync semantics**: When upstream accepts a proposed issue and modifies it, how to sync back?
- Option A: Mark as "accepted" in planning repo, keep both copies
- Option B: Delete from planning repo (it's now canonical)
- Option C: Keep in planning repo but mark as read-only mirror
2. **Discovery**: How do users learn about this feature?
- Auto-prompt when detecting fork/contributor status?
- Docs + examples?
- `bd init --contributor` wizard?
3. **Metadata fields**: Should `source_repo` be exposed in JSON export, or keep it internal to storage layer?
4. **Proposed issue lifecycle**: What happens to proposed issues after PR is merged/rejected?
- Auto-delete from planning repo?
- Mark as "accepted" or "rejected"?
- Manual cleanup via `bd withdraw`?
## Success Metrics
How we'll know this works:
1. **Zero pollution**: No contributor planning issues accidentally merged upstream
2. **Multi-clone sync**: Workers on different machines see consistent state (via VCS sync)
3. **Flexibility**: Users can configure for their workflow (personas, phases, etc.)
4. **Backwards compat**: Existing single-repo users unaffected
5. **VCS-agnostic**: Works with git, jj, hg, sl, or no VCS
## Next Actions
Suggested epics/issues to create (can be done in follow-up session):
1. **Epic: Multi-repo hydration layer**
- Design schema for source_repo metadata
- Implement config parsing for repos.additional
- Build hydration logic (read from N JSONLs)
- Build write routing (write to correct JSONL)
2. **Epic: Proposal workflow**
- Implement `bd propose` command
- Implement `bd withdraw` command
- Implement `bd accept` command (maintainer only)
- Design visibility state machine
3. **Epic: Auto-routing**
- Detect maintainer vs contributor status
- Implement routing rules (label, priority, custom)
- Make `bd add` route to correct repo
4. **Epic: VCS-agnostic sync**
- Detect VCS type per repo
- Implement sync adapters (git, jj, hg, sl)
- Handle mixed-VCS multi-repo configs
5. **Epic: Migration and onboarding**
- Write migration guide
- Implement `bd migrate` command
- Create init wizards for common scenarios
- Update documentation
---
## Summary and Next Steps
This document represents the design evolution for multi-repo support in beads, driven by:
1. **Original problem** (GitHub #207): Contributors' personal planning pollutes upstream PRs
2. **Core insight**: Beads is a separate communication channel that happens to use VCS
3. **VC feedback**: Real-world library consumer with specific performance and API stability needs
### Final Architecture
**Solution #4 (Separate Repos)** with these refinements:
- **N ≥ 1 repos**: Single repo (N=1) is default, multi-repo is opt-in
- **VCS-agnostic**: Works with git, jj, hg, sapling, or no VCS
- **Config-based routing**: No schema changes, auto-detect maintainer vs contributor
- **Smart caching**: File mtime tracking, SQLite DB as cache layer
- **Transparent hydration**: Library API remains stable, config-driven
- **Global namespace**: Hash-based IDs with timestamp + random for uniqueness
- **Cross-repo dependencies**: Fully supported, transparent to users
- **Discovered issues**: Inherit parent's source_repo automatically
### Why This Design Wins
1. **Zero PR pollution**: Separate git histories = impossible to accidentally merge planning
2. **Git ledger preserved**: All repos are VCS-tracked, full forensic capability
3. **Maximum flexibility**: Supports OSS contributors, multi-phase dev, multi-persona workflows
4. **Backward compatible**: Existing single-repo users unchanged
5. **Performance**: Sub-second queries even with polling loops
6. **Library-friendly**: Transparent to downstream consumers like VC
### Related Documents
- Original issue: GitHub #207
- VC feedback: `./vc-feedback-on-multi-repo.md`
- Implementation tracking: TBD (epics to be created)
### Status
**Design**: ✅ Complete (pending resolution of open questions)
**Implementation**: ⏳ Not started
**Target**: TBD
Last updated: 2025-11-03