- Add exclusive lock protocol compatibility (bd-u8j) - Add library consumer migration guide (bd-824) - Add self-hosting project guidance (bd-x47) - Add performance benchmarks and targets (bd-wta) - Clarify JSONL size bounds: per-repo <25k (bd-4ry) Closes bd-u8j, bd-824, bd-x47, bd-wta, bd-4ry
45 KiB
Beads Contributor Workflow Analysis
Date: 2025-11-03 Context: Design discussion on how to handle beads issues in PR/OSS contribution workflows
The Problem (from #207)
When contributing to OSS projects with beads installed:
- Git hooks automatically commit contributor's personal planning to PRs
- Contributor's experimental musings pollute the upstream project's issue tracker
- No clear ownership/permission model for external contributors
- Difficult to keep beads changes out of commits
Core tension: Beads is great for team planning (shared namespace), but breaks down for OSS contributions (hierarchical gatekeeping).
Key Insights from Discussion
Beads as "Moving Frontier"
Beads is not a traditional issue tracker. It captures the active working set - the sliding window of issues currently under attention:
- Work moves fast with AI agents (10x-50x acceleration)
- Completed work fades quickly (95% never revisited, should be pruned aggressively)
- Future work is mostly blocked (small frontier of ready tasks)
- The frontier is bounded by team size (dozens to hundreds of issues, not thousands)
Design principle: Beads should focus on the "what's next" cloud, not long-term planning or historical archive.
The Git Ledger is Fundamental
Beads achieves reliability despite being unreliable (merge conflicts, sync issues, data staleness) through:
A. Git is the ledger and immutable backstop for forensics B. AI is the ultimate arbiter and problem-solver when things go wrong
Any solution that removes the git ledger (e.g., gitignored contributor files) breaks this model entirely.
Requirements for Contributors
Contributors need:
- Git-backed persistence (multi-clone sync, forensics, AI repair)
- Isolated planning space (don't pollute upstream)
- Ability to propose selected issues upstream
- Support for multiple workers across multiple clones of the same repo
Proposed Solutions
Idea 1: Fork-Aware Hooks + Two-File System
Structure:
# Upstream repo
.beads/
beads.jsonl # Canonical frontier (committed)
.gitignore # Ignores local.jsonl
# Contributor's fork
.beads/
beads.jsonl # Synced from upstream (read-only)
local.jsonl # Contributor planning (committed to fork)
beads.db # Hydrated from both
Detection: Check for upstream remote to distinguish fork from canonical repo
Workflow:
# In fork
$ bd add "Experiment" # → local.jsonl (committed to fork)
$ bd sync # → Pulls upstream's beads.jsonl
$ bd show # → Shows both
$ bd propose bd-a3f8e9 # → Moves issue to beads.jsonl for PR
Pros:
- Git ledger preserved (local.jsonl committed to fork)
- Multi-clone sync works
- Upstream .gitignore prevents pollution
Cons:
- Fork detection doesn't help teams using branches (most common workflow)
- Two files to manage
- Requires discipline to use
bd propose
Idea 2: Ownership Metadata + Smart PR Filtering
Structure:
{"id":"bd-123","owner":"upstream","title":"Canonical issue",...}
{"id":"bd-456","owner":"stevey","title":"My planning",...}
Workflow:
$ bd add "Experiment" # → Creates with owner="stevey"
$ bd propose bd-456 # → Changes owner to "upstream"
$ bd clean-pr # → Filters commit to only upstream-owned issues
$ git push # → PR contains only proposed issues
Pros:
- Single file (simpler)
- Works with any git workflow (branch, fork, etc)
- Git ledger fully preserved
Cons:
- Requires discipline to run
bd clean-pr - Clean commit is awkward (temporarily removing data)
- Merge conflicts if upstream and contributor both modify beads.jsonl
Idea 3: Branch-Scoped Databases
Track which issues belong to which branch, filter at PR time.
Implementation: Similar to #2 but uses labels/metadata to track branch instead of owner.
Challenge: Complex with multiple feature branches, requires tracking branch scope.
Idea 4: Separate Planning Repo (Most Isolated)
Structure:
# Main project repos (many)
~/projects/beads/.beads/beads.jsonl
~/projects/foo/.beads/beads.jsonl
# Single planning repo (one)
~/.beads-planning/.beads/beads.jsonl
# Configuration links them
~/projects/beads/.beads/config.toml:
planning_repo = "~/.beads-planning"
Workflow:
$ cd ~/projects/beads
$ bd add "My idea" # → Commits to ~/.beads-planning/
$ bd show # → Shows beads canonical + my planning
$ bd propose bd-a3f8e9 # → Exports to beads repo for PR
Pros:
- Complete isolation (separate git histories, zero PR pollution risk)
- Git ledger fully preserved (both repos tracked)
- Multi-clone works perfectly (clone both repos)
- No special filtering/detection needed
- Scales better: One planning repo for all projects
Cons:
- Two repos to manage
- Less obvious for new users (where's my planning?)
Analysis: Fork vs Clone vs Branch
Clone: Local copy of a repo (git clone <url>)
originremote points to source- Push directly to origin (if you have write access)
Fork: Server-side copy on GitHub
- For contributors without write access
origin→ your fork,upstream→ original repo- Push to fork, then PR from fork → upstream
Branch: Feature branches in same repo
- Most common for teams with write access
- Push to same repo, PR from branch → main
Key insight: Branches are universal, forks are only for external contributors. Most teams work on branches in a shared repo.
Current Thinking: Idea 4 is Cleanest
After analysis, separate planning repo (#4) is likely the best solution because:
- Only solution that truly prevents PR pollution (separate git histories)
- Git ledger fully preserved (both repos tracked)
- Multi-clone works perfectly (just clone both)
- No complex filtering/detection needed (simple config)
- Better scaling: One planning repo across all projects you contribute to
The "managing two repos" concern is actually an advantage: your planning is centralized and project-agnostic.
Open Questions
About the Workflow
-
Where does PR pollution actually happen?
- Scenario A: Feature branch → upstream/main includes all beads changes from that branch?
- Scenario B: Something else?
-
Multi-clone usage pattern:
- Multiple clones on different machines?
- All push/pull to same remote?
- Workers coordinate via git sync?
- PRs created from feature branches?
About Implementation
-
Proposed issue IDs: When moving issue from planning → canonical, keep same ID? (Hash-based IDs are globally unique)
-
Upstream acceptance sync: If upstream accepts/modifies a proposal, how to sync back to contributor?
bd syncdetects accepted proposals- Moves from planning repo to project's canonical beads.jsonl
-
Multiple projects: One planning repo for all projects you contribute to, or one per project?
-
Backwards compatibility: Single-user projects unchanged (single beads.jsonl)
-
Discovery: How do users discover this feature? Auto-detect and prompt?
Next Steps
Need to clarify:
- User's actual multi-clone workflow (understand the real use case)
- Where exactly PR pollution occurs (branch vs fork workflow)
- Which solution best fits the "git ledger + multi-clone" requirements
- Whether centralized planning repo (#4) or per-project isolation (#1/#2) is preferred
Design Principles to Preserve
From the conversation, these are non-negotiable:
- Git as ledger: Everything must be git-tracked for forensics and AI repair
- Moving frontier: Focus on active work, aggressively prune completed work
- Multi-clone sync: Workers across clones must coordinate via git
- Small databases: Keep beads.jsonl small enough for agents to read (<25k per repo, see below)
- Simple defaults: Don't break single-user workflows
- Explicit over implicit: Clear boundaries between personal and canonical
JSONL Size Bounds with Multi-Repo
Critical clarification: The <25k limit applies per-repo, not to total hydrated size.
The Rule
Per-repo limit: Each individual JSONL file should stay <25k (roughly 100-200 issues depending on metadata).
Why per-repo, not total:
- Git operations: Each repo is independently versioned. Git performance depends on per-file size, not aggregate.
- AI readability: Agents read JSONLs for forensics/repair. Reading one 20k file is easy; reading the union of 10 files is still manageable.
- Bounded growth: Total size naturally bounded by number of repos (typically N=1-3, rarely >10).
- Pruning granularity: Completed work is pruned per-repo, keeping each repo's frontier small.
Example Scenarios
| Primary | Planning | Team Shared | Total Hydrated | Valid? |
|---|---|---|---|---|
| 20k | - | - | 20k | ✅ Single-repo, well under limit |
| 20k | 15k | - | 35k | ✅ Each repo <25k (per-repo rule) |
| 20k | 15k | 18k | 53k | ✅ Each repo <25k (per-repo rule) |
| 30k | 15k | - | 45k | ❌ Primary exceeds 25k |
| 20k | 28k | - | 48k | ❌ Planning exceeds 25k |
Rationale: Why 25k?
Agent context limits: AI agents have finite context windows. A 25k JSONL file is:
- ~100-200 issues with metadata
- ~500-1000 lines of JSON
- Comfortably fits in GPT-4 context (128k tokens)
- Small enough to read/parse in <500ms
Moving frontier principle: Beads tracks active work, not historical archive. With aggressive pruning:
- Completed issues get compacted/archived
- Blocked work stays dormant
- Only ready + in-progress issues are "hot"
- Typical frontier: 50-100 issues per repo
Monitoring Size with Multi-Repo
Per-repo monitoring:
# Check each repo's JSONL size
$ wc -c .beads/beads.jsonl
20480 .beads/beads.jsonl
$ wc -c ~/.beads-planning/beads.jsonl
15360 ~/.beads-planning/beads.jsonl
# Total hydrated size (informational, not a hard limit)
$ expr 20480 + 15360
35840
Automated check:
// Check all configured repos
for _, repo := range cfg.Repos.All() {
jsonlPath := filepath.Join(repo, "beads.jsonl")
size, _ := getFileSize(jsonlPath)
if size > 25*1024 { // 25k
log.Warnf("Repo %s exceeds 25k limit: %d bytes", repo, size)
}
}
Pruning Strategy with Multi-Repo
Each repo should be pruned independently:
# Prune completed work from primary repo
$ bd compact --repo . --older-than 30d
# Prune experimental planning repo
$ bd compact --repo ~/.beads-planning --older-than 7d
# Shared team planning (longer retention)
$ bd compact --repo ~/team-shared/.beads --older-than 90d
Different repos can have different retention policies based on their role.
Total Size Soft Limit (Guideline Only)
While per-repo limit is the hard rule, consider total hydrated size for performance:
Guideline: Keep total hydrated size <100k for optimal performance.
Why 100k total:
- SQLite hydration: Parsing 100k JSON still fast (<1s)
- Agent queries: Dependency graphs with 300-500 total issues remain tractable
- Memory footprint: In-memory SQLite comfortably handles 500 issues
If total exceeds 100k:
- Not a hard error, but performance may degrade
- Consider pruning completed work more aggressively
- Evaluate if all repos are still needed
- Check if any repos should be archived/removed
Summary
| Limit Type | Value | Enforcement |
|---|---|---|
| Per-repo (hard limit) | <25k | ⚠️ Warn if exceeded, agents may struggle |
| Total hydrated (guideline) | <100k | ℹ️ Informational, affects performance |
| Typical usage | 20k-50k total | ✅ Expected range for active development |
Bottom line: Monitor per-repo size (<25k each). Total size naturally bounded by N repos × 25k.
Decision: Separate Repos (Solution #4)
Date: 2025-11-03 (continued discussion)
Why Separate Repos
After consideration, Solution #4 (Separate Planning Repos) is the chosen approach:
Key Rationale
-
Beads as a Separate Channel: Beads is fundamentally a separate communication channel that happens to use git/VCS for persistence, not a git-centric tool. It should work with any VCS (jujutsu, sapling, mercurial, etc.).
-
VCS-Agnostic Design: Solution #1 (fork detection) is too git-centric and wouldn't work with other version control systems. Separate repos work regardless of VCS.
-
Maximum Flexibility: Supports multiple workflows and personas:
- OSS contributor with personal planning
- Multi-phase development (different beads DBs for different stages)
- Multiple personas (architect, implementer, reviewer)
- Team vs personal planning separation
-
Zero PR Pollution Risk: Completely separate git histories guarantee no accidental pollution of upstream projects.
-
Proven Pain Point: Experience shows that accidental bulk commits (100k issues) can be catastrophic and traumatic to recover from. Complete isolation is worth the complexity.
Core Architecture Principles
1. Multi-Repo Support (N ≥ 1)
Configuration should support N repos, including N=1 for backward compatibility:
When N=1 (default), this is the current single-repo workflow - no changes needed. When N≥2, multiple repos are hydrated together.
# .beads/config.toml
# Default mode: single repo (backwards compatible)
mode = "single"
# Multi-repo mode
[repos]
# Primary repo: where canonical issues live
primary = "."
# Additional repos to hydrate into the database
additional = [
"~/.beads-planning", # Personal planning across all projects
"~/.beads-work/phase1", # Architecting phase
"~/.beads-work/phase2", # Implementation phase
"~/team-shared/.beads", # Shared team planning
]
# Routing: where do new issues go?
[routing]
mode = "auto" # auto | explicit
default = "~/.beads-planning" # Default for `bd add`
# Auto-detection: based on user permissions
[routing.auto]
maintainer = "." # If maintainer, use primary
contributor = "~/.beads-planning" # Otherwise use planning repo
2. Hydration Model
On bd show, bd list, etc., the database hydrates from multiple sources:
beads.db ← [
./.beads/beads.jsonl (primary, read-write if maintainer)
~/.beads-planning/beads.jsonl (personal, read-write)
~/team-shared/.beads/beads.jsonl (shared, read-write if team member)
]
Metadata tracking:
{
"id": "bd-a3f8e9",
"title": "Add dark mode",
"source_repo": "~/.beads-planning", # Which repo owns this issue
"visibility": "local", # local | proposed | canonical
...
}
3. Visibility States
Issues can be in different states of visibility:
- local: Personal planning, only in planning repo
- proposed: Exported for upstream consideration (staged for PR)
- canonical: In the primary repo (upstream accepted it)
4. VCS-Agnostic Operations
Beads should not assume git. Core operations:
- Sync:
bd syncshould work with git, jj, hg, sl, etc. - Ledger: Each repo uses whatever VCS it's under (or none)
- Transport: Issues move between repos via export/import, not git-specific operations
Workflow Examples
Use Case 1: OSS Contributor
# One-time setup
$ mkdir ~/.beads-planning
$ cd ~/.beads-planning
$ git init && bd init
# Contributing to upstream project
$ cd ~/projects/some-oss-project
$ bd config --add-repo ~/.beads-planning --routing contributor
# Work
$ bd add "Explore dark mode implementation"
# → Goes to ~/.beads-planning/beads.jsonl
# → Commits to planning repo (git tracked, forensic trail)
$ bd show
# → Shows upstream's canonical issues (read-only)
# → Shows my planning issues (read-write)
$ bd work bd-a3f8e9
$ bd status bd-a3f8e9 in-progress
# Ready to propose
$ bd propose bd-a3f8e9 --target upstream
# → Exports issue from planning repo
# → Creates issue in ./beads/beads.jsonl (staged for PR)
# → Marks as visibility="proposed" in planning repo
$ git add .beads/beads.jsonl
$ git commit -m "Propose: Add dark mode"
$ git push origin feature-branch
# → PR contains only the proposed issue, not all my planning
Use Case 2: Multi-Phase Development
# Setup phases
$ mkdir -p ~/.beads-work/{architecture,implementation,testing}
$ for dir in ~/.beads-work/*; do (cd $dir && git init && bd init); done
# Configure project
$ cd ~/my-big-project
$ bd config --add-repo ~/.beads-work/architecture
$ bd config --add-repo ~/.beads-work/implementation
$ bd config --add-repo ~/.beads-work/testing
# Architecture phase
$ bd add "Design authentication system" --repo ~/.beads-work/architecture
$ bd show --repo ~/.beads-work/architecture
# → Only architecture issues
# Implementation phase (later)
$ bd add "Implement JWT validation" --repo ~/.beads-work/implementation
# View all phases
$ bd show
# → Shows all issues from all configured repos
Use Case 3: Multiple Contributors on Same Project
# Team member Alice (maintainer)
$ cd ~/project
$ bd add "Fix bug in parser"
# → Goes to ./beads/beads.jsonl (she's maintainer)
# → Commits to project repo
# Team member Bob (contributor)
$ cd ~/project
$ bd add "Explore performance optimization"
# → Goes to ~/.beads-planning/beads.jsonl (he's contributor)
# → Does NOT pollute project repo
$ bd show
# → Sees Alice's canonical issue
# → Sees his own planning
$ bd propose bd-xyz
# → Proposes to Alice's canonical repo
Implementation Outline
Phase 1: Core Multi-Repo Support
Commands:
bd config --add-repo <path> # Add a repo to hydration
bd config --remove-repo <path> # Remove a repo
bd config --list-repos # Show all configured repos
bd config --routing <mode> # Set routing: single|auto|explicit
Config schema:
[repos]
primary = "."
additional = ["path1", "path2", ...]
[routing]
default = "path" # Where `bd add` goes by default
mode = "auto" # auto | explicit
Database changes:
- Add
source_repofield to issues - Hydration layer reads from multiple JSONLs
- Writes go to correct JSONL based on source_repo
Phase 2: Proposal Flow
Commands:
bd propose <issue-id> [--target <repo>] # Move issue to target repo
bd withdraw <issue-id> # Un-propose (move back)
bd accept <issue-id> # Maintainer accepts proposal
States:
visibility: local→ Personal planningvisibility: proposed→ Staged for PRvisibility: canonical→ Accepted by upstream
Phase 3: Routing Rules
Auto-detection:
- Detect if user is maintainer (git config, permissions)
- Auto-route to primary vs planning repo
Config-based routing (no new schema fields):
[routing]
mode = "auto" # auto | explicit
default = "~/.beads-planning" # Fallback for contributors
# Auto-detection rules
[routing.auto]
maintainer = "." # If user is maintainer, use primary repo
contributor = "~/.beads-planning" # Otherwise use planning repo
Explicit routing via CLI flag:
# Override auto-detection for specific issues
bd add "Design system" --repo ~/.beads-work/architecture
Discovered issue inheritance:
- Issues with parent_id automatically inherit parent's source_repo
- Keeps related work co-located
Phase 4: VCS-Agnostic Sync
Sync operations:
- Detect VCS type per repo (git, jj, hg, sl)
- Use appropriate sync commands
- Fall back to manual sync if no VCS
Example:
$ bd sync
# Auto-detects:
# - . is git → runs git pull
# - ~/.beads-planning is jj → runs jj git fetch && jj rebase
# - ~/other is hg → runs hg pull && hg update
Migration Path
Existing Users (Single Repo)
No changes required. Current workflow continues to work:
$ bd add "Task"
# → .beads/beads.jsonl (as before)
Library Consumers (Go/TypeScript)
Critical for projects like VC that use beads as a library.
Backward Compatibility (No Changes Required)
Your existing code continues to work unchanged. The storage layer automatically reads .beads/config.toml if present:
// Before multi-repo (v0.17.3)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// After multi-repo (v0.18.0+) - EXACT SAME CODE
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// If .beads/config.toml exists, additional repos are auto-hydrated
// If .beads/config.toml doesn't exist, single-repo mode (backward compatible)
What happens automatically:
- Storage layer checks for
.beads/config.toml - If found: Reads
repos.additional, hydrates from all configured repos - If not found: Single-repo mode (current behavior)
- Your code doesn't need to know which mode is active
Explicit Multi-Repo Configuration (Optional)
If you need to override config.toml or configure repos programmatically:
// Explicit multi-repo configuration
cfg := beadsLib.Config{
Primary: ".beads/vc.db",
Additional: []string{
filepath.ExpandUser("~/.beads-planning"),
filepath.ExpandUser("~/team-shared/.beads"),
},
}
store, err := beadsLib.NewStorageWithConfig(cfg)
When to use explicit configuration:
- Testing: Override config for test isolation
- Dynamic repos: Add repos based on runtime conditions
- No config file: Programmatic setup without
.beads/config.toml
When to Use Multi-Repo vs Single-Repo
Single-repo (default, recommended for most library consumers):
// VC executor managing its own database
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// Stays single-repo by default, no config.toml needed
Multi-repo (opt-in for specific use cases):
- Team planning: VC executor needs to see team-wide issues from shared repo
- Multi-phase dev: Different repos for architecture, implementation, testing phases
- Personal planning: User wants to track personal experiments separate from VC's canonical DB
Example: VC with team planning:
# .beads/config.toml
[repos]
primary = "."
additional = ["~/team-shared/.beads"]
[routing]
default = "." # VC-generated issues go to primary
// VC executor code (unchanged)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// GetReadyWork() now returns issues from:
// - .beads/vc.db (VC-generated issues)
// - ~/team-shared/.beads (team planning)
ready, err := store.GetReadyWork(ctx)
Migration Checklist for Library Consumers
- Test with config.toml: Create
.beads/config.toml, verify auto-hydration works - Verify performance: Ensure multi-repo hydration meets your latency requirements (see Performance section)
- Update exclusive locks: If using locks, decide if you need per-repo or all-repo locking (see Exclusive Lock Protocol section)
- Review routing: Ensure auto-generated issues (e.g., VC's
discovered:blocker) go to correct repo - Test backward compat: Verify code works with and without config.toml
API Compatibility Matrix
| API Call | v0.17.3 (single-repo) | v0.18.0+ (multi-repo) | Breaking? |
|---|---|---|---|
NewSQLiteStorage(path) |
✅ Single repo | ✅ Auto-detects config | ❌ No |
GetReadyWork() |
✅ Returns from single DB | ✅ Returns from all repos | ❌ No |
CreateIssue() |
✅ Writes to single DB | ✅ Writes to primary (or routing config) | ❌ No |
UpdateIssue() |
✅ Updates in single DB | ✅ Updates in source repo | ❌ No |
| Exclusive locks | ✅ Locks single DB | ✅ Locks per-repo | ❌ No |
Summary: Zero breaking changes. Multi-repo is transparent to library consumers.
Opting Into Multi-Repo (CLI Users)
# Create planning repo
$ mkdir ~/.beads-planning && cd ~/.beads-planning
$ git init && bd init
# Link to project
$ cd ~/my-project
$ bd config --add-repo ~/.beads-planning
$ bd config --routing auto # Auto-detect maintainer vs contributor
# Optionally migrate existing issues
$ bd migrate --move-to ~/.beads-planning --filter "author=me"
Teams Adopting Beads
# Maintainer sets up project
$ cd ~/team-project
$ bd init
$ git add .beads/ && git commit -m "Initialize beads"
# Contributors clone and configure
$ git clone team-project
$ cd team-project
$ mkdir ~/.beads-planning && cd ~/.beads-planning
$ git init && bd init
$ cd ~/team-project
$ bd config --add-repo ~/.beads-planning --routing contributor
Self-Hosting Projects (VC, Internal Tools, Pet Projects)
Important: The multi-repo design is primarily for OSS contributors making PRs to upstream projects. Self-hosting projects have different needs.
What is Self-Hosting?
Projects that use beads to build themselves:
- VC (VibeCoder): Uses beads to track development of VC itself
- Internal tools: Company tools that track their own roadmap
- Pet projects: Personal projects with beads-based planning
Key difference from OSS contribution:
- No upstream/downstream distinction (you ARE the project)
- Direct commit access (no PR workflow)
- Often have automated executors/agents
- Bootstrap/early phase stability matters
Default Recommendation: Stay Single-Repo
For most self-hosting projects, single-repo is the right choice:
# Simple, stable, proven
$ cd ~/my-project
$ bd init
$ bd create "Task" -p 1
# → .beads/beads.jsonl (committed to project repo)
Why single-repo for self-hosting:
- ✅ Simpler: No config, no routing decisions, no multi-repo complexity
- ✅ Proven: Current architecture, battle-tested
- ✅ Sufficient: All issues live with the project they describe
- ✅ Stable: No hydration overhead, no cross-repo coordination
When to Adopt Multi-Repo
Multi-repo makes sense for self-hosting projects only in specific scenarios:
Scenario 1: Team Planning Separation
Your project has multiple developers with different permission levels:
# .beads/config.toml
[repos]
primary = "." # Canonical project issues (maintainers only)
additional = ["~/team-shared/.beads"] # Team planning (all contributors)
Scenario 2: Multi-Phase Development
Your project uses distinct phases (architecture → implementation → testing):
# .beads/config.toml
[repos]
primary = "." # Current active work
additional = [
"~/.beads-work/architecture", # Design decisions
"~/.beads-work/implementation", # Implementation backlog
]
Scenario 3: Experimental Work Isolation
You want to keep experimental ideas separate from canonical roadmap:
# .beads/config.toml
[repos]
primary = "." # Committed roadmap
additional = ["~/.beads-experiments"] # Experimental ideas
Automated Executors with Multi-Repo
Critical for projects like VC with automated agents.
Default behavior (recommended):
// Executor sees ONLY primary repo (canonical work)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// No config.toml = single-repo mode
ready, err := store.GetReadyWork(ctx) // Only canonical issues
With multi-repo (opt-in):
# .beads/config.toml
[repos]
primary = "."
additional = ["~/team-shared/.beads"]
[routing]
default = "." # Executor-created issues stay in primary
// Executor code (unchanged)
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
// Auto-reads config.toml, hydrates from both repos
ready, err := store.GetReadyWork(ctx)
// Returns issues from primary + team-shared
// When executor creates discovered issues:
discovered := &Issue{Title: "Found blocker", ...}
store.CreateIssue(discovered)
// → Goes to primary repo (routing.default = ".")
Recommendation for executors: Stay single-repo unless you have a clear team coordination need.
Bootstrap Phase Considerations
If your project is in early/bootstrap phase (like VC), extra caution:
- Prioritize stability: Multi-repo adds complexity. Delay until proven need.
- Test thoroughly: If adopting multi-repo, test with small repos first.
- Monitor performance: Ensure executor polling loops stay sub-second (see Performance section).
- Plan rollback: Keep single-repo workflow working so you can revert if needed.
Bootstrap-phase checklist:
- Do you have multiple developers with different permissions? → Maybe multi-repo
- Do you have team planning separate from executor roadmap? → Maybe multi-repo
- Are you solo or small team with unified planning? → Stay single-repo
- Is executor stability critical right now? → Stay single-repo
- Can you afford multi-repo testing/debugging time? → If no, stay single-repo
Migration Path for Self-Hosting Projects
From single-repo to multi-repo (when ready):
# Step 1: Create planning repo
$ mkdir ~/.beads-planning && cd ~/.beads-planning
$ git init && bd init
# Step 2: Configure multi-repo (test mode)
$ cd ~/my-project
$ bd config --add-repo ~/.beads-planning --routing auto
# Step 3: Test with small workload
$ bd create "Test issue" --repo ~/.beads-planning
$ bd show # Verify hydration works
$ bd ready # Verify queries work
# Step 4: Verify executor compatibility
# - Run executor with multi-repo config
# - Check GetReadyWork() latency (<100ms)
# - Verify discovered issues route correctly
# Step 5: Migrate planning issues (optional)
$ bd migrate --move-to ~/.beads-planning --filter "label=experimental"
Rollback (if needed):
# Remove config.toml to revert to single-repo
$ rm .beads/config.toml
$ bd show # Back to single-repo mode
Summary: Self-Hosting Decision Tree
Is your project self-hosting? (building itself with beads)
├─ YES
│ ├─ Solo developer or unified team?
│ │ └─ Stay single-repo (simple, stable)
│ ├─ Multiple developers, different permissions?
│ │ └─ Consider multi-repo (team planning separation)
│ ├─ Multi-phase development (arch → impl → test)?
│ │ └─ Consider multi-repo (phase isolation)
│ ├─ Bootstrap/early phase?
│ │ └─ Stay single-repo (stability > flexibility)
│ └─ Automated executor?
│ └─ Stay single-repo unless team coordination needed
└─ NO (OSS contributor)
└─ Use multi-repo (planning repo separate from upstream)
Bottom line for self-hosting: Default to single-repo. Only adopt multi-repo when you have a proven, specific need.
Design Decisions (Resolved)
1. Namespace Collisions: Option B (Global Uniqueness)
Decision: Use globally unique hash-based IDs that include timestamp + random component.
Rationale (from VC feedback):
- Option C (allow collisions) breaks dependency references:
bd dep add bd-a3f8e9 bd-b7c2d1becomes ambiguous - Need to support cross-repo dependencies without repo-scoped namespacing
- Hash should be:
hash(title + description + timestamp_ms + random_4bytes) - Collision probability: ~1 in 10^12 (acceptable)
2. Cross-Repo Dependencies: Yes, Fully Supported
Decision: Dependencies work transparently across all repos.
Implementation:
- Hydrated database contains all issues from all repos
- Dependencies stored by ID only (no repo qualifier needed)
bd readychecks dependency graph across all repos- Writes route back to correct JSONL via
source_repometadata
3. Routing Mechanism: Config-Based, No Schema Changes
Decision: Use config-based routing + explicit --repo flag. No new schema fields.
Rationale:
IssueTypealready exists and is used semantically (bug, feature, task, epic, chore)- Labels are used semantically by VC (
discovered:blocker,no-auto-claim) - Routing is a storage concern, not issue metadata
- Simpler: auto-detect maintainer vs contributor from config
- Discovered issues inherit parent's
source_repoautomatically
4. Performance: Smart Caching with File Mtime Tracking
Decision: SQLite DB is the cache, JSONLs are source of truth.
Implementation:
type MultiRepoStorage struct {
repos []RepoConfig
db *sql.DB
repoMtimes map[string]time.Time // Track file modification times
}
func (s *MultiRepoStorage) GetReadyWork(ctx) ([]Issue, error) {
// Fast path: check if ANY JSONL changed
needSync := false
for repo, jsonlPath := range s.jsonlPaths() {
currentMtime := stat(jsonlPath).ModTime()
if currentMtime.After(s.repoMtimes[repo]) {
needSync = true
s.repoMtimes[repo] = currentMtime
}
}
// Only re-hydrate if something changed
if needSync {
s.rehydrate() // Expensive but rare
}
// Query is fast (in-memory SQLite)
return s.db.Query("SELECT * FROM issues WHERE ...")
}
Rationale: VC's polling loop (every 5-10 seconds) requires sub-second queries. File stat is microseconds, re-parsing only when needed.
Performance Benchmarks and Targets
Critical for library consumers (VC) with automated polling.
Performance Targets
Based on VC's polling loop requirements (every 5-10 seconds):
| Operation | Target | Rationale |
|---|---|---|
| File stat (per repo) | <1ms | Checking mtime of N JSONLs must be negligible |
| Hydration (full re-parse) | <500ms | Only happens when JSONL changes, rare in polling loop |
| Query (from cached DB) | <10ms | Common case: no JSONL changes, pure SQLite query |
| Total GetReadyWork() | <100ms | VC's hard requirement for responsive executor |
Scale Testing Targets
Test at multiple repo counts to ensure scaling:
| Repo Count | File Stat Total | Hydration (worst case) | Query (cached) | Total (cached) |
|---|---|---|---|---|
| N=1 (baseline) | <1ms | <200ms | <5ms | <10ms |
| N=3 (typical) | <3ms | <500ms | <10ms | <20ms |
| N=10 (edge case) | <10ms | <2s | <50ms | <100ms |
Assumptions:
- JSONL size: <25k per repo (see Design Principles)
- SQLite: In-memory mode (
:memory:orfile::memory:?cache=shared) - Cached case: No JSONL changes since last hydration (99% of polling loops)
Benchmark Suite (To Be Implemented)
// benchmark/multi_repo_test.go
func BenchmarkFileStatOverhead(b *testing.B) {
// Test: Stat N JSONL files
// Target: <1ms per repo
}
func BenchmarkHydrationN1(b *testing.B) {
// Test: Full hydration from 1 JSONL (20k file)
// Target: <200ms
}
func BenchmarkHydrationN3(b *testing.B) {
// Test: Full hydration from 3 JSONLs (20k each)
// Target: <500ms
}
func BenchmarkHydrationN10(b *testing.B) {
// Test: Full hydration from 10 JSONLs (20k each)
// Target: <2s
}
func BenchmarkQueryCached(b *testing.B) {
// Test: GetReadyWork() with no JSONL changes
// Target: <10ms
}
func BenchmarkGetReadyWorkN3(b *testing.B) {
// Test: Realistic polling loop (3 repos, cached)
// Target: <20ms total
}
Performance Optimization Notes
If benchmarks fail to meet targets, optimization strategies:
- Parallel file stats: Use goroutines to stat N JSONLs concurrently
- Incremental hydration: Only re-parse changed repos, merge into DB
- Smarter caching: Hash-based cache invalidation (mtime + file size)
- SQLite tuning:
PRAGMA synchronous = OFF,PRAGMA journal_mode = MEMORY - Lazy hydration: Defer hydration until first query after mtime change
Status
Benchmarks: ⏳ Not implemented yet (tracked in bd-wta) Targets: ✅ Documented above Validation: ⏳ Pending implementation
Next steps:
- Implement benchmark suite in
benchmark/multi_repo_test.go - Run benchmarks on realistic workloads (VC-sized DBs)
- Document results in this section
- File optimization issues if targets not met
5. Visibility Field: Optional, Backward Compatible
Decision: Add visibility as optional field, defaults to "canonical" if missing.
Schema:
type Issue struct {
// ... existing fields ...
Visibility *string `json:"visibility,omitempty"` // nil = canonical
}
States:
local: Personal planning onlyproposed: Staged for upstream PRcanonical: Accepted by upstream (or default for existing issues)
Orthogonality: Visibility and Status are independent:
status: in_progress, visibility: local→ Working on personal planningstatus: open, visibility: proposed→ Proposed to upstream, awaiting review
6. Library API Stability: Transparent Hydration
Decision: Hybrid approach - transparent by default, explicit opt-in available.
Backward Compatible:
// Existing code keeps working - reads config.toml automatically
store, err := beadsLib.NewSQLiteStorage(".beads/vc.db")
Explicit Override:
// Library consumers can override config
cfg := beadsLib.Config{
Primary: ".beads/vc.db",
Additional: []string{"~/.beads-planning"},
}
store, err := beadsLib.NewStorageWithConfig(cfg)
7. ACID Guarantees: Per-Repo File Locking
Decision: Use file-based locks per JSONL, atomic within single repo.
Implementation:
func (s *Storage) UpdateIssue(issue Issue) error {
sourceRepo := issue.SourceRepo
// Lock that repo's JSONL
lock := flock(sourceRepo + "/beads.jsonl.lock")
defer lock.Unlock()
// Read-modify-write
issues := s.readJSONL(sourceRepo)
issues.Update(issue)
s.writeJSONL(sourceRepo, issues)
// Update in-memory DB
s.db.Update(issue)
}
Limitation: Cross-repo transactions are NOT atomic (acceptable, rare use case).
Compatibility with Exclusive Lock Protocol
The per-repo file locking (Decision #7) is fully compatible with the existing exclusive lock protocol (see EXCLUSIVE_LOCK.md).
How they work together:
- Exclusive locks are daemon-level: The
.beads/.exclusive-lockprevents the bd daemon from operating on a specific database - File locks are operation-level: Per-JSONL file locks (
flock) ensure atomic read-modify-write for individual operations - Different scopes, complementary purposes:
- Exclusive lock: "This entire database is off-limits to the daemon"
- File lock: "This specific JSONL is being modified right now"
Multi-repo behavior:
With multi-repo configuration, each repo can have its own exclusive lock:
# VC executor locks its primary database
.beads/.exclusive-lock # Locks primary repo operations
# Planning repo can be locked independently
~/.beads-planning/.exclusive-lock # Locks planning repo operations
When both are active:
- If primary repo is locked: Daemon skips all operations on primary, but can still sync planning repo
- If planning repo is locked: Daemon skips planning repo, but can still sync primary
- If both locked: Daemon skips entire multi-repo workspace
No migration needed for library consumers:
Existing VC code (v0.17.3+) using exclusive locks will continue to work:
// VC's existing lock acquisition
lock, err := types.NewExclusiveLock("vc-executor", "1.0.0")
lockPath := filepath.Join(".beads", ".exclusive-lock")
os.WriteFile(lockPath, data, 0644)
// Works the same with multi-repo:
// - Locks .beads/ (primary repo)
// - Daemon skips primary, can still sync ~/.beads-planning if configured
Atomic multi-repo locking:
If a library consumer needs to lock all repos atomically:
// Lock all configured repos
repos := []string{".beads", filepath.ExpandUser("~/.beads-planning")}
for _, repo := range repos {
lockPath := filepath.Join(repo, ".exclusive-lock")
os.WriteFile(lockPath, lockData, 0644)
}
defer func() {
for _, repo := range repos {
os.Remove(filepath.Join(repo, ".exclusive-lock"))
}
}()
// Now daemon skips all repos until locks released
Summary: No breaking changes. Exclusive locks work per-repo in multi-repo configs, preventing daemon interference at repo granularity.
Key Learnings from VC Feedback
The VC project (VibeCoder) provided critical feedback as a real downstream consumer that uses beads as a library. Key insights:
1. Two Consumer Models
Beads has two distinct consumer types:
- CLI users: Use
bdcommands directly - Library consumers: Use
beadsLibin Go/TypeScript/etc. (like VC)
Multi-repo must work transparently for both.
2. Performance is Critical for Automation
VC's executor polls GetReadyWork() every 5-10 seconds. Multi-repo hydration must:
- Use smart caching (file mtime tracking)
- Avoid re-parsing JSONLs on every query
- Keep queries sub-second (ideally <100ms)
3. Special Labels Must Work Across Repos
VC uses semantic labels that must work regardless of repo:
discovered:blocker- Auto-generated blocker issues (priority boost)discovered:related- Auto-generated related workno-auto-claim- Prevent executor from claimingbaseline-failure- Self-healing baseline failures
These are semantic labels, not routing labels. Don't overload labels for routing.
4. Discovered Issues Routing
When VC's analysis phase auto-creates issues with discovered:blocker label, they should:
- Inherit parent's
source_repoautomatically - Stay co-located with related work
- Not require manual routing decisions
5. Library API Stability is Non-Negotiable
VC's code uses beadsLib.NewSQLiteStorage(). Must not break. Solution:
- Read
.beads/config.tomlautomatically (transparent) - Provide
NewStorageWithConfig()for explicit override - Hydration happens at storage layer, invisible to library consumers
Remaining Open Questions
-
Sync semantics: When upstream accepts a proposed issue and modifies it, how to sync back?
- Option A: Mark as "accepted" in planning repo, keep both copies
- Option B: Delete from planning repo (it's now canonical)
- Option C: Keep in planning repo but mark as read-only mirror
-
Discovery: How do users learn about this feature?
- Auto-prompt when detecting fork/contributor status?
- Docs + examples?
bd init --contributorwizard?
-
Metadata fields: Should
source_repobe exposed in JSON export, or keep it internal to storage layer? -
Proposed issue lifecycle: What happens to proposed issues after PR is merged/rejected?
- Auto-delete from planning repo?
- Mark as "accepted" or "rejected"?
- Manual cleanup via
bd withdraw?
Success Metrics
How we'll know this works:
- Zero pollution: No contributor planning issues accidentally merged upstream
- Multi-clone sync: Workers on different machines see consistent state (via VCS sync)
- Flexibility: Users can configure for their workflow (personas, phases, etc.)
- Backwards compat: Existing single-repo users unaffected
- VCS-agnostic: Works with git, jj, hg, sl, or no VCS
Next Actions
Suggested epics/issues to create (can be done in follow-up session):
-
Epic: Multi-repo hydration layer
- Design schema for source_repo metadata
- Implement config parsing for repos.additional
- Build hydration logic (read from N JSONLs)
- Build write routing (write to correct JSONL)
-
Epic: Proposal workflow
- Implement
bd proposecommand - Implement
bd withdrawcommand - Implement
bd acceptcommand (maintainer only) - Design visibility state machine
- Implement
-
Epic: Auto-routing
- Detect maintainer vs contributor status
- Implement routing rules (label, priority, custom)
- Make
bd addroute to correct repo
-
Epic: VCS-agnostic sync
- Detect VCS type per repo
- Implement sync adapters (git, jj, hg, sl)
- Handle mixed-VCS multi-repo configs
-
Epic: Migration and onboarding
- Write migration guide
- Implement
bd migratecommand - Create init wizards for common scenarios
- Update documentation
Summary and Next Steps
This document represents the design evolution for multi-repo support in beads, driven by:
- Original problem (GitHub #207): Contributors' personal planning pollutes upstream PRs
- Core insight: Beads is a separate communication channel that happens to use VCS
- VC feedback: Real-world library consumer with specific performance and API stability needs
Final Architecture
Solution #4 (Separate Repos) with these refinements:
- N ≥ 1 repos: Single repo (N=1) is default, multi-repo is opt-in
- VCS-agnostic: Works with git, jj, hg, sapling, or no VCS
- Config-based routing: No schema changes, auto-detect maintainer vs contributor
- Smart caching: File mtime tracking, SQLite DB as cache layer
- Transparent hydration: Library API remains stable, config-driven
- Global namespace: Hash-based IDs with timestamp + random for uniqueness
- Cross-repo dependencies: Fully supported, transparent to users
- Discovered issues: Inherit parent's source_repo automatically
Why This Design Wins
- Zero PR pollution: Separate git histories = impossible to accidentally merge planning
- Git ledger preserved: All repos are VCS-tracked, full forensic capability
- Maximum flexibility: Supports OSS contributors, multi-phase dev, multi-persona workflows
- Backward compatible: Existing single-repo users unchanged
- Performance: Sub-second queries even with polling loops
- Library-friendly: Transparent to downstream consumers like VC
Related Documents
- Original issue: GitHub #207
- VC feedback:
./vc-feedback-on-multi-repo.md - Implementation tracking: TBD (epics to be created)
Status
Design: ✅ Complete (pending resolution of open questions) Implementation: ⏳ Not started Target: TBD
Last updated: 2025-11-03