diff --git a/cmd/bd/MAIN_TEST_OPTIMIZATION_PLAN.md b/cmd/bd/MAIN_TEST_OPTIMIZATION_PLAN.md deleted file mode 100644 index 5314df2d..00000000 --- a/cmd/bd/MAIN_TEST_OPTIMIZATION_PLAN.md +++ /dev/null @@ -1,249 +0,0 @@ -# main_test.go Performance Optimization Plan - -## Executive Summary -Tests are currently **hanging indefinitely** due to nil `rootCtx`. With fixes, we can achieve **90%+ speedup** (from ~60s+ to <5s total). - -## Critical Fixes (MUST DO) - -### Fix 1: Initialize rootCtx in Tests -**Impact**: Fixes hanging tests ← BLOCKING ISSUE -**Effort**: 5 minutes - -Add to all tests that call `flushToJSONL()` or `autoImportIfNewer()`: - -```go -func TestAutoFlushJSONLContent(t *testing.T) { - // FIX: Initialize rootCtx for flush operations - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) - defer cancel() - - oldRootCtx := rootCtx - rootCtx = ctx - defer func() { rootCtx = oldRootCtx }() - - // rest of test... -} -``` - -**Files affected**: -- TestAutoFlushOnExit -- TestAutoFlushJSONLContent -- TestAutoFlushErrorHandling -- TestAutoImportIfNewer -- TestAutoImportDisabled -- TestAutoImportWithUpdate -- TestAutoImportNoUpdate -- TestAutoImportMergeConflict -- TestAutoImportConflictMarkerFalsePositive -- TestAutoImportClosedAtInvariant - -### Fix 2: Reduce Sleep Durations -**Impact**: Saves ~280ms -**Effort**: 2 minutes - -```go -// BEFORE -time.Sleep(200 * time.Millisecond) - -// AFTER -time.Sleep(20 * time.Millisecond) // 10x faster, still reliable - -// BEFORE -time.Sleep(100 * time.Millisecond) - -// AFTER -time.Sleep(10 * time.Millisecond) // 10x faster -``` - -**Rationale**: We're not testing actual timing, just sequencing. Shorter sleeps work fine. - -## High-Impact Optimizations (RECOMMENDED) - -### Opt 1: Share Test Fixtures -**Impact**: Saves ~1-1.5s -**Effort**: 15 minutes - -Group related tests and reuse DB: - -```go -func TestAutoFlushGroup(t *testing.T) { - // Setup once - tmpDir := t.TempDir() - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) - defer cancel() - rootCtx = ctx - defer func() { rootCtx = nil }() - - // Subtest 1: DirtyMarking (no DB needed!) - t.Run("DirtyMarking", func(t *testing.T) { - autoFlushEnabled = true - isDirty = false - if flushTimer != nil { - flushTimer.Stop() - flushTimer = nil - } - - markDirtyAndScheduleFlush() - - flushMutex.Lock() - dirty := isDirty - hasTimer := flushTimer != nil - flushMutex.Unlock() - - assert(dirty && hasTimer) - }) - - // Subtest 2: Disabled (no DB needed!) - t.Run("Disabled", func(t *testing.T) { - autoFlushEnabled = false - isDirty = false - // ... - }) - - // Shared DB for remaining tests - dbPath := filepath.Join(tmpDir, "shared.db") - testStore := newTestStore(t, dbPath) - store = testStore - storeMutex.Lock() - storeActive = true - storeMutex.Unlock() - defer func() { - storeMutex.Lock() - storeActive = false - storeMutex.Unlock() - }() - - t.Run("OnExit", func(t *testing.T) { - // reuse testStore... - }) - - t.Run("JSONLContent", func(t *testing.T) { - // reuse testStore... - }) -} -``` - -**Reduces DB setups from 14 to ~4-5** - -### Opt 2: Use In-Memory SQLite -**Impact**: Saves ~800ms-1.2s -**Effort**: 10 minutes - -```go -func newFastTestStore(t *testing.T) *sqlite.SQLiteStorage { - t.Helper() - - // Use :memory: for speed (10-20x faster than file-based) - store, err := sqlite.New(context.Background(), ":memory:") - if err != nil { - t.Fatalf("Failed to create in-memory database: %v", err) - } - - ctx := context.Background() - if err := store.SetConfig(ctx, "issue_prefix", "test"); err != nil { - store.Close() - t.Fatalf("Failed to set issue_prefix: %v", err) - } - - t.Cleanup(func() { store.Close() }) - return store -} -``` - -**Use for tests that don't need filesystem integration** - -### Opt 3: Skip TestAutoFlushDebounce -**Impact**: Re-enables skipped test with fix -**Effort**: 5 minutes - -The test is currently skipped (line 93). Fix the config issue: - -```go -func TestAutoFlushDebounce(t *testing.T) { - // REMOVED: t.Skip() - - // FIX: Don't rely on config.Set during test - // Instead, directly manipulate flushDebounce duration via test helper - oldDebounce := flushDebounce - flushDebounce = 20 * time.Millisecond // Fast for testing - defer func() { flushDebounce = oldDebounce }() - - // rest of test... -} -``` - -## Medium-Impact Optimizations (NICE TO HAVE) - -### Opt 4: Parallel Test Execution -**Impact**: 40-60% faster with t.Parallel() -**Effort**: 20 minutes - -**Careful!** Only parallelize tests that don't manipulate global state: -- TestImportOpenToClosedTransition ✓ -- TestImportClosedToOpenTransition ✓ -- (most can't be parallel due to global state) - -```go -func TestImportOpenToClosedTransition(t *testing.T) { - t.Parallel() // Safe - no global state - // ... -} -``` - -### Opt 5: Mock flushToJSONL() for State Tests -**Impact**: Saves ~200ms -**Effort**: 30 minutes - -Tests like `TestAutoFlushDirtyMarking` don't need actual flushing: - -```go -var flushToJSONLFunc = flushToJSONL // Allow mocking - -func TestAutoFlushDirtyMarking(t *testing.T) { - flushToJSONLFunc = func() {} // No-op - defer func() { flushToJSONLFunc = flushToJSONL }() - - // Test just the state management... -} -``` - -## Expected Results - -| Approach | Time Savings | Effort | Recommendation | -|----------|-------------|--------|----------------| -| Fix 1: rootCtx | Unblocks tests | 5 min | **DO NOW** | -| Fix 2: Reduce sleeps | ~280ms | 2 min | **DO NOW** | -| Opt 1: Share fixtures | ~1.2s | 15 min | **DO NOW** | -| Opt 2: In-memory DB | ~1s | 10 min | **RECOMMENDED** | -| Opt 3: Fix debounce test | Enables test | 5 min | **RECOMMENDED** | -| Opt 4: Parallel | ~2s (40%) | 20 min | Nice to have | -| Opt 5: Mock flushToJSONL | ~200ms | 30 min | Optional | - -**Total speedup with Fixes + Opts 1-3: ~2.5-3s (from baseline after fixing hangs)** -**Total effort: ~40 minutes** - -## Implementation Order - -1. **Fix 1** (5 min) - Fixes hanging tests -2. **Fix 2** (2 min) - Quick win -3. **Opt 2** (10 min) - In-memory DBs where possible -4. **Opt 1** (15 min) - Share fixtures -5. **Opt 3** (5 min) - Fix skipped test -6. **Opt 4** (20 min) - Parallelize safe tests (optional) - -## Alternative: Rewrite as Integration Tests - -If tests remain slow after optimizations, consider: -- Move to `cmd/bd/integration_test` directory -- Run with `-short` flag to skip in normal CI -- Keep only smoke tests in main_test.go - -**Trade-off**: Slower tests, but better integration coverage - -## Validation - -After changes, run: -```bash -go test -run "^TestAuto" -count=5 # Should complete in <5s consistently -go test -race -run "^TestAuto" # Verify no race conditions -``` diff --git a/cmd/bd/import.go b/cmd/bd/import.go index 0506310e..5c71993d 100644 --- a/cmd/bd/import.go +++ b/cmd/bd/import.go @@ -69,6 +69,7 @@ NOTE: Import requires direct database access and does not work with daemon mode. dedupeAfter, _ := cmd.Flags().GetBool("dedupe-after") clearDuplicateExternalRefs, _ := cmd.Flags().GetBool("clear-duplicate-external-refs") orphanHandling, _ := cmd.Flags().GetString("orphan-handling") + force, _ := cmd.Flags().GetBool("force") // Open input in := os.Stdin @@ -309,7 +310,8 @@ NOTE: Import requires direct database access and does not work with daemon mode. // Update last_import_hash metadata to enable content-based staleness detection (bd-khnb fix) // This prevents git operations from resurrecting deleted issues by comparing content instead of mtime - if input != "" { + // When --force is true, ALWAYS update metadata even if no changes were made + if input != "" && (result.Created > 0 || result.Updated > 0 || len(result.IDMapping) > 0 || force) { if currentHash, err := computeJSONLHash(input); err == nil { if err := store.SetMetadata(ctx, "last_import_hash", currentHash); err != nil { // Non-fatal warning: Metadata update failures are intentionally non-fatal to prevent blocking @@ -358,6 +360,11 @@ NOTE: Import requires direct database access and does not work with daemon mode. } fmt.Fprintf(os.Stderr, "\n") + // Print force message if metadata was updated despite no changes + if force && result.Created == 0 && result.Updated == 0 && len(result.IDMapping) == 0 { + fmt.Fprintf(os.Stderr, "Metadata updated (database already in sync with JSONL)\n") + } + // Run duplicate detection if requested if dedupeAfter { fmt.Fprintf(os.Stderr, "\n=== Post-Import Duplicate Detection ===\n") @@ -697,6 +704,7 @@ func init() { importCmd.Flags().Bool("rename-on-import", false, "Rename imported issues to match database prefix (updates all references)") importCmd.Flags().Bool("clear-duplicate-external-refs", false, "Clear duplicate external_ref values (keeps first occurrence)") importCmd.Flags().String("orphan-handling", "", "How to handle missing parent issues: strict/resurrect/skip/allow (default: use config or 'allow')") + importCmd.Flags().Bool("force", false, "Force metadata update even when database is already in sync with JSONL") importCmd.Flags().BoolVar(&jsonOutput, "json", false, "Output import statistics in JSON format") rootCmd.AddCommand(importCmd) } diff --git a/cmd/bd/main.go b/cmd/bd/main.go index a15cf9b6..4ae5ed49 100644 --- a/cmd/bd/main.go +++ b/cmd/bd/main.go @@ -89,6 +89,7 @@ var ( noAutoFlush bool noAutoImport bool sandboxMode bool + allowStale bool // Use --allow-stale: skip staleness check (emergency escape hatch) noDb bool // Use --no-db mode: load from JSONL, write back after each command profileEnabled bool profileFile *os.File @@ -109,6 +110,7 @@ func init() { rootCmd.PersistentFlags().BoolVar(&noAutoFlush, "no-auto-flush", false, "Disable automatic JSONL sync after CRUD operations") rootCmd.PersistentFlags().BoolVar(&noAutoImport, "no-auto-import", false, "Disable automatic JSONL import when newer than DB") rootCmd.PersistentFlags().BoolVar(&sandboxMode, "sandbox", false, "Sandbox mode: disables daemon and auto-sync") + rootCmd.PersistentFlags().BoolVar(&allowStale, "allow-stale", false, "Allow operations on potentially stale data (skip staleness check)") rootCmd.PersistentFlags().BoolVar(&noDb, "no-db", false, "Use no-db mode: load from JSONL, no SQLite") rootCmd.PersistentFlags().BoolVar(&profileEnabled, "profile", false, "Generate CPU profile for performance analysis") diff --git a/cmd/bd/staleness.go b/cmd/bd/staleness.go index 73497184..412dcd24 100644 --- a/cmd/bd/staleness.go +++ b/cmd/bd/staleness.go @@ -18,6 +18,11 @@ import ( // Implements bd-2q6d: All read operations should validate database freshness. // Implements bd-c4rq: Daemon check moved to call sites to avoid function call overhead. func ensureDatabaseFresh(ctx context.Context) error { + if allowStale { + fmt.Fprintf(os.Stderr, "⚠️ Staleness check skipped (--allow-stale), data may be out of sync\n") + return nil + } + // Skip check if no storage available (shouldn't happen in practice) if store == nil { return nil @@ -43,7 +48,11 @@ func ensureDatabaseFresh(ctx context.Context) error { "The JSONL file has been updated (e.g., after 'git pull') but the database\n"+ "hasn't been imported yet. This would cause you to see stale/incomplete data.\n\n"+ "To fix:\n"+ - " bd import # Import JSONL updates to database\n\n"+ + " bd import -i .beads/beads.jsonl # Import JSONL updates to database\n\n"+ + "If in a sandboxed environment (e.g., Codex) where daemon can't be stopped:\n"+ + " bd --sandbox ready # Use direct mode (no daemon)\n"+ + " bd import --force # Force metadata update\n"+ + " bd ready --allow-stale # Skip staleness check (use with caution)\n\n"+ "Or use daemon mode (auto-imports on every operation):\n"+ " bd daemon start\n"+ " bd # Will auto-import before executing", diff --git a/docs/GH353_INVESTIGATION.md b/docs/GH353_INVESTIGATION.md new file mode 100644 index 00000000..573a6ead --- /dev/null +++ b/docs/GH353_INVESTIGATION.md @@ -0,0 +1,302 @@ +# Investigation: GH #353 - Daemon Locking Issues in Codex Sandbox + +## Problem Summary + +When running `bd` inside the Codex sandbox (macOS host), users encounter persistent "Database out of sync with JSONL" errors that cannot be resolved through normal means (`bd import`). The root cause is a daemon process that the sandbox cannot signal or kill, creating a deadlock situation. + +## Root Cause Analysis + +### The Daemon Locking Mechanism + +The daemon uses three mechanisms to claim a database: + +1. **File lock (`flock`)** on `.beads/daemon.lock` - exclusive lock held while daemon is running +2. **PID file** at `.beads/daemon.pid` - contains daemon process ID (Windows compatibility) +3. **Lock metadata** in `daemon.lock` - JSON containing PID, database path, version, start time + +**Source:** `cmd/bd/daemon_lock.go` + +### Process Verification Issue + +On Unix systems, `isProcessRunning()` uses `syscall.Kill(pid, 0)` to check if a process exists. In sandboxed environments: + +- The daemon PID exists in the lock file +- `syscall.Kill(pid, 0)` returns EPERM (operation not permitted) +- The CLI can't verify if the daemon is actually running +- The CLI can't send signals to stop the daemon + +**Source:** `cmd/bd/daemon_unix.go:26-28` + +### Staleness Check Flow + +When running `bd ready` or other read commands: + +1. **With daemon connected:** + - Command → Daemon RPC → `checkAndAutoImportIfStale()` + - Daemon checks JSONL mtime vs `last_import_time` metadata + - Daemon auto-imports if stale (with safeguards) + - **Source:** `internal/rpc/server_export_import_auto.go:171-303` + +2. **Without daemon (direct mode):** + - Command → `ensureDatabaseFresh(ctx)` check + - Compares JSONL mtime vs `last_import_time` metadata + - **Refuses to proceed** if stale, shows error message + - **Source:** `cmd/bd/staleness.go:20-51` + +### The Deadlock Scenario + +1. Daemon is running outside sandbox with database lock +2. User (in sandbox) runs `bd ready` +3. CLI tries to connect to daemon → connection fails or daemon is unreachable +4. CLI falls back to direct mode +5. Direct mode checks staleness → JSONL is newer than metadata +6. Error: "Database out of sync with JSONL. Run 'bd import' first." +7. User runs `bd import -i .beads/beads.jsonl` +8. Import updates metadata in database file +9. **But daemon still running with OLD metadata cached in memory** +10. User runs `bd ready` again → CLI connects to daemon +11. Daemon checks staleness using **cached metadata** → still stale! +12. **Infinite loop:** Can't fix because can't restart daemon + +### Why `--no-daemon` Doesn't Always Work + +The `--no-daemon` flag should work by setting `daemonClient = nil` and skipping daemon connection (**source:** `cmd/bd/main.go:287-289`). However: + +1. If JSONL is genuinely newer than database (e.g., after `git pull`), the staleness check in direct mode will still fail +2. If the user doesn't specify `--no-daemon` consistently, the CLI will reconnect to the stale daemon +3. The daemon may still hold file locks that interfere with direct operations + +## Existing Workarounds + +### The `--sandbox` Flag + +Already exists! Sets: +- `noDaemon = true` (skip daemon) +- `noAutoFlush = true` (skip auto-flush) +- `noAutoImport = true` (skip auto-import) + +**Source:** `cmd/bd/main.go:201-206` + +**Issue:** Still runs staleness check in direct mode, which fails if JSONL is actually newer. + +## Proposed Solutions + +### Solution 1: Force-Import Flag (Quick Fix) ⭐ **Recommended** + +Add `--force` flag to `bd import` that: +- Updates `last_import_time` and `last_import_hash` metadata even when 0 issues imported +- Explicitly touches database file to update mtime +- Prints clear message: "Metadata updated (database already in sync)" + +**Pros:** +- Minimal code change +- Solves immediate problem +- User can manually fix stuck state + +**Cons:** +- Requires user to know about --force flag +- Doesn't prevent the problem from occurring + +**Implementation location:** `cmd/bd/import.go` around line 349 + +### Solution 2: Skip-Staleness Flag (Escape Hatch) ⭐ **Recommended** + +Add `--allow-stale` or `--no-staleness-check` global flag that: +- Bypasses `ensureDatabaseFresh()` check entirely +- Allows operations on potentially stale data +- Prints warning: "⚠️ Staleness check skipped, data may be out of sync" + +**Pros:** +- Emergency escape hatch when stuck +- Minimal invasive change +- Works with `--sandbox` mode + +**Cons:** +- User can accidentally work with stale data +- Should be well-documented as last resort + +**Implementation location:** `cmd/bd/staleness.go:20` and callers + +### Solution 3: Sandbox Detection (Automatic) ⭐⭐ **Best Long-term** + +Auto-detect sandbox environment and adjust behavior: + +```go +func isSandboxed() bool { + // Try to signal a known process (e.g., our own parent) + // If we get EPERM, we're likely sandboxed + if syscall.Kill(os.Getppid(), 0) != nil { + if err == syscall.EPERM { + return true + } + } + return false +} + +// In PersistentPreRun: +if isSandboxed() { + sandboxMode = true // Auto-enable sandbox mode + fmt.Fprintf(os.Stderr, "ℹ️ Sandbox detected, using direct mode\n") +} +``` + +Additionally, when daemon connection fails with permission errors: +- Automatically set `noDaemon = true` for subsequent operations +- Skip daemon health checks that require process signals + +**Pros:** +- Zero configuration for users +- Prevents the problem entirely +- Graceful degradation + +**Cons:** +- More complex heuristic +- May have false positives +- Requires testing in various environments + +**Implementation locations:** +- `cmd/bd/main.go` (detection) +- `cmd/bd/daemon_unix.go` (process checks) + +### Solution 4: Better Daemon Health Checks (Robust) + +Enhance daemon health check to detect unreachable daemons: + +1. When `daemonClient.Health()` fails, check why: + - Connection refused → daemon not running + - Timeout → daemon unreachable (sandbox?) + - Permission denied → sandbox detected + +2. On sandbox detection, automatically: + - Set `noDaemon = true` + - Clear cached daemon client + - Proceed in direct mode + +**Pros:** +- Automatic recovery +- Better error messages +- Handles edge cases + +**Cons:** +- Requires careful timeout tuning +- More complex state management + +**Implementation location:** `cmd/bd/main.go` around lines 300-367 + +### Solution 5: Daemon Metadata Refresh (Prevents Staleness) + +Make daemon periodically refresh metadata from disk: + +```go +// In daemon event loop, check metadata every N seconds +if time.Since(lastMetadataCheck) > 5*time.Second { + lastImportTime, _ := store.GetMetadata(ctx, "last_import_time") + // Update cached value +} +``` + +**Pros:** +- Daemon picks up external import operations +- Reduces stale metadata issues +- Works for other scenarios too + +**Cons:** +- Doesn't solve sandbox permission issues +- Adds I/O overhead +- Still requires daemon restart eventually + +**Implementation location:** `cmd/bd/daemon_event_loop.go` + +## Recommended Implementation Plan + +### Phase 1: Immediate Relief (1-2 hours) +1. ✅ Add `--force` flag to `bd import` (Solution 1) +2. ✅ Add `--allow-stale` global flag (Solution 2) +3. ✅ Update error message to suggest these flags + +### Phase 2: Better UX (3-4 hours) +1. ✅ Implement sandbox detection heuristic (Solution 3) +2. ✅ Auto-enable `--sandbox` mode when detected +3. ✅ Update docs with sandbox troubleshooting + +### Phase 3: Robustness (5-6 hours) +1. Enhance daemon health checks (Solution 4) +2. Add daemon metadata refresh (Solution 5) +3. Comprehensive testing in sandbox environments + +## Testing Strategy + +### Manual Testing in Codex Sandbox +1. Start daemon outside sandbox +2. Run `bd ready` inside sandbox → should detect sandbox +3. Run `bd import --force` → should update metadata +4. Run `bd ready --allow-stale` → should work despite staleness + +### Automated Testing +1. Mock sandboxed environment (permission denied on signals) +2. Test daemon connection failure scenarios +3. Test metadata update in import with 0 changes +4. Test staleness check bypass flag + +## Documentation Updates Needed + +1. **TROUBLESHOOTING.md** - Add sandbox section with: + - Symptoms of daemon lock issues + - `--sandbox` flag usage + - `--force` and `--allow-stale` as escape hatches + +2. **CLI_REFERENCE.md** - Document new flags: + - `--allow-stale` / `--no-staleness-check` + - `bd import --force` + +3. **Error message** in `staleness.go` - Add: + ``` + If you're in a sandboxed environment (e.g., Codex): + bd --sandbox ready + bd import --force -i .beads/beads.jsonl + ``` + +## Files to Modify + +### Critical Path (Phase 1) +- [ ] `cmd/bd/import.go` - Add --force flag +- [ ] `cmd/bd/staleness.go` - Add staleness bypass, update error message +- [ ] `cmd/bd/main.go` - Add --allow-stale flag + +### Enhancement (Phase 2-3) +- [ ] `cmd/bd/main.go` - Sandbox detection +- [ ] `cmd/bd/daemon_unix.go` - Permission-aware process checks +- [ ] `cmd/bd/daemon_event_loop.go` - Metadata refresh +- [ ] `internal/rpc/server_export_import_auto.go` - Better import handling + +### Documentation +- [ ] `docs/TROUBLESHOOTING.md` +- [ ] `docs/CLI_REFERENCE.md` +- [ ] Issue #353 comment with workaround + +## Open Questions + +1. Should `--sandbox` auto-detect, or require explicit flag? + - **Recommendation:** Start with explicit, add auto-detect in Phase 2 + +2. Should `--allow-stale` be per-command or global? + - **Recommendation:** Global flag (less repetition) + +3. What should happen to daemon lock files when daemon is unreachable? + - **Recommendation:** Leave them (don't force-break locks), use direct mode + +4. Should we add a `--force-direct` that ignores daemon locks entirely? + - **Recommendation:** Not needed if sandbox detection works well + +## Success Metrics + +- Users in Codex can run `bd ready` without errors +- No false positives in sandbox detection +- Clear error messages guide users to solutions +- `bd import --force` always updates metadata +- `--sandbox` mode works reliably + +--- + +**Investigation completed:** 2025-11-21 +**Next steps:** Implement Phase 1 solutions diff --git a/docs/GH353_NEXT_SESSION.md b/docs/GH353_NEXT_SESSION.md new file mode 100644 index 00000000..356335d8 --- /dev/null +++ b/docs/GH353_NEXT_SESSION.md @@ -0,0 +1,128 @@ +# Next Session Prompt: Implement GH #353 Fixes + +## Context +We've investigated GH #353 (daemon locking issues in Codex sandbox). Full analysis in `docs/GH353_INVESTIGATION.md`. + +**TL;DR:** Users in sandboxed environments (Codex) get stuck with "Database out of sync" errors because: +1. Running daemon has cached metadata +2. `bd import` updates database but daemon never sees it +3. Sandbox can't signal/kill the daemon +4. User is stuck in infinite loop + +## Task: Implement Phase 1 Solutions + +Implement three quick fixes that give users escape hatches: + +### 1. Add `--force` flag to `bd import` +**File:** `cmd/bd/import.go` + +**What to do:** +- Add `--force` flag to importCmd.Flags() (around line 692) +- When `--force` is true, ALWAYS update metadata (lines 310-346) even if `created == 0 && updated == 0` +- Print message: "Metadata updated (database already in sync with JSONL)" +- Ensure `TouchDatabaseFile()` is called to update mtime + +**Why:** Allows users to manually force metadata sync when stuck + +### 2. Add `--allow-stale` global flag +**File:** `cmd/bd/main.go` + +**What to do:** +- Add global var: `allowStale bool` +- Add to rootCmd.PersistentFlags(): `--allow-stale` (around line 111) +- Description: "Allow operations on potentially stale data (skip staleness check)" + +**File:** `cmd/bd/staleness.go` + +**What to do:** +- At top of `ensureDatabaseFresh()` function (line 20), add: + ```go + if allowStale { + fmt.Fprintf(os.Stderr, "⚠️ Staleness check skipped (--allow-stale), data may be out of sync\n") + return nil + } + ``` + +**Why:** Emergency escape hatch when staleness check blocks operations + +### 3. Improve error message in staleness.go +**File:** `cmd/bd/staleness.go` + +**What to do:** +- Update the error message (lines 41-50) to add sandbox guidance: + ```go + return fmt.Errorf( + "Database out of sync with JSONL. Run 'bd import' first.\n\n"+ + "The JSONL file has been updated (e.g., after 'git pull') but the database\n"+ + "hasn't been imported yet. This would cause you to see stale/incomplete data.\n\n"+ + "To fix:\n"+ + " bd import -i .beads/beads.jsonl # Import JSONL updates to database\n\n"+ + "If in a sandboxed environment (e.g., Codex) where daemon can't be stopped:\n"+ + " bd --sandbox ready # Use direct mode (no daemon)\n"+ + " bd import --force # Force metadata update\n"+ + " bd ready --allow-stale # Skip staleness check (use with caution)\n\n"+ + "Or use daemon mode (auto-imports on every operation):\n"+ + " bd daemon start\n"+ + " bd # Will auto-import before executing", + ) + ``` + +**Why:** Guides users to the right solution based on their environment + +## Testing Checklist + +After implementation: + +- [ ] `bd import --force -i .beads/beads.jsonl` updates metadata even with 0 changes +- [ ] `bd import --force` without `-i` flag shows appropriate error (needs input file) +- [ ] `bd ready --allow-stale` bypasses staleness check and shows warning +- [ ] Error message displays correctly and includes sandbox guidance +- [ ] `--sandbox` mode still works as before +- [ ] Flags appear in `bd --help` and `bd import --help` + +## Quick Start Commands + +```bash +# 1. Review the investigation +cat docs/GH353_INVESTIGATION.md + +# 2. Check current import.go implementation +grep -A 5 "func init()" cmd/bd/import.go + +# 3. Check current staleness.go +head -60 cmd/bd/staleness.go + +# 4. Run existing tests to establish baseline +go test ./cmd/bd/... -run TestImport +go test ./cmd/bd/... -run TestStaleness + +# 5. Implement changes (see sections above) + +# 6. Test manually +bd import --help | grep force +bd --help | grep allow-stale +``` + +## Expected Outcome + +Users stuck in Codex sandbox can: +1. Run `bd import --force -i .beads/beads.jsonl` to fix metadata +2. Run `bd --sandbox ready` to use direct mode +3. Run `bd ready --allow-stale` as last resort +4. See helpful error message explaining their options + +## References + +- **Investigation:** `docs/GH353_INVESTIGATION.md` +- **Issue:** https://github.com/steveyegge/beads/issues/353 +- **Key files:** + - `cmd/bd/import.go` (import command) + - `cmd/bd/staleness.go` (staleness check) + - `cmd/bd/main.go` (global flags) + +## Estimated Time +~1-2 hours for implementation + testing + +--- + +**Ready to implement?** Start with adding the flags, then update the error message, then test thoroughly. diff --git a/docs/HASH_ID_DESIGN.md b/docs/HASH_ID_DESIGN.md deleted file mode 100644 index b5d1d6eb..00000000 --- a/docs/HASH_ID_DESIGN.md +++ /dev/null @@ -1,329 +0,0 @@ -# Hash-Based ID Generation Design - -**Status:** Implemented (bd-166) -**Version:** 2.0 -**Last Updated:** 2025-10-30 - -## Overview - -bd v2.0 replaces sequential auto-increment IDs (bd-1, bd-2) with content-hash based IDs (bd-af78e9a2) and hierarchical sequential children (bd-af78e9a2.1, .2, .3). - -This eliminates ID collisions in distributed workflows while maintaining human-friendly IDs for related work. - -## ID Format - -### Top-Level IDs (Hash-Based) -``` -Format: {prefix}-{6-8-char-hex} (progressive on collision) -Examples: - bd-a3f2dd (6 chars, common case ~97%) - bd-a3f2dda (7 chars, rare collision ~3%) - bd-a3f2dda8 (8 chars, very rare double collision) -``` - -- **Prefix:** Configurable (bd, ticket, bug, etc.) -- **Hash:** First 6 characters of SHA256 hash (extends to 7-8 on collision) -- **Total length:** 9-11 chars for "bd-" prefix - -### Hierarchical Child IDs (Sequential) -``` -Format: {parent-id}.{child-number} -Examples: - bd-a3f2dd.1 (depth 1, 6-char parent) - bd-a3f2dda.1.2 (depth 2, 7-char parent on collision) - bd-a3f2dd.1.2.3 (depth 3, max depth) -``` - -- **Max depth:** 3 levels (prevents over-decomposition) -- **Max breadth:** Unlimited (tested up to 347 children) -- **Max ID length:** ~17 chars at depth 3 (6-char parent + .N.N.N) - -## Hash Generation Algorithm - -```go -func GenerateHashID(prefix, title, description string, created time.Time, workspaceID string) string { - h := sha256.New() - h.Write([]byte(title)) - h.Write([]byte(description)) - h.Write([]byte(created.Format(time.RFC3339Nano))) - h.Write([]byte(workspaceID)) - hash := hex.EncodeToString(h.Sum(nil)) - return fmt.Sprintf("%s-%s", prefix, hash[:8]) -} -``` - -### Hash Inputs - -1. **Title** - Primary identifier for the issue -2. **Description** - Additional context for uniqueness -3. **Created timestamp** - RFC3339Nano format for nanosecond precision -4. **Workspace ID** - Prevents collisions across databases/teams - -### Design Decisions - -**Why include timestamp?** -- Ensures different issues with identical title+description get unique IDs -- Nanosecond precision makes simultaneous creation unlikely - -**Why include workspace ID?** -- Prevents collisions when merging databases from different teams -- Can be hostname, UUID, or team identifier - -**Why NOT include priority/type?** -- These fields are mutable and shouldn't affect identity -- Changing priority shouldn't change the issue ID - -## Content Hash (Collision Detection) - -Separate from ID generation, bd uses content hashing for collision detection during import. See `internal/storage/sqlite/collision.go:hashIssueContent()`. - -### Content Hash Fields - -The content hash includes ALL semantically meaningful fields: -- title, description, status, priority, issue_type -- assignee, design, acceptance_criteria, notes -- **external_ref** ⚠️ (important: see below) - -### External Ref in Content Hash - -**IMPORTANT:** `external_ref` is included in the content hash. This has subtle implications: - -``` -Local issue (no external_ref) → content hash A -Same issue + external_ref → content hash B (different!) -``` - -**Why include external_ref?** -- Linkage to external systems (Jira, GitHub, Linear) is semantically meaningful -- Changing external_ref represents a real content change -- Ensures external system changes are tracked properly - -**Implications:** -1. **Rename detection** won't match issues before/after adding external_ref -2. **Collision detection** treats external_ref changes as updates -3. **Idempotent import** requires identical external_ref -4. **Import by external_ref** still works (checked before content hash) - -**Example scenario:** -```bash -# 1. Create local issue -bd create "Fix auth bug" -p 1 -# → ID: bd-a3f2dd, content_hash: abc123 - -# 2. Link to Jira -bd update bd-a3f2dd --external-ref JIRA-456 -# → ID: bd-a3f2dd (same), content_hash: def789 (changed!) - -# 3. Re-import from Jira -bd import -i jira-export.jsonl -# → Matches by external_ref first (JIRA-456) -# → Content hash different, triggers update -# → Idempotent on subsequent imports -``` - -**Design rationale:** External system linkage is tracked as substantive content, not just metadata. This ensures proper audit trails and collision resolution. - -**Why 6 chars (with progressive extension)?** -- 6 chars (24 bits) = ~16 million possible IDs -- Progressive collision handling: extend to 7-8 chars only when needed -- Optimizes for common case: 97% get short, readable 6-char IDs -- Rare collisions get slightly longer but still reasonable IDs -- Inspired by Git's abbreviated commit SHAs - -## Collision Analysis - -### Birthday Paradox Probability - -For 6-character hex IDs (24-bit space = 2^24 = 16,777,216): - -| # Issues | 6-char Collision | 7-char Collision | 8-char Collision | -|----------|------------------|------------------|------------------| -| 100 | ~0.03% | ~0.002% | ~0.0001% | -| 1,000 | 2.94% | 0.19% | 0.01% | -| 10,000 | 94.9% | 17.0% | 1.16% | - -**Formula:** P(collision) ≈ 1 - e^(-n²/2N) - -**Progressive Strategy:** Start with 6 chars. On INSERT collision, try 7 chars from same hash. On second collision, try 8 chars. This means ~97% of IDs in a 1,000 issue database stay at 6 chars. - -### Real-World Risk Assessment - -**Low Risk (<10,000 issues):** -- Single team projects: ~1% chance over lifetime -- Mitigation: Workspace ID prevents cross-team collisions -- Fallback: If collision detected, append counter (bd-af78e9a2-2) - -**Medium Risk (10,000-50,000 issues):** -- Large enterprise projects -- Recommendation: Monitor collision rate -- Consider 16-char IDs in v3 if collisions occur - -**High Risk (>50,000 issues):** -- Multi-team platforms with shared database -- Recommendation: Use 16-char IDs (64 bits) for 2^64 space -- Implementation: Change hash[:8] to hash[:16] - -### Collision Detection - -The database schema enforces uniqueness via PRIMARY KEY constraint. If a hash collision occurs: - -1. INSERT fails with UNIQUE constraint violation -2. Client detects error and retries with modified input -3. Options: - - Append counter to description: "Fix auth (2)" - - Wait 1ns and regenerate (different timestamp) - - Use 16-char hash mode - -## Performance - -**Benchmark Results (Apple M1 Max):** -``` -BenchmarkGenerateHashID-10 3758022 317.4 ns/op -BenchmarkGenerateChildID-10 19689157 60.96 ns/op -``` - -- Hash ID generation: **~317ns** (well under 1μs requirement) ✅ -- Child ID generation: **~61ns** (trivial string concat) -- No performance concerns for interactive CLI use - -## Comparison to Sequential IDs - -| Aspect | Sequential (v1) | Hash-Based (v2) | -|--------|----------------|-----------------| -| Collision risk | HIGH (offline work) | NONE (top-level) | -| ID length | 5-8 chars | 9-11 chars (avg ~9) | -| Predictability | Predictable (bd-1, bd-2) | Unpredictable | -| Offline-first | ❌ Requires coordination | ✅ Fully offline | -| Merge conflicts | ❌ Same ID, different content | ✅ Different IDs | -| Human-friendly | ✅ Easy to remember | ⚠️ Harder to remember | -| Code complexity | ~2,100 LOC collision resolution | <100 LOC | - -## CLI Usage - -### Prefix Handling - -**Storage:** Always includes prefix (bd-a3f2dd) -**CLI Input:** Prefix optional (both bd-a3f2dd AND a3f2dd accepted) -**CLI Output:** Always shows prefix (copy-paste clarity) -**External refs:** Always use prefix (git commits, docs, Slack) - -```bash -# All of these work (prefix optional in input): -bd show a3f2dd -bd show bd-a3f2dd -bd show a3f2dd.1 -bd show bd-a3f2dd.1.2 - -# Output always shows prefix: -bd-a3f2dd [epic] Auth System - Status: open - ... -``` - -### Git-Style Prefix Matching - -Like Git commit SHAs, bd accepts abbreviated IDs: - -```bash -bd show af78 # Matches bd-af78e9a2 if unique -bd show af7 # ERROR: ambiguous (matches bd-af78e9a2 and bd-af78e9a2.1) -``` - -## Migration Strategy - -### Database Migration - -```bash -# Preview migration -bd migrate --hash-ids --dry-run - -# Execute migration -bd migrate --hash-ids - -# What it does: -# 1. Create child_counters table -# 2. For each existing issue: -# - Generate hash ID from content -# - Update all references in dependencies -# - Update all text mentions in descriptions/notes -# 3. Drop issue_counters table -# 4. Update config to hash_id_mode=true -``` - -### Backward Compatibility - -- Sequential IDs continue working in v1.x -- Hash IDs are opt-in until v2.0 -- Migration is one-way (no rollback) -- Export to JSONL preserves both old and new IDs during transition - -## Workspace ID Generation - -**Recommended approach:** -1. **First run:** Generate UUID and store in `config` table -2. **Subsequent runs:** Reuse stored workspace ID -3. **Collision:** If two databases have same workspace ID, collisions possible but rare - -**Alternative approaches:** -- Hostname: Simple but not unique (multiple DBs on same machine) -- Git remote URL: Requires git repository -- Manual config: User sets team identifier (e.g., "team-auth") - -**Implementation:** -```go -func (s *SQLiteStorage) getWorkspaceID(ctx context.Context) (string, error) { - var id string - err := s.db.QueryRowContext(ctx, - `SELECT value FROM config WHERE key = ?`, - "workspace_id").Scan(&id) - if err == sql.ErrNoRows { - // Generate new UUID - id = uuid.New().String() - _, err = s.db.ExecContext(ctx, - `INSERT INTO config (key, value) VALUES (?, ?)`, - "workspace_id", id) - } - return id, err -} -``` - -## Future Considerations - -### 16-Character Hash IDs (v3.0) - -If collision rates become problematic: - -```go -// Change from: -return fmt.Sprintf("%s-%s", prefix, hash[:8]) - -// To: -return fmt.Sprintf("%s-%s", prefix, hash[:16]) - -// Example: bd-af78e9a2c4d5e6f7 -``` - -**Tradeoffs:** -- ✅ Collision probability: ~0% even at 100M issues -- ❌ Longer IDs: 19 chars vs 11 chars -- ❌ Less human-friendly - -### Custom Hash Algorithms - -For specialized use cases: -- BLAKE3: Faster than SHA256 (not needed for interactive CLI) -- xxHash: Non-cryptographic but faster (collision resistance?) -- MurmurHash: Used by Jira (consider for compatibility) - -## References - -- **Epic:** bd-165 (Hash-based IDs with hierarchical children) -- **Implementation:** internal/types/id_generator.go -- **Tests:** internal/types/id_generator_test.go -- **Related:** bd-168 (CreateIssue integration), bd-169 (JSONL format) - -## Summary - -Hash-based IDs eliminate distributed ID collision problems at the cost of slightly longer, less memorable IDs. Hierarchical children provide human-friendly sequential IDs within naturally-coordinated contexts (epic ownership). - -This design enables true offline-first workflows and eliminates ~2,100 lines of complex collision resolution code. diff --git a/docs/TEST_OPTIMIZATION.md b/docs/TEST_OPTIMIZATION.md deleted file mode 100644 index ab93c8c5..00000000 --- a/docs/TEST_OPTIMIZATION.md +++ /dev/null @@ -1,85 +0,0 @@ -# Test Suite Optimization - November 2025 - -## Problem -Test suite was timing out after 5+ minutes, making development workflow painful. - -## Root Cause -Slow integration tests were running during normal `go test ./...`: -- **Daemon tests**: 7 files with git operations and time.Sleep calls -- **Multi-clone convergence tests**: 2 tests creating multiple git repos -- **Concurrent import test**: 30-second timeout for deadlock detection - -## Solution -Tagged slow integration tests with `//go:build integration` so they're excluded from normal runs: - -### Files moved to integration-only: -1. `cmd/bd/daemon_test.go` (862 lines, 15 tests) -2. `cmd/bd/daemon_sync_branch_test.go` (1235 lines, 11 tests) -3. `cmd/bd/daemon_autoimport_test.go` (408 lines, 2 tests) -4. `cmd/bd/daemon_watcher_test.go` (7 tests) -5. `cmd/bd/daemon_watcher_platform_test.go` -6. `cmd/bd/daemon_lock_test.go` -7. `cmd/bd/git_sync_test.go` -8. `beads_hash_multiclone_test.go` (already tagged) -9. `internal/importer/importer_integration_test.go` (concurrent test) - -### Fix for build error: -- Added `const windowsOS = "windows"` to `test_helpers_test.go` (was in daemon_test.go) - -## Results - -### Before: -``` -$ go test ./... -> 300 seconds (timeout) -``` - -### After: -``` -$ go test ./... -real 0m1.668s ✅ -user 0m2.075s -sys 0m1.586s -``` - -**99.4% faster!** From 5+ minutes to under 2 seconds. - -## Running Integration Tests - -### Normal development (fast): -```bash -go test ./... -``` - -### Full test suite including integration (slow): -```bash -go test -tags=integration ./... -``` - -### CI/CD: -```yaml -# Fast feedback on PRs -- run: go test ./... - -# Full suite on merge to main -- run: go test -tags=integration ./... -``` - -## Benefits -1. ✅ Fast feedback loop for developers (<2s vs 5+ min) -2. ✅ Agents won't timeout on test runs -3. ✅ Integration tests still available when needed -4. ✅ CI can run both fast and comprehensive tests -5. ✅ No tests deleted - just separated by speed - -## What Tests Remain in Fast Suite? -- All unit tests (~300+ tests) -- Quick integration tests (<100ms each) -- In-memory database tests -- Logic/validation tests -- Fast import/export tests - -## Notes -- Integration tests still have `testing.Short()` checks for double safety -- The `integration` build tag is opt-in (must explicitly request with `-tags=integration`) -- All slow git/daemon operations are now integration-only diff --git a/docs/contributor-workflow-analysis.md b/docs/contributor-workflow-analysis.md deleted file mode 100644 index fa3f29c0..00000000 --- a/docs/contributor-workflow-analysis.md +++ /dev/null @@ -1,1380 +0,0 @@ -# Beads Contributor Workflow Analysis - -**Date**: 2025-11-03 -**Context**: Design discussion on how to handle beads issues in PR/OSS contribution workflows - -## The Problem (from #207) - -When contributing to OSS projects with beads installed: -- Git hooks automatically commit contributor's personal planning to PRs -- Contributor's experimental musings pollute the upstream project's issue tracker -- No clear ownership/permission model for external contributors -- Difficult to keep beads changes out of commits - -**Core tension**: Beads is great for team planning (shared namespace), but breaks down for OSS contributions (hierarchical gatekeeping). - -## Key Insights from Discussion - -### Beads as "Moving Frontier" - -Beads is not a traditional issue tracker. It captures the **active working set** - the sliding window of issues currently under attention: - -- Work moves fast with AI agents (10x-50x acceleration) -- Completed work fades quickly (95% never revisited, should be pruned aggressively) -- Future work is mostly blocked (small frontier of ready tasks) -- The frontier is bounded by team size (dozens to hundreds of issues, not thousands) - -**Design principle**: Beads should focus on the "what's next" cloud, not long-term planning or historical archive. - -### The Git Ledger is Fundamental - -Beads achieves reliability despite being unreliable (merge conflicts, sync issues, data staleness) through: - -**A. Git is the ledger and immutable backstop for forensics** -**B. AI is the ultimate arbiter and problem-solver when things go wrong** - -Any solution that removes the git ledger (e.g., gitignored contributor files) breaks this model entirely. - -### Requirements for Contributors - -Contributors need: -- Git-backed persistence (multi-clone sync, forensics, AI repair) -- Isolated planning space (don't pollute upstream) -- Ability to propose selected issues upstream -- Support for multiple workers across multiple clones of the same repo - -## Proposed Solutions - -### Idea 1: Fork-Aware Hooks + Two-File System - -**Structure**: -``` -# Upstream repo -.beads/ - beads.jsonl # Canonical frontier (committed) - .gitignore # Ignores local.jsonl - -# Contributor's fork -.beads/ - beads.jsonl # Synced from upstream (read-only) - local.jsonl # Contributor planning (committed to fork) - beads.db # Hydrated from both -``` - -**Detection**: Check for `upstream` remote to distinguish fork from canonical repo - -**Workflow**: -```bash -# In fork -$ bd add "Experiment" # → local.jsonl (committed to fork) -$ bd sync # → Pulls upstream's beads.jsonl -$ bd show # → Shows both -$ bd propose bd-a3f8e9 # → Moves issue to beads.jsonl for PR -``` - -**Pros**: -- Git ledger preserved (local.jsonl committed to fork) -- Multi-clone sync works -- Upstream .gitignore prevents pollution - -**Cons**: -- Fork detection doesn't help teams using branches (most common workflow) -- Two files to manage -- Requires discipline to use `bd propose` - -### Idea 2: Ownership Metadata + Smart PR Filtering - -**Structure**: -```jsonl -{"id":"bd-123","owner":"upstream","title":"Canonical issue",...} -{"id":"bd-456","owner":"stevey","title":"My planning",...} -``` - -**Workflow**: -```bash -$ bd add "Experiment" # → Creates with owner="stevey" -$ bd propose bd-456 # → Changes owner to "upstream" -$ bd clean-pr # → Filters commit to only upstream-owned issues -$ git push # → PR contains only proposed issues -``` - -**Pros**: -- Single file (simpler) -- Works with any git workflow (branch, fork, etc) -- Git ledger fully preserved - -**Cons**: -- Requires discipline to run `bd clean-pr` -- Clean commit is awkward (temporarily removing data) -- Merge conflicts if upstream and contributor both modify beads.jsonl - -### Idea 3: Branch-Scoped Databases - -Track which issues belong to which branch, filter at PR time. - -**Implementation**: Similar to #2 but uses labels/metadata to track branch instead of owner. - -**Challenge**: Complex with multiple feature branches, requires tracking branch scope. - -### Idea 4: Separate Planning Repo (Most Isolated) - -**Structure**: -```bash -# Main project repos (many) -~/projects/beads/.beads/beads.jsonl -~/projects/foo/.beads/beads.jsonl - -# Single planning repo (one) -~/.beads-planning/.beads/beads.jsonl - -# Configuration links them -~/projects/beads/.beads/config.toml: - planning_repo = "~/.beads-planning" -``` - -**Workflow**: -```bash -$ cd ~/projects/beads -$ bd add "My idea" # → Commits to ~/.beads-planning/ -$ bd show # → Shows beads canonical + my planning -$ bd propose bd-a3f8e9 # → Exports to beads repo for PR -``` - -**Pros**: -- Complete isolation (separate git histories, zero PR pollution risk) -- Git ledger fully preserved (both repos tracked) -- Multi-clone works perfectly (clone both repos) -- No special filtering/detection needed -- **Scales better**: One planning repo for all projects - -**Cons**: -- Two repos to manage -- Less obvious for new users (where's my planning?) - -## Analysis: Fork vs Clone vs Branch - -**Clone**: Local copy of a repo (`git clone `) -- `origin` remote points to source -- Push directly to origin (if you have write access) - -**Fork**: Server-side copy on GitHub -- For contributors without write access -- `origin` → your fork, `upstream` → original repo -- Push to fork, then PR from fork → upstream - -**Branch**: Feature branches in same repo -- Most common for teams with write access -- Push to same repo, PR from branch → main - -**Key insight**: Branches are universal, forks are only for external contributors. Most teams work on branches in a shared repo. - -## Current Thinking: Idea 4 is Cleanest - -After analysis, **separate planning repo (#4)** is likely the best solution because: - -1. **Only solution that truly prevents PR pollution** (separate git histories) -2. **Git ledger fully preserved** (both repos tracked) -3. **Multi-clone works perfectly** (just clone both) -4. **No complex filtering/detection needed** (simple config) -5. **Better scaling**: One planning repo across all projects you contribute to - -The "managing two repos" concern is actually an advantage: your planning is centralized and project-agnostic. - -## Open Questions - -### About the Workflow - -1. **Where does PR pollution actually happen?** - - Scenario A: Feature branch → upstream/main includes all beads changes from that branch? - - Scenario B: Something else? - -2. **Multi-clone usage pattern**: - - Multiple clones on different machines? - - All push/pull to same remote? - - Workers coordinate via git sync? - - PRs created from feature branches? - -### About Implementation - -1. **Proposed issue IDs**: When moving issue from planning → canonical, keep same ID? (Hash-based IDs are globally unique) - -2. **Upstream acceptance sync**: If upstream accepts/modifies a proposal, how to sync back to contributor? - - `bd sync` detects accepted proposals - - Moves from planning repo to project's canonical beads.jsonl - -3. **Multiple projects**: One planning repo for all projects you contribute to, or one per project? - -4. **Backwards compatibility**: Single-user projects unchanged (single beads.jsonl) - -5. **Discovery**: How do users discover this feature? Auto-detect and prompt? - -## Next Steps - -Need to clarify: -1. User's actual multi-clone workflow (understand the real use case) -2. Where exactly PR pollution occurs (branch vs fork workflow) -3. Which solution best fits the "git ledger + multi-clone" requirements -4. Whether centralized planning repo (#4) or per-project isolation (#1/#2) is preferred - -## Design Principles to Preserve - -From the conversation, these are non-negotiable: - -- **Git as ledger**: Everything must be git-tracked for forensics and AI repair -- **Moving frontier**: Focus on active work, aggressively prune completed work -- **Multi-clone sync**: Workers across clones must coordinate via git -- **Small databases**: Keep beads.jsonl small enough for agents to read (<25k per repo, see below) -- **Simple defaults**: Don't break single-user workflows -- **Explicit over implicit**: Clear boundaries between personal and canonical - -### JSONL Size Bounds with Multi-Repo - -**Critical clarification**: The <25k limit applies **per-repo**, not to total hydrated size. - -#### The Rule - -**Per-repo limit**: Each individual JSONL file should stay <25k (roughly 100-200 issues depending on metadata). - -**Why per-repo, not total**: -1. **Git operations**: Each repo is independently versioned. Git performance depends on per-file size, not aggregate. -2. **AI readability**: Agents read JSONLs for forensics/repair. Reading one 20k file is easy; reading the union of 10 files is still manageable. -3. **Bounded growth**: Total size naturally bounded by number of repos (typically N=1-3, rarely >10). -4. **Pruning granularity**: Completed work is pruned per-repo, keeping each repo's frontier small. - -#### Example Scenarios - -| Primary | Planning | Team Shared | Total Hydrated | Valid? | -|---------|----------|-------------|----------------|--------| -| 20k | - | - | 20k | ✅ Single-repo, well under limit | -| 20k | 15k | - | 35k | ✅ Each repo <25k (per-repo rule) | -| 20k | 15k | 18k | 53k | ✅ Each repo <25k (per-repo rule) | -| 30k | 15k | - | 45k | ❌ Primary exceeds 25k | -| 20k | 28k | - | 48k | ❌ Planning exceeds 25k | - -#### Rationale: Why 25k? - -**Agent context limits**: AI agents have finite context windows. A 25k JSONL file is: -- ~100-200 issues with metadata -- ~500-1000 lines of JSON -- Comfortably fits in GPT-4 context (128k tokens) -- Small enough to read/parse in <500ms - -**Moving frontier principle**: Beads tracks **active work**, not historical archive. With aggressive pruning: -- Completed issues get compacted/archived -- Blocked work stays dormant -- Only ready + in-progress issues are "hot" -- Typical frontier: 50-100 issues per repo - -#### Monitoring Size with Multi-Repo - -**Per-repo monitoring**: -```bash -# Check each repo's JSONL size -$ wc -c .beads/beads.jsonl -20480 .beads/beads.jsonl - -$ wc -c ~/.beads-planning/beads.jsonl -15360 ~/.beads-planning/beads.jsonl - -# Total hydrated size (informational, not a hard limit) -$ expr 20480 + 15360 -35840 -``` - -**Automated check**: -```go -// Check all configured repos -for _, repo := range cfg.Repos.All() { - jsonlPath := filepath.Join(repo, "beads.jsonl") - size, _ := getFileSize(jsonlPath) - if size > 25*1024 { // 25k - log.Warnf("Repo %s exceeds 25k limit: %d bytes", repo, size) - } -} -``` - -#### Pruning Strategy with Multi-Repo - -Each repo should be pruned independently: - -```bash -# Prune completed work from primary repo -$ bd compact --repo . --older-than 30d - -# Prune experimental planning repo -$ bd compact --repo ~/.beads-planning --older-than 7d - -# Shared team planning (longer retention) -$ bd compact --repo ~/team-shared/.beads --older-than 90d -``` - -Different repos can have different retention policies based on their role. - -#### Total Size Soft Limit (Guideline Only) - -While per-repo limit is the hard rule, consider total hydrated size for performance: - -**Guideline**: Keep total hydrated size <100k for optimal performance. - -**Why 100k total**: -- SQLite hydration: Parsing 100k JSON still fast (<1s) -- Agent queries: Dependency graphs with 300-500 total issues remain tractable -- Memory footprint: In-memory SQLite comfortably handles 500 issues - -**If total exceeds 100k**: -- Not a hard error, but performance may degrade -- Consider pruning completed work more aggressively -- Evaluate if all repos are still needed -- Check if any repos should be archived/removed - -#### Summary - -| Limit Type | Value | Enforcement | -|------------|-------|-------------| -| **Per-repo (hard limit)** | <25k | ⚠️ Warn if exceeded, agents may struggle | -| **Total hydrated (guideline)** | <100k | ℹ️ Informational, affects performance | -| **Typical usage** | 20k-50k total | ✅ Expected range for active development | - -**Bottom line**: Monitor per-repo size (<25k each). Total size naturally bounded by N repos × 25k. - ---- - -# Decision: Separate Repos (Solution #4) - -**Date**: 2025-11-03 (continued discussion) - -## Why Separate Repos - -After consideration, **Solution #4 (Separate Planning Repos)** is the chosen approach: - -### Key Rationale - -1. **Beads as a Separate Channel**: Beads is fundamentally a separate communication channel that happens to use git/VCS for persistence, not a git-centric tool. It should work with any VCS (jujutsu, sapling, mercurial, etc.). - -2. **VCS-Agnostic Design**: Solution #1 (fork detection) is too git-centric and wouldn't work with other version control systems. Separate repos work regardless of VCS. - -3. **Maximum Flexibility**: Supports multiple workflows and personas: - - OSS contributor with personal planning - - Multi-phase development (different beads DBs for different stages) - - Multiple personas (architect, implementer, reviewer) - - Team vs personal planning separation - -4. **Zero PR Pollution Risk**: Completely separate git histories guarantee no accidental pollution of upstream projects. - -5. **Proven Pain Point**: Experience shows that accidental bulk commits (100k issues) can be catastrophic and traumatic to recover from. Complete isolation is worth the complexity. - -## Core Architecture Principles - -### 1. Multi-Repo Support (N ≥ 1) - -**Configuration should support N repos, including N=1 for backward compatibility:** - -When N=1 (default), this is the current single-repo workflow - no changes needed. -When N≥2, multiple repos are hydrated together. - -```toml -# .beads/config.toml - -# Default mode: single repo (backwards compatible) -mode = "single" - -# Multi-repo mode -[repos] - # Primary repo: where canonical issues live - primary = "." - - # Additional repos to hydrate into the database - additional = [ - "~/.beads-planning", # Personal planning across all projects - "~/.beads-work/phase1", # Architecting phase - "~/.beads-work/phase2", # Implementation phase - "~/team-shared/.beads", # Shared team planning - ] - -# Routing: where do new issues go? -[routing] - mode = "auto" # auto | explicit - default = "~/.beads-planning" # Default for `bd add` - - # Auto-detection: based on user permissions - [routing.auto] - maintainer = "." # If maintainer, use primary - contributor = "~/.beads-planning" # Otherwise use planning repo -``` - -### 2. Hydration Model - -On `bd show`, `bd list`, etc., the database hydrates from multiple sources: - -``` -beads.db ← [ - ./.beads/beads.jsonl (primary, read-write if maintainer) - ~/.beads-planning/beads.jsonl (personal, read-write) - ~/team-shared/.beads/beads.jsonl (shared, read-write if team member) -] -``` - -**Metadata tracking**: -```jsonl -{ - "id": "bd-a3f8e9", - "title": "Add dark mode", - "source_repo": "~/.beads-planning", # Which repo owns this issue - "visibility": "local", # local | proposed | canonical - ... -} -``` - -### 3. Visibility States - -Issues can be in different states of visibility: - -- **local**: Personal planning, only in planning repo -- **proposed**: Exported for upstream consideration (staged for PR) -- **canonical**: In the primary repo (upstream accepted it) - -### 4. VCS-Agnostic Operations - -Beads should not assume git. Core operations: - -- **Sync**: `bd sync` should work with git, jj, hg, sl, etc. -- **Ledger**: Each repo uses whatever VCS it's under (or none) -- **Transport**: Issues move between repos via export/import, not git-specific operations - -## Workflow Examples - -### Use Case 1: OSS Contributor - -```bash -# One-time setup -$ mkdir ~/.beads-planning -$ cd ~/.beads-planning -$ git init && bd init - -# Contributing to upstream project -$ cd ~/projects/some-oss-project -$ bd config --add-repo ~/.beads-planning --routing contributor - -# Work -$ bd add "Explore dark mode implementation" -# → Goes to ~/.beads-planning/beads.jsonl -# → Commits to planning repo (git tracked, forensic trail) - -$ bd show -# → Shows upstream's canonical issues (read-only) -# → Shows my planning issues (read-write) - -$ bd work bd-a3f8e9 -$ bd status bd-a3f8e9 in-progress - -# Ready to propose -$ bd propose bd-a3f8e9 --target upstream -# → Exports issue from planning repo -# → Creates issue in ./beads/beads.jsonl (staged for PR) -# → Marks as visibility="proposed" in planning repo - -$ git add .beads/beads.jsonl -$ git commit -m "Propose: Add dark mode" -$ git push origin feature-branch -# → PR contains only the proposed issue, not all my planning -``` - -### Use Case 2: Multi-Phase Development - -```bash -# Setup phases -$ mkdir -p ~/.beads-work/{architecture,implementation,testing} -$ for dir in ~/.beads-work/*; do (cd $dir && git init && bd init); done - -# Configure project -$ cd ~/my-big-project -$ bd config --add-repo ~/.beads-work/architecture -$ bd config --add-repo ~/.beads-work/implementation -$ bd config --add-repo ~/.beads-work/testing - -# Architecture phase -$ bd add "Design authentication system" --repo ~/.beads-work/architecture -$ bd show --repo ~/.beads-work/architecture -# → Only architecture issues - -# Implementation phase (later) -$ bd add "Implement JWT validation" --repo ~/.beads-work/implementation - -# View all phases -$ bd show -# → Shows all issues from all configured repos -``` - -### Use Case 3: Multiple Contributors on Same Project - -```bash -# Team member Alice (maintainer) -$ cd ~/project -$ bd add "Fix bug in parser" -# → Goes to ./beads/beads.jsonl (she's maintainer) -# → Commits to project repo - -# Team member Bob (contributor) -$ cd ~/project -$ bd add "Explore performance optimization" -# → Goes to ~/.beads-planning/beads.jsonl (he's contributor) -# → Does NOT pollute project repo - -$ bd show -# → Sees Alice's canonical issue -# → Sees his own planning - -$ bd propose bd-xyz -# → Proposes to Alice's canonical repo -``` - -## Implementation Outline - -### Phase 1: Core Multi-Repo Support - -**Commands**: -```bash -bd config --add-repo # Add a repo to hydration -bd config --remove-repo # Remove a repo -bd config --list-repos # Show all configured repos -bd config --routing # Set routing: single|auto|explicit -``` - -**Config schema**: -```toml -[repos] -primary = "." -additional = ["path1", "path2", ...] - -[routing] -default = "path" # Where `bd add` goes by default -mode = "auto" # auto | explicit -``` - -**Database changes**: -- Add `source_repo` field to issues -- Hydration layer reads from multiple JSONLs -- Writes go to correct JSONL based on source_repo - -### Phase 2: Proposal Flow - -**Commands**: -```bash -bd propose [--target ] # Move issue to target repo -bd withdraw # Un-propose (move back) -bd accept # Maintainer accepts proposal -``` - -**States**: -- `visibility: local` → Personal planning -- `visibility: proposed` → Staged for PR -- `visibility: canonical` → Accepted by upstream - -### Phase 3: Routing Rules - -**Auto-detection**: -- Detect if user is maintainer (git config, permissions) -- Auto-route to primary vs planning repo - -**Config-based routing** (no new schema fields): -```toml -[routing] -mode = "auto" # auto | explicit -default = "~/.beads-planning" # Fallback for contributors - -# Auto-detection rules -[routing.auto] -maintainer = "." # If user is maintainer, use primary repo -contributor = "~/.beads-planning" # Otherwise use planning repo -``` - -**Explicit routing** via CLI flag: -```bash -# Override auto-detection for specific issues -bd add "Design system" --repo ~/.beads-work/architecture -``` - -**Discovered issue inheritance**: -- Issues with parent_id automatically inherit parent's source_repo -- Keeps related work co-located - -### Phase 4: VCS-Agnostic Sync - -**Sync operations**: -- Detect VCS type per repo (git, jj, hg, sl) -- Use appropriate sync commands -- Fall back to manual sync if no VCS - -**Example**: -```bash -$ bd sync -# Auto-detects: -# - . is git → runs git pull -# - ~/.beads-planning is jj → runs jj git fetch && jj rebase -# - ~/other is hg → runs hg pull && hg update -``` - -## Migration Path - -### Existing Users (Single Repo) - -No changes required. Current workflow continues to work: - -```bash -$ bd add "Task" -# → .beads/beads.jsonl (as before) -``` - -### Library Consumers (Go/TypeScript) - -**Critical for projects like VC that use beads as a library.** - -#### Backward Compatibility (No Changes Required) - -Your existing code continues to work unchanged. The storage layer automatically reads `.beads/config.toml` if present: - -```go -// Before multi-repo (v0.17.3) -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") - -// After multi-repo (v0.18.0+) - EXACT SAME CODE -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") -// If .beads/config.toml exists, additional repos are auto-hydrated -// If .beads/config.toml doesn't exist, single-repo mode (backward compatible) -``` - -**What happens automatically**: -1. Storage layer checks for `.beads/config.toml` -2. If found: Reads `repos.additional`, hydrates from all configured repos -3. If not found: Single-repo mode (current behavior) -4. Your code doesn't need to know which mode is active - -#### Explicit Multi-Repo Configuration (Optional) - -If you need to override config.toml or configure repos programmatically: - -```go -// Explicit multi-repo configuration -cfg := beadsLib.Config{ - Primary: ".beads/vc.db", - Additional: []string{ - filepath.ExpandUser("~/.beads-planning"), - filepath.ExpandUser("~/team-shared/.beads"), - }, -} -store, err := beadsLib.NewStorageWithConfig(cfg) -``` - -**When to use explicit configuration**: -- Testing: Override config for test isolation -- Dynamic repos: Add repos based on runtime conditions -- No config file: Programmatic setup without `.beads/config.toml` - -#### When to Use Multi-Repo vs Single-Repo - -**Single-repo (default, recommended for most library consumers)**: -```go -// VC executor managing its own database -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") -// Stays single-repo by default, no config.toml needed -``` - -**Multi-repo (opt-in for specific use cases)**: -- **Team planning**: VC executor needs to see team-wide issues from shared repo -- **Multi-phase dev**: Different repos for architecture, implementation, testing phases -- **Personal planning**: User wants to track personal experiments separate from VC's canonical DB - -**Example: VC with team planning**: -```toml -# .beads/config.toml -[repos] -primary = "." -additional = ["~/team-shared/.beads"] - -[routing] -default = "." # VC-generated issues go to primary -``` - -```go -// VC executor code (unchanged) -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") - -// GetReadyWork() now returns issues from: -// - .beads/vc.db (VC-generated issues) -// - ~/team-shared/.beads (team planning) -ready, err := store.GetReadyWork(ctx) -``` - -#### Migration Checklist for Library Consumers - -1. **Test with config.toml**: Create `.beads/config.toml`, verify auto-hydration works -2. **Verify performance**: Ensure multi-repo hydration meets your latency requirements (see Performance section) -3. **Update exclusive locks**: If using locks, decide if you need per-repo or all-repo locking (see Exclusive Lock Protocol section) -4. **Review routing**: Ensure auto-generated issues (e.g., VC's `discovered:blocker`) go to correct repo -5. **Test backward compat**: Verify code works with and without config.toml - -#### API Compatibility Matrix - -| API Call | v0.17.3 (single-repo) | v0.18.0+ (multi-repo) | Breaking? | -|----------|----------------------|----------------------|-----------| -| `NewSQLiteStorage(path)` | ✅ Single repo | ✅ Auto-detects config | ❌ No | -| `GetReadyWork()` | ✅ Returns from single DB | ✅ Returns from all repos | ❌ No | -| `CreateIssue()` | ✅ Writes to single DB | ✅ Writes to primary (or routing config) | ❌ No | -| `UpdateIssue()` | ✅ Updates in single DB | ✅ Updates in source repo | ❌ No | -| Exclusive locks | ✅ Locks single DB | ✅ Locks per-repo | ❌ No | - -**Summary**: Zero breaking changes. Multi-repo is transparent to library consumers. - -### Opting Into Multi-Repo (CLI Users) - -```bash -# Create planning repo -$ mkdir ~/.beads-planning && cd ~/.beads-planning -$ git init && bd init - -# Link to project -$ cd ~/my-project -$ bd config --add-repo ~/.beads-planning -$ bd config --routing auto # Auto-detect maintainer vs contributor - -# Optionally migrate existing issues -$ bd migrate --move-to ~/.beads-planning --filter "author=me" -``` - -### Teams Adopting Beads - -```bash -# Maintainer sets up project -$ cd ~/team-project -$ bd init -$ git add .beads/ && git commit -m "Initialize beads" - -# Contributors clone and configure -$ git clone team-project -$ cd team-project -$ mkdir ~/.beads-planning && cd ~/.beads-planning -$ git init && bd init -$ cd ~/team-project -$ bd config --add-repo ~/.beads-planning --routing contributor -``` - -### Self-Hosting Projects (VC, Internal Tools, Pet Projects) - -**Important**: The multi-repo design is primarily for **OSS contributors** making PRs to upstream projects. Self-hosting projects have different needs. - -#### What is Self-Hosting? - -Projects that use beads to build themselves: -- **VC (VibeCoder)**: Uses beads to track development of VC itself -- **Internal tools**: Company tools that track their own roadmap -- **Pet projects**: Personal projects with beads-based planning - -**Key difference from OSS contribution**: -- No upstream/downstream distinction (you ARE the project) -- Direct commit access (no PR workflow) -- Often have automated executors/agents -- Bootstrap/early phase stability matters - -#### Default Recommendation: Stay Single-Repo - -**For most self-hosting projects, single-repo is the right choice:** - -```bash -# Simple, stable, proven -$ cd ~/my-project -$ bd init -$ bd create "Task" -p 1 -# → .beads/beads.jsonl (committed to project repo) -``` - -**Why single-repo for self-hosting**: -- ✅ **Simpler**: No config, no routing decisions, no multi-repo complexity -- ✅ **Proven**: Current architecture, battle-tested -- ✅ **Sufficient**: All issues live with the project they describe -- ✅ **Stable**: No hydration overhead, no cross-repo coordination - -#### When to Adopt Multi-Repo - -Multi-repo makes sense for self-hosting projects only in specific scenarios: - -**Scenario 1: Team Planning Separation** - -Your project has multiple developers with different permission levels: - -```toml -# .beads/config.toml -[repos] -primary = "." # Canonical project issues (maintainers only) -additional = ["~/team-shared/.beads"] # Team planning (all contributors) -``` - -**Scenario 2: Multi-Phase Development** - -Your project uses distinct phases (architecture → implementation → testing): - -```toml -# .beads/config.toml -[repos] -primary = "." # Current active work -additional = [ - "~/.beads-work/architecture", # Design decisions - "~/.beads-work/implementation", # Implementation backlog -] -``` - -**Scenario 3: Experimental Work Isolation** - -You want to keep experimental ideas separate from canonical roadmap: - -```toml -# .beads/config.toml -[repos] -primary = "." # Committed roadmap -additional = ["~/.beads-experiments"] # Experimental ideas -``` - -#### Automated Executors with Multi-Repo - -**Critical for projects like VC with automated agents.** - -**Default behavior (recommended)**: -```go -// Executor sees ONLY primary repo (canonical work) -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") -// No config.toml = single-repo mode -ready, err := store.GetReadyWork(ctx) // Only canonical issues -``` - -**With multi-repo (opt-in)**: -```toml -# .beads/config.toml -[repos] -primary = "." -additional = ["~/team-shared/.beads"] - -[routing] -default = "." # Executor-created issues stay in primary -``` - -```go -// Executor code (unchanged) -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") -// Auto-reads config.toml, hydrates from both repos -ready, err := store.GetReadyWork(ctx) -// Returns issues from primary + team-shared - -// When executor creates discovered issues: -discovered := &Issue{Title: "Found blocker", ...} -store.CreateIssue(discovered) -// → Goes to primary repo (routing.default = ".") -``` - -**Recommendation for executors**: Stay single-repo unless you have a clear team coordination need. - -#### Bootstrap Phase Considerations - -**If your project is in early/bootstrap phase (like VC), extra caution:** - -1. **Prioritize stability**: Multi-repo adds complexity. Delay until proven need. -2. **Test thoroughly**: If adopting multi-repo, test with small repos first. -3. **Monitor performance**: Ensure executor polling loops stay sub-second (see Performance section). -4. **Plan rollback**: Keep single-repo workflow working so you can revert if needed. - -**Bootstrap-phase checklist**: -- [ ] Do you have multiple developers with different permissions? → Maybe multi-repo -- [ ] Do you have team planning separate from executor roadmap? → Maybe multi-repo -- [ ] Are you solo or small team with unified planning? → Stay single-repo -- [ ] Is executor stability critical right now? → Stay single-repo -- [ ] Can you afford multi-repo testing/debugging time? → If no, stay single-repo - -#### Migration Path for Self-Hosting Projects - -**From single-repo to multi-repo (when ready)**: - -```bash -# Step 1: Create planning repo -$ mkdir ~/.beads-planning && cd ~/.beads-planning -$ git init && bd init - -# Step 2: Configure multi-repo (test mode) -$ cd ~/my-project -$ bd config --add-repo ~/.beads-planning --routing auto - -# Step 3: Test with small workload -$ bd create "Test issue" --repo ~/.beads-planning -$ bd show # Verify hydration works -$ bd ready # Verify queries work - -# Step 4: Verify executor compatibility -# - Run executor with multi-repo config -# - Check GetReadyWork() latency (<100ms) -# - Verify discovered issues route correctly - -# Step 5: Migrate planning issues (optional) -$ bd migrate --move-to ~/.beads-planning --filter "label=experimental" -``` - -**Rollback (if needed)**: -```bash -# Remove config.toml to revert to single-repo -$ rm .beads/config.toml -$ bd show # Back to single-repo mode -``` - -#### Summary: Self-Hosting Decision Tree - -``` -Is your project self-hosting? (building itself with beads) -├─ YES -│ ├─ Solo developer or unified team? -│ │ └─ Stay single-repo (simple, stable) -│ ├─ Multiple developers, different permissions? -│ │ └─ Consider multi-repo (team planning separation) -│ ├─ Multi-phase development (arch → impl → test)? -│ │ └─ Consider multi-repo (phase isolation) -│ ├─ Bootstrap/early phase? -│ │ └─ Stay single-repo (stability > flexibility) -│ └─ Automated executor? -│ └─ Stay single-repo unless team coordination needed -└─ NO (OSS contributor) - └─ Use multi-repo (planning repo separate from upstream) -``` - -**Bottom line for self-hosting**: Default to single-repo. Only adopt multi-repo when you have a proven, specific need. - -## Design Decisions (Resolved) - -### 1. Namespace Collisions: **Option B (Global Uniqueness)** - -**Decision**: Use globally unique hash-based IDs that include timestamp + random component. - -**Rationale** (from VC feedback): -- Option C (allow collisions) breaks dependency references: `bd dep add bd-a3f8e9 bd-b7c2d1` becomes ambiguous -- Need to support cross-repo dependencies without repo-scoped namespacing -- Hash should be: `hash(title + description + timestamp_ms + random_4bytes)` -- Collision probability: ~1 in 10^12 (acceptable) - -### 2. Cross-Repo Dependencies: **Yes, Fully Supported** - -**Decision**: Dependencies work transparently across all repos. - -**Implementation**: -- Hydrated database contains all issues from all repos -- Dependencies stored by ID only (no repo qualifier needed) -- `bd ready` checks dependency graph across all repos -- Writes route back to correct JSONL via `source_repo` metadata - -### 3. Routing Mechanism: **Config-Based, No Schema Changes** - -**Decision**: Use config-based routing + explicit `--repo` flag. No new schema fields. - -**Rationale**: -- `IssueType` already exists and is used semantically (bug, feature, task, epic, chore) -- Labels are used semantically by VC (`discovered:blocker`, `no-auto-claim`) -- Routing is a storage concern, not issue metadata -- Simpler: auto-detect maintainer vs contributor from config -- Discovered issues inherit parent's `source_repo` automatically - -### 4. Performance: **Smart Caching with File Mtime Tracking** - -**Decision**: SQLite DB is the cache, JSONLs are source of truth. - -**Implementation**: -```go -type MultiRepoStorage struct { - repos []RepoConfig - db *sql.DB - repoMtimes map[string]time.Time // Track file modification times -} - -func (s *MultiRepoStorage) GetReadyWork(ctx) ([]Issue, error) { - // Fast path: check if ANY JSONL changed - needSync := false - for repo, jsonlPath := range s.jsonlPaths() { - currentMtime := stat(jsonlPath).ModTime() - if currentMtime.After(s.repoMtimes[repo]) { - needSync = true - s.repoMtimes[repo] = currentMtime - } - } - - // Only re-hydrate if something changed - if needSync { - s.rehydrate() // Expensive but rare - } - - // Query is fast (in-memory SQLite) - return s.db.Query("SELECT * FROM issues WHERE ...") -} -``` - -**Rationale**: VC's polling loop (every 5-10 seconds) requires sub-second queries. File stat is microseconds, re-parsing only when needed. - -#### Performance Benchmarks and Targets - -**Critical for library consumers (VC) with automated polling.** - -##### Performance Targets - -Based on VC's polling loop requirements (every 5-10 seconds): - -| Operation | Target | Rationale | -|-----------|--------|-----------| -| **File stat** (per repo) | <1ms | Checking mtime of N JSONLs must be negligible | -| **Hydration** (full re-parse) | <500ms | Only happens when JSONL changes, rare in polling loop | -| **Query** (from cached DB) | <10ms | Common case: no JSONL changes, pure SQLite query | -| **Total GetReadyWork()** | <100ms | VC's hard requirement for responsive executor | - -##### Scale Testing Targets - -Test at multiple repo counts to ensure scaling: - -| Repo Count | File Stat Total | Hydration (worst case) | Query (cached) | Total (cached) | -|------------|-----------------|------------------------|----------------|----------------| -| **N=1** (baseline) | <1ms | <200ms | <5ms | <10ms | -| **N=3** (typical) | <3ms | <500ms | <10ms | <20ms | -| **N=10** (edge case) | <10ms | <2s | <50ms | <100ms | - -**Assumptions**: -- JSONL size: <25k per repo (see Design Principles) -- SQLite: In-memory mode (`:memory:` or `file::memory:?cache=shared`) -- Cached case: No JSONL changes since last hydration (99% of polling loops) - -##### Benchmark Suite (To Be Implemented) - -```go -// benchmark/multi_repo_test.go - -func BenchmarkFileStatOverhead(b *testing.B) { - // Test: Stat N JSONL files - // Target: <1ms per repo -} - -func BenchmarkHydrationN1(b *testing.B) { - // Test: Full hydration from 1 JSONL (20k file) - // Target: <200ms -} - -func BenchmarkHydrationN3(b *testing.B) { - // Test: Full hydration from 3 JSONLs (20k each) - // Target: <500ms -} - -func BenchmarkHydrationN10(b *testing.B) { - // Test: Full hydration from 10 JSONLs (20k each) - // Target: <2s -} - -func BenchmarkQueryCached(b *testing.B) { - // Test: GetReadyWork() with no JSONL changes - // Target: <10ms -} - -func BenchmarkGetReadyWorkN3(b *testing.B) { - // Test: Realistic polling loop (3 repos, cached) - // Target: <20ms total -} -``` - -##### Performance Optimization Notes - -If benchmarks fail to meet targets, optimization strategies: - -1. **Parallel file stats**: Use goroutines to stat N JSONLs concurrently -2. **Incremental hydration**: Only re-parse changed repos, merge into DB -3. **Smarter caching**: Hash-based cache invalidation (mtime + file size) -4. **SQLite tuning**: `PRAGMA synchronous = OFF`, `PRAGMA journal_mode = MEMORY` -5. **Lazy hydration**: Defer hydration until first query after mtime change - -##### Status - -**Benchmarks**: ⏳ Not implemented yet (tracked in bd-wta) -**Targets**: ✅ Documented above -**Validation**: ⏳ Pending implementation - -**Next steps**: -1. Implement benchmark suite in `benchmark/multi_repo_test.go` -2. Run benchmarks on realistic workloads (VC-sized DBs) -3. Document results in this section -4. File optimization issues if targets not met - -### 5. Visibility Field: **Optional, Backward Compatible** - -**Decision**: Add `visibility` as optional field, defaults to "canonical" if missing. - -**Schema**: -```go -type Issue struct { - // ... existing fields ... - Visibility *string `json:"visibility,omitempty"` // nil = canonical -} -``` - -**States**: -- `local`: Personal planning only -- `proposed`: Staged for upstream PR -- `canonical`: Accepted by upstream (or default for existing issues) - -**Orthogonality**: Visibility and Status are independent: -- `status: in_progress, visibility: local` → Working on personal planning -- `status: open, visibility: proposed` → Proposed to upstream, awaiting review - -### 6. Library API Stability: **Transparent Hydration** - -**Decision**: Hybrid approach - transparent by default, explicit opt-in available. - -**Backward Compatible**: -```go -// Existing code keeps working - reads config.toml automatically -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") -``` - -**Explicit Override**: -```go -// Library consumers can override config -cfg := beadsLib.Config{ - Primary: ".beads/vc.db", - Additional: []string{"~/.beads-planning"}, -} -store, err := beadsLib.NewStorageWithConfig(cfg) -``` - -### 7. ACID Guarantees: **Per-Repo File Locking** - -**Decision**: Use file-based locks per JSONL, atomic within single repo. - -**Implementation**: -```go -func (s *Storage) UpdateIssue(issue Issue) error { - sourceRepo := issue.SourceRepo - - // Lock that repo's JSONL - lock := flock(sourceRepo + "/beads.jsonl.lock") - defer lock.Unlock() - - // Read-modify-write - issues := s.readJSONL(sourceRepo) - issues.Update(issue) - s.writeJSONL(sourceRepo, issues) - - // Update in-memory DB - s.db.Update(issue) -} -``` - -**Limitation**: Cross-repo transactions are NOT atomic (acceptable, rare use case). - -#### Compatibility with Exclusive Lock Protocol - -The per-repo file locking (Decision #7) is **fully compatible** with the existing exclusive lock protocol (see [EXCLUSIVE_LOCK.md](../EXCLUSIVE_LOCK.md)). - -**How they work together**: - -1. **Exclusive locks are daemon-level**: The `.beads/.exclusive-lock` prevents the bd daemon from operating on a specific database -2. **File locks are operation-level**: Per-JSONL file locks (`flock`) ensure atomic read-modify-write for individual operations -3. **Different scopes, complementary purposes**: - - Exclusive lock: "This entire database is off-limits to the daemon" - - File lock: "This specific JSONL is being modified right now" - -**Multi-repo behavior**: - -With multi-repo configuration, each repo can have its own exclusive lock: - -```bash -# VC executor locks its primary database -.beads/.exclusive-lock # Locks primary repo operations - -# Planning repo can be locked independently -~/.beads-planning/.exclusive-lock # Locks planning repo operations -``` - -**When both are active**: -- If primary repo is locked: Daemon skips all operations on primary, but can still sync planning repo -- If planning repo is locked: Daemon skips planning repo, but can still sync primary -- If both locked: Daemon skips entire multi-repo workspace - -**No migration needed for library consumers**: - -Existing VC code (v0.17.3+) using exclusive locks will continue to work: -```go -// VC's existing lock acquisition -lock, err := types.NewExclusiveLock("vc-executor", "1.0.0") -lockPath := filepath.Join(".beads", ".exclusive-lock") -os.WriteFile(lockPath, data, 0644) - -// Works the same with multi-repo: -// - Locks .beads/ (primary repo) -// - Daemon skips primary, can still sync ~/.beads-planning if configured -``` - -**Atomic multi-repo locking**: - -If a library consumer needs to lock **all** repos atomically: - -```go -// Lock all configured repos -repos := []string{".beads", filepath.ExpandUser("~/.beads-planning")} -for _, repo := range repos { - lockPath := filepath.Join(repo, ".exclusive-lock") - os.WriteFile(lockPath, lockData, 0644) -} -defer func() { - for _, repo := range repos { - os.Remove(filepath.Join(repo, ".exclusive-lock")) - } -}() - -// Now daemon skips all repos until locks released -``` - -**Summary**: No breaking changes. Exclusive locks work per-repo in multi-repo configs, preventing daemon interference at repo granularity. - -## Key Learnings from VC Feedback - -The VC project (VibeCoder) provided critical feedback as a real downstream consumer that uses beads as a library. Key insights: - -### 1. Two Consumer Models - -Beads has two distinct consumer types: -- **CLI users**: Use `bd` commands directly -- **Library consumers**: Use `beadsLib` in Go/TypeScript/etc. (like VC) - -Multi-repo must work transparently for both. - -### 2. Performance is Critical for Automation - -VC's executor polls `GetReadyWork()` every 5-10 seconds. Multi-repo hydration must: -- Use smart caching (file mtime tracking) -- Avoid re-parsing JSONLs on every query -- Keep queries sub-second (ideally <100ms) - -### 3. Special Labels Must Work Across Repos - -VC uses semantic labels that must work regardless of repo: -- `discovered:blocker` - Auto-generated blocker issues (priority boost) -- `discovered:related` - Auto-generated related work -- `no-auto-claim` - Prevent executor from claiming -- `baseline-failure` - Self-healing baseline failures - -These are **semantic labels**, not routing labels. Don't overload labels for routing. - -### 4. Discovered Issues Routing - -When VC's analysis phase auto-creates issues with `discovered:blocker` label, they should: -- Inherit parent's `source_repo` automatically -- Stay co-located with related work -- Not require manual routing decisions - -### 5. Library API Stability is Non-Negotiable - -VC's code uses `beadsLib.NewSQLiteStorage()`. Must not break. Solution: -- Read `.beads/config.toml` automatically (transparent) -- Provide `NewStorageWithConfig()` for explicit override -- Hydration happens at storage layer, invisible to library consumers - -## Remaining Open Questions - -1. **Sync semantics**: When upstream accepts a proposed issue and modifies it, how to sync back? - - Option A: Mark as "accepted" in planning repo, keep both copies - - Option B: Delete from planning repo (it's now canonical) - - Option C: Keep in planning repo but mark as read-only mirror - -2. **Discovery**: How do users learn about this feature? - - Auto-prompt when detecting fork/contributor status? - - Docs + examples? - - `bd init --contributor` wizard? - -3. **Metadata fields**: Should `source_repo` be exposed in JSON export, or keep it internal to storage layer? - -4. **Proposed issue lifecycle**: What happens to proposed issues after PR is merged/rejected? - - Auto-delete from planning repo? - - Mark as "accepted" or "rejected"? - - Manual cleanup via `bd withdraw`? - -## Success Metrics - -How we'll know this works: - -1. **Zero pollution**: No contributor planning issues accidentally merged upstream -2. **Multi-clone sync**: Workers on different machines see consistent state (via VCS sync) -3. **Flexibility**: Users can configure for their workflow (personas, phases, etc.) -4. **Backwards compat**: Existing single-repo users unaffected -5. **VCS-agnostic**: Works with git, jj, hg, sl, or no VCS - -## Next Actions - -Suggested epics/issues to create (can be done in follow-up session): - -1. **Epic: Multi-repo hydration layer** - - Design schema for source_repo metadata - - Implement config parsing for repos.additional - - Build hydration logic (read from N JSONLs) - - Build write routing (write to correct JSONL) - -2. **Epic: Proposal workflow** - - Implement `bd propose` command - - Implement `bd withdraw` command - - Implement `bd accept` command (maintainer only) - - Design visibility state machine - -3. **Epic: Auto-routing** - - Detect maintainer vs contributor status - - Implement routing rules (label, priority, custom) - - Make `bd add` route to correct repo - -4. **Epic: VCS-agnostic sync** - - Detect VCS type per repo - - Implement sync adapters (git, jj, hg, sl) - - Handle mixed-VCS multi-repo configs - -5. **Epic: Migration and onboarding** - - Write migration guide - - Implement `bd migrate` command - - Create init wizards for common scenarios - - Update documentation - ---- - -## Summary and Next Steps - -This document represents the design evolution for multi-repo support in beads, driven by: - -1. **Original problem** (GitHub #207): Contributors' personal planning pollutes upstream PRs -2. **Core insight**: Beads is a separate communication channel that happens to use VCS -3. **VC feedback**: Real-world library consumer with specific performance and API stability needs - -### Final Architecture - -**Solution #4 (Separate Repos)** with these refinements: - -- **N ≥ 1 repos**: Single repo (N=1) is default, multi-repo is opt-in -- **VCS-agnostic**: Works with git, jj, hg, sapling, or no VCS -- **Config-based routing**: No schema changes, auto-detect maintainer vs contributor -- **Smart caching**: File mtime tracking, SQLite DB as cache layer -- **Transparent hydration**: Library API remains stable, config-driven -- **Global namespace**: Hash-based IDs with timestamp + random for uniqueness -- **Cross-repo dependencies**: Fully supported, transparent to users -- **Discovered issues**: Inherit parent's source_repo automatically - -### Why This Design Wins - -1. **Zero PR pollution**: Separate git histories = impossible to accidentally merge planning -2. **Git ledger preserved**: All repos are VCS-tracked, full forensic capability -3. **Maximum flexibility**: Supports OSS contributors, multi-phase dev, multi-persona workflows -4. **Backward compatible**: Existing single-repo users unchanged -5. **Performance**: Sub-second queries even with polling loops -6. **Library-friendly**: Transparent to downstream consumers like VC - -### Related Documents - -- Original issue: GitHub #207 -- VC feedback: `./vc-feedback-on-multi-repo.md` -- Implementation tracking: TBD (epics to be created) - -### Status - -**Design**: ✅ Complete (pending resolution of open questions) -**Implementation**: ⏳ Not started -**Target**: TBD - -Last updated: 2025-11-03 diff --git a/docs/import-bug-analysis-bd-3xq.md b/docs/import-bug-analysis-bd-3xq.md deleted file mode 100644 index e7d05109..00000000 --- a/docs/import-bug-analysis-bd-3xq.md +++ /dev/null @@ -1,517 +0,0 @@ -# bd-3xq: Import Failure on Missing Parent Issues - Deep Analysis - -**Issue ID**: bd-3xq -**Analysis Date**: 2025-11-04 -**Status**: P0 Bug - ---- - -## Executive Summary - -The beads import process fails atomically when the JSONL file references deleted parent issues, blocking all imports. This is caused by overly strict parent validation in two critical code paths. The root issue is a **design tension between referential integrity and operational flexibility**. - -**Key Finding**: The current implementation prioritizes database integrity over forward-compatibility, making normal operations like `bd-delete` potentially destructive to future imports. - ---- - -## Problem Deep Dive - -### The Failure Scenario - -1. User deletes old/obsolete issues via `bd-delete` for hygiene ✓ (valid operation) -2. Issues remain in git history but are removed from database ✓ (expected) -3. JSONL file in git contains child issues whose parents were deleted ✗ (orphaned references) -4. Auto-import fails completely: `parent issue bd-1f4086c5 does not exist` ✗ -5. Database becomes stuck - **296 issues in DB, newer data in JSONL cannot sync** ✗ - -### Technical Root Cause - -Parent validation occurs in **two critical locations**: - -#### 1. **`internal/storage/sqlite/ids.go:189-202`** - In `EnsureIDs()` - -```go -// For hierarchical IDs (bd-a3f8e9.1), validate parent exists -if strings.Contains(issues[i].ID, ".") { - // Extract parent ID (everything before the last dot) - lastDot := strings.LastIndex(issues[i].ID, ".") - parentID := issues[i].ID[:lastDot] - - var parentCount int - err := conn.QueryRowContext(ctx, `SELECT COUNT(*) FROM issues WHERE id = ?`, parentID).Scan(&parentCount) - if err != nil { - return fmt.Errorf("failed to check parent existence: %w", err) - } - if parentCount == 0 { - return fmt.Errorf("parent issue %s does not exist", parentID) // ⚠️ BLOCKS ENTIRE IMPORT - } -} -``` - -#### 2. **`internal/storage/sqlite/sqlite.go:182-196`** - In `CreateIssue()` - -```go -// For hierarchical IDs (bd-a3f8e9.1), validate parent exists -if strings.Contains(issue.ID, ".") { - // Extract parent ID (everything before the last dot) - lastDot := strings.LastIndex(issue.ID, ".") - parentID := issue.ID[:lastDot] - - var parentCount int - err = conn.QueryRowContext(ctx, `SELECT COUNT(*) FROM issues WHERE id = ?`, parentID).Scan(&parentCount) - if err != nil { - return fmt.Errorf("failed to check parent existence: %w", err) - } - if parentCount == 0 { - return fmt.Errorf("parent issue %s does not exist", parentID) // ⚠️ BLOCKS CREATION - } -} -``` - -**Analysis**: Both functions perform identical validation, creating a redundant but reinforced barrier. This is defensive programming taken too far - it prevents valid evolution scenarios. - ---- - -## Critical Insight: The Import Ordering Bug - -### Hidden Problem in `importer.go:534-546` - -The `upsertIssues()` function has a **latent bug** that compounds the parent validation issue: - -```go -// Batch create all new issues -if len(newIssues) > 0 { - if err := sqliteStore.CreateIssues(ctx, newIssues, "import"); err != nil { - return fmt.Errorf("error creating issues: %w", err) - } - result.Created += len(newIssues) -} -``` - -**The Problem**: `newIssues` is **not sorted by hierarchy depth** before batch creation! - -If the import includes: -- `bd-abc123` (parent) -- `bd-abc123.1` (child) - -And they happen to be ordered `[child, parent]` in the slice, the import will fail even though **both parent and child are present** in the batch. - -**Why This Matters**: Even if we relax parent validation to allow missing parents, we still need proper topological sorting to handle parent-child pairs in the same batch. - ---- - -## Design Analysis: Three Competing Forces - -### 1. **Referential Integrity** (Current Priority) -- **Goal**: Prevent orphaned children in the database -- **Benefit**: Clean, consistent data structure -- **Cost**: Blocks valid operations, makes deletion risky - -### 2. **Operational Flexibility** (Sacrificed) -- **Goal**: Allow normal database maintenance (deletions, pruning) -- **Benefit**: Database hygiene, reduced clutter -- **Cost**: Currently incompatible with strict integrity - -### 3. **Multi-Repo Sync** (Broken) -- **Goal**: Share issues across clones with different histories -- **Benefit**: Collaboration, distributed workflow -- **Cost**: Different deletion states across clones break imports - -**Current State**: Force 1 wins at the expense of Forces 2 and 3. - ---- - -## Solution Space Analysis - -### Option 1: **Strict Validation with Import-Time Parent Creation** ⭐ - -**Approach**: Keep strict validation but auto-resurrect deleted parents during import. - -**How It Works**: -1. When importing child with missing parent, check git history -2. If parent found in JSONL history, resurrect it as a **tombstone** -3. Tombstone: status=`deleted`, minimal metadata, preserved for structure -4. Child import succeeds with valid parent reference - -**Pros**: -- ✅ Maintains referential integrity -- ✅ Allows forward-rolling imports -- ✅ Preserves dependency tree structure -- ✅ Minimal code changes - -**Cons**: -- ⚠️ Database accumulates tombstones (but they're marked deleted) -- ⚠️ Requires git history access (already available) -- ⚠️ Slight complexity increase - -**Code Changes Required**: -- Modify `EnsureIDs()` and `CreateIssue()` to accept a "resurrect" mode -- Add `TryResurrectParent(ctx, parentID)` function -- Parse JSONL history to find deleted parent -- Create parent with `status="deleted"` and flag `is_tombstone=true` - -**Risk Level**: **Low** - Backwards compatible, preserves semantics - ---- - -### Option 2: **Relaxed Validation - Skip Orphans** - -**Approach**: Log warning and skip orphaned children during import. - -**How It Works**: -1. Remove `if parentCount == 0` error return -2. Replace with: `log.Warnf("Skipping orphaned issue %s (parent %s not found)", childID, parentID)` -3. Continue import with other issues -4. Report skipped issues at end - -**Pros**: -- ✅ Simplest implementation -- ✅ Unblocks imports immediately -- ✅ No data corruption - -**Cons**: -- ❌ Silently loses data (orphaned issues) -- ❌ Hard to notice what was skipped -- ❌ Breaks user expectations (import should import everything) - -**Risk Level**: **Medium** - Data loss risk - ---- - -### Option 3: **Relaxed Validation - Allow Orphans** - -**Approach**: Import orphaned children without parent validation. - -**How It Works**: -1. Remove parent existence check entirely -2. Allow `bd-abc123.1` to exist without `bd-abc123` -3. UI/CLI queries handle missing parents gracefully - -**Pros**: -- ✅ Maximum flexibility -- ✅ Simple code change -- ✅ Unblocks all scenarios - -**Cons**: -- ❌ Breaks dependency tree integrity -- ❌ UI/CLI must handle orphans everywhere -- ❌ Hierarchical ID semantics become meaningless -- ❌ Risk of cascading failures in tree operations - -**Risk Level**: **High** - Semantic corruption - ---- - -### Option 4: **Convert Hierarchical to Top-Level** - -**Approach**: When parent missing, flatten child ID to top-level. - -**How It Works**: -1. Detect orphaned child: `bd-abc123.1` -2. Convert to top-level: `bd-abc123-1` (dot → dash) -3. Import as independent issue -4. Log transformation - -**Pros**: -- ✅ Preserves all issues -- ✅ Maintains uniqueness -- ✅ No data loss - -**Cons**: -- ❌ Changes issue IDs (breaks references) -- ❌ Loses hierarchical relationship -- ❌ Confusing for users - -**Risk Level**: **Medium** - ID stability risk - ---- - -### Option 5: **Two-Pass Import with Topological Sort** ⭐⭐ - -**Approach**: Sort issues by hierarchy depth before batch creation. - -**How It Works**: -1. **Pre-process phase**: Separate issues into depth buckets - - Depth 0: `bd-abc123` (no dots) - - Depth 1: `bd-abc123.1` (one dot) - - Depth 2: `bd-abc123.1.2` (two dots) -2. **Import phase**: Create in depth order (0 → 1 → 2) -3. **Parent resolution**: For missing parents, try: - - Option A: Resurrect from JSONL (Option 1) - - Option B: Skip with warning (Option 2) - - Option C: Create placeholder parent - -**Pros**: -- ✅ Fixes latent import ordering bug -- ✅ Handles parent-child pairs in same batch -- ✅ Can combine with other options (1, 2, or 3) -- ✅ More robust import pipeline - -**Cons**: -- ⚠️ Requires refactoring `upsertIssues()` -- ⚠️ Slight performance overhead (sorting) - -**Code Changes Required**: -```go -// In upsertIssues() before batch creation: - -// Sort newIssues by hierarchy depth to ensure parents are created first -sort.Slice(newIssues, func(i, j int) bool { - depthI := strings.Count(newIssues[i].ID, ".") - depthJ := strings.Count(newIssues[j].ID, ".") - if depthI != depthJ { - return depthI < depthJ // Shallower first - } - return newIssues[i].ID < newIssues[j].ID // Stable sort -}) - -// Then batch create by depth level -for depth := 0; depth <= 3; depth++ { // Max depth 3 - var batchForDepth []*types.Issue - for _, issue := range newIssues { - if strings.Count(issue.ID, ".") == depth { - batchForDepth = append(batchForDepth, issue) - } - } - if len(batchForDepth) > 0 { - if err := sqliteStore.CreateIssues(ctx, batchForDepth, "import"); err != nil { - return fmt.Errorf("error creating depth-%d issues: %w", depth, err) - } - result.Created += len(batchForDepth) - } -} -``` - -**Risk Level**: **Low** - Fixes existing bug, improves robustness - ---- - -## Recommended Solution: **Hybrid Approach** 🎯 - -**Combine Options 1 + 5**: Two-pass import with parent resurrection. - -### Implementation Plan - -#### Phase 1: Fix Import Ordering (Option 5) -1. Refactor `upsertIssues()` to sort by depth -2. Add depth-based batch creation -3. Add tests for parent-child pairs in same batch - -#### Phase 2: Add Parent Resurrection (Option 1) -1. Create `TryResurrectParent(ctx, parentID)` function -2. Modify `EnsureIDs()` to call resurrection before validation -3. Add `is_tombstone` flag to schema (optional) -4. Log resurrected parents for transparency - -#### Phase 3: Make Configurable -1. Add config option: `import.orphan_handling` - - `strict`: Current behavior (fail on missing parent) - - `resurrect`: Auto-resurrect from JSONL (default) - - `skip`: Skip orphaned issues with warning - - `allow`: Allow orphans (relaxed mode) - -### Benefits of Hybrid Approach -- ✅ Fixes latent ordering bug (prevents future issues) -- ✅ Handles deleted parents gracefully -- ✅ Maintains referential integrity -- ✅ Provides user control via config -- ✅ Backwards compatible (strict mode available) -- ✅ Enables multi-repo workflows - ---- - -## Edge Cases to Consider - -### 1. **Parent Deleted in Multiple Levels** -**Scenario**: `bd-abc.1.2` exists but both `bd-abc` and `bd-abc.1` are deleted. - -**Resolution**: Recursive resurrection - resurrect entire chain. - ---- - -### 2. **Parent Never Existed in JSONL** -**Scenario**: JSONL corruption or manual ID manipulation. - -**Resolution**: -- If `resurrect` mode: Skip with error (can't resurrect what doesn't exist) -- If `skip` mode: Skip orphan -- If `allow` mode: Import anyway (dangerous) - ---- - -### 3. **Concurrent Import from Different Clones** -**Scenario**: Two clones import same JSONL with missing parents simultaneously. - -**Resolution**: Resurrection is idempotent - second clone sees parent already exists (created by first clone). No conflict. - ---- - -### 4. **Parent Deleted After Child Import** -**Scenario**: Import creates `bd-abc.1`, then user deletes `bd-abc`. - -**Resolution**: Foreign key constraint prevents deletion (if enabled). If disabled, creates orphan in DB. - -**Recommendation**: Add `ON DELETE CASCADE` or `ON DELETE RESTRICT` to child_counters table. - ---- - -## Schema Considerations - -### Current Schema (`schema.go`) - -```sql -CREATE TABLE IF NOT EXISTS child_counters ( - parent_id TEXT PRIMARY KEY, - next_counter INTEGER NOT NULL DEFAULT 1, - FOREIGN KEY(parent_id) REFERENCES issues(id) -); -``` - -**Issue**: No `ON DELETE` clause - undefined behavior when parent deleted. - -### Recommended Schema Change - -```sql -CREATE TABLE IF NOT EXISTS child_counters ( - parent_id TEXT PRIMARY KEY, - next_counter INTEGER NOT NULL DEFAULT 1, - FOREIGN KEY(parent_id) REFERENCES issues(id) ON DELETE CASCADE -); -``` - -**Reason**: When parent deleted, child counter should also be deleted. If parent is resurrected, counter gets recreated from scratch. - ---- - -## Performance Impact Analysis - -### Current Import (Broken) -- Time: O(n) where n = number of issues -- Fails on first orphan - -### Two-Pass Import (Option 5) -- Sorting: O(n log n) -- Depth-based batching: O(n × d) where d = max depth (3) -- **Total**: O(n log n) - negligible for typical datasets (<10k issues) - -### Parent Resurrection (Option 1) -- JSONL parse: Already done -- Parent lookup: O(1) hash map lookup -- Resurrection: O(1) single insert -- **Total**: O(1) per orphan - minimal overhead - -**Conclusion**: Performance impact is negligible (<5% overhead for typical imports). - ---- - -## Testing Strategy - -### Unit Tests Required - -1. **Test Import Ordering** - - Import `[child, parent]` - should succeed - - Import `[parent.1.2, parent, parent.1]` - should sort correctly - -2. **Test Parent Resurrection** - - Import child with deleted parent - should resurrect - - Import child with never-existed parent - should fail gracefully - -3. **Test Config Modes** - - Test `strict`, `resurrect`, `skip`, `allow` modes - - Verify error messages and logging - -4. **Test Edge Cases** - - Multi-level deletion (`bd-abc.1.2` with `bd-abc` and `bd-abc.1` deleted) - - Concurrent imports with same orphans - - JSONL corruption scenarios - -### Integration Tests Required - -1. **Multi-Repo Sync** - - Clone A deletes issue - - Clone B imports Clone A's JSONL - - Verify: Clone B handles deletion gracefully - -2. **Round-Trip Fidelity** - - Export → Delete parent → Import → Verify structure - ---- - -## Code Files Affected - -### Must Modify -1. `internal/importer/importer.go:534-546` - Add topological sort -2. `internal/storage/sqlite/ids.go:189-202` - Add resurrection option -3. `internal/storage/sqlite/sqlite.go:182-196` - Add resurrection option - -### Should Modify -4. `internal/storage/sqlite/schema.go:35-49` - Add `ON DELETE CASCADE` -5. `internal/types/types.go` - Add `IsTombstone bool` field (optional) - -### New Files Needed -6. `internal/storage/sqlite/resurrection.go` - Parent resurrection logic -7. `internal/importer/sort.go` - Topological sort utilities - ---- - -## Migration Path - -### For Existing Databases - -**Problem**: Databases might already have orphaned children (if foreign keys were disabled during development). - -**Solution**: Add migration to detect and fix orphans: - -```sql --- Find orphaned children -SELECT id FROM issues -WHERE id LIKE '%.%' -AND substr(id, 1, instr(id || '.', '.') - 1) NOT IN (SELECT id FROM issues); - --- Option A: Delete orphans -DELETE FROM issues WHERE id IN (...); - --- Option B: Convert to top-level -UPDATE issues SET id = replace(id, '.', '-') WHERE id IN (...); -``` - -**Recommendation**: Run detection query, log results, let user decide action. - ---- - -## Conclusion - -**bd-3xq reveals a fundamental design flaw**: The system prioritizes database integrity over operational flexibility, making normal operations (deletion) risky for future imports. - -**The hybrid solution (Options 1 + 5) is strongly recommended** because it: -1. Fixes the latent import ordering bug that affects everyone -2. Enables graceful handling of deleted parents -3. Maintains referential integrity through resurrection -4. Provides configuration options for different use cases -5. Enables multi-repo workflows (bd-4ms) -6. Has minimal performance impact -7. Is backwards compatible - -**Estimated Implementation Time**: -- Phase 1 (sorting): 4-6 hours -- Phase 2 (resurrection): 6-8 hours -- Phase 3 (config): 2-3 hours -- Testing: 8-10 hours -- **Total**: 2-3 days for complete solution - -**Priority**: P0 - Blocks multi-repo work (bd-4ms) and makes bd-delete risky - ---- - -## References - -- **bd-3xq**: This issue -- **bd-4ms**: Multi-repo support (blocked by this issue) -- **bd-a101**: Separate branch workflow (blocked by this issue) -- **bd-8e05**: Hash-based ID migration (related context) -- **bd-95**: Content hash computation (resurrection uses this) - ---- - -*Analysis completed: 2025-11-04* -*Analyzed by: Claude (Sonnet 4.5)* diff --git a/docs/vc-feedback-on-multi-repo.md b/docs/vc-feedback-on-multi-repo.md deleted file mode 100644 index 37c2921c..00000000 --- a/docs/vc-feedback-on-multi-repo.md +++ /dev/null @@ -1,536 +0,0 @@ -# VC Feedback on Multi-Repo Contributor Workflow - -**Date**: 2025-11-03 -**Context**: Response to `docs/contributor-workflow-analysis.md` -**From**: VC Team (AI-supervised issue workflow system) - -## Executive Summary - -**Overall Assessment**: The multi-repo design is **sound and well-thought-out**. VC can adopt it post-bootstrap with minimal disruption. - -**Key Concerns**: -1. **Library API stability** - Must remain transparent to library consumers -2. **Cross-repo dependency resolution** - Critical for VC's blocker-first prioritization -3. **Performance** - Hydration caching needed for VC's polling loop -4. **Namespace collisions** - Recommend Option B (global uniqueness) - -**Current Status**: VC uses Beads v0.17.7 as a library, single-repo model, bootstrap phase (pre-contributors). - ---- - -## 1. VC's Context & Usage Patterns - -### How VC Uses Beads - -**Architecture**: -- Beads as library: `beadsLib.NewSQLiteStorage(".beads/vc.db")` -- Extension model: VC adds tables (`vc_mission_state`, `vc_agent_events`) -- Single repo: `.beads/vc.db` + `.beads/issues.jsonl` -- Heavy use of ~20 library methods (GetIssue, CreateIssue, GetReadyWork, etc.) - -**Key Workflows**: -1. **Blocker-first prioritization** - `GetReadyWork()` sorts by discovered:blocker label first -2. **Atomic claiming** - `UPDATE issues SET status='in_progress' WHERE status='open'` -3. **Auto-discovery** - AI analysis creates issues with `discovered:blocker` and `discovered:related` labels -4. **Self-healing** - Enters "degraded mode" when `baseline-failure` issues exist -5. **Executor exclusion** - `no-auto-claim` label prevents auto-claiming - -**Performance Profile**: -- Polling loop: `GetReadyWork()` called every 5-10 seconds -- Need sub-second response times -- Cannot afford to re-read N JSONL files on every query - ---- - -## 2. Impact Assessment - -### Short-Term (Bootstrap Phase): ✅ MINIMAL - -- Multi-repo is opt-in with backwards-compatible defaults -- VC continues with single `.beads/vc.db` and `.beads/issues.jsonl` -- No changes needed during bootstrap - -### Medium-Term (Post-Bootstrap): ⚠️ LOW-MEDIUM - -**Potential use cases**: -- **Testing isolation**: Separate repo for experimental executor features -- **Multi-contributor**: External contributors use `~/.beads-planning/` - -**Concerns**: -- Cross-repo dependency resolution must work transparently -- Atomic claiming must preserve ACID guarantees -- Performance impact of multi-repo hydration - -### Long-Term (Self-Hosting): ✅ BENEFICIAL - -- Natural fit for VC's multi-contributor future -- Prevents PR pollution from contributor planning -- Aligns with VC's goal of becoming self-hosting - ---- - -## 3. Critical Design Questions - -### Q1. Library API Stability ⚠️ CRITICAL - -**Question**: Is this a library API change or pure CLI feature? - -**Context**: VC uses `beadsLib.NewSQLiteStorage()` and expects single JSONL file. - -**What we need to know**: -- Does `NewSQLiteStorage()` API change? -- Is hydration transparent at library level? -- Or is multi-repo purely a `bd` CLI feature? - -**Recommendation**: -```go -// Backwards-compatible: continue to work with no changes -store, err := beadsLib.NewSQLiteStorage(".beads/vc.db") - -// Multi-repo should be configured externally (.beads/config.toml) -// and hydrated transparently by the storage layer - -// If API must change, provide opt-in: -cfg := beadsLib.Config{ - Primary: ".beads/vc.db", - Additional: []string{"~/.beads-planning"}, -} -store, err := beadsLib.NewStorageWithConfig(cfg) -``` - ---- - -### Q2. Cross-Repo Dependencies ⚠️ CRITICAL - -**Question**: How does `GetReadyWork()` handle cross-repo dependencies? - -**Context**: VC's executor relies on dependency graph to find ready work. - -**Example scenario**: -``` -canonical repo (.beads/vc.db): - vc-100 (open, P0) - ready work - -planning repo (~/.beads-planning): - vc-101 (open, P1, discovered:blocker) - ready work - vc-102 (open, P2) depends on vc-100 ← cross-repo dependency - -Expected results: - GetReadyWork() returns [vc-101, vc-100] ← blocker-first, then priority - (excludes vc-102 - blocked by vc-100) -``` - -**What we need**: -- Hydration layer builds unified dependency graph across all repos -- `GetReadyWork()` respects cross-repo dependencies -- Performance acceptable for frequent polling - -**Recommendation**: Document cross-repo dependency behavior clearly and provide test cases. - ---- - -### Q3. Atomic Operations Across Repos ⚠️ CRITICAL - -**Question**: Are writes atomic when multiple repos are hydrated? - -**Context**: VC's executor uses atomic claiming: -```go -// Must be atomic even if issue comes from planning repo -UPDATE issues SET status = 'in_progress', executor_id = ? -WHERE id = ? AND status = 'open' -``` - -**What we need to know**: -- If multiple repos hydrate into single `.beads/vc.db`, are writes atomic? -- How does hydration layer route writes back to correct JSONL? -- Are there race conditions between multiple processes? - -**Recommendation**: Preserve ACID guarantees. Writes to hydrated database should be transparently routed to correct JSONL with transactional semantics. - ---- - -### Q4. Visibility States vs Issue Status ⚠️ MEDIUM - -**Question**: Are visibility and status orthogonal? - -**Context**: VC uses `status: open | in_progress | closed` extensively. - -**From document**: -```jsonl -{ - "status": "open", // ← VC's current field - "visibility": "local", // ← New field proposed - ... -} -``` - -**What we need to know**: -- Can an issue be `status: in_progress` and `visibility: local`? -- Does `GetReadyWork()` filter by visibility? -- Is this a breaking schema change? - -**Recommendation**: Clarify orthogonality and provide migration guide. - ---- - -### Q5. Performance - Hydration on Every Query? ⚠️ CRITICAL - -**Question**: Does library-level hydration happen on every `GetReadyWork()` call? - -**Context**: VC's executor polls every 5-10 seconds. - -**Performance requirement**: -```go -// Executor polling loop -for { - // Must be < 1 second, ideally < 100ms - readyWork, err := store.GetReadyWork(ctx, filter) - if len(readyWork) > 0 { - claimIssue(readyWork[0]) - } - time.Sleep(5 * time.Second) -} -``` - -**Recommendation**: Implement smart caching: -```go -type MultiRepoStorage struct { - repos []RepoConfig - cache *HydratedCache - lastSync map[string]time.Time -} - -func (s *MultiRepoStorage) GetReadyWork(ctx context.Context) ([]Issue, error) { - // Check if any repo has changed since last sync - for _, repo := range s.repos { - if fileModTime(repo.JSONLPath) > s.lastSync[repo.Path] { - s.rehydrate(repo) // ← Only re-read changed repos - } - } - - // Query from cached hydrated database (fast) - return s.cache.GetReadyWork(ctx) -} -``` - -**Rationale**: Cannot afford to re-parse N JSONL files every 5 seconds. - ---- - -## 4. Design Feedback & Recommendations - -### F1. Namespace Collisions ✅ VOTE FOR OPTION B - -**From document's open question**: -> 1. **Namespace collisions**: If two repos both have `bd-a3f8e9`, how to handle? -> - Option A: Hash includes repo path -> - Option B: Global uniqueness (hash includes timestamp + random) ← **VC PREFERS THIS** -> - Option C: Allow collisions, use source_repo to disambiguate - -**Rationale**: -- VC uses `vc-` prefix, Beads uses `bd-` prefix -- Hash-based IDs should be globally unique -- Avoids complexity of repo-scoped namespaces -- Simpler for cross-repo dependencies -- **Concern with Option C**: How does `bd dep add vc-123 vc-456` know which repo's `vc-123`? - -**Recommendation**: **Option B** (global uniqueness). Include timestamp + random in hash. - ---- - -### F2. Routing Labels vs Semantic Labels ⚠️ IMPORTANT - -**From document**: -```toml -[routing.rules.label] -label = "architecture" -target = "~/.beads-work/architecture" -``` - -**Concern**: VC uses labels for semantic meaning, not routing: -- `discovered:blocker` - auto-generated blocker issues -- `discovered:related` - auto-generated related work -- `no-auto-claim` - prevent executor from claiming -- `baseline-failure` - self-healing baseline failures - -**Problem**: If Beads uses labels for routing, this conflicts with VC's semantic labels. - -**Recommendation**: Use separate mechanism for routing: -```toml -[routing.rules] - # Option 1: Use tags instead of labels - [[routing.rules.tag]] - tag = "architecture" - target = "~/.beads-work/architecture" - - # Option 2: Use issue type - [[routing.rules.type]] - type = "design" - target = "~/.beads-work/architecture" - - # Option 3: Use explicit category/phase field - [[routing.rules.phase]] - phase = "architecture" - target = "~/.beads-work/architecture" -``` - -**Rationale**: Don't overload labels - they're already a general-purpose tagging mechanism. - ---- - -### F3. Proposal Workflow - Dependency Handling ⚠️ MEDIUM - -**Question**: What happens to dependencies when an issue moves repos? - -**Scenario**: -``` -planning repo: - vc-100 "Explore feature" - vc-101 "Document findings" (depends on vc-100) - -Proposal workflow: - bd propose vc-100 # ← Move to canonical - -Result: - canonical repo: - vc-100 "Explore feature" - - planning repo: - vc-101 "Document findings" (depends on vc-100) ← Cross-repo dep now! -``` - -**Recommendation**: Document this behavior clearly: -- Dependencies survive across repos (stored by ID) -- `bd ready` checks cross-repo dependencies -- Provide command: `bd dep tree --all-repos` to visualize -- Consider warning when `bd propose` creates cross-repo deps - ---- - -### F4. Discovered Issues Routing ⚠️ MEDIUM - -**Context**: VC's analysis phase auto-creates issues with labels: -- `discovered:blocker` -- `discovered:related` - -**Question**: Which repo do discovered issues go to? - -**Options**: -1. **Same repo as parent issue** ← **VC PREFERS THIS** -2. **Always canonical** -3. **Configurable routing** - -**Rationale for Option 1**: -- Discovered issues are part of work breakdown -- Should stay with parent issue -- Avoids fragmenting related work across repos - -**Example**: -``` -planning repo: - vc-100 "Explore feature" (status: in_progress) - -Analysis phase discovers: - vc-101 "Fix edge case" (discovered:blocker, parent: vc-100) - -Expected: vc-101 goes to planning repo (same as vc-100) -``` - ---- - -### F5. Self-Healing Across Repos ⚠️ LOW - -**Context**: VC has special behavior for `baseline-failure` label: -- Enters "degraded mode" -- Only works on baseline-failure issues until fixed - -**Question**: How does this interact with multi-repo? - -**Scenario**: -``` -canonical repo: - vc-300 (baseline-failure) - tests failing - -planning repo: - vc-301 (baseline-failure) - build failing - -Expected: Executor sees both, enters degraded mode, works on either -``` - -**Recommendation**: Degraded mode should check ALL repos for baseline-failure labels. - ---- - -## 5. Test Scenarios VC Needs to Work - -### Scenario 1: Cross-Repo Blocker-First Prioritization - -``` -canonical repo: - vc-100 (open, P0, no labels) - regular work - -planning repo: - vc-101 (open, P3, discovered:blocker) - blocker work - -Expected: GetReadyWork() returns [vc-101, vc-100] - (blocker-first, even though vc-101 is P3 in planning repo) -``` - -### Scenario 2: Cross-Repo Dependencies - -``` -canonical repo: - vc-200 (open, P0) - -planning repo: - vc-201 (open, P0) depends on vc-200 - -Expected: GetReadyWork() returns [vc-200] - (vc-201 is blocked by vc-200) -``` - -### Scenario 3: Atomic Claiming - -``` -planning repo: - vc-300 (open, P0) - -Executor A: Claims vc-300 -Executor B: Tries to claim vc-300 concurrently - -Expected: Only one executor succeeds (ACID guarantee) - Write routes back to planning repo's JSONL -``` - -### Scenario 4: No-Auto-Claim Across Repos - -``` -canonical repo: - vc-400 (open, P0, no-auto-claim) - -planning repo: - vc-401 (open, P0, no-auto-claim) - -Expected: GetReadyWork() excludes both - (no-auto-claim works regardless of repo or visibility) -``` - -### Scenario 5: Baseline Failure Degraded Mode - -``` -canonical repo: - vc-500 (open, P0, baseline-failure) - vc-501 (open, P0) - regular work - -planning repo: - vc-502 (open, P0) - regular work - -Expected: Executor enters degraded mode - Only works on vc-500 (ignores vc-501 and vc-502) -``` - ---- - -## 6. Documentation Requests - -### For Library Consumers (VC's Needs) - -1. **Migration guide**: How to adopt multi-repo for existing single-repo projects -2. **API stability guarantees**: What will/won't break in future versions -3. **Cross-repo dependency semantics**: Detailed behavior and examples -4. **Performance characteristics**: Hydration cost, caching strategy, optimization tips -5. **Schema changes**: Backward compatibility for visibility field - -### For Multi-Repo Users - -6. **Cross-repo workflow examples**: Contributor, multi-phase, multi-persona scenarios -7. **Proposal workflow**: What happens to dependencies, labels, metadata when proposing -8. **Troubleshooting**: Common issues (namespace collisions, sync conflicts, performance) -9. **Best practices**: When to use multi-repo vs single-repo, repo organization patterns - ---- - -## 7. Open Questions for Beads Team - -### Priority 1 - CRITICAL: -1. Is this a breaking change to storage library API? -2. How does cross-repo dependency resolution work at library level? -3. What's the hydration performance model for frequent queries? -4. Are atomic operations preserved across multi-repo? - -### Priority 2 - IMPORTANT: -5. Which namespace collision strategy will you choose? (VC votes Option B) -6. How will routing interact with semantic labels? -7. What's the migration path for library consumers? - -### Priority 3 - NICE TO HAVE: -8. How will discovered issues routing work? -9. How will special labels (baseline-failure, no-auto-claim) work across repos? -10. Will there be performance monitoring/profiling tools for multi-repo setups? - ---- - -## 8. VC's Roadmap for Multi-Repo Adoption - -### Phase 1: Bootstrap (Current) -- ✅ Stick with single repo (`.beads/vc.db`, `.beads/issues.jsonl`) -- ✅ Monitor Beads releases for API changes -- ✅ No code changes needed unless API breaks - -### Phase 2: Post-Bootstrap Testing -- 📋 Evaluate multi-repo for isolated executor testing -- 📋 Test cross-repo scenarios (dependencies, claiming, performance) -- 📋 Validate blocker-first prioritization across repos - -### Phase 3: Self-Hosting with Contributors -- 📋 Adopt multi-repo for contributor workflow -- 📋 Contributors use `~/.beads-planning/` -- 📋 Canonical issues stay in `.beads/issues.jsonl` -- 📋 Executor handles both transparently - ---- - -## 9. Summary & Recommendations - -### For Beads Team: - -**High Priority**: -1. ✅ **Solution #4 (Separate Repos) is correct** - VCS-agnostic, clean isolation -2. ⚠️ **Library API must remain stable** - Transparent hydration for existing consumers -3. ⚠️ **Cross-repo dependencies are critical** - Must work transparently in GetReadyWork() -4. ⚠️ **Performance matters** - Smart caching needed for polling loops -5. ✅ **Choose Option B for namespaces** - Global uniqueness (timestamp + random) - -**Medium Priority**: -6. ⚠️ **Don't overload labels for routing** - Use separate mechanism (tags/types/phases) -7. ⚠️ **Document cross-repo dependency behavior** - Especially in proposal workflow -8. ⚠️ **Provide migration guide** - For library consumers adopting multi-repo - -**Design is fundamentally sound**. VC can adopt post-bootstrap with minimal changes IF library API remains stable. - -### For VC Team: - -**Short-term**: No action needed. Continue single-repo development. - -**Medium-term**: Create tracking issues: -- Monitor Beads multi-repo feature development -- Evaluate adoption post-bootstrap -- Test cross-repo scenarios with executor - -**Long-term**: Adopt for contributor workflow when self-hosting. - ---- - -## 10. Contact & Follow-Up - -**VC Project**: https://github.com/steveyegge/vc -**Current Beads Version**: v0.17.7 -**VC's Bootstrap Status**: Phase 1 (building core executor) - -**Questions for Beads team?** Feel free to ping VC maintainer or open an issue on VC repo for clarification. - -**Test scenarios needed?** VC can provide more detailed test cases for cross-repo scenarios. - ---- - -**Thank you for the thorough design doc!** This is exactly the kind of forward-thinking design discussion that helps downstream consumers prepare for changes. 🙏