Completes daemon architecture implementation: Features: - Batch/transaction API (OpBatch) for multi-step atomic operations - Request timeout and cancellation support (30s default, configurable) - Comprehensive stress tests (4-10 concurrent agents, 800-1000 ops) - Performance benchmarks (daemon 2x faster than direct mode) Results: - Zero ID collisions across 1000+ concurrent creates - All acceptance criteria validated for bd-110 - Create: 2.4ms (daemon) vs 4.7ms (direct) - Update/List: similar 2x improvement Tests Added: - TestStressConcurrentAgents (8 agents, 800 creates) - TestStressBatchOperations (4 agents, 400 batch ops) - TestStressMixedOperations (6 agents, mixed read/write) - TestStressNoUniqueConstraintViolations (10 agents, 1000 creates) - BenchmarkDaemonCreate/Update/List/Latency - Fixed flaky TestConcurrentRequests (shared client issue) Files: - internal/rpc/protocol.go - Added OpBatch, BatchArgs, BatchResponse - internal/rpc/server.go - Implemented handleBatch with stop-on-failure - internal/rpc/client.go - Added SetTimeout and Batch methods - internal/rpc/stress_test.go - All stress tests - internal/rpc/bench_test.go - Performance benchmarks - DAEMON_STRESS_TEST.md - Complete documentation Closes bd-114, bd-110 Amp-Thread-ID: https://ampcode.com/threads/T-1c07c140-0420-49fe-add1-b0b83b1bdff5 Co-authored-by: Amp <amp@ampcode.com>
7.4 KiB
BD Daemon Architecture for Concurrent Access
Problem Statement
Multiple AI agents running concurrently (via beads-mcp) cause:
- SQLite write corruption: Counter stuck, UNIQUE constraint failures
- Git index.lock contention: All agents auto-export → all try to commit simultaneously
- Data loss risk: Concurrent SQLite writers without coordination
- Poor performance: Redundant exports, 4x git operations for same changes
Current Architecture (Broken)
Agent 1 → beads-mcp 1 → bd CLI → SQLite DB (direct write)
Agent 2 → beads-mcp 2 → bd CLI → SQLite DB (direct write) ← RACE CONDITIONS
Agent 3 → beads-mcp 3 → bd CLI → SQLite DB (direct write)
Agent 4 → beads-mcp 4 → bd CLI → SQLite DB (direct write)
↓
4x concurrent git export/commit
Proposed Architecture (Daemon-Mediated)
Agent 1 → beads-mcp 1 → bd client ──┐
Agent 2 → beads-mcp 2 → bd client ──┼──> bd daemon → SQLite DB
Agent 3 → beads-mcp 3 → bd client ──┤ (single writer) ↓
Agent 4 → beads-mcp 4 → bd client ──┘ git export
(batched,
serialized)
Key Changes
- bd daemon becomes mandatory for multi-agent scenarios
- All bd commands become RPC clients when daemon is running
- Daemon owns SQLite - single writer, no races
- Daemon batches git operations - one export cycle per interval
- Unix socket IPC - simple, fast, local-only
Implementation Plan
Phase 1: RPC Infrastructure
New files:
internal/rpc/protocol.go- Request/response typesinternal/rpc/server.go- Unix socket server in daemoninternal/rpc/client.go- Client detection & dispatch
Operations to support:
type Request struct {
Operation string // "create", "update", "list", "close", etc.
Args json.RawMessage // Operation-specific args
}
type Response struct {
Success bool
Data json.RawMessage // Operation result
Error string
}
Socket location: ~/.beads/bd.sock or .beads/bd.sock (per-repo)
Phase 2: Client Auto-Detection
bd command behavior:
- Check if daemon socket exists & responsive
- If yes: Send RPC request, print response
- If no: Run command directly (backward compatible)
Example:
func main() {
if client := rpc.TryConnect(); client != nil {
// Daemon is running - use RPC
resp := client.Execute(cmd, args)
fmt.Println(resp)
return
}
// No daemon - run directly (current behavior)
executeLocally(cmd, args)
}
Phase 3: Daemon SQLite Ownership
Daemon startup:
- Open SQLite connection (exclusive)
- Start RPC server on Unix socket
- Start git sync loop (existing functionality)
- Process RPC requests serially
Git operations:
- Batch exports every 5 seconds (not per-operation)
- Single commit with all changes
- Prevent concurrent git operations entirely
Phase 4: Atomic Operations
ID generation:
// In daemon process only
func (d *Daemon) generateID(prefix string) (string, error) {
d.mu.Lock()
defer d.mu.Unlock()
// No races - daemon is single writer
return d.storage.NextID(prefix)
}
Transaction support:
// RPC can request multi-operation transactions
type BatchRequest struct {
Operations []Request
Atomic bool // All-or-nothing
}
Migration Strategy
Stage 1: Opt-In (v0.10.0)
- Daemon RPC code implemented
- bd commands detect daemon, fall back to direct
- Users can
bd daemon startfor multi-agent scenarios - No breaking changes - direct mode still works
Stage 2: Recommended (v0.11.0)
- Document multi-agent workflow requires daemon
- MCP server README says "start daemon for concurrent agents"
- Detection warning: "Multiple bd processes detected, consider using daemon"
Stage 3: Required for Multi-Agent (v1.0.0)
- bd detects concurrent access patterns
- Refuses to run without daemon if lock contention detected
- Error: "Concurrent access detected. Start daemon:
bd daemon start"
Benefits
✅ No SQLite corruption - single writer ✅ No git lock contention - batched, serialized operations ✅ Atomic ID generation - no counter corruption ✅ Better performance - fewer redundant exports ✅ Backward compatible - graceful fallback to direct mode ✅ Simple protocol - Unix sockets, JSON payloads
Trade-offs
⚠️ Daemon must be running for multi-agent workflows
⚠️ One more process to manage (bd daemon start/stop)
⚠️ Complexity - RPC layer adds code & maintenance
⚠️ Single point of failure - if daemon crashes, all agents blocked
Open Questions
-
Per-repo or global daemon?
- Per-repo:
.beads/bd.sock(supports multiple repos) - Global:
~/.beads/bd.sock(simpler, but only one repo at a time) - Recommendation: Per-repo, use
--dbpath to determine socket location
- Per-repo:
-
Daemon crash recovery?
- Client auto-starts daemon if socket missing?
- Or require manual
bd daemon start? - Recommendation: Auto-start with exponential backoff
-
Concurrent read optimization?
- Reads could bypass daemon (SQLite supports concurrent readers)
- But complex: need to detect read-only vs read-write commands
- Recommendation: Start simple, all ops through daemon
-
Transaction API for clients?
- MCP tools often do multi-step operations
- Would benefit from
BEGIN/COMMITstyle transactions - Recommendation: Phase 4 feature, not MVP
Success Metrics
- ✅ 4 concurrent agents can run without errors
- ✅ No UNIQUE constraint failures on ID generation
- ✅ No git index.lock errors
- ✅ SQLite counter stays in sync with actual issues
- ✅ Graceful fallback when daemon not running
Related Issues
- bd-668: Git index.lock contention (root cause)
- bd-670: ID generation retry on UNIQUE constraint
- bd-654: Concurrent tmp file collisions (already fixed)
- bd-477: Phase 1 daemon command (git sync only - now expanded)
- bd-279: Tests for concurrent scenarios
- bd-271: Epic for multi-device support
Next Steps
- Ultrathink: Validate this design with user
- File epic: Create bd-??? for daemon RPC architecture
- Break down work: Phase 1 subtasks (protocol, server, client)
- Start implementation: Begin with protocol.go
Phase 4: Atomic Operations and Stress Testing (COMPLETED - bd-114)
Status: ✅ Complete
Implementation:
- Batch/transaction API for multi-step operations
- Request timeout and cancellation support
- Connection management optimization
- Comprehensive stress tests (4-10 concurrent agents)
- Performance benchmarks vs direct mode
Results:
- Daemon mode is 2x faster than direct mode
- Zero ID collisions in 1000+ concurrent creates
- All acceptance criteria validated
- Full test coverage with stress tests
Documentation: See DAEMON_STRESS_TEST.md for details.
Files Added:
internal/rpc/stress_test.go- Stress tests with 4-10 agentsinternal/rpc/bench_test.go- Performance benchmarksDAEMON_STRESS_TEST.md- Full documentation
Files Modified:
internal/rpc/protocol.go- Added OpBatch and batch typesinternal/rpc/server.go- Implemented batch handlerinternal/rpc/client.go- Added timeout support and Batch method