Files
beads/DAEMON_DESIGN.md
Steve Yegge 872f203c57 Add RPC infrastructure and updated database
- RPC Phase 1: Protocol, server, client implementation
- Updated renumber.go with proper text reference updates (3-phase approach)
- Clean database exported: 344 issues (bd-1 to bd-344)
- Added DAEMON_DESIGN.md documentation
- Updated go.mod/go.sum for RPC dependencies

Amp-Thread-ID: https://ampcode.com/threads/T-456af77c-8b7f-4004-9027-c37b95e10ea5
Co-authored-by: Amp <amp@ampcode.com>
2025-10-16 20:36:23 -07:00

203 lines
6.4 KiB
Markdown

# BD Daemon Architecture for Concurrent Access
## Problem Statement
Multiple AI agents running concurrently (via beads-mcp) cause:
- **SQLite write corruption**: Counter stuck, UNIQUE constraint failures
- **Git index.lock contention**: All agents auto-export → all try to commit simultaneously
- **Data loss risk**: Concurrent SQLite writers without coordination
- **Poor performance**: Redundant exports, 4x git operations for same changes
## Current Architecture (Broken)
```
Agent 1 → beads-mcp 1 → bd CLI → SQLite DB (direct write)
Agent 2 → beads-mcp 2 → bd CLI → SQLite DB (direct write) ← RACE CONDITIONS
Agent 3 → beads-mcp 3 → bd CLI → SQLite DB (direct write)
Agent 4 → beads-mcp 4 → bd CLI → SQLite DB (direct write)
4x concurrent git export/commit
```
## Proposed Architecture (Daemon-Mediated)
```
Agent 1 → beads-mcp 1 → bd client ──┐
Agent 2 → beads-mcp 2 → bd client ──┼──> bd daemon → SQLite DB
Agent 3 → beads-mcp 3 → bd client ──┤ (single writer) ↓
Agent 4 → beads-mcp 4 → bd client ──┘ git export
(batched,
serialized)
```
### Key Changes
1. **bd daemon becomes mandatory** for multi-agent scenarios
2. **All bd commands become RPC clients** when daemon is running
3. **Daemon owns SQLite** - single writer, no races
4. **Daemon batches git operations** - one export cycle per interval
5. **Unix socket IPC** - simple, fast, local-only
## Implementation Plan
### Phase 1: RPC Infrastructure
**New files:**
- `internal/rpc/protocol.go` - Request/response types
- `internal/rpc/server.go` - Unix socket server in daemon
- `internal/rpc/client.go` - Client detection & dispatch
**Operations to support:**
```go
type Request struct {
Operation string // "create", "update", "list", "close", etc.
Args json.RawMessage // Operation-specific args
}
type Response struct {
Success bool
Data json.RawMessage // Operation result
Error string
}
```
**Socket location:** `~/.beads/bd.sock` or `.beads/bd.sock` (per-repo)
### Phase 2: Client Auto-Detection
**bd command behavior:**
1. Check if daemon socket exists & responsive
2. If yes: Send RPC request, print response
3. If no: Run command directly (backward compatible)
**Example:**
```go
func main() {
if client := rpc.TryConnect(); client != nil {
// Daemon is running - use RPC
resp := client.Execute(cmd, args)
fmt.Println(resp)
return
}
// No daemon - run directly (current behavior)
executeLocally(cmd, args)
}
```
### Phase 3: Daemon SQLite Ownership
**Daemon startup:**
1. Open SQLite connection (exclusive)
2. Start RPC server on Unix socket
3. Start git sync loop (existing functionality)
4. Process RPC requests serially
**Git operations:**
- Batch exports every 5 seconds (not per-operation)
- Single commit with all changes
- Prevent concurrent git operations entirely
### Phase 4: Atomic Operations
**ID generation:**
```go
// In daemon process only
func (d *Daemon) generateID(prefix string) (string, error) {
d.mu.Lock()
defer d.mu.Unlock()
// No races - daemon is single writer
return d.storage.NextID(prefix)
}
```
**Transaction support:**
```go
// RPC can request multi-operation transactions
type BatchRequest struct {
Operations []Request
Atomic bool // All-or-nothing
}
```
## Migration Strategy
### Stage 1: Opt-In (v0.10.0)
- Daemon RPC code implemented
- bd commands detect daemon, fall back to direct
- Users can `bd daemon start` for multi-agent scenarios
- **No breaking changes** - direct mode still works
### Stage 2: Recommended (v0.11.0)
- Document multi-agent workflow requires daemon
- MCP server README says "start daemon for concurrent agents"
- Detection warning: "Multiple bd processes detected, consider using daemon"
### Stage 3: Required for Multi-Agent (v1.0.0)
- bd detects concurrent access patterns
- Refuses to run without daemon if lock contention detected
- Error: "Concurrent access detected. Start daemon: `bd daemon start`"
## Benefits
**No SQLite corruption** - single writer
**No git lock contention** - batched, serialized operations
**Atomic ID generation** - no counter corruption
**Better performance** - fewer redundant exports
**Backward compatible** - graceful fallback to direct mode
**Simple protocol** - Unix sockets, JSON payloads
## Trade-offs
⚠️ **Daemon must be running** for multi-agent workflows
⚠️ **One more process** to manage (`bd daemon start/stop`)
⚠️ **Complexity** - RPC layer adds code & maintenance
⚠️ **Single point of failure** - if daemon crashes, all agents blocked
## Open Questions
1. **Per-repo or global daemon?**
- Per-repo: `.beads/bd.sock` (supports multiple repos)
- Global: `~/.beads/bd.sock` (simpler, but only one repo at a time)
- **Recommendation:** Per-repo, use `--db` path to determine socket location
2. **Daemon crash recovery?**
- Client auto-starts daemon if socket missing?
- Or require manual `bd daemon start`?
- **Recommendation:** Auto-start with exponential backoff
3. **Concurrent read optimization?**
- Reads could bypass daemon (SQLite supports concurrent readers)
- But complex: need to detect read-only vs read-write commands
- **Recommendation:** Start simple, all ops through daemon
4. **Transaction API for clients?**
- MCP tools often do multi-step operations
- Would benefit from `BEGIN/COMMIT` style transactions
- **Recommendation:** Phase 4 feature, not MVP
## Success Metrics
- ✅ 4 concurrent agents can run without errors
- ✅ No UNIQUE constraint failures on ID generation
- ✅ No git index.lock errors
- ✅ SQLite counter stays in sync with actual issues
- ✅ Graceful fallback when daemon not running
## Related Issues
- bd-668: Git index.lock contention (root cause)
- bd-670: ID generation retry on UNIQUE constraint
- bd-654: Concurrent tmp file collisions (already fixed)
- bd-477: Phase 1 daemon command (git sync only - now expanded)
- bd-279: Tests for concurrent scenarios
- bd-271: Epic for multi-device support
## Next Steps
1. **Ultrathink**: Validate this design with user
2. **File epic**: Create bd-??? for daemon RPC architecture
3. **Break down work**: Phase 1 subtasks (protocol, server, client)
4. **Start implementation**: Begin with protocol.go