# Event-Driven Daemon Architecture

**Status:** Design Proposal
**Author:** AI Assistant
**Date:** 2025-10-28
**Context:** Post-cache removal, per-project daemon model established

## Executive Summary

Replace the current 5-second polling sync loop with an event-driven architecture that reacts instantly to changes. This eliminates stale data issues while reducing CPU usage and improving user experience.

**Key metrics:**
- Latency improvement: 5000ms → <500ms
- CPU reduction: ~60% (no polling)
- Code complexity: +300 LOC (event handling), but cleaner semantics
- User impact: Instant feedback, no stale cache pain

## Problem Statement

### Current Architecture Issues

**Polling-based sync** (`cmd/bd/daemon.go:1010-1120`):
```go
ticker := time.NewTicker(5 * time.Second)
for {
    case <-ticker.C:
        doSync() // Export, pull, import, push
}
```

**Pain points:**
1. **Stale data window**: Changes invisible for up to 5 seconds
2. **CPU waste**: Daemon wakes every 5s even if nothing changed
3. **Unnecessary work**: Sync cycle runs even when no mutations occurred
4. **Cache confusion**: (Now removed) Cache staleness compounded delay

### What Cache Removal Enables

The recent cache removal (Oct 27-28, 964 LOC removed) creates ideal conditions for event-driven architecture:

✅ **One daemon = One database**: No cache eviction, no cross-workspace confusion
✅ **Simpler state**: Daemon state is just `s.storage`, no cache maps
✅ **Clear ownership**: Each daemon owns exactly one JSONL + SQLite pair
✅ **No invalidation complexity**: Events can directly trigger actions

## Proposed Architecture

### High-Level Flow

```
┌─────────────────────────────────────────────────────────┐
│                Event-Driven Daemon                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Event Sources              Event Handler               │
│  ┌──────────────┐          ┌──────────────┐            │
│  │ FS Watcher   │─────────→│              │            │
│  │ (JSONL file) │          │   Debouncer  │            │
│  └──────────────┘          │   (500ms)    │            │
│                             │              │            │
│  ┌──────────────┐          └──────────────┘            │
│  │ RPC Mutation │─────────→       │                    │
│  │   Events     │                 │                    │
│  └──────────────┘                 ↓                    │
│                            ┌──────────────┐            │
│  ┌──────────────┐          │ Sync Action  │            │
│  │ Git Hooks    │─────────→│  - Export    │            │
│  │ (optional)   │          │  - Import    │            │
│  └──────────────┘          └──────────────┘            │
│                                                          │
└─────────────────────────────────────────────────────────┘
```

### Components

#### 1. File System Watcher

**Purpose:** Detect JSONL changes from external sources (git pull, manual edits)

**Implementation:**
```go
// cmd/bd/daemon_watcher.go (new file)
package main

import (
    "context"
    "path/filepath"
    "time"

    "github.com/fsnotify/fsnotify"
)

type FileWatcher struct {
    watcher   *fsnotify.Watcher
    debouncer *Debouncer
    jsonlPath string
}

func NewFileWatcher(jsonlPath string, onChanged func()) (*FileWatcher, error) {
    watcher, err := fsnotify.NewWatcher()
    if err != nil {
        return nil, err
    }

    fw := &FileWatcher{
        watcher:   watcher,
        jsonlPath: jsonlPath,
        debouncer: NewDebouncer(500*time.Millisecond, onChanged),
    }

    // Watch JSONL file
    if err := watcher.Add(jsonlPath); err != nil {
        watcher.Close()
        return nil, err
    }

    // Also watch .git/refs/heads for branch changes
    gitRefsPath := filepath.Join(filepath.Dir(jsonlPath), "..", ".git", "refs", "heads")
    _ = watcher.Add(gitRefsPath) // Best effort

    return fw, nil
}

func (fw *FileWatcher) Start(ctx context.Context, log daemonLogger) {
    go func() {
        for {
            select {
            case event, ok := <-fw.watcher.Events:
                if !ok {
                    return
                }

                // Only care about writes to JSONL or ref changes
                if event.Name == fw.jsonlPath && event.Op&fsnotify.Write != 0 {
                    log.log("File change detected: %s", event.Name)
                    fw.debouncer.Trigger()
                } else if event.Op&fsnotify.Write != 0 {
                    log.log("Git ref change detected: %s", event.Name)
                    fw.debouncer.Trigger()
                }

            case err, ok := <-fw.watcher.Errors:
                if !ok {
                    return
                }
                log.log("Watcher error: %v", err)

            case <-ctx.Done():
                return
            }
        }
    }()
}

func (fw *FileWatcher) Close() error {
    return fw.watcher.Close()
}
```

**Platform support:**
- **Linux**: inotify (built into fsnotify)
- **macOS**: FSEvents (built into fsnotify)
- **Windows**: ReadDirectoryChangesW (built into fsnotify)

**Edge cases handled:**
- File rename (git atomic write via temp file): Watch directory, not just file
- Event storm (rapid git writes): Debouncer batches into single action
- Watcher failure: Fall back to polling with warning

#### 2. Debouncer

**Purpose:** Batch rapid events into single action

**Implementation:**
```go
// cmd/bd/daemon_debouncer.go (new file)
package main

import (
    "sync"
    "time"
)

type Debouncer struct {
    mu       sync.Mutex
    timer    *time.Timer
    duration time.Duration
    action   func()
}

func NewDebouncer(duration time.Duration, action func()) *Debouncer {
    return &Debouncer{
        duration: duration,
        action:   action,
    }
}

func (d *Debouncer) Trigger() {
    d.mu.Lock()
    defer d.mu.Unlock()

    if d.timer != nil {
        d.timer.Stop()
    }

    d.timer = time.AfterFunc(d.duration, func() {
        d.action()
        d.mu.Lock()
        d.timer = nil
        d.mu.Unlock()
    })
}

func (d *Debouncer) Cancel() {
    d.mu.Lock()
    defer d.mu.Unlock()

    if d.timer != nil {
        d.timer.Stop()
        d.timer = nil
    }
}
```

**Tuning:**
- Default: 500ms (balance between responsiveness and batching)
- Configurable via `BEADS_DEBOUNCE_MS` env var
- Could use adaptive timing based on event frequency

#### 3. RPC Mutation Events

**Purpose:** Trigger export immediately after DB changes (not in 5s)

**Implementation:**
```go
// internal/rpc/server.go (modifications)
type Server struct {
    // ... existing fields
    mutationChan chan MutationEvent
}

type MutationEvent struct {
    Type      string    // "create", "update", "delete"
    IssueID   string    // e.g., "bd-42"
    Timestamp time.Time
}

func (s *Server) CreateIssue(req *CreateRequest) (*Issue, error) {
    issue, err := s.storage.CreateIssue(req)
    if err != nil {
        return nil, err
    }

    // Notify mutation channel
    select {
    case s.mutationChan <- MutationEvent{
        Type:      "create",
        IssueID:   issue.ID,
        Timestamp: time.Now(),
    }:
    default:
        // Channel full, event dropped (sync will happen eventually)
    }

    return issue, nil
}

// Similar for UpdateIssue, DeleteIssue, AddComment, etc.
```

**Handler in daemon:**
```go
// cmd/bd/daemon.go (modification)
func handleMutationEvents(ctx context.Context, events <-chan rpc.MutationEvent,
                          debouncer *Debouncer, log daemonLogger) {
    go func() {
        for {
            select {
            case event := <-events:
                log.log("Mutation detected: %s %s", event.Type, event.IssueID)
                debouncer.Trigger() // Schedule export

            case <-ctx.Done():
                return
            }
        }
    }()
}
```

#### 4. Git Hook Integration (Optional)

**Purpose:** Explicit notifications from git operations

**Implementation:**
```bash
# .git/hooks/post-merge (installed by bd init --quiet)
#!/bin/bash
# Notify daemon of merge completion
if command -v bd &> /dev/null; then
    bd daemon-event import-needed &
fi
```

```go
// cmd/bd/daemon_event.go (new file)
package main

// Called by git hooks to notify daemon
func handleDaemonEvent() {
    if len(os.Args) < 3 {
        fmt.Fprintln(os.Stderr, "Usage: bd daemon-event <event-type>")
        os.Exit(1)
    }

    eventType := os.Args[2]
    socketPath := getSocketPath()

    client := rpc.NewClient(socketPath)
    ctx := context.Background()

    switch eventType {
    case "import-needed":
        // Git hook says "JSONL changed, please import"
        if err := client.TriggerImport(ctx); err != nil {
            // Ignore error - daemon might not be running
            os.Exit(0)
        }
    case "export-needed":
        if err := client.TriggerExport(ctx); err != nil {
            os.Exit(0)
        }
    default:
        fmt.Fprintf(os.Stderr, "Unknown event type: %s\n", eventType)
        os.Exit(1)
    }
}
```

**Note:** Git hooks are **optional enhancement**, not required. File watcher is primary mechanism.

### Complete Daemon Loop

**Current implementation** (`cmd/bd/daemon.go:1123-1161`):
```go
func runEventLoop(ctx context.Context, cancel context.CancelFunc, ticker *time.Ticker,
                  doSync func(), server *rpc.Server, serverErrChan chan error,
                  log daemonLogger) {
    for {
        select {
        case <-ticker.C:  // ← Every 5 seconds
            doSync()
        case sig := <-sigChan:
            // ... shutdown
        }
    }
}
```

**Proposed implementation:**
```go
// cmd/bd/daemon_event_loop.go (new file)
func runEventDrivenLoop(ctx context.Context, cancel context.CancelFunc,
                        server *rpc.Server, serverErrChan chan error,
                        watcher *FileWatcher, mutationChan <-chan rpc.MutationEvent,
                        log daemonLogger) {

    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, daemonSignals...)
    defer signal.Stop(sigChan)

    // Debounced sync actions
    exportDebouncer := NewDebouncer(500*time.Millisecond, func() {
        log.log("Export triggered by mutation events")
        exportToJSONL()
    })

    importDebouncer := NewDebouncer(500*time.Millisecond, func() {
        log.log("Import triggered by file change")
        autoImportIfNewer()
    })

    // Start file watcher (triggers import)
    watcher.Start(ctx, log)

    // Start mutation handler (triggers export)
    handleMutationEvents(ctx, mutationChan, exportDebouncer, log)

    // Optional: Periodic health check (every 60s, not sync)
    healthTicker := time.NewTicker(60 * time.Second)
    defer healthTicker.Stop()

    for {
        select {
        case <-healthTicker.C:
            // Periodic health check (validate DB, check disk space, etc.)
            checkDaemonHealth(ctx, store, log)

        case sig := <-sigChan:
            if isReloadSignal(sig) {
                log.log("Received reload signal, ignoring")
                continue
            }
            log.log("Received signal %v, shutting down...", sig)
            cancel()
            if err := server.Stop(); err != nil {
                log.log("Error stopping server: %v", err)
            }
            return

        case <-ctx.Done():
            log.log("Context canceled, shutting down")
            watcher.Close()
            if err := server.Stop(); err != nil {
                log.log("Error stopping server: %v", err)
            }
            return

        case err := <-serverErrChan:
            log.log("RPC server failed: %v", err)
            cancel()
            watcher.Close()
            return
        }
    }
}
```

## Migration Strategy

### Phase 1: Parallel Implementation (2-3 weeks)

**Goal:** Event-driven as opt-in alongside polling

**Changes:**
1. Add `fsnotify` dependency to `go.mod`
2. Create new files:
   - `cmd/bd/daemon_watcher.go` (~150 LOC)
   - `cmd/bd/daemon_debouncer.go` (~60 LOC)
   - `cmd/bd/daemon_event_loop.go` (~200 LOC)
3. Add flag `BEADS_DAEMON_MODE=events` to enable
4. Keep existing `runEventLoop` as fallback

**Testing:**
- Unit tests for debouncer
- Integration tests for file watcher
- Stress test with event storm (rapid git operations)
- Test on Linux, macOS, Windows

**Rollout:**
- Default: `BEADS_DAEMON_MODE=poll` (current behavior)
- Opt-in: `BEADS_DAEMON_MODE=events` (new behavior)
- Documentation: Add to AGENTS.md

### Phase 2: Battle Testing (4-6 weeks)

**Goal:** Real-world validation with dogfooding

**Metrics to track:**
- CPU usage (before/after comparison)
- Latency (time from mutation to JSONL update)
- Memory usage (fsnotify overhead)
- Event storm handling (git pull with 100+ file changes)
- Edge case frequency (watcher failures, debounce races)

**Success criteria:**
- CPU usage <40% of polling mode
- Latency <500ms (vs 5000ms in polling)
- Zero data loss or corruption
- Zero daemon crashes from event handling

**Issue tracking:**
- Create `bd-XXX: Event-driven daemon stabilization` issue
- Track bugs as sub-issues
- Weekly review of metrics

### Phase 3: Default Switchover (1 week)

**Goal:** Make event-driven the default

**Changes:**
1. Flip default: `BEADS_DAEMON_MODE=events`
2. Keep polling as fallback: `BEADS_DAEMON_MODE=poll`
3. Update documentation
4. Add release note

**Communication:**
- Blog post: "Beads daemon now event-driven"
- Changelog entry with before/after metrics
- Migration guide for users who hit issues

### Phase 4: Deprecation (6+ months later)

**Goal:** Remove polling mode entirely

**Changes:**
1. Remove `runEventLoop` with ticker
2. Remove `BEADS_DAEMON_MODE` flag
3. Simplify daemon startup code

**Only if:**
- Event-driven stable for 6+ months
- No unresolved critical issues
- Community feedback positive

## Performance Analysis

### CPU Usage

**Current (polling):**
```
Every 5 seconds:
- Wake daemon
- Check git status
- Check JSONL hash
- Check dirty flags
- Sleep

Estimated: ~5-10% CPU (depends on repo size)
```

**Event-driven:**
```
Daemon sleeps until:
- File system event (rare)
- RPC mutation (user-triggered)
- Signal

Estimated: ~1-2% CPU (mostly idle)
```

**Savings:** ~60-80% CPU reduction

### Latency

**Current (polling):**
```
User runs: bd create "Fix bug"
→ RPC call → DB write → (wait up to 5s) → Export → Git commit
Average: 2.5s delay
Worst: 5s delay
```

**Event-driven:**
```
User runs: bd create "Fix bug"
→ RPC call → DB write → Mutation event → (500ms debounce) → Export → Git commit
Average: 250ms delay
Worst: 500ms delay
```

**Improvement:** 5-10x faster

### Memory Usage

**fsnotify overhead:**
- Linux (inotify): ~1-2 MB per watched directory
- macOS (FSEvents): ~500 KB per watched directory
- Windows: ~1 MB per watched directory

**With 1 JSONL + 1 git refs directory = ~2-4 MB**

**Negligible compared to SQLite cache (10-50 MB for typical database)**

## Edge Cases & Error Handling

### 1. File Watcher Failure

**Scenario:** `inotify` limit exceeded (Linux), permissions issue, or filesystem doesn't support watching

**Detection:**
```go
watcher, err := fsnotify.NewWatcher()
if err != nil {
    log.log("WARNING: File watcher unavailable (%v), falling back to polling", err)
    useFallbackPolling = true
}
```

**Fallback:** Automatic switch to 5s polling with warning

### 2. Event Storm

**Scenario:** Git pull modifies JSONL 50 times in rapid succession

**Mitigation:** Debouncer batches into single action after 500ms quiet period

**Stress test:**
```bash
# Simulate event storm
for i in {1..100}; do
    echo '{"id":"bd-'$i'"}' >> beads.jsonl
done
# Should trigger exactly 1 import after 500ms
```

### 3. Watcher Detached from File

**Scenario:** JSONL replaced by `git checkout` (different inode)

**Detection:** fsnotify sends `RENAME` or `REMOVE` event

**Recovery:**
```go
case event.Op&fsnotify.Remove != 0:
    log.log("JSONL removed, re-establishing watch")
    watcher.Remove(jsonlPath)
    time.Sleep(100 * time.Millisecond)
    watcher.Add(jsonlPath)
```

### 4. Debounce Race Condition

**Scenario:** Event A triggers debounce, event B arrives during wait, action fires for A before B seen

**Solution:** Debouncer restarts timer on each trigger (standard debounce behavior)

**Test:**
```go
func TestDebouncerBatchesMultipleEvents(t *testing.T) {
    callCount := 0
    d := NewDebouncer(100*time.Millisecond, func() { callCount++ })

    d.Trigger() // t=0ms
    time.Sleep(50 * time.Millisecond)
    d.Trigger() // t=50ms (resets timer)
    time.Sleep(50 * time.Millisecond)
    d.Trigger() // t=100ms (resets timer)

    time.Sleep(150 * time.Millisecond) // t=250ms (timer fires)

    assert.Equal(t, 1, callCount) // Only 1 action despite 3 triggers
}
```

### 5. Daemon Restart During Debounce

**Scenario:** Daemon receives SIGTERM while debouncer waiting

**Solution:** Cancel debouncer on shutdown

```go
func (d *Debouncer) Cancel() {
    d.mu.Lock()
    defer d.mu.Unlock()
    if d.timer != nil {
        d.timer.Stop()
    }
}

// In shutdown handler
defer exportDebouncer.Cancel()
defer importDebouncer.Cancel()
```

## Configuration

### Environment Variables

```bash
# Enable event-driven mode (default: events after Phase 3)
BEADS_DAEMON_MODE=events

# Debounce duration in milliseconds (default: 500)
BEADS_DEBOUNCE_MS=500

# Fall back to polling if watcher fails (default: true)
BEADS_WATCHER_FALLBACK=true

# Polling interval if fallback used (default: 5s)
BEADS_POLL_INTERVAL=5s
```

### Daemon Status

**New command:** `bd daemon status --verbose`

```bash
$ bd daemon status --verbose
Daemon running: yes
PID: 12345
Mode: event-driven
Uptime: 3h 42m
Last sync: 2s ago

Event statistics:
  File changes: 23
  Mutations: 156
  Exports: 12 (batched from 156 mutations)
  Imports: 4 (batched from 23 file changes)

Watcher status: active
  Watching: /Users/steve/beads/.beads/beads.jsonl
  Events received: 23
  Errors: 0
```

## What This Doesn't Solve

Event-driven architecture improves **responsiveness** but doesn't eliminate **repair cycles** caused by:

1. **Git merge conflicts** - Still need manual/AI resolution
2. **Semantic duplication** - Still need deduplication logic
3. **Test pollution** - Still need better isolation (separate issue)
4. **Worktree confusion** - Still need per-worktree branch tracking (separate design)

**These require separate solutions** (see repair commands design doc)

## Success Metrics

### Must-Have (P0)
- ✅ Zero data loss or corruption
- ✅ Zero regressions in sync reliability
- ✅ Works on Linux, macOS, Windows

### Should-Have (P1)
- ✅ Latency <500ms (vs 5000ms today)
- ✅ CPU usage <40% of polling mode
- ✅ Graceful fallback to polling if watcher fails

### Nice-to-Have (P2)
- ✅ Configurable debounce timing
- ✅ Detailed event statistics in `bd daemon status`
- ✅ Real-time dashboard of events (debug mode)

## Implementation Checklist

### Code Changes
- [ ] Add `fsnotify` to `go.mod`
- [ ] Create `cmd/bd/daemon_watcher.go`
- [ ] Create `cmd/bd/daemon_debouncer.go`
- [ ] Create `cmd/bd/daemon_event_loop.go`
- [ ] Modify `internal/rpc/server.go` (add mutation channel)
- [ ] Add `BEADS_DAEMON_MODE` flag handling
- [ ] Add fallback to polling on watcher failure

### Tests
- [ ] Unit tests for Debouncer
- [ ] Unit tests for FileWatcher
- [ ] Integration test: mutation → export latency
- [ ] Integration test: file change → import latency
- [ ] Stress test: event storm (100+ rapid changes)
- [ ] Platform tests: Linux, macOS, Windows
- [ ] Edge case test: watcher failure recovery
- [ ] Edge case test: file inode change (git checkout)

### Documentation
- [ ] Update AGENTS.md (event-driven mode)
- [ ] Add `docs/architecture/event_driven.md` (this doc)
- [ ] Update `bd daemon --help` (add --mode flag)
- [ ] Add troubleshooting guide (watcher failures)
- [ ] Write migration guide (for users hitting issues)

### Rollout
- [ ] Phase 1: Parallel implementation (opt-in)
- [ ] Phase 2: Dogfooding (beads repo itself)
- [ ] Phase 3: Default switchover
- [ ] Phase 4: Announce in release notes

## Open Questions

1. **Should git hooks be required or optional?**
   - Recommendation: Optional (file watcher is sufficient)

2. **What debounce duration is optimal?**
   - Recommendation: 500ms default, configurable
   - Could use adaptive timing based on event frequency

3. **Should we track event statistics permanently?**
   - Recommendation: In-memory only (reset on daemon restart)
   - Could add `bd daemon stats --export` for debugging

4. **What happens if fsnotify doesn't support filesystem?**
   - Recommendation: Automatic fallback to polling with warning

5. **Should mutation events be buffered or dropped if channel full?**
   - Recommendation: Buffered (size 100), then drop oldest
   - Worst case: Export delayed by 500ms, but no data loss

## Conclusion

Event-driven architecture is a **natural evolution** after cache removal:
- ✅ Eliminates stale data issues
- ✅ Reduces CPU usage significantly
- ✅ Improves user experience with instant feedback
- ✅ Builds on simplified per-project daemon model

**Recommended:** Proceed with Phase 1 implementation, targeting 2-3 week timeline for opt-in release.