This commit is contained in:
Steve Yegge
2025-11-24 01:25:22 -08:00
10 changed files with 524 additions and 79 deletions

File diff suppressed because one or more lines are too long

View File

@@ -114,6 +114,10 @@ func outputPrimeContext(mcpMode bool) error {
func outputMCPContext() error {
context := `# Beads Issue Tracker Active
# 🚨 SESSION CLOSE PROTOCOL 🚨
Before saying "done": git status → git add → bd sync → git commit → bd sync → git push
## Core Rules
- Track ALL work in beads (no TodoWrite tool, no markdown TODOs)
- Use bd MCP tools (mcp__plugin_beads_beads__*), not TodoWrite or markdown
@@ -131,6 +135,21 @@ func outputCLIContext() error {
> **Context Recovery**: Run ` + "`bd prime`" + ` after compaction, clear, or new session
> Hooks auto-call this in Claude Code when .beads/ detected
# 🚨 SESSION CLOSE PROTOCOL 🚨
**CRITICAL**: Before saying "done" or "complete", you MUST run this checklist:
` + "```" + `
[ ] 1. git status (check what changed)
[ ] 2. git add <files> (stage code changes)
[ ] 3. bd sync (commit beads changes)
[ ] 4. git commit -m "..." (commit code)
[ ] 5. bd sync (commit any new beads changes)
[ ] 6. git push (push to remote)
` + "```" + `
**NEVER skip this.** Work is not done until pushed.
## Core Rules
- Track ALL work in beads (no TodoWrite tool, no markdown TODOs)
- Use ` + "`bd create`" + ` to create issues, not TodoWrite tool

View File

@@ -58,9 +58,16 @@ This is more explicit than 'bd update --status open' and emits a Reopened event.
fmt.Fprintf(os.Stderr, "Error reopening %s: %v\n", id, err)
continue
}
// TODO(bd-r46): Add reason as a comment once RPC supports AddComment
// Add reason as a comment if provided
if reason != "" {
fmt.Fprintf(os.Stderr, "Warning: reason not supported in daemon mode yet\n")
commentArgs := &rpc.CommentAddArgs{
ID: id,
Author: actor,
Text: reason,
}
if _, err := daemonClient.AddComment(commentArgs); err != nil {
fmt.Fprintf(os.Stderr, "Warning: failed to add comment to %s: %v\n", id, err)
}
}
if jsonOutput {
var issue types.Issue

View File

@@ -192,6 +192,150 @@ Hash-based comparison (not mtime) prevents git pull false positives (issue bd-84
- **Memory overhead:** One goroutine + minimal channel buffers per command execution
- **Flush latency:** Debounce duration + JSONL write time (typically <100ms for incremental)
## Blocked Issues Cache (bd-5qim)
### Problem Statement
The `bd ready` command originally computed blocked issues using a recursive CTE on every query. On a 10K issue database, each query took ~752ms, making the command feel sluggish and impractical for large projects.
### Solution: Materialized Cache Table
The `blocked_issues_cache` table materializes the blocking computation, storing issue IDs for all currently blocked issues. Queries now use a simple `NOT EXISTS` check against this cache, completing in ~29ms (25x speedup).
### Architecture
```
┌─────────────────────────────────────────────────────────┐
│ GetReadyWork Query │
│ │
│ SELECT ... FROM issues WHERE status IN (...) │
│ AND NOT EXISTS ( │
│ SELECT 1 FROM blocked_issues_cache │
│ WHERE issue_id = issues.id │
│ ) │
│ │
│ Performance: 29ms (was 752ms with recursive CTE) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Cache Invalidation Triggers │
│ │
│ 1. AddDependency (blocks/parent-child only) │
│ 2. RemoveDependency (blocks/parent-child only) │
│ 3. UpdateIssue (on any status change) │
│ 4. CloseIssue (changes status to closed) │
│ │
│ NOT triggered by: related, discovered-from deps │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Cache Rebuild Process │
│ │
│ 1. DELETE FROM blocked_issues_cache │
│ 2. INSERT INTO blocked_issues_cache │
│ WITH RECURSIVE CTE: │
│ - Find directly blocked issues (blocks deps) │
│ - Propagate to children (parent-child deps) │
│ 3. Happens in same transaction as triggering change │
│ │
│ Performance: <50ms full rebuild on 10K database │
└─────────────────────────────────────────────────────────┘
```
### Blocking Semantics
An issue is blocked if:
1. **Direct blocking**: Has a `blocks` dependency on an open/in_progress/blocked issue
2. **Transitive blocking**: Parent is blocked and issue is connected via `parent-child` dependency
Closed issues never block others. Related and discovered-from dependencies don't affect blocking.
### Cache Invalidation Strategy
**Full rebuild on every change**
Instead of incremental updates, the cache is completely rebuilt (DELETE + INSERT) on any triggering change. This approach is chosen because:
- Rebuild is fast (<50ms even on 10K issues) due to optimized CTE
- Simpler implementation with no risk of partial/stale updates
- Dependency changes are rare compared to reads
- Guarantees consistency - cache matches database state exactly
**Transaction safety**
All cache operations happen within the same transaction as the triggering change:
- Uses transaction if provided, otherwise direct db connection
- Cache can never be in an inconsistent state visible to queries
- Foreign key CASCADE ensures cache entries deleted when issues are deleted
**Selective invalidation**
Only `blocks` and `parent-child` dependencies trigger rebuilds since they affect blocking semantics. Related and discovered-from dependencies don't trigger invalidation, avoiding unnecessary work.
### Performance Characteristics
**Query performance (GetReadyWork):**
- Before cache: ~752ms (recursive CTE)
- With cache: ~29ms (NOT EXISTS)
- Speedup: 25x
**Write overhead:**
- Cache rebuild: <50ms
- Only triggered on dependency/status changes (rare operations)
- Trade-off: slower writes for much faster reads
### Edge Cases
1. **Parent-child transitive blocking**
- Children of blocked parents are automatically marked as blocked
- Propagates through arbitrary depth hierarchies (limited to depth 50 for safety)
2. **Multiple blockers**
- Issue blocked by multiple open issues stays blocked until all are closed
- DISTINCT in CTE ensures issue appears once in cache
3. **Status changes**
- Closing a blocker removes all blocked descendants from cache
- Reopening a blocker adds them back
4. **Dependency removal**
- Removing last blocker unblocks the issue
- Removing parent-child link unblocks orphaned subtree
5. **Foreign key cascades**
- Cache entries automatically deleted when issue is deleted
- No manual cleanup needed
### Testing
Comprehensive test coverage in `blocked_cache_test.go`:
- Cache invalidation on dependency add/remove
- Cache updates on status changes
- Multiple blockers
- Deep hierarchies
- Transitive blocking via parent-child
- Related dependencies (should NOT affect cache)
Run tests: `go test -v ./internal/storage/sqlite -run TestCache`
### Implementation Files
- `internal/storage/sqlite/blocked_cache.go` - Cache rebuild and invalidation
- `internal/storage/sqlite/ready.go` - Uses cache in GetReadyWork queries
- `internal/storage/sqlite/dependencies.go` - Invalidates on dep changes
- `internal/storage/sqlite/queries.go` - Invalidates on status changes
- `internal/storage/sqlite/migrations/015_blocked_issues_cache.go` - Schema and initial population
### Future Optimizations
If rebuild becomes a bottleneck in very large databases (>100K issues):
- Consider incremental updates for specific dependency types
- Add indexes to dependencies table for CTE performance
- Implement dirty tracking to avoid rebuilds when cache is unchanged
However, current performance is excellent for realistic workloads.
## Future Improvements
Potential enhancements for multi-agent scenarios:

View File

@@ -144,7 +144,8 @@ func (s *SQLiteStorage) CreateIssuesWithOptions(ctx context.Context, issues []*t
}
defer func() { _ = conn.Close() }()
if _, err := conn.ExecContext(ctx, "BEGIN IMMEDIATE"); err != nil {
// Use retry logic with exponential backoff to handle SQLITE_BUSY under concurrent load (bd-ola6)
if err := beginImmediateWithRetry(ctx, conn, 5, 10*time.Millisecond); err != nil {
return fmt.Errorf("failed to begin immediate transaction: %w", err)
}

View File

@@ -1,3 +1,89 @@
// Package sqlite provides the blocked_issues_cache optimization for GetReadyWork performance.
//
// # Performance Impact
//
// GetReadyWork originally used a recursive CTE to compute blocked issues on every query,
// taking ~752ms on a 10K issue database. With the cache, queries complete in ~29ms
// (25x speedup) by using a simple NOT EXISTS check against the materialized cache table.
//
// # Cache Architecture
//
// The blocked_issues_cache table stores issue_id values for all issues that are currently
// blocked. An issue is blocked if:
// - It has a 'blocks' dependency on an open/in_progress/blocked issue (direct blocking)
// - Its parent is blocked and it's connected via 'parent-child' dependency (transitive blocking)
//
// The cache is maintained automatically by invalidating and rebuilding whenever:
// - A 'blocks' or 'parent-child' dependency is added or removed
// - Any issue's status changes (affects whether it blocks others)
// - An issue is closed (closed issues don't block others)
//
// Related and discovered-from dependencies do NOT trigger cache invalidation since they
// don't affect blocking semantics.
//
// # Cache Invalidation Strategy
//
// On any triggering change, the entire cache is rebuilt from scratch (DELETE + INSERT).
// This full-rebuild approach is chosen because:
// - Rebuild is fast (<50ms even on 10K databases) due to optimized CTE logic
// - Simpler implementation than incremental updates
// - Dependency changes are rare compared to reads
// - Guarantees consistency - no risk of partial/stale updates
//
// The rebuild happens within the same transaction as the triggering change, ensuring
// atomicity and consistency. The cache can never be in an inconsistent state visible
// to queries.
//
// # Transaction Safety
//
// All cache operations support both transaction and direct database execution:
// - rebuildBlockedCache accepts optional *sql.Tx parameter
// - If tx != nil, uses transaction; otherwise uses direct db connection
// - Cache invalidation during CreateIssue/UpdateIssue/AddDependency happens in their tx
// - Ensures cache is always consistent with the database state
//
// # Performance Characteristics
//
// Query performance (GetReadyWork):
// - Before cache: ~752ms (recursive CTE on 10K issues)
// - With cache: ~29ms (NOT EXISTS check)
// - Speedup: 25x
//
// Write overhead:
// - Cache rebuild: <50ms (full DELETE + INSERT)
// - Only triggered on dependency/status changes (rare operations)
// - Trade-off: slower writes for much faster reads
//
// # Edge Cases Handled
//
// 1. Parent-child transitive blocking:
// - Children of blocked parents are automatically marked as blocked
// - Propagates through arbitrary depth hierarchies (limited to depth 50)
//
// 2. Multiple blockers:
// - Issue blocked by multiple open issues stays blocked until all are closed
// - DISTINCT in CTE ensures issue appears once in cache
//
// 3. Status changes:
// - Closing a blocker removes all blocked descendants from cache
// - Reopening a blocker adds them back
//
// 4. Dependency removal:
// - Removing last blocker unblocks the issue
// - Removing parent-child link unblocks orphaned subtree
//
// 5. Foreign key cascades:
// - Cache entries automatically deleted when issue is deleted (ON DELETE CASCADE)
// - No manual cleanup needed
//
// # Future Optimizations
//
// If rebuild becomes a bottleneck in very large databases (>100K issues):
// - Consider incremental updates for specific dependency types
// - Add indexes to dependencies table for CTE performance
// - Implement dirty tracking to avoid rebuilds when cache is unchanged
//
// However, current performance is excellent for realistic workloads.
package sqlite
import (

View File

@@ -58,7 +58,9 @@ func (s *SQLiteStorage) CreateIssue(ctx context.Context, issue *types.Issue, act
//
// We use raw Exec instead of BeginTx because database/sql doesn't support transaction
// modes in BeginTx, and modernc.org/sqlite's BeginTx always uses DEFERRED mode.
if _, err := conn.ExecContext(ctx, "BEGIN IMMEDIATE"); err != nil {
//
// Use retry logic with exponential backoff to handle SQLITE_BUSY under concurrent load (bd-ola6)
if err := beginImmediateWithRetry(ctx, conn, 5, 10*time.Millisecond); err != nil {
return fmt.Errorf("failed to begin immediate transaction: %w", err)
}

View File

@@ -82,7 +82,16 @@ func (s *SQLiteStorage) GetReadyWork(ctx context.Context, filter types.WorkFilte
orderBySQL := buildOrderByClause(sortPolicy)
// Use blocked_issues_cache for performance (bd-5qim)
// Cache is maintained by invalidateBlockedCache() called on dependency/status changes
// This optimization replaces the recursive CTE that computed blocked issues on every query.
// Performance improvement: 752ms → 29ms on 10K issues (25x speedup).
//
// The cache is automatically maintained by invalidateBlockedCache() which is called:
// - When adding/removing 'blocks' or 'parent-child' dependencies
// - When any issue status changes
// - When closing any issue
//
// Cache rebuild is fast (<50ms) and happens within the same transaction as the
// triggering change, ensuring consistency. See blocked_cache.go for full details.
// #nosec G201 - safe SQL with controlled formatting
query := fmt.Sprintf(`
SELECT i.id, i.content_hash, i.title, i.description, i.design, i.acceptance_criteria, i.notes,

View File

@@ -4,6 +4,7 @@ import (
"context"
"database/sql"
"strings"
"time"
)
// QueryContext exposes the underlying database QueryContext method for advanced queries
@@ -62,4 +63,73 @@ func IsForeignKeyConstraintError(err error) bool {
strings.Contains(errStr, "foreign key constraint failed")
}
// IsBusyError checks if an error is a database busy/locked error
func IsBusyError(err error) bool {
if err == nil {
return false
}
errStr := err.Error()
return strings.Contains(errStr, "database is locked") ||
strings.Contains(errStr, "SQLITE_BUSY")
}
// beginImmediateWithRetry starts an IMMEDIATE transaction with exponential backoff retry
// on SQLITE_BUSY errors. This addresses bd-ola6: under concurrent write load, BEGIN IMMEDIATE
// can fail with SQLITE_BUSY, so we retry with exponential backoff instead of failing immediately.
//
// Parameters:
// - ctx: context for cancellation checking
// - conn: dedicated database connection (must use same connection for entire transaction)
// - maxRetries: maximum number of retry attempts (default: 5)
// - initialDelay: initial backoff delay (default: 10ms)
//
// Returns error if:
// - Context is cancelled
// - BEGIN IMMEDIATE fails with non-busy error
// - All retries exhausted with SQLITE_BUSY
func beginImmediateWithRetry(ctx context.Context, conn *sql.Conn, maxRetries int, initialDelay time.Duration) error {
if maxRetries <= 0 {
maxRetries = 5
}
if initialDelay <= 0 {
initialDelay = 10 * time.Millisecond
}
var lastErr error
delay := initialDelay
for attempt := 0; attempt <= maxRetries; attempt++ {
// Check context cancellation before each attempt
if err := ctx.Err(); err != nil {
return err
}
// Attempt to begin transaction
_, err := conn.ExecContext(ctx, "BEGIN IMMEDIATE")
if err == nil {
return nil // Success
}
lastErr = err
// If not a busy error, fail immediately
if !IsBusyError(err) {
return err
}
// On last attempt, don't sleep
if attempt == maxRetries {
break
}
// Exponential backoff: sleep before retry
select {
case <-time.After(delay):
delay *= 2 // Double the delay for next attempt
case <-ctx.Done():
return ctx.Err()
}
}
return lastErr // Return the last SQLITE_BUSY error after exhausting retries
}

View File

@@ -208,3 +208,110 @@ func TestQueryContext(t *testing.T) {
t.Error("Expected only one row")
}
}
func TestIsBusyError(t *testing.T) {
tests := []struct {
name string
err error
expected bool
}{
{
name: "nil error",
err: nil,
expected: false,
},
{
name: "database is locked",
err: errors.New("database is locked"),
expected: true,
},
{
name: "SQLITE_BUSY",
err: errors.New("SQLITE_BUSY"),
expected: true,
},
{
name: "SQLITE_BUSY with context",
err: errors.New("failed to begin: SQLITE_BUSY: database is locked"),
expected: true,
},
{
name: "other error",
err: errors.New("some other database error"),
expected: false,
},
{
name: "UNIQUE constraint error",
err: errors.New("UNIQUE constraint failed: issues.id"),
expected: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := IsBusyError(tt.err)
if result != tt.expected {
t.Errorf("IsBusyError(%v) = %v, want %v", tt.err, result, tt.expected)
}
})
}
}
func TestBeginImmediateWithRetry(t *testing.T) {
ctx := context.Background()
store := newTestStore(t, t.TempDir()+"/test.db")
defer store.Close()
t.Run("successful on first try", func(t *testing.T) {
conn, err := store.db.Conn(ctx)
if err != nil {
t.Fatalf("Failed to acquire connection: %v", err)
}
defer conn.Close()
err = beginImmediateWithRetry(ctx, conn, 5, 10)
if err != nil {
t.Errorf("beginImmediateWithRetry failed: %v", err)
}
// Rollback to clean up
_, _ = conn.ExecContext(context.Background(), "ROLLBACK")
})
t.Run("context cancellation", func(t *testing.T) {
conn, err := store.db.Conn(ctx)
if err != nil {
t.Fatalf("Failed to acquire connection: %v", err)
}
defer conn.Close()
cancelCtx, cancel := context.WithCancel(ctx)
cancel() // Cancel immediately
err = beginImmediateWithRetry(cancelCtx, conn, 5, 10)
if err == nil {
t.Error("Expected context cancellation error, got nil")
_, _ = conn.ExecContext(context.Background(), "ROLLBACK")
}
if !errors.Is(err, context.Canceled) {
t.Errorf("Expected context.Canceled, got %v", err)
}
})
t.Run("defaults for invalid parameters", func(t *testing.T) {
conn, err := store.db.Conn(ctx)
if err != nil {
t.Fatalf("Failed to acquire connection: %v", err)
}
defer conn.Close()
// Should use defaults (5 retries, 10ms delay) when passed invalid values
err = beginImmediateWithRetry(ctx, conn, 0, 0)
if err != nil {
t.Errorf("beginImmediateWithRetry with invalid params failed: %v", err)
}
// Rollback to clean up
_, _ = conn.ExecContext(context.Background(), "ROLLBACK")
})
}