- Add fsnotify dependency for file watching - Create daemon_debouncer.go: batch rapid events (500ms window) - Create daemon_watcher.go: monitor JSONL and git refs changes - Create daemon_event_loop.go: event-driven sync loop - Add mutation channel to RPC server (create/update/close events) - Add BEADS_DAEMON_MODE env var (poll/events, default: poll) Phase 1 implementation: opt-in via BEADS_DAEMON_MODE=events Target: <500ms latency (vs 5000ms), ~60% CPU reduction Related: bd-49 (epic), bd-50, bd-51, bd-53, bd-54, bd-55, bd-56 Amp-Thread-ID: https://ampcode.com/threads/T-35a3d0d7-4e19-421d-8392-63755035036e Co-authored-by: Amp <amp@ampcode.com>
36 KiB
Repair Commands & AI-Assisted Tooling
Status: Design Proposal Author: AI Assistant Date: 2025-10-28 Context: Reduce agent repair burden by providing specialized repair tools
Executive Summary
Agents spend significant time repairing beads databases due to:
- Git merge conflicts in JSONL
- Duplicate issues from parallel work
- Semantic inconsistencies (labeling, dependencies)
- Orphaned references after deletions
Solution: Add dedicated repair commands that agents (and humans) can invoke instead of manually fixing these issues. Some commands use AI for semantic understanding, others are pure mechanical checks.
Problem Analysis
Current Repair Scenarios
Based on codebase analysis and commit history:
1. Git Merge Conflicts (High Frequency)
Scenario:
# Feature branch creates bd-42
git checkout -b feature
bd create "Add authentication" # Creates bd-42
# Meanwhile, main branch also creates bd-42
git checkout main
bd create "Fix logging" # Also creates bd-42
# Merge creates conflict
git checkout feature
git merge main
JSONL conflict:
<<<<<<< HEAD
{"id":"bd-42","title":"Add authentication",...}
=======
{"id":"bd-42","title":"Fix logging",...}
>>>>>>> main
Current fix: Agent manually parses conflict markers, remaps IDs, updates references
Pain points:
- Time-consuming (5-10 minutes per conflict)
- Error-prone (easy to miss references)
- Repetitive (same logic every time)
2. Semantic Duplicates (Medium Frequency)
Scenario:
# Agent A creates issue
bd create "Fix memory leak in parser" # bd-42
# Agent B creates similar issue (different session)
bd create "Parser memory leak needs fixing" # bd-87
# Human notices: "These are the same issue!"
Current fix: Agent manually:
- Reads both issues
- Determines they're duplicates
- Picks canonical one
- Closes duplicate with reference
- Moves comments/dependencies
Pain points:
- Requires reading full issue text
- Subjective judgment (are they really duplicates?)
- Manual reference updates
3. Test Pollution (Low Frequency Now, High Impact)
Scenario:
# Test creates 1044 issues in production DB
go test ./internal/rpc/... # Oops, no isolation
bd list
# Shows 1044 issues with titles like "test-issue-1", "benchmark-issue-42"
Recent occurrence: Commits 78e8cb9, d1d3fcd (Oct 2025)
Current fix: Agent manually:
- Identifies test issues by pattern matching
- Bulk closes with
bd close bd-1 bd-2 ... bd-1044 - Archives or deletes
Pain points:
- Hard to distinguish test vs. real issues
- Risk of deleting real issues
- No automated recovery
4. Orphaned Dependencies (Medium Frequency)
Scenario:
bd create "Implement feature X" # bd-42
bd create "Test feature X" --depends bd-42 # bd-43 depends on bd-42
bd delete bd-42 # User deletes parent
bd show bd-43
# Depends: bd-42 (orphaned - issue doesn't exist!)
Current fix: Agent manually updates dependencies
Pain points:
- Silent corruption (no warning on delete)
- Hard to find orphans (requires DB query)
Proposed Commands
1. bd resolve-conflicts - Git Merge Conflict Resolver
Purpose: Automatically resolve JSONL merge conflicts
Usage:
# Detect conflicts
bd resolve-conflicts
# Auto-resolve with AI
bd resolve-conflicts --auto
# Manual conflict resolution
bd resolve-conflicts --interactive
Implementation:
// cmd/bd/resolve_conflicts.go (new file)
package main
import (
"bufio"
"context"
"fmt"
"os"
"strings"
"github.com/steveyegge/beads/internal/types"
)
type ConflictBlock struct {
HeadIssues []types.Issue
BaseIssues []types.Issue
LineStart int
LineEnd int
}
func detectConflicts(jsonlPath string) ([]ConflictBlock, error) {
file, err := os.Open(jsonlPath)
if err != nil {
return nil, err
}
defer file.Close()
var conflicts []ConflictBlock
var current *ConflictBlock
inConflict := false
inHead := false
lineNum := 0
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
lineNum++
switch {
case strings.HasPrefix(line, "<<<<<<<"):
// Start of conflict
inConflict = true
inHead = true
current = &ConflictBlock{LineStart: lineNum}
case strings.HasPrefix(line, "======="):
// Switch from HEAD to base
inHead = false
case strings.HasPrefix(line, ">>>>>>>"):
// End of conflict
inConflict = false
current.LineEnd = lineNum
conflicts = append(conflicts, *current)
current = nil
case inConflict && inHead:
// Parse issue in HEAD section
issue, err := parseIssueLine(line)
if err == nil {
current.HeadIssues = append(current.HeadIssues, issue)
}
case inConflict && !inHead:
// Parse issue in base section
issue, err := parseIssueLine(line)
if err == nil {
current.BaseIssues = append(current.BaseIssues, issue)
}
}
}
if scanner.Err() != nil {
return nil, scanner.Err()
}
return conflicts, nil
}
func resolveConflictsAuto(conflicts []ConflictBlock, useAI bool) ([]Resolution, error) {
var resolutions []Resolution
for _, conflict := range conflicts {
if useAI {
// Use AI to determine resolution
resolution, err := resolveConflictWithAI(conflict)
if err != nil {
return nil, err
}
resolutions = append(resolutions, resolution)
} else {
// Mechanical resolution: remap duplicate IDs
resolution := resolveConflictMechanical(conflict)
resolutions = append(resolutions, resolution)
}
}
return resolutions, nil
}
type Resolution struct {
Action string // "remap", "merge", "keep-head", "keep-base"
OldID string
NewID string
Reason string
Merged *types.Issue // If action="merge"
}
func resolveConflictMechanical(conflict ConflictBlock) Resolution {
// Mechanical strategy: Keep HEAD, remap base to new IDs
// This matches current auto-import collision resolution
headIDs := make(map[string]bool)
for _, issue := range conflict.HeadIssues {
headIDs[issue.ID] = true
}
var resolutions []Resolution
for _, issue := range conflict.BaseIssues {
if headIDs[issue.ID] {
// ID collision: remap base issue to next available ID
newID := getNextAvailableID()
resolutions = append(resolutions, Resolution{
Action: "remap",
OldID: issue.ID,
NewID: newID,
Reason: fmt.Sprintf("ID %s exists in both branches", issue.ID),
})
}
}
return resolutions[0] // Simplified for example
}
func resolveConflictWithAI(conflict ConflictBlock) (Resolution, error) {
// Call AI to analyze conflict and suggest resolution
prompt := fmt.Sprintf(`
You are resolving a git merge conflict in a beads issue tracker JSONL file.
HEAD issues (current branch):
%s
BASE issues (incoming branch):
%s
Analyze these conflicts and suggest ONE of:
1. "remap" - Issues are different, keep both but remap IDs
2. "merge" - Issues are similar, merge into one
3. "keep-head" - HEAD version is correct, discard BASE
4. "keep-base" - BASE version is correct, discard HEAD
Respond in JSON format:
{
"action": "remap|merge|keep-head|keep-base",
"reason": "explanation",
"merged_issue": {...} // Only if action=merge
}
`, formatIssues(conflict.HeadIssues), formatIssues(conflict.BaseIssues))
// Call AI (via environment-configured API)
response, err := callAIAPI(prompt)
if err != nil {
return Resolution{}, err
}
// Parse response
var resolution Resolution
if err := json.Unmarshal([]byte(response), &resolution); err != nil {
return Resolution{}, err
}
return resolution, nil
}
func applyResolutions(jsonlPath string, conflicts []ConflictBlock, resolutions []Resolution) error {
// Read entire JSONL
allIssues, err := readJSONL(jsonlPath)
if err != nil {
return err
}
// Apply resolutions
for i, resolution := range resolutions {
conflict := conflicts[i]
switch resolution.Action {
case "remap":
// Remap IDs and update references
remapIssueID(allIssues, resolution.OldID, resolution.NewID)
case "merge":
// Replace both with merged issue
replaceIssues(allIssues, conflict.HeadIssues, conflict.BaseIssues, resolution.Merged)
case "keep-head":
// Remove base issues
removeIssues(allIssues, conflict.BaseIssues)
case "keep-base":
// Remove head issues
removeIssues(allIssues, conflict.HeadIssues)
}
}
// Write back to JSONL (atomic)
return writeJSONL(jsonlPath, allIssues)
}
AI Integration:
// internal/ai/client.go (new package)
package ai
import (
"context"
"fmt"
"os"
)
type Client struct {
provider string // "anthropic", "openai", "ollama"
apiKey string
model string
}
func NewClient() (*Client, error) {
provider := os.Getenv("BEADS_AI_PROVIDER") // "anthropic" (default)
apiKey := os.Getenv("BEADS_AI_API_KEY") // Required for cloud providers
model := os.Getenv("BEADS_AI_MODEL") // "claude-3-5-sonnet-20241022" (default)
if provider == "" {
provider = "anthropic"
}
if apiKey == "" && provider != "ollama" {
return nil, fmt.Errorf("BEADS_AI_API_KEY required for provider %s", provider)
}
return &Client{
provider: provider,
apiKey: apiKey,
model: model,
}, nil
}
func (c *Client) Complete(ctx context.Context, prompt string) (string, error) {
switch c.provider {
case "anthropic":
return c.callAnthropic(ctx, prompt)
case "openai":
return c.callOpenAI(ctx, prompt)
case "ollama":
return c.callOllama(ctx, prompt)
default:
return "", fmt.Errorf("unknown AI provider: %s", c.provider)
}
}
func (c *Client) callAnthropic(ctx context.Context, prompt string) (string, error) {
// Use anthropic-go SDK
// Implementation omitted for brevity
return "", nil
}
Configuration:
# ~/.config/beads/ai.conf (optional)
BEADS_AI_PROVIDER=anthropic
BEADS_AI_API_KEY=sk-ant-...
BEADS_AI_MODEL=claude-3-5-sonnet-20241022
# Or use local Ollama
BEADS_AI_PROVIDER=ollama
BEADS_AI_MODEL=llama2
Example usage:
# Detect conflicts (shows summary, doesn't modify)
$ bd resolve-conflicts
Found 3 conflicts in beads.jsonl:
Conflict 1 (lines 42-47):
HEAD: bd-42 "Add authentication" (created by alice)
BASE: bd-42 "Fix logging" (created by bob)
→ Recommendation: REMAP (different issues, same ID)
Conflict 2 (lines 103-108):
HEAD: bd-87 "Update docs for API"
BASE: bd-87 "Update docs for API v2"
→ Recommendation: MERGE (similar, minor differences)
Conflict 3 (lines 234-239):
HEAD: bd-156 "Refactor parser"
BASE: bd-156 "Refactor parser" (identical)
→ Recommendation: KEEP-HEAD (identical content)
Run 'bd resolve-conflicts --auto' to apply recommendations.
Run 'bd resolve-conflicts --interactive' to review each conflict.
# Auto-resolve with AI
$ bd resolve-conflicts --auto --ai
Resolving 3 conflicts...
✓ Conflict 1: Remapped bd-42 (BASE) → bd-200
✓ Conflict 2: Merged into bd-87 (combined descriptions)
✓ Conflict 3: Kept HEAD version (identical)
Updated beads.jsonl (conflicts resolved)
Next steps:
1. Review changes: git diff beads.jsonl
2. Import to database: bd import
3. Commit resolution: git add beads.jsonl && git commit
# Interactive mode
$ bd resolve-conflicts --interactive
Conflict 1 of 3 (lines 42-47):
HEAD: bd-42 "Add authentication"
Created: 2025-10-20 by alice
Status: in_progress
Labels: feature, security
BASE: bd-42 "Fix logging"
Created: 2025-10-21 by bob
Status: open
Labels: bug, logging
AI Recommendation: REMAP (different issues, same ID)
Reason: Issues have different topics (auth vs logging) and authors
Choose action:
1) Remap BASE to new ID (recommended)
2) Merge into one issue
3) Keep HEAD, discard BASE
4) Keep BASE, discard HEAD
5) Skip (resolve manually)
Your choice [1-5]: 1
✓ Will remap BASE bd-42 → bd-200
Continue to next conflict? [Y/n]:
2. bd find-duplicates - AI-Powered Duplicate Detection
Purpose: Find semantically duplicate issues across the database
Usage:
# Find all duplicates
bd find-duplicates
# Find duplicates with specific threshold
bd find-duplicates --threshold 0.8
# Auto-merge duplicates (requires confirmation)
bd find-duplicates --merge
Implementation:
// cmd/bd/find_duplicates.go (new file)
package main
import (
"context"
"fmt"
"github.com/steveyegge/beads/internal/ai"
"github.com/steveyegge/beads/internal/storage"
"github.com/steveyegge/beads/internal/types"
)
type DuplicateGroup struct {
Issues []*types.Issue
Similarity float64
Reason string
}
func findDuplicates(ctx context.Context, store storage.Storage, useAI bool, threshold float64) ([]DuplicateGroup, error) {
// Get all open issues
issues, err := store.ListIssues(ctx, storage.ListOptions{
Status: []string{"open", "in_progress"},
})
if err != nil {
return nil, err
}
if !useAI {
// Mechanical approach: exact title match
return findDuplicatesMechanical(issues), nil
}
// AI approach: semantic similarity
return findDuplicatesWithAI(ctx, issues, threshold)
}
func findDuplicatesMechanical(issues []*types.Issue) []DuplicateGroup {
// Group by normalized title
titleMap := make(map[string][]*types.Issue)
for _, issue := range issues {
normalized := normalizeTitle(issue.Title)
titleMap[normalized] = append(titleMap[normalized], issue)
}
var groups []DuplicateGroup
for _, group := range titleMap {
if len(group) > 1 {
groups = append(groups, DuplicateGroup{
Issues: group,
Similarity: 1.0, // Exact match
Reason: "Identical titles",
})
}
}
return groups
}
func findDuplicatesWithAI(ctx context.Context, issues []*types.Issue, threshold float64) ([]DuplicateGroup, error) {
aiClient, err := ai.NewClient()
if err != nil {
return nil, fmt.Errorf("AI client unavailable: %v (set BEADS_AI_API_KEY)", err)
}
var groups []DuplicateGroup
// Compare all pairs (N^2, but issues typically <1000)
for i := 0; i < len(issues); i++ {
for j := i + 1; j < len(issues); j++ {
similarity, reason, err := compareIssues(ctx, aiClient, issues[i], issues[j])
if err != nil {
continue // Skip on error
}
if similarity >= threshold {
groups = append(groups, DuplicateGroup{
Issues: []*types.Issue{issues[i], issues[j]},
Similarity: similarity,
Reason: reason,
})
}
}
}
return groups, nil
}
func compareIssues(ctx context.Context, client *ai.Client, issue1, issue2 *types.Issue) (float64, string, error) {
prompt := fmt.Sprintf(`
Compare these two issues and determine if they are duplicates.
Issue 1: %s
Title: %s
Description: %s
Labels: %v
Status: %s
Issue 2: %s
Title: %s
Description: %s
Labels: %v
Status: %s
Respond in JSON:
{
"similarity": 0.0-1.0,
"reason": "explanation",
"is_duplicate": true/false
}
`, issue1.ID, issue1.Title, issue1.Description, issue1.Labels, issue1.Status,
issue2.ID, issue2.Title, issue2.Description, issue2.Labels, issue2.Status)
response, err := client.Complete(ctx, prompt)
if err != nil {
return 0, "", err
}
var result struct {
Similarity float64 `json:"similarity"`
Reason string `json:"reason"`
IsDuplicate bool `json:"is_duplicate"`
}
if err := json.Unmarshal([]byte(response), &result); err != nil {
return 0, "", err
}
return result.Similarity, result.Reason, nil
}
Optimization for large databases:
For databases with >1000 issues, N^2 comparison is too slow. Use embedding-based similarity:
// Use OpenAI embeddings or local model
func findDuplicatesWithEmbeddings(ctx context.Context, issues []*types.Issue, threshold float64) ([]DuplicateGroup, error) {
// 1. Generate embeddings for all issues
embeddings := make([][]float64, len(issues))
for i, issue := range issues {
text := fmt.Sprintf("%s\n%s", issue.Title, issue.Description)
embedding, err := generateEmbedding(ctx, text)
if err != nil {
return nil, err
}
embeddings[i] = embedding
}
// 2. Find similar pairs using cosine similarity
var groups []DuplicateGroup
for i := 0; i < len(embeddings); i++ {
for j := i + 1; j < len(embeddings); j++ {
similarity := cosineSimilarity(embeddings[i], embeddings[j])
if similarity >= threshold {
groups = append(groups, DuplicateGroup{
Issues: []*types.Issue{issues[i], issues[j]},
Similarity: similarity,
Reason: "Semantic similarity via embeddings",
})
}
}
}
return groups, nil
}
func generateEmbedding(ctx context.Context, text string) ([]float64, error) {
// Use OpenAI text-embedding-3-small or local model
// Returns 1536-dimensional vector
return nil, nil
}
func cosineSimilarity(a, b []float64) float64 {
var dotProduct, normA, normB float64
for i := range a {
dotProduct += a[i] * b[i]
normA += a[i] * a[i]
normB += b[i] * b[i]
}
return dotProduct / (math.Sqrt(normA) * math.Sqrt(normB))
}
Example usage:
# Find duplicates (mechanical, no AI)
$ bd find-duplicates --no-ai
Found 2 potential duplicate groups:
Group 1 (Similarity: 100%):
bd-42: "Fix memory leak in parser"
bd-87: "Fix memory leak in parser"
Reason: Identical titles
Group 2 (Similarity: 100%):
bd-103: "Update documentation"
bd-145: "Update documentation"
Reason: Identical titles
# Find duplicates with AI (semantic)
$ bd find-duplicates --ai --threshold 0.75
Found 4 potential duplicate groups:
Group 1 (Similarity: 95%):
bd-42: "Fix memory leak in parser"
bd-87: "Parser memory leak needs fixing"
Reason: Same issue described differently
Group 2 (Similarity: 88%):
bd-103: "Update API documentation"
bd-145: "Document new API endpoints"
Reason: Both about API docs, overlapping scope
Group 3 (Similarity: 82%):
bd-200: "Optimize database queries"
bd-234: "Improve query performance"
Reason: Same goal (performance), different wording
Group 4 (Similarity: 76%):
bd-301: "Add user authentication"
bd-312: "Implement login system"
Reason: Authentication and login are related features
# Merge duplicates interactively
$ bd find-duplicates --merge
Found 2 duplicate groups. Review each:
Group 1 (Similarity: 95%):
bd-42: "Fix memory leak in parser" (alice, 2025-10-20)
Status: in_progress
Labels: bug, performance
Comments: 3
bd-87: "Parser memory leak needs fixing" (bob, 2025-10-21)
Status: open
Labels: bug
Comments: 1
Merge these issues? [y/N] y
Choose canonical issue:
1) bd-42 (more activity, earlier)
2) bd-87
Your choice [1-2]: 1
✓ Merged bd-87 → bd-42
- Moved 1 comment from bd-87
- Added note: "Duplicate of bd-42"
- Closed bd-87 with reason: "duplicate"
Continue to next group? [Y/n]:
3. bd detect-pollution - Test Issue Detector
Purpose: Identify and clean up test issues that leaked into production database
Usage:
# Detect test issues
bd detect-pollution
# Auto-delete with confirmation
bd detect-pollution --clean
# Export pollution report
bd detect-pollution --report pollution.json
Implementation:
// cmd/bd/detect_pollution.go (new file)
package main
import (
"context"
"regexp"
"strings"
"github.com/steveyegge/beads/internal/storage"
"github.com/steveyegge/beads/internal/types"
)
type PollutionIndicator struct {
Pattern string
Weight float64
}
var pollutionPatterns = []PollutionIndicator{
{Pattern: `^test[-_]`, Weight: 0.9}, // "test-issue-1"
{Pattern: `^benchmark[-_]`, Weight: 0.95}, // "benchmark-issue-42"
{Pattern: `^(?i)test\s+issue`, Weight: 0.85}, // "Test Issue 123"
{Pattern: `^(?i)dummy`, Weight: 0.8}, // "Dummy issue"
{Pattern: `^(?i)sample`, Weight: 0.7}, // "Sample issue"
{Pattern: `^(?i)todo.*test`, Weight: 0.75}, // "TODO test something"
{Pattern: `^issue\s+\d+$`, Weight: 0.6}, // "issue 123"
{Pattern: `^[A-Z]{4,}-\d+$`, Weight: 0.5}, // "JIRA-123" (might be import)
}
func detectPollution(ctx context.Context, store storage.Storage, useAI bool) ([]*types.Issue, error) {
allIssues, err := store.ListIssues(ctx, storage.ListOptions{})
if err != nil {
return nil, err
}
if !useAI {
// Mechanical approach: pattern matching
return detectPollutionMechanical(allIssues), nil
}
// AI approach: semantic classification
return detectPollutionWithAI(ctx, allIssues)
}
func detectPollutionMechanical(issues []*types.Issue) []*types.Issue {
var polluted []*types.Issue
for _, issue := range issues {
score := 0.0
// Check title against patterns
for _, indicator := range pollutionPatterns {
matched, _ := regexp.MatchString(indicator.Pattern, issue.Title)
if matched {
score = max(score, indicator.Weight)
}
}
// Additional heuristics
if len(issue.Title) < 10 {
score += 0.2 // Very short titles suspicious
}
if issue.Description == "" || issue.Description == issue.Title {
score += 0.1 // No description
}
if strings.Count(issue.Title, "test") > 1 {
score += 0.2 // Multiple "test" occurrences
}
// Threshold: 0.7
if score >= 0.7 {
polluted = append(polluted, issue)
}
}
return polluted
}
func detectPollutionWithAI(ctx context.Context, issues []*types.Issue) ([]*types.Issue, error) {
aiClient, err := ai.NewClient()
if err != nil {
return nil, err
}
// Batch issues for efficiency (classify 50 at a time)
batchSize := 50
var polluted []*types.Issue
for i := 0; i < len(issues); i += batchSize {
end := min(i+batchSize, len(issues))
batch := issues[i:end]
prompt := buildPollutionPrompt(batch)
response, err := aiClient.Complete(ctx, prompt)
if err != nil {
return nil, err
}
// Parse response: list of issue IDs classified as test pollution
pollutedIDs, err := parsePollutionResponse(response)
if err != nil {
continue
}
for _, issue := range batch {
for _, id := range pollutedIDs {
if issue.ID == id {
polluted = append(polluted, issue)
}
}
}
}
return polluted, nil
}
func buildPollutionPrompt(issues []*types.Issue) string {
var builder strings.Builder
builder.WriteString("Identify test pollution in this issue list. Test issues have patterns like:\n")
builder.WriteString("- Titles starting with 'test', 'benchmark', 'sample'\n")
builder.WriteString("- Sequential numbering (test-1, test-2, ...)\n")
builder.WriteString("- Generic descriptions or no description\n")
builder.WriteString("- Created in rapid succession\n\n")
builder.WriteString("Issues:\n")
for _, issue := range issues {
fmt.Fprintf(&builder, "%s: %s (created: %s)\n", issue.ID, issue.Title, issue.CreatedAt)
}
builder.WriteString("\nRespond with JSON list of polluted issue IDs: {\"polluted\": [\"bd-1\", \"bd-2\"]}")
return builder.String()
}
Example usage:
# Detect pollution
$ bd detect-pollution
Scanning 523 issues for test pollution...
Found 47 potential test issues:
High Confidence (score ≥ 0.9):
bd-100: "test-issue-1"
bd-101: "test-issue-2"
...
bd-146: "benchmark-create-47"
(Total: 45 issues)
Medium Confidence (score 0.7-0.9):
bd-200: "Quick test"
bd-301: "sample issue for testing"
(Total: 2 issues)
Recommendation: Review and clean up these issues.
Run 'bd detect-pollution --clean' to delete them (with confirmation).
# Clean up
$ bd detect-pollution --clean
Found 47 test issues. Delete them? [y/N] y
Deleting 47 issues...
✓ Deleted bd-100 through bd-146
✓ Deleted bd-200, bd-301
Cleanup complete. Exported deleted issues to .beads/pollution-backup.jsonl
(Run 'bd import .beads/pollution-backup.jsonl' to restore if needed)
4. bd repair-deps - Orphaned Dependency Cleaner
Purpose: Find and fix orphaned dependency references
Usage:
# Find orphans
bd repair-deps
# Auto-fix (remove orphaned references)
bd repair-deps --fix
# Interactive
bd repair-deps --interactive
Implementation:
// cmd/bd/repair_deps.go (new file)
package main
import (
"context"
"fmt"
"github.com/steveyegge/beads/internal/storage"
"github.com/steveyegge/beads/internal/types"
)
type OrphanedDependency struct {
Issue *types.Issue
OrphanedID string
}
func findOrphanedDeps(ctx context.Context, store storage.Storage) ([]OrphanedDependency, error) {
allIssues, err := store.ListIssues(ctx, storage.ListOptions{})
if err != nil {
return nil, err
}
// Build ID existence map
existingIDs := make(map[string]bool)
for _, issue := range allIssues {
existingIDs[issue.ID] = true
}
// Find orphans
var orphaned []OrphanedDependency
for _, issue := range allIssues {
for _, depID := range issue.DependsOn {
if !existingIDs[depID] {
orphaned = append(orphaned, OrphanedDependency{
Issue: issue,
OrphanedID: depID,
})
}
}
}
return orphaned, nil
}
func repairOrphanedDeps(ctx context.Context, store storage.Storage, orphaned []OrphanedDependency, autoFix bool) error {
for _, o := range orphaned {
if autoFix {
// Remove orphaned dependency
newDeps := removeString(o.Issue.DependsOn, o.OrphanedID)
o.Issue.DependsOn = newDeps
if err := store.UpdateIssue(ctx, o.Issue); err != nil {
return err
}
fmt.Printf("✓ Removed orphaned dependency %s from %s\n", o.OrphanedID, o.Issue.ID)
} else {
fmt.Printf("Found orphan: %s depends on non-existent %s\n", o.Issue.ID, o.OrphanedID)
}
}
return nil
}
Example usage:
# Find orphaned deps
$ bd repair-deps
Scanning dependencies...
Found 3 orphaned dependencies:
bd-42: depends on bd-10 (deleted)
bd-87: depends on bd-25 (deleted)
bd-103: depends on bd-25 (deleted)
Run 'bd repair-deps --fix' to remove these references.
# Auto-fix
$ bd repair-deps --fix
✓ Removed bd-10 from bd-42 dependencies
✓ Removed bd-25 from bd-87 dependencies
✓ Removed bd-25 from bd-103 dependencies
Repaired 3 issues.
5. bd validate - Comprehensive Health Check
Purpose: Run all validation checks in one command
Usage:
# Run all checks
bd validate
# Auto-fix all issues
bd validate --fix-all
# Specific checks
bd validate --checks=duplicates,orphans,pollution
Implementation:
// cmd/bd/validate.go (new file)
package main
import (
"context"
"fmt"
"github.com/steveyegge/beads/internal/storage"
)
func runValidation(ctx context.Context, store storage.Storage, checks []string, autoFix bool) error {
results := ValidationResults{}
for _, check := range checks {
switch check {
case "duplicates":
groups, err := findDuplicates(ctx, store, false, 1.0)
if err != nil {
return err
}
results.Duplicates = len(groups)
case "orphans":
orphaned, err := findOrphanedDeps(ctx, store)
if err != nil {
return err
}
results.Orphans = len(orphaned)
if autoFix {
repairOrphanedDeps(ctx, store, orphaned, true)
}
case "pollution":
polluted, err := detectPollution(ctx, store, false)
if err != nil {
return err
}
results.Pollution = len(polluted)
case "conflicts":
jsonlPath := findJSONLPath()
conflicts, err := detectConflicts(jsonlPath)
if err != nil {
return err
}
results.Conflicts = len(conflicts)
}
}
results.Print()
return nil
}
type ValidationResults struct {
Duplicates int
Orphans int
Pollution int
Conflicts int
}
func (r ValidationResults) Print() {
fmt.Println("\nValidation Results:")
fmt.Println("===================")
fmt.Printf("Duplicates: %d\n", r.Duplicates)
fmt.Printf("Orphans: %d\n", r.Orphans)
fmt.Printf("Pollution: %d\n", r.Pollution)
fmt.Printf("Conflicts: %d\n", r.Conflicts)
total := r.Duplicates + r.Orphans + r.Pollution + r.Conflicts
if total == 0 {
fmt.Println("\n✓ Database is healthy!")
} else {
fmt.Printf("\n⚠ Found %d issues to fix\n", total)
}
}
Example usage:
$ bd validate
Running validation checks...
✓ Checking for duplicates... found 2 groups
✓ Checking for orphaned dependencies... found 3
✓ Checking for test pollution... found 0
✓ Checking for git conflicts... found 1
Validation Results:
===================
Duplicates: 2
Orphans: 3
Pollution: 0
Conflicts: 1
⚠ Found 6 issues to fix
Recommendations:
- Run 'bd find-duplicates --merge' to handle duplicates
- Run 'bd repair-deps --fix' to remove orphaned dependencies
- Run 'bd resolve-conflicts' to resolve git conflicts
$ bd validate --fix-all
Running validation with auto-fix...
✓ Fixed 3 orphaned dependencies
✓ Resolved 1 git conflict (mechanical)
2 duplicate groups require manual review.
Run 'bd find-duplicates --merge' to handle them interactively.
Agent Integration
MCP Server Functions
Add these as MCP functions for easy agent access:
# integrations/beads-mcp/src/beads_mcp/server.py
@server.call_tool()
async def beads_resolve_conflicts(auto: bool = False, ai: bool = True) -> list:
"""Resolve git merge conflicts in JSONL file"""
result = subprocess.run(
["bd", "resolve-conflicts"] +
(["--auto"] if auto else []) +
(["--ai"] if ai else []) +
["--json"],
capture_output=True,
text=True
)
return json.loads(result.stdout)
@server.call_tool()
async def beads_find_duplicates(ai: bool = True, threshold: float = 0.8) -> list:
"""Find duplicate issues using AI or mechanical matching"""
result = subprocess.run(
["bd", "find-duplicates"] +
(["--ai"] if ai else ["--no-ai"]) +
["--threshold", str(threshold), "--json"],
capture_output=True,
text=True
)
return json.loads(result.stdout)
@server.call_tool()
async def beads_detect_pollution() -> list:
"""Detect test issues that leaked into production"""
result = subprocess.run(
["bd", "detect-pollution", "--json"],
capture_output=True,
text=True
)
return json.loads(result.stdout)
@server.call_tool()
async def beads_validate(fix_all: bool = False) -> dict:
"""Run all validation checks"""
result = subprocess.run(
["bd", "validate"] +
(["--fix-all"] if fix_all else []) +
["--json"],
capture_output=True,
text=True
)
return json.loads(result.stdout)
Agent Workflow
Typical agent repair workflow:
1. Agent notices issue (e.g., git merge conflict error)
2. Agent calls: mcp__beads__resolve_conflicts(auto=True, ai=True)
3. If successful:
- Agent reports: "Resolved 3 conflicts, remapped 1 ID"
- Agent continues work
4. If fails:
- Agent calls: mcp__beads__resolve_conflicts() for report
- Agent asks user for guidance
Proactive validation:
At session start, agent can:
1. Call: mcp__beads__validate()
2. If issues found:
- Report to user: "Found 3 orphaned deps and 2 duplicates"
- Ask: "Should I fix these?"
3. If user approves:
- Call: mcp__beads__validate(fix_all=True)
- Report: "Fixed 3 orphans, 2 duplicates need manual review"
Cost Considerations
AI API Costs
Claude 3.5 Sonnet pricing (2025):
- Input: $3.00 / 1M tokens
- Output: $15.00 / 1M tokens
Typical usage:
-
Resolve conflicts (~500 tokens per conflict)
- Cost: ~$0.0075 per conflict
- 10 conflicts/day = $0.075/day = $2.25/month
-
Find duplicates (~200 tokens per comparison)
- Cost: ~$0.003 per comparison
- 100 issues = 4,950 comparisons = $15/run
- Too expensive! Use embeddings instead
-
Embeddings approach (text-embedding-3-small)
- $0.02 / 1M tokens
- 100 issues × 100 tokens = 10K tokens = $0.0002/run
- Much cheaper!
Recommendations:
- Use AI for conflict resolution (low frequency, high value)
- Use embeddings for duplicate detection (high frequency, needs scale)
- Use mechanical checks by default, AI as opt-in
Local AI Option
For users who want to avoid API costs:
# Use Ollama (free, local)
BEADS_AI_PROVIDER=ollama
BEADS_AI_MODEL=llama3.2
# Or use local embedding model
BEADS_EMBEDDING_PROVIDER=local
BEADS_EMBEDDING_MODEL=all-MiniLM-L6-v2 # 384-dimensional, fast
Implementation Roadmap
Phase 1: Mechanical Commands (2-3 weeks)
bd repair-deps(orphaned dependency cleaner)bd detect-pollution(pattern-based test detection)bd resolve-conflicts(mechanical ID remapping)bd validate(run all checks)
Phase 2: AI Integration (2-3 weeks)
- Add
internal/aipackage - Implement Anthropic, OpenAI, Ollama providers
- Add
--aiflag to commands - Test with real conflicts/duplicates
Phase 3: Embeddings (1-2 weeks)
- Add embedding generation
- Implement cosine similarity search
- Optimize for large databases (>1K issues)
- Benchmark performance
Phase 4: MCP Integration (1 week)
- Add MCP functions for all repair commands
- Update beads-mcp documentation
- Add examples to AGENTS.md
Phase 5: Polish (1 week)
- Add
--jsonoutput for all commands - Improve error messages
- Add progress indicators for slow operations
- Write comprehensive tests
Total timeline: 7-10 weeks
Success Metrics
Quantitative
- ✅ Agent repair time reduced by >50%
- ✅ Manual interventions reduced by >70%
- ✅ Conflict resolution time <30 seconds
- ✅ Duplicate detection accuracy >90%
Qualitative
- ✅ Agents report fewer "stuck" situations
- ✅ Users spend less time on database maintenance
- ✅ Fewer support requests about database issues
Open Questions
-
Should repair commands auto-run in daemon?
- Recommendation: No, too risky. On-demand only.
-
Should agents proactively run validation?
- Recommendation: Yes, at session start (with user notification)
-
What AI provider should be default?
- Recommendation: None (mechanical by default), user opts in
-
Should duplicate detection be continuous?
- Recommendation: No, run on-demand or weekly scheduled
-
How to handle false positives in pollution detection?
- Recommendation: Always confirm before deleting, backup to JSONL
Conclusion
Repair commands address the root cause of agent repair burden: lack of specialized tools for common maintenance tasks. By providing bd resolve-conflicts, bd find-duplicates, bd detect-pollution, and bd validate, we:
✅ Reduce agent time from 5-10 minutes to <30 seconds per repair ✅ Provide consistent repair logic across sessions ✅ Enable proactive validation instead of reactive fixing ✅ Allow AI assistance where valuable (conflicts, duplicates) while keeping mechanical checks fast
Combined with event-driven daemon (instant feedback), these tools should significantly reduce the "not as much in the background as I'd like" pain.