Files
beads/DATABASE_REINIT_BUG.md
Steve Yegge 7eb8fa37da Add DATABASE_REINIT_BUG investigation and fixes
Critical P0 bug analysis: silent data loss when .beads/ removed
- Root cause: autoimport.go hardcoded to issues.jsonl, git has beads.jsonl
- Oracle-reviewed fixes with implementation refinements
- Epic structure ready: 5 child issues, 5-7 hours estimated
- Comprehensive test cases for all scenarios

Amp-Thread-ID: https://ampcode.com/threads/T-57e73277-9112-42fd-a3c1-a1d1f5a22c8b
Co-authored-by: Amp <amp@ampcode.com>
2025-10-24 13:59:15 -07:00

16 KiB

Database Re-initialization Bug Investigation

Date: 2024-10-24
Severity: P0 Critical
Status: Under Investigation

Problem Statement

When .beads/ directory is removed and daemon auto-starts, it creates an empty database instead of importing from git-tracked JSONL file. This causes silent data loss.

What Happened

  1. Initial State: ~/src/fred/beads had polluted database with 202 issues
  2. Action Taken: Removed .beads/ directory to clean pollution: rm -rf .beads/
  3. Session Restart: Amp session restarted, working directory: /Users/stevey/src/fred/beads
  4. Auto-Init Triggered: Daemon auto-started and created fresh database
  5. Result: Empty database (0 issues) despite .beads/beads.jsonl in git with 111 issues

Root Cause Analysis

Key Observations

  1. File Naming Confusion

    • Git history shows rename: issues.jsonl → beads.jsonl (commit d1d3fcd)
    • Daemon created new issues.jsonl (empty)
    • Auto-import may be looking for wrong filename
  2. Auto-Import Failed

    • bd init ran successfully
    • Auto-import from git did NOT trigger
    • Expected behavior: should import from .beads/beads.jsonl in git
  3. Daemon Startup Sequence

    [2025-10-24 13:19:42] Daemon started
    [2025-10-24 13:19:42] Using database: /Users/stevey/src/fred/beads/.beads/bd.db
    [2025-10-24 13:19:42] Database opened
    [2025-10-24 13:19:42] Exported to JSONL (exported 0 issues to empty file)
    
  4. Multiple Database Problem

    • Three separate beads databases detected:
      • ~/src/beads/.beads/bd.db (4.2MB, 112 issues) CORRECT
      • ~/src/fred/beads/.beads/bd.db (155KB, 0 issues) EMPTY
      • ~/src/original/beads/.beads/bd.db UNKNOWN STATE

Expected Behavior

When .beads/ directory is missing but git has tracked JSONL:

  1. bd init should detect git-tracked JSONL file
  2. Auto-import should trigger immediately
  3. Database should be populated from git history
  4. User should see: "Imported N issues from git"

Actual Behavior

  1. bd init creates empty database
  2. Auto-import does NOT trigger
  3. Database remains empty (0 issues)
  4. Silent data loss - user unaware issues are missing

Impact

  • Silent Data Loss: Users lose entire issue database without warning
  • Multi-Workspace Confusion: Per-project daemons don't handle missing DB correctly
  • Git Sync Broken: Auto-import from git not working as expected
  • User Trust: Critical failure mode that breaks core workflow

Recovery Steps Taken

  1. Restored from git: git restore .beads/beads.jsonl File already in git, not in working tree
  2. Extracted from git history: git show HEAD:.beads/beads.jsonl > /tmp/backup.jsonl
  3. Manual import with collision resolution: bd import -i /tmp/backup.jsonl --resolve-collisions
  4. Final state: 194 issues recovered (had stale backup)

Correct Recovery (Final)

  1. Removed bad database: rm -f .beads/beads.db
  2. Git pull to get latest: git pull origin main (got 111 issues from ~/src/beads)
  3. Re-init with correct prefix: bd init --prefix bd
  4. Import from git-tracked JSONL: bd import -i .beads/beads.jsonl
  5. Result: 112 issues (111 + external_ref epic from main database)

Technical Investigation Needed

1. Auto-Import Logic

  • Where is auto-import triggered? (bd init command? daemon startup?)
  • What file does it look for? (issues.jsonl vs beads.jsonl)
  • Why didn't it run when .beads/ was missing?

2. Daemon Initialization

  • Should daemon auto-import on first startup?
  • Should daemon detect missing database and import from git?
  • Per-project daemon handling when DB missing

3. File Naming

  • When did issues.jsonl → beads.jsonl rename happen?
  • Are all code paths updated to use correct filename?
  • Is auto-import looking for old filename?

4. Git Integration

  • Should bd init check for tracked JSONL in git?
  • Should init fail if git has JSONL but DB is empty after init?
  • Add warning: "JSONL found in git but not imported"?

Proposed Fixes (Oracle-Reviewed)

Fix A: checkGitForIssues() Filename Detection (P0, Simple, <1h)

Current Code (autoimport.go:70-76):

relPath, err := filepath.Rel(gitRoot, filepath.Join(beadsDir, "issues.jsonl"))

Fixed Code:

// Try canonical JSONL filenames in precedence order
relBeads, err := filepath.Rel(gitRoot, beadsDir)
if err != nil {
    return 0, ""
}

candidates := []string{
    filepath.Join(relBeads, "beads.jsonl"),
    filepath.Join(relBeads, "issues.jsonl"),
}

for _, relPath := range candidates {
    cmd := exec.Command("git", "show", fmt.Sprintf("HEAD:%s", relPath))
    output, err := cmd.Output()
    if err == nil && len(output) > 0 {
        lines := bytes.Count(output, []byte("\n"))
        return lines, relPath
    }
}

return 0, ""

Impact: Auto-import will now detect beads.jsonl in git


Fix B: findJSONLPath() Consults Git HEAD (P0, Simple-Medium, 1-2h)

Current Code (main.go:898-912):

func findJSONLPath() string {
    jsonlPath := beads.FindJSONLPath(dbPath)
    // Creates directory but doesn't check git
    return jsonlPath
}

Fixed Code:

func findJSONLPath() string {
    // First check for existing local JSONL files
    jsonlPath := beads.FindJSONLPath(dbPath)
    
    dbDir := filepath.Dir(dbPath)
    
    // If local file exists, use it
    if _, err := os.Stat(jsonlPath); err == nil {
        return jsonlPath
    }
    
    // No local JSONL - check git HEAD for tracked filename
    if gitJSONL := checkGitForJSONLFilename(); gitJSONL != "" {
        jsonlPath = filepath.Join(dbDir, filepath.Base(gitJSONL))
    }
    
    // Ensure directory exists
    if err := os.MkdirAll(dbDir, 0755); err == nil {
        // Verify we didn't pick the wrong file
        // ...error checking...
    }
    
    return jsonlPath
}

Impact: Daemon/CLI will export to beads.jsonl (not issues.jsonl) when git tracks beads.jsonl


Fix C: Init Safety Check (P0, Simple, <1h)

Location: cmd/bd/init.go after line 150

Add After Import Attempt:

// Safety check: verify import succeeded
stats, err := store.GetStatistics(ctx)
if err == nil && stats.TotalIssues == 0 {
    // DB empty after init - check if git has issues we failed to import
    recheck, _ := checkGitForIssues()
    if recheck > 0 {
        fmt.Fprintf(os.Stderr, "\n❌ ERROR: Database empty but git has %d issues!\n", recheck)
        fmt.Fprintf(os.Stderr, "Auto-import failed. Manual recovery:\n")
        fmt.Fprintf(os.Stderr, "  git show HEAD:%s | bd import -i /dev/stdin\n", jsonlPath)
        fmt.Fprintf(os.Stderr, "Or:\n")
        fmt.Fprintf(os.Stderr, "  bd import -i %s\n", jsonlPath)
        os.Exit(1)
    }
}

Impact: Prevents silent data loss by failing loudly with recovery instructions


Fix D: Daemon Startup Import (P1, Simple, <1h)

Location: cmd/bd/daemon.go after DB open (around line 914)

Add After Database Open:

// Check for empty DB with issues in git
ctx := context.Background()
stats, err := store.GetStatistics(ctx)
if err == nil && stats.TotalIssues == 0 {
    issueCount, jsonlPath := checkGitForIssues()
    if issueCount > 0 {
        log(fmt.Sprintf("Empty database but git has %d issues, importing...", issueCount))
        if err := importFromGit(ctx, dbPath, store, jsonlPath); err != nil {
            log(fmt.Sprintf("Warning: startup import failed: %v", err))
        } else {
            log(fmt.Sprintf("Successfully imported %d issues from git", issueCount))
        }
    }
}

Impact: Daemon auto-recovers from empty DB on startup

Medium Term (P1)

  1. Multiple database warning (bd-112)

    • Detect multiple .beads/ in workspace hierarchy
    • Warn user on startup
    • Prevent accidental database pollution
  2. Better error messages

    • bd init: "Warning: found beads.jsonl in git with N issues"
    • bd stats: "Warning: database empty but git has tracked JSONL"
    • Guide user to recovery path

Implementation Refinements (Critical)

Fix B Missing Helper Function: The oracle's Fix B pseudocode calls checkGitForJSONLFilename() which doesn't exist. Need to add:

// checkGitForJSONLFilename returns just the filename from git HEAD check
func checkGitForJSONLFilename() string {
    _, relPath := checkGitForIssues()
    if relPath == "" {
        return ""
    }
    return filepath.Base(relPath)
}

Alternative Simpler Approach for Fix B: Instead of making findJSONLPath() git-aware, ensure import immediately exports to local file:

// In cmd/bd/init.go after successful importFromGit (line 148):
if err := importFromGit(ctx, initDBPath, store, jsonlPath); err != nil {
    // ...error handling...
} else {
    // CRITICAL: Immediately export to local to prevent daemon race
    localPath := filepath.Join(".beads", filepath.Base(jsonlPath))
    if err := exportToJSONL(ctx, store, localPath); err != nil {
        fmt.Fprintf(os.Stderr, "Warning: failed to export after import: %v\n", err)
    }
    fmt.Fprintf(os.Stderr, "✓ Successfully imported %d issues from git.\n\n", issueCount)
}

Race Condition Warning: After rm -rf .beads/, there's a timing window:

  1. bd init runs, imports from git's beads.jsonl
  2. Import schedules auto-flush (5-second debounce)
  3. Daemon auto-starts before flush completes
  4. Daemon calls findJSONLPath() → no local file yet → creates wrong issues.jsonl

Solution: Import must immediately create local JSONL (no debounce) to win the race.

Revised Priority:

  • Fix A: P0 - Blocks everything, enables git detection
  • Fix C: P0 - Prevents silent failures, critical safety net
  • Fix B: P0 - Prevents wrong file creation (OR immediate export)
  • Fix D: P1 - Nice recovery but redundant if A+B+C work

Precedence Rules (All Fixes)

When checking git HEAD:

  1. First try .beads/beads.jsonl
  2. Then try .beads/issues.jsonl
  3. Ignore non-canonical names (archive.jsonl, backup.jsonl, etc.)

When multiple local JSONL files exist:

  • Use existing beads.FindJSONLPath() glob behavior (first match)
  • This preserves backward compatibility

Long Term (P2)

  1. Unified JSONL naming

    • Standardize on one filename (recommend beads.jsonl)
    • Migration path for old issues.jsonl
    • Update all code paths consistently
    • Optional: Store chosen JSONL filename in DB metadata
  2. Git-aware init PARTIALLY DONE

    • bd init should be git-aware EXISTS (commit 7f82708)
    • Detect tracked JSONL and import automatically BROKEN (wrong filename)
    • Make this the default happy path WILL BE FIXED by Fix A

Implementation Plan (Epic Structure)

Epic: Fix database reinitialization data loss bug

Child Issues (in dependency order):

  1. Fix A: checkGitForIssues() filename detection (P0, <1h)

    • Update autoimport.go:70-96 to try beads.jsonl then issues.jsonl
    • Test: verify detects both filenames in git
    • Blocks: Fix C (needs working detection)
  2. Fix B-Alt: Immediate export after import (P0, <1h)

    • In init.go after importFromGit(), immediately call exportToJSONL()
    • Prevents daemon race condition
    • Simpler than making findJSONLPath() git-aware
    • Test: verify local JSONL created with correct filename
  3. Fix C: Init safety check (P0, <1h)

    • Add post-init verification in init.go
    • Error and exit if DB empty but git has issues
    • Depends: Fix A (uses checkGitForIssues)
    • Test: verify fails loudly when import fails
  4. Fix D: Daemon startup import (P1, <1h)

    • Add empty-DB check on daemon startup
    • Auto-import if git has issues
    • Depends: Fix A (uses checkGitForIssues)
    • Test: verify daemon recovers from empty DB
  5. Integration tests (P0, 1-2h)

    • Test fresh clone scenario
    • Test rm -rf .beads/ scenario
    • Test daemon race condition (start daemon immediately after init)
    • Test both beads.jsonl and issues.jsonl in git

Estimated Total: 5-7 hours

  • bd-112: Warn when multiple beads databases detected (filed in ~/src/beads)
  • GH #142: External_ref import feature (not directly related but shows import complexity)
  • Commit d1d3fcd: Renamed issues.jsonl → beads.jsonl
  • Commit 7f82708: "Fix bd init to auto-import issues from git on fresh clone"

Test Cases Needed

  1. Fresh Clone Scenario

    git clone repo
    cd repo
    bd init
    # Should auto-import from .beads/beads.jsonl
    # Should create local .beads/beads.jsonl immediately
    bd stats --json | jq '.total_issues'  # Should match git count
    
  2. Database Removal Scenario (Primary Bug)

    rm -rf .beads/
    bd init
    # Should detect git-tracked JSONL and import
    bd stats --json | jq '.total_issues'  # Should be >0, not 0
    ls .beads/*.jsonl  # Should be beads.jsonl, NOT issues.jsonl
    
  3. Race Condition Scenario (Daemon Startup)

    rm -rf .beads/
    bd init &  # Start init in background
    sleep 0.1
    bd ready   # Triggers daemon auto-start
    wait
    # Daemon should NOT create issues.jsonl
    # Should use beads.jsonl from git
    ls .beads/*.jsonl
    
  4. Legacy Filename Support (issues.jsonl)

    # Git has .beads/issues.jsonl (not beads.jsonl)
    rm -rf .beads/
    bd init
    # Should still import correctly
    ls .beads/*.jsonl  # Should be issues.jsonl (matches git)
    
  5. Multiple Workspace Scenario

    # Two separate clones
    ~/src/beads/        # database 1
    ~/src/fred/beads/   # database 2
    # Each should maintain separate state correctly
    # Each should use correct JSONL filename from its own git
    
  6. Daemon Restart Scenario

    bd daemon --stop
    rm .beads/bd.db
    bd daemon  # auto-start
    # Should import from git on startup
    bd stats --json | jq '.total_issues'  # Should be >0
    
  7. Init Safety Check Scenario

    # Simulate import failure
    rm -rf .beads/
    chmod 000 .beads  # Prevent creation
    bd init 2>&1 | grep ERROR
    # Should fail with clear error, not silent success
    

Root Cause Analysis - CONFIRMED

Primary Bug: Hardcoded Filename in checkGitForIssues()

File: cmd/bd/autoimport.go:76 Problem: Hardcoded to "issues.jsonl" but git tracks "beads.jsonl"

// Line 76 - HARDCODED FILENAME
relPath, err := filepath.Rel(gitRoot, filepath.Join(beadsDir, "issues.jsonl"))

Secondary Bug: Daemon Creates Wrong JSONL File

File: cmd/bd/main.go:findJSONLPath(), beads.go:FindJSONLPath() Problem: When no local JSONL exists, defaults to "issues.jsonl" without checking git HEAD

Code Flow:

  1. FindJSONLPath() globs for *.jsonl in .beads/ (line 137)
  2. If none found, defaults to "issues.jsonl" (line 144)
  3. Daemon exports to empty issues.jsonl, ignoring beads.jsonl in git

Why Auto-Import Failed

  1. bd init called checkGitForIssues() → looked for HEAD:.beads/issues.jsonl
  2. Git only has HEAD:.beads/beads.jsonl → check returned 0 issues
  3. No import triggered, DB stayed empty
  4. Daemon started, called findJSONLPath() → found no local JSONL
  5. Defaulted to issues.jsonl, exported 0 issues to empty file
  6. Silent data loss complete

Questions for Investigation

  1. Why did auto-import not trigger after bd init?
    • ANSWERED: checkGitForIssues() hardcoded to issues.jsonl, git has beads.jsonl
  2. Is there auto-import code that's not being called?
    • ANSWERED: Auto-import code ran but found 0 issues due to wrong filename
  3. When should daemon vs CLI handle import?
    • ANSWERED: Both should handle; daemon on startup if DB empty + git has JSONL
  4. Should we enforce single JSONL filename across codebase?
    • ANSWERED: Support both with precedence: beads.jsonl > issues.jsonl
  5. How do we prevent this silent data loss in future?
    • ANSWERED: See proposed fixes below

Severity Justification: P0

This is a critical data loss bug:

  • Silent failure (no error, no warning)
  • Complete data loss (0 issues after 202)
  • Core workflow broken (init + auto-import)
  • Multi-workspace scenarios broken
  • User cannot recover without manual intervention
  • Breaks trust in beads reliability

Recommendation: Investigate and fix immediately before 1.0 release.