Files

Steve Yegge f76f150997 Add import bug analysis for bd-3xq

Amp-Thread-ID: https://ampcode.com/threads/T-14d3a206-aeac-4499-8ae9-47f3715e18fa
Co-authored-by: Amp <amp@ampcode.com>

2025-11-04 12:25:53 -08:00

16 KiB

Raw Blame History

bd-3xq: Import Failure on Missing Parent Issues - Deep Analysis

Issue ID: bd-3xq Analysis Date: 2025-11-04 Status: P0 Bug

Executive Summary

The beads import process fails atomically when the JSONL file references deleted parent issues, blocking all imports. This is caused by overly strict parent validation in two critical code paths. The root issue is a design tension between referential integrity and operational flexibility.

Key Finding: The current implementation prioritizes database integrity over forward-compatibility, making normal operations like bd-delete potentially destructive to future imports.

Problem Deep Dive

The Failure Scenario

User deletes old/obsolete issues via bd-delete for hygiene ✓ (valid operation)
Issues remain in git history but are removed from database ✓ (expected)
JSONL file in git contains child issues whose parents were deleted ✗ (orphaned references)
Auto-import fails completely: parent issue bd-1f4086c5 does not exist ✗
Database becomes stuck - 296 issues in DB, newer data in JSONL cannot sync ✗

Technical Root Cause

Parent validation occurs in two critical locations:

1. `internal/storage/sqlite/ids.go:189-202` - In `EnsureIDs()`

// For hierarchical IDs (bd-a3f8e9.1), validate parent exists
if strings.Contains(issues[i].ID, ".") {
    // Extract parent ID (everything before the last dot)
    lastDot := strings.LastIndex(issues[i].ID, ".")
    parentID := issues[i].ID[:lastDot]

    var parentCount int
    err := conn.QueryRowContext(ctx, `SELECT COUNT(*) FROM issues WHERE id = ?`, parentID).Scan(&parentCount)
    if err != nil {
        return fmt.Errorf("failed to check parent existence: %w", err)
    }
    if parentCount == 0 {
        return fmt.Errorf("parent issue %s does not exist", parentID)  // ⚠️ BLOCKS ENTIRE IMPORT
    }
}

2. `internal/storage/sqlite/sqlite.go:182-196` - In `CreateIssue()`

// For hierarchical IDs (bd-a3f8e9.1), validate parent exists
if strings.Contains(issue.ID, ".") {
    // Extract parent ID (everything before the last dot)
    lastDot := strings.LastIndex(issue.ID, ".")
    parentID := issue.ID[:lastDot]

    var parentCount int
    err = conn.QueryRowContext(ctx, `SELECT COUNT(*) FROM issues WHERE id = ?`, parentID).Scan(&parentCount)
    if err != nil {
        return fmt.Errorf("failed to check parent existence: %w", err)
    }
    if parentCount == 0 {
        return fmt.Errorf("parent issue %s does not exist", parentID)  // ⚠️ BLOCKS CREATION
    }
}

Analysis: Both functions perform identical validation, creating a redundant but reinforced barrier. This is defensive programming taken too far - it prevents valid evolution scenarios.

Critical Insight: The Import Ordering Bug

Hidden Problem in `importer.go:534-546`

The upsertIssues() function has a latent bug that compounds the parent validation issue:

// Batch create all new issues
if len(newIssues) > 0 {
    if err := sqliteStore.CreateIssues(ctx, newIssues, "import"); err != nil {
        return fmt.Errorf("error creating issues: %w", err)
    }
    result.Created += len(newIssues)
}

The Problem: newIssues is not sorted by hierarchy depth before batch creation!

If the import includes:

bd-abc123 (parent)
bd-abc123.1 (child)

And they happen to be ordered [child, parent] in the slice, the import will fail even though both parent and child are present in the batch.

Why This Matters: Even if we relax parent validation to allow missing parents, we still need proper topological sorting to handle parent-child pairs in the same batch.

Design Analysis: Three Competing Forces

1. Referential Integrity (Current Priority)

Goal: Prevent orphaned children in the database
Benefit: Clean, consistent data structure
Cost: Blocks valid operations, makes deletion risky

2. Operational Flexibility (Sacrificed)

Goal: Allow normal database maintenance (deletions, pruning)
Benefit: Database hygiene, reduced clutter
Cost: Currently incompatible with strict integrity

3. Multi-Repo Sync (Broken)

Goal: Share issues across clones with different histories
Benefit: Collaboration, distributed workflow
Cost: Different deletion states across clones break imports

Current State: Force 1 wins at the expense of Forces 2 and 3.

Solution Space Analysis

Option 1: Strict Validation with Import-Time Parent Creation ⭐

Approach: Keep strict validation but auto-resurrect deleted parents during import.

How It Works:

When importing child with missing parent, check git history
If parent found in JSONL history, resurrect it as a tombstone
Tombstone: status=deleted, minimal metadata, preserved for structure
Child import succeeds with valid parent reference

Pros:

✅ Maintains referential integrity
✅ Allows forward-rolling imports
✅ Preserves dependency tree structure
✅ Minimal code changes

Cons:

⚠️ Database accumulates tombstones (but they're marked deleted)
⚠️ Requires git history access (already available)
⚠️ Slight complexity increase

Code Changes Required:

Modify EnsureIDs() and CreateIssue() to accept a "resurrect" mode
Add TryResurrectParent(ctx, parentID) function
Parse JSONL history to find deleted parent
Create parent with status="deleted" and flag is_tombstone=true

Risk Level: Low - Backwards compatible, preserves semantics

Option 2: Relaxed Validation - Skip Orphans

Approach: Log warning and skip orphaned children during import.

How It Works:

Remove if parentCount == 0 error return
Replace with: log.Warnf("Skipping orphaned issue %s (parent %s not found)", childID, parentID)
Continue import with other issues
Report skipped issues at end

Pros:

✅ Simplest implementation
✅ Unblocks imports immediately
✅ No data corruption

Cons:

❌ Silently loses data (orphaned issues)
❌ Hard to notice what was skipped
❌ Breaks user expectations (import should import everything)

Risk Level: Medium - Data loss risk

Option 3: Relaxed Validation - Allow Orphans

Approach: Import orphaned children without parent validation.

How It Works:

Remove parent existence check entirely
Allow bd-abc123.1 to exist without bd-abc123
UI/CLI queries handle missing parents gracefully

Pros:

✅ Maximum flexibility
✅ Simple code change
✅ Unblocks all scenarios

Cons:

❌ Breaks dependency tree integrity
❌ UI/CLI must handle orphans everywhere
❌ Hierarchical ID semantics become meaningless
❌ Risk of cascading failures in tree operations

Risk Level: High - Semantic corruption

Option 4: Convert Hierarchical to Top-Level

Approach: When parent missing, flatten child ID to top-level.

How It Works:

Detect orphaned child: bd-abc123.1
Convert to top-level: bd-abc123-1 (dot → dash)
Import as independent issue
Log transformation

Pros:

✅ Preserves all issues
✅ Maintains uniqueness
✅ No data loss

Cons:

❌ Changes issue IDs (breaks references)
❌ Loses hierarchical relationship
❌ Confusing for users

Risk Level: Medium - ID stability risk

Option 5: Two-Pass Import with Topological Sort ⭐⭐

Approach: Sort issues by hierarchy depth before batch creation.

How It Works:

Pre-process phase: Separate issues into depth buckets
- Depth 0: bd-abc123 (no dots)
- Depth 1: bd-abc123.1 (one dot)
- Depth 2: bd-abc123.1.2 (two dots)
Import phase: Create in depth order (0 → 1 → 2)
Parent resolution: For missing parents, try:
- Option A: Resurrect from JSONL (Option 1)
- Option B: Skip with warning (Option 2)
- Option C: Create placeholder parent

Pros:

✅ Fixes latent import ordering bug
✅ Handles parent-child pairs in same batch
✅ Can combine with other options (1, 2, or 3)
✅ More robust import pipeline

Cons:

⚠️ Requires refactoring upsertIssues()
⚠️ Slight performance overhead (sorting)

Code Changes Required:

// In upsertIssues() before batch creation:

// Sort newIssues by hierarchy depth to ensure parents are created first
sort.Slice(newIssues, func(i, j int) bool {
    depthI := strings.Count(newIssues[i].ID, ".")
    depthJ := strings.Count(newIssues[j].ID, ".")
    if depthI != depthJ {
        return depthI < depthJ  // Shallower first
    }
    return newIssues[i].ID < newIssues[j].ID  // Stable sort
})

// Then batch create by depth level
for depth := 0; depth <= 3; depth++ {  // Max depth 3
    var batchForDepth []*types.Issue
    for _, issue := range newIssues {
        if strings.Count(issue.ID, ".") == depth {
            batchForDepth = append(batchForDepth, issue)
        }
    }
    if len(batchForDepth) > 0 {
        if err := sqliteStore.CreateIssues(ctx, batchForDepth, "import"); err != nil {
            return fmt.Errorf("error creating depth-%d issues: %w", depth, err)
        }
        result.Created += len(batchForDepth)
    }
}

Risk Level: Low - Fixes existing bug, improves robustness

Edge Cases to Consider

1. Parent Deleted in Multiple Levels

Scenario: bd-abc.1.2 exists but both bd-abc and bd-abc.1 are deleted.

Resolution: Recursive resurrection - resurrect entire chain.

2. Parent Never Existed in JSONL

Scenario: JSONL corruption or manual ID manipulation.

Resolution:

If resurrect mode: Skip with error (can't resurrect what doesn't exist)
If skip mode: Skip orphan
If allow mode: Import anyway (dangerous)

3. Concurrent Import from Different Clones

Scenario: Two clones import same JSONL with missing parents simultaneously.

Resolution: Resurrection is idempotent - second clone sees parent already exists (created by first clone). No conflict.

4. Parent Deleted After Child Import

Scenario: Import creates bd-abc.1, then user deletes bd-abc.

Resolution: Foreign key constraint prevents deletion (if enabled). If disabled, creates orphan in DB.

Recommendation: Add ON DELETE CASCADE or ON DELETE RESTRICT to child_counters table.

Schema Considerations

Current Schema (`schema.go`)

CREATE TABLE IF NOT EXISTS child_counters (
    parent_id TEXT PRIMARY KEY,
    next_counter INTEGER NOT NULL DEFAULT 1,
    FOREIGN KEY(parent_id) REFERENCES issues(id)
);

Issue: No ON DELETE clause - undefined behavior when parent deleted.

Recommended Schema Change

CREATE TABLE IF NOT EXISTS child_counters (
    parent_id TEXT PRIMARY KEY,
    next_counter INTEGER NOT NULL DEFAULT 1,
    FOREIGN KEY(parent_id) REFERENCES issues(id) ON DELETE CASCADE
);

Reason: When parent deleted, child counter should also be deleted. If parent is resurrected, counter gets recreated from scratch.

Performance Impact Analysis

Current Import (Broken)

Time: O(n) where n = number of issues
Fails on first orphan

Two-Pass Import (Option 5)

Sorting: O(n log n)
Depth-based batching: O(n × d) where d = max depth (3)
Total: O(n log n) - negligible for typical datasets (<10k issues)

Parent Resurrection (Option 1)

JSONL parse: Already done
Parent lookup: O(1) hash map lookup
Resurrection: O(1) single insert
Total: O(1) per orphan - minimal overhead

Conclusion: Performance impact is negligible (<5% overhead for typical imports).

Testing Strategy

Unit Tests Required

Test Import Ordering
- Import [child, parent] - should succeed
- Import [parent.1.2, parent, parent.1] - should sort correctly
Test Parent Resurrection
- Import child with deleted parent - should resurrect
- Import child with never-existed parent - should fail gracefully
Test Config Modes
- Test strict, resurrect, skip, allow modes
- Verify error messages and logging
Test Edge Cases
- Multi-level deletion (bd-abc.1.2 with bd-abc and bd-abc.1 deleted)
- Concurrent imports with same orphans
- JSONL corruption scenarios

Integration Tests Required

Multi-Repo Sync
- Clone A deletes issue
- Clone B imports Clone A's JSONL
- Verify: Clone B handles deletion gracefully
Round-Trip Fidelity
- Export → Delete parent → Import → Verify structure

Code Files Affected

Must Modify

internal/importer/importer.go:534-546 - Add topological sort
internal/storage/sqlite/ids.go:189-202 - Add resurrection option
internal/storage/sqlite/sqlite.go:182-196 - Add resurrection option

Should Modify

internal/storage/sqlite/schema.go:35-49 - Add ON DELETE CASCADE
internal/types/types.go - Add IsTombstone bool field (optional)

New Files Needed

internal/storage/sqlite/resurrection.go - Parent resurrection logic
internal/importer/sort.go - Topological sort utilities

Migration Path

For Existing Databases

Problem: Databases might already have orphaned children (if foreign keys were disabled during development).

Solution: Add migration to detect and fix orphans:

-- Find orphaned children
SELECT id FROM issues
WHERE id LIKE '%.%'
AND substr(id, 1, instr(id || '.', '.') - 1) NOT IN (SELECT id FROM issues);

-- Option A: Delete orphans
DELETE FROM issues WHERE id IN (...);

-- Option B: Convert to top-level
UPDATE issues SET id = replace(id, '.', '-') WHERE id IN (...);

Recommendation: Run detection query, log results, let user decide action.

Conclusion

bd-3xq reveals a fundamental design flaw: The system prioritizes database integrity over operational flexibility, making normal operations (deletion) risky for future imports.

The hybrid solution (Options 1 + 5) is strongly recommended because it:

Fixes the latent import ordering bug that affects everyone
Enables graceful handling of deleted parents
Maintains referential integrity through resurrection
Provides configuration options for different use cases
Enables multi-repo workflows (bd-4ms)
Has minimal performance impact
Is backwards compatible

Estimated Implementation Time:

Phase 1 (sorting): 4-6 hours
Phase 2 (resurrection): 6-8 hours
Phase 3 (config): 2-3 hours
Testing: 8-10 hours
Total: 2-3 days for complete solution

Priority: P0 - Blocks multi-repo work (bd-4ms) and makes bd-delete risky

References

bd-3xq: This issue
bd-4ms: Multi-repo support (blocked by this issue)
bd-a101: Separate branch workflow (blocked by this issue)
bd-8e05: Hash-based ID migration (related context)
bd-95: Content hash computation (resurrection uses this)

Analysis completed: 2025-11-04 Analyzed by: Claude (Sonnet 4.5)

16 KiB Raw Blame History Unescape Escape

bd-3xq: Import Failure on Missing Parent Issues - Deep Analysis

Executive Summary

Problem Deep Dive

The Failure Scenario

Technical Root Cause

1. internal/storage/sqlite/ids.go:189-202 - In EnsureIDs()

2. internal/storage/sqlite/sqlite.go:182-196 - In CreateIssue()

Critical Insight: The Import Ordering Bug

Hidden Problem in importer.go:534-546

Design Analysis: Three Competing Forces

1. Referential Integrity (Current Priority)

2. Operational Flexibility (Sacrificed)

3. Multi-Repo Sync (Broken)

Solution Space Analysis

Option 1: Strict Validation with Import-Time Parent Creation ⭐

Option 2: Relaxed Validation - Skip Orphans

Option 3: Relaxed Validation - Allow Orphans

Option 4: Convert Hierarchical to Top-Level

Option 5: Two-Pass Import with Topological Sort ⭐⭐

Recommended Solution: Hybrid Approach 🎯

Implementation Plan

Phase 1: Fix Import Ordering (Option 5)

Phase 2: Add Parent Resurrection (Option 1)

Phase 3: Make Configurable

Benefits of Hybrid Approach

Edge Cases to Consider

1. Parent Deleted in Multiple Levels

2. Parent Never Existed in JSONL

3. Concurrent Import from Different Clones

4. Parent Deleted After Child Import

Schema Considerations

Current Schema (schema.go)

Recommended Schema Change

Performance Impact Analysis

Current Import (Broken)

Two-Pass Import (Option 5)

Parent Resurrection (Option 1)

Testing Strategy

Unit Tests Required

Integration Tests Required

Code Files Affected

Must Modify

Should Modify

New Files Needed

Migration Path

For Existing Databases

Conclusion

References

16 KiB

Raw Blame History

1. `internal/storage/sqlite/ids.go:189-202` - In `EnsureIDs()`

2. `internal/storage/sqlite/sqlite.go:182-196` - In `CreateIssue()`

Hidden Problem in `importer.go:534-546`

Current Schema (`schema.go`)