Files

Steve Yegge ab809c5baf Create issue structure for bd-222 and bd-224

- Added design documents (ULTRATHINK_BD222.md, ULTRATHINK_BD224.md)
- Created bd-224 epic with 6 child issues for status/closed_at invariant fix
- Created bd-222 epic with 7 child issues for batching API
- Set up dependencies: bd-224 blocks bd-222 (must fix invariant first)
- Dependencies enable max parallelism while ensuring correct order

2025-10-15 14:27:10 -07:00

27 KiB

Raw Blame History

Ultrathink: Batching API for Bulk Issue Creation (bd-222)

Date: 2025-10-15 Context: Individual devs, small teams, future agent swarms, bulk imports Problem: CreateIssue acquires dedicated connection per call, inefficient for bulk operations

Executive Summary

Recommended Solution: Hybrid approach - Add CreateIssues + Keep existing CreateIssue unchanged

Provides high-performance batch path for bulk operations while maintaining simple single-issue API for typical use.

Dependencies & Implementation Order

Critical Dependency: bd-224 (status/closed_at invariant)

bd-224 MUST be implemented before bd-222

Why: Both issues modify the same code paths:

bd-224: Fixes import.go to enforce closed_at invariant (status='closed' ⟺ closed_at != NULL)
bd-222: Changes import.go to use CreateIssues instead of CreateIssue loop

The Problem: If we implement bd-222 first:

CreateIssues won't enforce the closed_at invariant (inherits bug from CreateIssue)
Import switches to use CreateIssues
Import can still create inconsistent data (bd-224's bug persists)
Later bd-224 fix requires modifying BOTH CreateIssue AND CreateIssues

The Solution: If we implement bd-224 first:

Add CHECK constraint: (status = 'closed') = (closed_at IS NOT NULL)
Fix UpdateIssue to manage closed_at automatically
Fix import.go to enforce invariant before calling CreateIssue
Then implement bd-222's CreateIssues with invariant already enforced:
- Database constraint rejects bad data
- Issue.Validate() checks the invariant (per bd-224)
- Import code already normalizes before calling CreateIssues
- No new code needed in CreateIssues - it's correct by construction!

Implementation Impact

CreateIssues must validate closed_at invariant (from bd-224):

// Phase 1: Validation
for i, issue := range issues {
    if err := issue.Validate(); err != nil {  // ← Validates invariant (bd-224)
        return fmt.Errorf("validation failed for issue %d: %w", i, err)
    }
}

After bd-224 is complete, Issue.Validate() will check:

func (i *Issue) Validate() error {
    // ... existing validation ...

    // Enforce closed_at invariant (bd-224)
    if i.Status == StatusClosed && i.ClosedAt == nil {
        return fmt.Errorf("closed issues must have closed_at timestamp")
    }
    if i.Status != StatusClosed && i.ClosedAt != nil {
        return fmt.Errorf("non-closed issues cannot have closed_at timestamp")
    }

    return nil
}

This means CreateIssues automatically enforces the invariant through validation, with the database CHECK constraint as final defense.

Import Code Simplification

Before bd-224 (current import.go):

for _, issue := range issues {
    // Complex logic to handle status/closed_at independently
    updates := make(map[string]interface{})
    if _, ok := rawData["status"]; ok {
        updates["status"] = issue.Status  // ← Doesn't manage closed_at
    }
    // ... more complex update logic
    store.CreateIssue(ctx, issue, "import")
}

After bd-224 (import.go enforces invariant):

for _, issue := range issues {
    // Normalize closed_at based on status BEFORE creating
    if issue.Status == types.StatusClosed {
        if issue.ClosedAt == nil {
            now := time.Now()
            issue.ClosedAt = &now
        }
    } else {
        issue.ClosedAt = nil  // ← Clear if not closed
    }
    store.CreateIssue(ctx, issue, "import")
}

After bd-222 (import.go uses batch):

// Normalize all issues
for _, issue := range issues {
    if issue.Status == types.StatusClosed {
        if issue.ClosedAt == nil {
            now := time.Now()
            issue.ClosedAt = &now
        }
    } else {
        issue.ClosedAt = nil
    }
}

// Single batch call (5-15x faster!)
store.CreateIssues(ctx, issues, "import")

Much simpler: normalize once, call batch API, database constraint enforces correctness.

Recommended Implementation Sequence

✅ Implement bd-224 first (P1 bug fix)
- Add database CHECK constraint
- Add validation to Issue.Validate()
- Fix UpdateIssue to auto-manage closed_at
- Fix import.go to normalize closed_at before creating
✅ Then implement bd-222 (P2 performance enhancement)
- Add CreateIssues method (inherits bd-224's validation)
- Update import.go to use CreateIssues
- Import code is simpler (no per-issue loop, just normalize + batch)
✅ Benefits of this order:
- bd-224 fixes data integrity bug (higher priority)
- bd-222 builds on correct foundation
- No duplicate invariant enforcement code
- Database constraint + validation = defense in depth
- CreateIssues is correct by construction

Current State Analysis

How CreateIssue Works (sqlite.go:315-453)

func (s *SQLiteStorage) CreateIssue(ctx, issue, actor) error {
    // 1. Acquire dedicated connection
    conn, err := s.db.Conn(ctx)
    defer conn.Close()

    // 2. BEGIN IMMEDIATE transaction (acquires write lock)
    conn.ExecContext(ctx, "BEGIN IMMEDIATE")

    // 3. Generate ID atomically if needed
    //    - Query issue_counters
    //    - Update counter with MAX(existing, calculated) + 1

    // 4. Insert issue
    // 5. Record creation event
    // 6. Mark dirty for export
    // 7. COMMIT
}

Performance Characteristics

Single Issue Creation:

Connection acquisition: ~1ms
BEGIN IMMEDIATE: ~1-5ms (lock acquisition)
ID generation: ~2-3ms (subquery + update)
Insert + event + dirty: ~2-3ms
COMMIT: ~1-2ms
Total: ~7-14ms per issue

Bulk Creation (100 issues, sequential):

100 connections: ~100ms
100 transactions: ~100-500ms (lock contention!)
100 ID generations: ~200-300ms
100 inserts: ~200-300ms
Total: ~600ms-1.2s

With Batching (estimated):

1 connection: ~1ms
1 transaction: ~1-5ms
ID generation batch: ~10-20ms (one query for range)
Bulk insert: ~50-100ms (prepared stmt, multiple VALUES)
Total: ~60-130ms (5-10x faster)

When Does This Matter?

Low Impact (current approach is fine):

Interactive CLI use: bd create "Fix bug"
Individual agent creating 1-5 issues
Typical development workflow

High Impact (batching helps):

✅ Bulk import from JSONL (10-1000+ issues)
✅ Agent workflows generating issue decompositions (10-50 issues)
✅ Migrating from other systems (100-10000+ issues)
✅ Template instantiation (creating epic + subtasks)
✅ Test data generation

Solution Options

Option A: Simple All-or-Nothing Batch ⭐ RECOMMENDED

// CreateIssues creates multiple issues atomically in a single transaction
func (s *SQLiteStorage) CreateIssues(ctx context.Context, issues []*types.Issue, actor string) error

Semantics:

All issues created, or none created (atomicity)
Single transaction, single connection
Returns error if ANY issue fails validation or insertion
IDs generated atomically as a range

Pros:

✅ Simple mental model (atomic batch)
✅ Clear error handling (one error = whole batch fails)
✅ Matches database transaction semantics
✅ Easy to implement (similar to CreateIssue)
✅ No partial state in database
✅ Safe for concurrent access (IMMEDIATE transaction)
✅ 5-10x faster for bulk operations

Cons:

⚠️ If one issue is invalid, whole batch fails
⚠️ Caller must retry entire batch on error
⚠️ No indication of WHICH issue failed

Mitigation: Add validation-only mode to pre-check batch

Verdict: Best for most use cases (import, migrations, agent workflows)

Option B: Partial Success with Error Details

type CreateResult struct {
    ID      string
    Error   error
}

func (s *SQLiteStorage) CreateIssues(ctx context.Context, issues []*types.Issue, actor string) ([]CreateResult, error)

Semantics:

Best-effort creation
Returns results for each issue (ID or error)
Transaction commits even if some issues fail
Complex rollback semantics

Pros:

✅ Caller knows exactly which issues failed
✅ Partial progress on errors
✅ Good for unreliable input data

Cons:

❌ Complex transaction semantics: Which failures abort transaction?
❌ Partial state in database: Caller must track what succeeded
❌ ID generation complexity: Skip failed issues in counter?
❌ Dirty tracking complexity: Which issues to mark dirty?
❌ Event recording: Record events for succeeded issues only?
❌ More complex API for common case
❌ Caller must handle partial state

Verdict: Too complex, doesn't match database atomicity model

Option C: Batch with Configurable Strategy

type BatchOptions struct {
    FailFast        bool  // Stop on first error (default)
    ContinueOnError bool  // Best effort
    ValidateOnly    bool  // Dry run
}

func (s *SQLiteStorage) CreateIssues(ctx, issues, actor, opts) ([]CreateResult, error)

Pros:

✅ Flexible for different use cases
✅ Can support both atomic and partial modes

Cons:

❌ Too much complexity for the benefit
❌ Multiple code paths = more bugs
❌ Unclear which mode to use when
❌ Doesn't solve the core problem (connection overhead)

Verdict: Over-engineered for current needs

Option D: Internal Optimization Only (No API Change)

Optimize CreateIssue internally to batch operations without changing API.

Approach:

Connection pooling improvements
Prepared statement caching
WAL optimization

Pros:

✅ No API changes
✅ Benefits all callers automatically

Cons:

❌ Can't eliminate transaction overhead (still N transactions)
❌ Can't eliminate ID generation overhead (still N counter updates)
❌ Limited improvement (maybe 20-30% faster, not 5-10x)
❌ Doesn't address root cause

Verdict: Good to do anyway, but doesn't solve the problem

Recommended Solution: Simple All-or-Nothing Batch (Option A)

API Design

// CreateIssues creates multiple issues atomically in a single transaction.
// All issues are created or none are created. Returns error if any issue
// fails validation or insertion.
//
// Performance: ~10x faster than calling CreateIssue in a loop for large batches.
// Use this for bulk imports, migrations, or agent workflows creating many issues.
//
// Issues with empty IDs will have IDs generated atomically. Issues with
// explicit IDs are used as-is (caller responsible for avoiding collisions).
func (s *SQLiteStorage) CreateIssues(ctx context.Context, issues []*types.Issue, actor string) error

Implementation Strategy

Phase 1: Validation

// Validate all issues first (fail-fast)
for i, issue := range issues {
    if err := issue.Validate(); err != nil {
        return fmt.Errorf("validation failed for issue %d: %w", i, err)
    }
}

Phase 2: Connection & Transaction

// Acquire dedicated connection (same as CreateIssue)
conn, err := s.db.Conn(ctx)
if err != nil {
    return fmt.Errorf("failed to acquire connection: %w", err)
}
defer conn.Close()

// BEGIN IMMEDIATE (same as CreateIssue)
if _, err := conn.ExecContext(ctx, "BEGIN IMMEDIATE"); err != nil {
    return fmt.Errorf("failed to begin immediate transaction: %w", err)
}

committed := false
defer func() {
    if !committed {
        conn.ExecContext(context.Background(), "ROLLBACK")
    }
}()

Phase 3: Batch ID Generation

Key Insight: Generate ID range atomically, then assign sequentially

// Count how many issues need IDs
needIDCount := 0
for _, issue := range issues {
    if issue.ID == "" {
        needIDCount++
    }
}

// Generate ID range atomically (if needed)
var nextID int
var prefix string
if needIDCount > 0 {
    // Get prefix from config
    err := conn.QueryRowContext(ctx,
        `SELECT value FROM config WHERE key = ?`,
        "issue_prefix").Scan(&prefix)
    if err == sql.ErrNoRows || prefix == "" {
        prefix = "bd"
    } else if err != nil {
        return fmt.Errorf("failed to get config: %w", err)
    }

    // Atomically reserve ID range: [nextID, nextID+needIDCount)
    // This is the KEY optimization - one counter update instead of N
    err = conn.QueryRowContext(ctx, `
        INSERT INTO issue_counters (prefix, last_id)
        SELECT ?, COALESCE(MAX(CAST(substr(id, LENGTH(?) + 2) AS INTEGER)), 0) + ?
        FROM issues
        WHERE id LIKE ? || '-%'
          AND substr(id, LENGTH(?) + 2) GLOB '[0-9]*'
        ON CONFLICT(prefix) DO UPDATE SET
            last_id = MAX(
                last_id,
                (SELECT COALESCE(MAX(CAST(substr(id, LENGTH(?) + 2) AS INTEGER)), 0)
                 FROM issues
                 WHERE id LIKE ? || '-%'
                   AND substr(id, LENGTH(?) + 2) GLOB '[0-9]*')
            ) + ?
        RETURNING last_id
    `, prefix, prefix, needIDCount, prefix, prefix, prefix, prefix, prefix, needIDCount).Scan(&nextID)
    if err != nil {
        return fmt.Errorf("failed to generate ID range: %w", err)
    }

    // Assign IDs sequentially
    currentID := nextID - needIDCount + 1
    for i := range issues {
        if issues[i].ID == "" {
            issues[i].ID = fmt.Sprintf("%s-%d", prefix, currentID)
            currentID++
        }
    }
}

Phase 4: Bulk Insert Issues

Two approaches:

Approach A: Prepared Statement + Loop (simpler, still fast)

stmt, err := conn.PrepareContext(ctx, `
    INSERT INTO issues (
        id, title, description, design, acceptance_criteria, notes,
        status, priority, issue_type, assignee, estimated_minutes,
        created_at, updated_at, closed_at, external_ref
    ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`)
if err != nil {
    return fmt.Errorf("failed to prepare statement: %w", err)
}
defer stmt.Close()

now := time.Now()
for _, issue := range issues {
    issue.CreatedAt = now
    issue.UpdatedAt = now

    _, err = stmt.ExecContext(ctx,
        issue.ID, issue.Title, issue.Description, issue.Design,
        issue.AcceptanceCriteria, issue.Notes, issue.Status,
        issue.Priority, issue.IssueType, issue.Assignee,
        issue.EstimatedMinutes, issue.CreatedAt, issue.UpdatedAt,
        issue.ClosedAt, issue.ExternalRef,
    )
    if err != nil {
        return fmt.Errorf("failed to insert issue %s: %w", issue.ID, err)
    }
}

Approach B: Multi-VALUE INSERT (fastest, more complex)

// Build multi-value INSERT
// INSERT INTO issues VALUES (...), (...), (...)
// More complex string building but ~2x faster for large batches
// Defer to performance testing phase

Decision: Start with Approach A (prepared statement), optimize to Approach B if benchmarks show need

Phase 5: Bulk Record Events

// Prepare event statement
eventStmt, err := conn.PrepareContext(ctx, `
    INSERT INTO events (issue_id, event_type, actor, new_value)
    VALUES (?, ?, ?, ?)
`)
if err != nil {
    return fmt.Errorf("failed to prepare event statement: %w", err)
}
defer eventStmt.Close()

for _, issue := range issues {
    eventData, err := json.Marshal(issue)
    if err != nil {
        eventData = []byte(fmt.Sprintf(`{"id":"%s","title":"%s"}`, issue.ID, issue.Title))
    }

    _, err = eventStmt.ExecContext(ctx, issue.ID, types.EventCreated, actor, string(eventData))
    if err != nil {
        return fmt.Errorf("failed to record event for %s: %w", issue.ID, err)
    }
}

Phase 6: Bulk Mark Dirty

// Bulk insert dirty markers
dirtyStmt, err := conn.PrepareContext(ctx, `
    INSERT INTO dirty_issues (issue_id, marked_at)
    VALUES (?, ?)
    ON CONFLICT (issue_id) DO UPDATE SET marked_at = excluded.marked_at
`)
if err != nil {
    return fmt.Errorf("failed to prepare dirty statement: %w", err)
}
defer dirtyStmt.Close()

dirtyTime := time.Now()
for _, issue := range issues {
    _, err = dirtyStmt.ExecContext(ctx, issue.ID, dirtyTime)
    if err != nil {
        return fmt.Errorf("failed to mark dirty %s: %w", issue.ID, err)
    }
}

Phase 7: Commit

if _, err := conn.ExecContext(ctx, "COMMIT"); err != nil {
    return fmt.Errorf("failed to commit transaction: %w", err)
}
committed = true
return nil

Design Decisions & Tradeoffs

Decision 1: All-or-Nothing Atomicity ✅

Rationale: Matches database transaction semantics, simpler mental model

Tradeoff: Batch fails if ANY issue is invalid

Mitigation: Pre-validate all issues before starting transaction
Alternative: Caller can retry with smaller batches or individual issues

Decision 2: Same Transaction Semantics as CreateIssue ✅

Use BEGIN IMMEDIATE, not DEFERRED or EXCLUSIVE

Rationale:

Consistency with existing CreateIssue
Prevents race conditions in ID generation
Serializes batch operations (which is fine - they're rare)

Tradeoff: Batches serialize (only one concurrent batch writer)

Impact: Low - batch operations are rare (import, migration)
Benefit: Simple, correct, no race conditions

Decision 3: Atomic ID Range Reservation ✅

Generate range [nextID, nextID+N) in single counter update

Rationale: KEY optimization - avoids N counter updates

Implementation:

-- Old approach (CreateIssue): N updates
UPDATE issue_counters SET last_id = last_id + 1 RETURNING last_id;  -- N times

-- New approach (CreateIssues): 1 update
UPDATE issue_counters SET last_id = last_id + N RETURNING last_id;  -- Once

Correctness: Safe because BEGIN IMMEDIATE serializes batches

Decision 4: Support Mixed ID Assignment ✅

Some issues can have explicit IDs, others auto-generated

Use Case: Import with some external IDs, some new issues

issues := []*Issue{
    {ID: "ext-123", Title: "External issue"},  // Keep ID
    {ID: "", Title: "New issue"},               // Generate ID
    {ID: "bd-999", Title: "Explicit ID"},      // Keep ID
}

Rationale: Flexible for import scenarios

Complexity: Low - just count issues needing IDs

Decision 5: Prepared Statements Over Multi-VALUE INSERT ✅

Start with prepared statement loop, optimize later if needed

Rationale:

Simpler implementation
Still much faster than N transactions (5-10x)
Multi-VALUE INSERT only ~2x faster than prepared stmt
Can optimize later if profiling shows need

Decision 6: Keep CreateIssue Unchanged ✅

Don't modify existing CreateIssue implementation

Rationale:

Backward compatibility
No risk to existing callers
Additive change only
Different use cases (single vs batch)

When to Use Which API

Use CreateIssue (existing)

✅ Interactive CLI: bd create "Title"
✅ Single issue creation
✅ Agent creating 1-3 issues
✅ When simplicity matters
✅ When you want per-issue error handling

Use CreateIssues (new)

✅ Bulk import from JSONL (10-1000+ issues)
✅ Migration from other systems (100-10000+ issues)
✅ Agent decomposing work into 10-50 issues
✅ Template instantiation (epic + subtasks)
✅ Test data generation
✅ When performance matters

Rule of Thumb: Use CreateIssues for N > 5 issues

Implementation Checklist

Phase 1: Core Implementation ✅

Add CreateIssues to Storage interface (storage/storage.go)
Implement SQLiteStorage.CreateIssues (storage/sqlite/sqlite.go)
Add comprehensive unit tests
Add concurrency tests (multiple batch writers)
Add performance benchmarks

Phase 2: CLI Integration

Add bd create-batch command (or internal use only?)
Update import.go to use CreateIssues for bulk imports
Test with real JSONL imports

Phase 3: Documentation

Document CreateIssues API (godoc)
Add batch import example
Update EXTENDING.md with batch usage
Performance notes in README

Phase 4: Optimization (if needed)

Profile CreateIssues with 100, 1000, 10000 issues
Optimize to multi-VALUE INSERT if needed
Consider batch size limits (split large batches)

Testing Strategy

Unit Tests

func TestCreateIssues_Empty(t *testing.T)
func TestCreateIssues_Single(t *testing.T)
func TestCreateIssues_Multiple(t *testing.T)
func TestCreateIssues_WithExplicitIDs(t *testing.T)
func TestCreateIssues_MixedIDs(t *testing.T)
func TestCreateIssues_ValidationError(t *testing.T)
func TestCreateIssues_DuplicateID(t *testing.T)
func TestCreateIssues_RollbackOnError(t *testing.T)

Concurrency Tests

func TestCreateIssues_Concurrent(t *testing.T) {
    // 10 goroutines each creating 100 issues
    // Verify no ID collisions
    // Verify all issues created
}

func TestCreateIssues_MixedWithCreateIssue(t *testing.T) {
    // Concurrent CreateIssue + CreateIssues
    // Verify no ID collisions
}

Performance Benchmarks

func BenchmarkCreateIssue_Sequential(b *testing.B)
func BenchmarkCreateIssues_Batch(b *testing.B)

// Expected results (100 issues):
// CreateIssue x100:  ~600-1200ms
// CreateIssues:      ~60-130ms
// Speedup:           5-10x

Integration Tests

func TestImport_LargeJSONL(t *testing.T) {
    // Import 1000 issues from JSONL
    // Verify all created correctly
    // Verify performance < 1s
}

Migration Plan

Step 1: Add Interface Method (Non-Breaking)

// storage/storage.go
type Storage interface {
    CreateIssue(ctx context.Context, issue *types.Issue, actor string) error
    CreateIssues(ctx context.Context, issues []*types.Issue, actor string) error  // NEW
    // ... rest unchanged
}

Step 2: Implement SQLiteStorage.CreateIssues

Follow implementation strategy above

Step 3: Add Tests

Comprehensive unit + concurrency + benchmark tests

Step 4: Update Import (Optional)

// cmd/bd/import.go - replace loop with batch
func importIssues(store Storage, issues []*Issue) error {
    // Old:
    // for _, issue := range issues {
    //     store.CreateIssue(ctx, issue, "import")
    // }

    // New:
    return store.CreateIssues(ctx, issues, "import")
}

Note: Start with internal use (import), expose CLI later if needed

Step 5: Performance Testing

# Generate test JSONL
bd export > backup.jsonl

# Duplicate 100x for stress test
cat backup.jsonl backup.jsonl ... > large_test.jsonl

# Test import performance
time bd import large_test.jsonl

Future Enhancements (NOT for bd-222)

Batch Size Limits

If very large batches cause memory issues:

func (s *SQLiteStorage) CreateIssues(ctx, issues, actor) error {
    const maxBatchSize = 1000

    for i := 0; i < len(issues); i += maxBatchSize {
        end := min(i+maxBatchSize, len(issues))
        batch := issues[i:end]

        if err := s.createIssuesBatch(ctx, batch, actor); err != nil {
            return fmt.Errorf("batch %d-%d failed: %w", i, end, err)
        }
    }
    return nil
}

Decision: Don't implement until we see issues with large batches (>1000)

Validation-Only Mode

Pre-validate batch without creating:

func (s *SQLiteStorage) ValidateIssues(ctx, issues) error

Use Case: Dry-run before bulk import

Decision: Add if import workflows request it

Progress Callbacks

Report progress for long-running batches:

type BatchProgress func(completed, total int)

func (s *SQLiteStorage) CreateIssuesWithProgress(ctx, issues, actor, progress) error

Decision: Add if agent workflows request it (likely for 1000+ issue batches)

Performance Analysis

Baseline (CreateIssue loop)

For 100 issues:

Connection overhead:  100ms   (1ms × 100)
Transaction overhead: 300ms   (3ms × 100, with lock contention)
ID generation:        250ms   (2.5ms × 100)
Insert + event:       250ms   (2.5ms × 100)
Total:                900ms

With CreateIssues

For 100 issues:

Connection overhead:   1ms    (1 connection)
Transaction overhead:  5ms    (1 transaction)
ID range generation:   15ms   (1 query, more complex)
Bulk insert (prep):    50ms   (prepared stmt × 100)
Bulk events (prep):    30ms   (prepared stmt × 100)
Bulk dirty (prep):     20ms   (prepared stmt × 100)
Commit:                5ms
Total:                 126ms  (7x faster)

Scalability

Issues	CreateIssue Loop	CreateIssues	Speedup
10	90ms	30ms	3x
100	900ms	126ms	7x
1000	9s	800ms	11x
10000	90s	6s	15x

Key Insight: Speedup increases with batch size due to fixed overhead amortization

Why This Solution Wins

For Individual Devs & Small Teams

Zero impact on normal workflow: CreateIssue unchanged
Fast imports: 1000 issues in <1s instead of 10s
Simple mental model: All-or-nothing batch
No new concepts: Same semantics as CreateIssue, just faster

For Agent Swarms

Efficient decomposition: Agent creates 50 subtasks in one call
Atomic work generation: All issues created or none
No connection exhaustion: One connection per batch
Safe concurrency: BEGIN IMMEDIATE prevents races

For New Codebase

Non-breaking change: Additive API only
Performance win: 5-15x faster for bulk operations
Simple implementation: ~200 LOC, similar to CreateIssue
Battle-tested pattern: Same transaction semantics as CreateIssue

Alternatives Considered and Rejected

Alternative 1: Auto-Batch in CreateIssue

Automatically detect rapid CreateIssue calls and batch them.

Why Rejected:

❌ Magical behavior (implicit batching)
❌ Complex implementation (goroutine + timer + coordination)
❌ Race conditions and edge cases
❌ Unpredictable performance (when does batch trigger?)
❌ Can't guarantee atomicity across auto-batch boundary

Alternative 2: Separate Import API

Add ImportIssues specifically for JSONL import, not general-purpose.

Why Rejected:

❌ Limits use cases (what about agent workflows?)
❌ Name doesn't match behavior (it's just batch create)
❌ CreateIssues is more discoverable and general

Alternative 3: Streaming API

type IssueStream interface {
    Send(*Issue) error
    CloseAndCommit() error
}
func (s *SQLiteStorage) CreateIssueStream(ctx, actor) (IssueStream, error)

Why Rejected:

❌ More complex API (stateful stream object)
❌ Error handling complexity (partial writes?)
❌ Doesn't match Go/SQL idioms
❌ Caller must manage stream lifecycle
❌ Simple slice is easier to work with

Conclusion

The simple all-or-nothing batch API (CreateIssues) is the best solution because:

Significant performance win: 5-15x faster for bulk operations
Simple API: Just like CreateIssue but with slice
Safe: Atomic transaction, no partial state
Non-breaking: Existing CreateIssue unchanged
Flexible: Supports mixed ID assignment (auto + explicit)
Proven pattern: Same transaction semantics as CreateIssue

The key insight is atomic ID range reservation - updating the counter once for N issues instead of N times. Combined with a single transaction and prepared statements, this provides major performance improvements without complexity.

This aligns perfectly with beads' goals: simple for individual devs, efficient for bulk operations, robust for agent swarms.

Implementation size: ~200 LOC + ~400 LOC tests = manageable, low-risk change Expected performance: 5-15x faster for bulk operations (N > 10) Risk: Low (additive API, comprehensive tests)

27 KiB Raw Blame History Unescape Escape

Ultrathink: Batching API for Bulk Issue Creation (bd-222)

Executive Summary

Dependencies & Implementation Order

Critical Dependency: bd-224 (status/closed_at invariant)

Implementation Impact

Import Code Simplification

Recommended Implementation Sequence

Current State Analysis

How CreateIssue Works (sqlite.go:315-453)

Performance Characteristics

When Does This Matter?

Solution Options

Option A: Simple All-or-Nothing Batch ⭐ RECOMMENDED

Option B: Partial Success with Error Details

Option C: Batch with Configurable Strategy

Option D: Internal Optimization Only (No API Change)

Recommended Solution: Simple All-or-Nothing Batch (Option A)

API Design

Implementation Strategy

Phase 1: Validation

Phase 2: Connection & Transaction

Phase 3: Batch ID Generation

Phase 4: Bulk Insert Issues

Phase 5: Bulk Record Events

Phase 6: Bulk Mark Dirty

Phase 7: Commit

Design Decisions & Tradeoffs

Decision 1: All-or-Nothing Atomicity ✅

Decision 2: Same Transaction Semantics as CreateIssue ✅

Decision 3: Atomic ID Range Reservation ✅

Decision 4: Support Mixed ID Assignment ✅

Decision 5: Prepared Statements Over Multi-VALUE INSERT ✅

Decision 6: Keep CreateIssue Unchanged ✅

When to Use Which API

Use CreateIssue (existing)

Use CreateIssues (new)

Implementation Checklist

Phase 1: Core Implementation ✅

Phase 2: CLI Integration

Phase 3: Documentation

Phase 4: Optimization (if needed)

Testing Strategy

Unit Tests

Concurrency Tests

Performance Benchmarks

Integration Tests

Migration Plan

Step 1: Add Interface Method (Non-Breaking)

Step 2: Implement SQLiteStorage.CreateIssues

Step 3: Add Tests

Step 4: Update Import (Optional)

Step 5: Performance Testing

Future Enhancements (NOT for bd-222)

Batch Size Limits

Validation-Only Mode

Progress Callbacks

Performance Analysis

Baseline (CreateIssue loop)

With CreateIssues

Scalability

Why This Solution Wins

For Individual Devs & Small Teams

For Agent Swarms

For New Codebase

Alternatives Considered and Rejected

Alternative 1: Auto-Batch in CreateIssue

Alternative 2: Separate Import API

Alternative 3: Streaming API

Conclusion

27 KiB

Raw Blame History