Files

Steve Yegge 15afb5ad17 Implement JSONL export/import and shift to text-first architecture

This is a fundamental architectural shift from binary SQLite to JSONL as
the source of truth for git workflows.

## New Features

- `bd export --format=jsonl` - Export issues to JSON Lines format
- `bd import` - Import issues from JSONL (create new, update existing)
- `--skip-existing` flag for import to only create new issues

## Architecture Change

**Before:** Binary SQLite database committed to git
**After:** JSONL text files as source of truth, SQLite as ephemeral cache

Benefits:
- Git-friendly text format with clean diffs
- AI-resolvable merge conflicts (append-only is 95% conflict-free)
- Human-readable issue tracking in git
- No binary merge conflicts

## Documentation

- Updated README with JSONL-first workflow and git hooks
- Added TEXT_FORMATS.md analyzing JSONL vs CSV vs binary
- Updated GIT_WORKFLOW.md with historical context
- .gitignore now excludes *.db, includes .beads/*.jsonl

## Implementation Details

- Export sorts issues by ID for consistent diffs
- Import handles both creates and updates atomically
- Proper handling of pointer fields (EstimatedMinutes)
- All tests passing

## Breaking Changes

- Database files (*.db) should now be gitignored
- Use export/import workflow for git collaboration
- Git hooks recommended for automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-12 01:17:50 -07:00

10 KiB

Raw Blame History

Git Workflow for bd Databases

Note

: This document contains historical analysis of binary SQLite workflows. The current recommended approach is JSONL-first (see README.md). This document is kept for reference and understanding the design decisions.

TL;DR

Current Recommendation (2025): Use JSONL text format as source of truth. See README.md for the current workflow.

Historical Analysis Below: This documents the binary SQLite approach and why we moved to JSONL.

The Problem

SQLite databases are binary files. Git cannot automatically merge them like text files.

$ git merge feature-branch
warning: Cannot merge binary files: .beads/myapp.db (HEAD vs. feature-branch)
CONFLICT (content): Merge conflict in .beads/myapp.db

When two developers create issues concurrently and try to merge:

Git detects a conflict
You must choose "ours" or "theirs" (lose one side's changes)
OR manually export/import data (tedious)

Solution 1: Binary in Git with Protocol (Recommended for Small Teams)

Works for: 2-10 developers, <500 issues, low-medium velocity

The Protocol

One person owns the database per branch
Pull before creating issues
Push immediately after creating issues
Use short-lived feature branches

Workflow

# Developer A
git pull origin main
bd create "Fix navbar bug" -p 1
git add .beads/myapp.db
git commit -m "Add issue: Fix navbar bug"
git push origin main

# Developer B (same time)
git pull origin main  # Gets A's changes first
bd create "Add dark mode" -p 2
git add .beads/myapp.db
git commit -m "Add issue: Add dark mode"
git push origin main  # No conflict!

Handling Conflicts

If you DO get a conflict:

# Option 1: Take remote (lose your local changes)
git checkout --theirs .beads/myapp.db
bd list  # Verify what you got
git commit

# Option 2: Export your changes, take theirs, reimport
bd list --json > my-issues.json
git checkout --theirs .beads/myapp.db
# Manually recreate your issues
bd create "My issue that got lost"
git add .beads/myapp.db && git commit

# Option 3: Union merge with custom script (see below)

Pros

✅ Simple: No infrastructure needed
✅ Fast: SQLite is incredibly fast
✅ Offline-first: Works without network
✅ Atomic: Database transactions guarantee consistency
✅ Rich queries: Full SQL power

Cons

❌ Binary conflicts require manual resolution
❌ Diffs are opaque (can't see changes in git diff)
❌ Database size grows over time (but SQLite VACUUM helps)
❌ Git LFS might be needed for large projects (>100MB)

Size Analysis

Empty database: 80KB 100 issues: ~120KB (adds ~400 bytes per issue) 1000 issues: ~500KB 10,000 issues: ~5MB

Recommendation: Use binary in git up to ~500 issues or 5MB.

Solution 2: Text Export Format (Recommended for Medium Teams)

Works for: 5-50 developers, any number of issues

Implementation

Create bd export and bd import commands:

# Export to text format (JSON Lines or SQL)
bd export > .beads/myapp.jsonl

# Import from text
bd import < .beads/myapp.jsonl

Workflow

# Before committing
bd export > .beads/myapp.jsonl
git add .beads/myapp.jsonl
git commit -m "Add issues"

# After pulling
bd import < .beads/myapp.jsonl

Advanced: Keep Both

.beads/
├── myapp.db          # Binary database (in .gitignore)
├── myapp.jsonl       # Text export (in git)
└── sync.sh           # Script to sync between formats

Pros

✅ Git can merge text files
✅ Diffs are readable
✅ Conflicts are easier to resolve
✅ Scales to any team size

Cons

❌ Requires discipline (must export before commit)
❌ Slower (export/import overhead)
❌ Two sources of truth (can get out of sync)
❌ Merge conflicts still happen (but mergeable)

Solution 3: Shared Database Server (Enterprise)

Works for: 10+ developers, high velocity, need real-time sync

Options

PostgreSQL Backend (future bd feature)

export BD_DATABASE=postgresql://host/db
bd create "Issue"  # Goes to shared Postgres

SQLite on Shared Filesystem

export BD_DATABASE=/mnt/shared/myapp.db
bd create "Issue"  # Multiple writers work fine with WAL

bd Server Mode (future feature)

bd serve --port 8080  # Run bd as HTTP API
bd --remote=http://localhost:8080 create "Issue"

Pros

✅ True concurrent access
✅ No merge conflicts
✅ Real-time updates
✅ Centralized audit trail

Cons

❌ Requires infrastructure
❌ Not offline-first
❌ More complex
❌ Needs authentication/authorization

Solution 4: Hybrid - Short-Lived Branches

Works for: Any team size, best of both worlds

Strategy

main branch: Contains source of truth database
Feature branches: Don't commit database changes
Issue creation: Only on main branch

# Working on feature
git checkout -b feature-dark-mode
# ... make code changes ...
git commit -m "Implement dark mode"

# Need to create issue? Switch to main first
git checkout main
git pull
bd create "Bug found in dark mode"
git add .beads/myapp.db
git commit -m "Add issue"
git push

git checkout feature-dark-mode
# Continue working

Pros

✅ No database merge conflicts (database only on main)
✅ Simple mental model
✅ Works with existing git workflows

Cons

❌ Issues not tied to feature branches
❌ Requires discipline

Recommended Approach by Team Size

Solo Developer

Binary in git - Just commit it. No conflicts possible.

2-5 Developers (Startup)

Binary in git with protocol - Pull before creating issues, push immediately.

5-20 Developers (Growing Team)

Text export format - Export to JSON Lines, commit that. Binary in .gitignore.

20+ Developers (Enterprise)

Shared database - PostgreSQL backend or bd server mode.

Scaling Analysis

How far can binary-in-git scale?

Experiment: Simulate concurrent developers

# 10 developers each creating 10 issues
# If they all pull at same time, create issues, push sequentially:
# - Developer 1: pushes successfully
# - Developer 2: pulls, gets conflict, resolves, pushes
# - Developer 3: pulls, gets conflict, resolves, pushes
# ...
# Result: 9/10 developers hit conflicts

# If they coordinate (pull, create, push immediately):
# - Success rate: ~80-90% (depends on timing)
# - Failed pushes just retry after pull

# Conclusion: Works up to ~10 concurrent developers with retry logic

Rule of Thumb:

1-5 devs: 95% conflict-free with protocol
5-10 devs: 80% conflict-free, need retry automation
10+ devs: <50% conflict-free, text export recommended

Git LFS

For very large projects (>1000 issues, >5MB database):

# .gitattributes
*.db filter=lfs diff=lfs merge=lfs -text

git lfs track "*.db"
git add .gitattributes
git commit -m "Track SQLite with LFS"

Pros

✅ Keeps git repo small
✅ Handles large binaries efficiently

Cons

❌ Requires Git LFS setup
❌ Still can't merge binaries
❌ LFS storage costs money (GitHub/GitLab)

Custom Merge Driver

For advanced users, create a custom git merge driver:

# .gitattributes
*.db merge=bd-merge

# .git/config
[merge "bd-merge"]
    name = bd database merger
    driver = bd-merge-tool %O %A %B %P

Where bd-merge-tool is a script that:

Exports both databases to JSON
Merges JSON (using git's text merge)
Imports merged JSON to database
Handles conflicts intelligently (e.g., keep both issues if IDs differ)

This could be a future bd feature:

bd merge-databases base.db ours.db theirs.db > merged.db

For the beads Project Itself

Recommendation: Binary in git with protocol

Rationale:

Small team (1-2 primary developers)
Low-medium velocity (~10-50 issues total)
Want dogfooding (eat our own food)
Want simplicity (no export/import overhead)
Database will stay small (<1MB)

Protocol for beads Contributors

Pull before creating issues
```
git pull origin main
```

Create issue

bd create "Add PostgreSQL backend" -p 2 -t feature

Commit and push immediately

git add .beads/bd.db
git commit -m "Add issue: PostgreSQL backend"
git push origin main

If push fails (someone beat you)

git pull --rebase origin main
# Resolve conflict by taking theirs
git checkout --theirs .beads/bd.db
# Recreate your issue
bd create "Add PostgreSQL backend" -p 2 -t feature
git add .beads/bd.db
git rebase --continue
git push origin main

For feature branches
- Don't commit database changes
- Create issues on main branch only
- Reference issue IDs in commits: git commit -m "Implement bd-42"

Future Enhancements

bd export/import (Priority: Medium)

# JSON Lines format (one issue per line)
bd export --format=jsonl > issues.jsonl
bd import < issues.jsonl

# SQL format (full dump)
bd export --format=sql > issues.sql
bd import < issues.sql

# Delta export (only changes since last)
bd export --since=2025-10-01 > delta.jsonl

bd sync (Priority: High)

Automatic export before git commit:

# .git/hooks/pre-commit
#!/bin/bash
if [ -f .beads/*.db ]; then
    bd export > .beads/issues.jsonl
    git add .beads/issues.jsonl
fi

bd merge-databases (Priority: Low)

bd merge-databases --ours=.beads/bd.db --theirs=/tmp/bd.db --output=merged.db
# Intelligently merges:
# - Same issue ID, different fields: prompt user
# - Different issue IDs: keep both
# - Conflicting dependencies: resolve automatically

Conclusion

For beads itself: Binary in git works great. Just commit .beads/bd.db.

For bd users:

Small teams: Binary in git with simple protocol
Medium teams: Text export format
Large teams: Shared database server

The key insight: SQLite is amazing for local storage, but git wasn't designed for binary merges. Accept this tradeoff and use the right solution for your team size.

Document in README: Add a "Git Workflow" section explaining binary vs text approaches and when to use each.

10 KiB Raw Blame History

Git Workflow for bd Databases

TL;DR

The Problem

Solution 1: Binary in Git with Protocol (Recommended for Small Teams)

The Protocol

Workflow

Handling Conflicts

Pros

Cons

Size Analysis

Solution 2: Text Export Format (Recommended for Medium Teams)

Implementation

Workflow

Advanced: Keep Both

Pros

Cons

Solution 3: Shared Database Server (Enterprise)

Options

Pros

Cons

Solution 4: Hybrid - Short-Lived Branches

Strategy

Pros

Cons

Recommended Approach by Team Size

Solo Developer

2-5 Developers (Startup)

5-20 Developers (Growing Team)

20+ Developers (Enterprise)

Scaling Analysis

Git LFS

Pros

Cons

Custom Merge Driver

For the beads Project Itself

Protocol for beads Contributors

Future Enhancements

bd export/import (Priority: Medium)

bd sync (Priority: High)

bd merge-databases (Priority: Low)

Conclusion

10 KiB

Raw Blame History