- Created migrations/ subdirectory with 14 individual migration files - Reduced migrations.go from 680 to 98 lines (orchestration only) - Updated test imports to use migrations package - Updated MULTI_REPO_HYDRATION.md documentation - All tests passing
10 KiB
Multi-Repo Hydration Layer
This document describes the implementation of Task 3 from the multi-repo support feature (bd-307): the hydration layer that loads issues from multiple JSONL files into a unified SQLite database.
Overview
The hydration layer enables beads to aggregate issues from multiple repositories into a single database for unified querying and analysis. It uses file modification time (mtime) caching to optimize performance by only reimporting files that have changed.
Architecture
1. Database Schema
Table: repo_mtimes
CREATE TABLE repo_mtimes (
repo_path TEXT PRIMARY KEY, -- Absolute path to repository root
jsonl_path TEXT NOT NULL, -- Absolute path to .beads/issues.jsonl
mtime_ns INTEGER NOT NULL, -- Modification time in nanoseconds
last_checked DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);
This table tracks the last known modification time of each repository's JSONL file to enable intelligent skip logic during hydration.
2. Configuration
Multi-repo mode is configured via internal/config/config.go:
# .beads/config.yaml
repos:
primary: /path/to/primary/repo # Canonical source (optional)
additional: # Additional repos to hydrate from
- ~/projects/repo1
- ~/projects/repo2
- Primary repo (
.): Issues from this repo are marked withsource_repo = "." - Additional repos: Issues marked with their relative path as
source_repo
3. Implementation Files
New Files:
internal/storage/sqlite/multirepo.go- Core hydration logicinternal/storage/sqlite/multirepo_test.go- Test coveragedocs/MULTI_REPO_HYDRATION.md- This document
Modified Files:
internal/storage/sqlite/schema.go- Addedrepo_mtimestableinternal/storage/sqlite/migrations/013_repo_mtimes_table.go- Migration forrepo_mtimestableinternal/storage/sqlite/sqlite.go- Integrated hydration into storage initializationinternal/storage/sqlite/ready.go- Addedsource_repoto all SELECT queriesinternal/storage/sqlite/labels.go- Addedsource_repoto SELECT queryinternal/storage/sqlite/migrations_test.go- Added migration tests
Key Functions
HydrateFromMultiRepo(ctx context.Context) (map[string]int, error)
Main entry point for multi-repo hydration. Called automatically during sqlite.New().
Behavior:
- Returns
nil, nilif not in multi-repo mode (single-repo operation) - Processes primary repo first (if configured)
- Then processes each additional repo
- Returns a map of
source_repo -> issue countfor imported issues
hydrateFromRepo(ctx, repoPath, sourceRepo string) (int, error)
Handles hydration for a single repository.
Steps:
- Resolves absolute path to repo and JSONL file
- Checks file existence (skips if missing)
- Compares current mtime with cached mtime
- Skips import if mtime unchanged (optimization)
- Imports issues if file changed or no cache exists
- Updates mtime cache after successful import
importJSONLFile(ctx, jsonlPath, sourceRepo string) (int, error)
Parses a JSONL file and imports all issues into the database.
Features:
- Handles large files (10MB max line size)
- Skips empty lines and comments (
#) - Sets
source_repofield on all imported issues - Computes
content_hashif missing - Uses transactions for atomicity
- Imports dependencies, labels, and comments
upsertIssueInTx(ctx, tx, issue *types.Issue) error
Inserts or updates an issue within a transaction.
Smart Update Logic:
- Checks if issue exists by ID
- If new: inserts issue
- If exists: compares
content_hashand only updates if changed - Imports associated dependencies, labels, and comments
- Uses
INSERT OR IGNOREfor dependencies/labels to avoid duplicates
expandTilde(path string) (string, error)
Utility function to expand ~ and ~/ paths to absolute home directory paths.
Mtime Caching
The hydration layer uses file modification time (mtime) as a cache key to avoid unnecessary reimports.
Cache Logic:
- First hydration: No cache exists → import file
- Subsequent hydrations: Compare mtimes
- If
mtime_current == mtime_cached→ skip import (fast path) - If
mtime_current != mtime_cached→ reimport (file changed)
- If
- After successful import: Update cache with new mtime
Benefits:
- Performance: Avoids parsing/importing unchanged JSONL files
- Correctness: Detects external changes via filesystem metadata
- Simplicity: No need for content hashing or git integration
Limitations:
- Relies on filesystem mtime accuracy
- Won't detect changes if mtime is manually reset
- Cross-platform mtime precision varies (nanosecond on Unix, ~100ns on Windows)
Source Repo Tracking
Each issue has a source_repo field that identifies which repository it came from:
- Primary repo:
source_repo = "." - Additional repos:
source_repo = <relative_path>(e.g.,~/projects/repo1)
This enables:
- Filtering issues by source repository
- Understanding issue provenance in multi-repo setups
- Future features like repo-specific permissions or workflows
Database Schema:
ALTER TABLE issues ADD COLUMN source_repo TEXT DEFAULT '.';
CREATE INDEX idx_issues_source_repo ON issues(source_repo);
Testing
Comprehensive test coverage in internal/storage/sqlite/multirepo_test.go:
Test Cases
-
TestExpandTilde- Verifies tilde expansion for various path formats
-
TestHydrateFromMultiRepo/single-repo_mode_returns_nil- Confirms nil return when not in multi-repo mode
-
TestHydrateFromMultiRepo/hydrates_from_primary_repo- Validates primary repo import
- Checks
source_repo = "."is set correctly
-
TestHydrateFromMultiRepo/uses_mtime_caching_to_skip_unchanged_files- First hydration: imports 1 issue
- Second hydration: imports 0 issues (cached)
- Proves mtime cache optimization works
-
TestHydrateFromMultiRepo/imports_additional_repos- Creates primary + additional repo
- Verifies both are imported
- Checks source_repo fields are distinct
-
TestImportJSONLFile/imports_issues_with_dependencies_and_labels- Tests JSONL parsing with complex data
- Validates dependencies and labels are imported
- Confirms relational data integrity
-
TestMigrateRepoMtimesTable- Verifies migration creates table correctly
- Confirms migration is idempotent
Running Tests
# Run all multirepo tests
go test -v ./internal/storage/sqlite -run TestHydrateFromMultiRepo
# Run specific test
go test -v ./internal/storage/sqlite -run TestExpandTilde
# Run all sqlite tests
go test ./internal/storage/sqlite
Integration
Automatic Hydration
Hydration happens automatically during storage initialization:
// internal/storage/sqlite/sqlite.go
func New(path string) (*SQLiteStorage, error) {
// ... schema initialization ...
storage := &SQLiteStorage{db: db, dbPath: absPath}
// Skip for in-memory databases (used in tests)
if path != ":memory:" {
_, err := storage.HydrateFromMultiRepo(ctx)
if err != nil {
return nil, fmt.Errorf("failed to hydrate from multi-repo: %w", err)
}
}
return storage, nil
}
Configuration Example
.beads/config.yaml:
repos:
primary: /Users/alice/work/main-project
additional:
- ~/work/library-a
- ~/work/library-b
- /opt/shared/common-issues
Resulting database:
- Issues from
/Users/alice/work/main-project→source_repo = "." - Issues from
~/work/library-a→source_repo = "~/work/library-a" - Issues from
~/work/library-b→source_repo = "~/work/library-b" - Issues from
/opt/shared/common-issues→source_repo = "/opt/shared/common-issues"
Migration
The repo_mtimes table is created via standard migration system:
// internal/storage/sqlite/migrations/013_repo_mtimes_table.go
func MigrateRepoMtimesTable(db *sql.DB) error {
// Check if table exists
var tableName string
err := db.QueryRow(`
SELECT name FROM sqlite_master
WHERE type='table' AND name='repo_mtimes'
`).Scan(&tableName)
if err == sql.ErrNoRows {
// Create table + index
_, err := db.Exec(`
CREATE TABLE repo_mtimes (...);
CREATE INDEX idx_repo_mtimes_checked ON repo_mtimes(last_checked);
`)
return err
}
return nil // Already exists
}
Migration is idempotent: Safe to run multiple times, won't error on existing table.
Future Enhancements
- Incremental Sync: Instead of full reimport, use git hashes or checksums to sync only changed issues
- Conflict Resolution: Handle cases where same issue ID exists in multiple repos with different content
- Selective Hydration: Allow users to specify which repos to hydrate (CLI flag or config)
- Background Refresh: Periodically check for JSONL changes without blocking CLI operations
- Repository Metadata: Track repo URL, branch, last commit hash for better provenance
Performance Considerations
Mtime Cache Hit (fast path):
- 1 SQL query per repo (check cached mtime)
- No file I/O if mtime matches
- Typical latency: <1ms per repo
Mtime Cache Miss (import path):
- 1 SQL query (check cache)
- 1 file read (parse JSONL)
- N SQL inserts/updates (where N = issue count)
- 1 SQL update (cache mtime)
- Typical latency: 10-100ms for 100 issues
Optimization Tips:
- Place frequently-changing repos in primary position
- Use
.beads/config.yamlinstead of env vars (faster viper access) - Limit
additionalrepos to ~10 for reasonable startup time
Troubleshooting
Hydration not working?
- Check config:
bd config listshould showrepos.primaryorrepos.additional - Verify JSONL exists:
ls -la /path/to/repo/.beads/issues.jsonl - Check logs: Set
BD_DEBUG=1to see hydration debug output
Issues not updating?
- Mtime cache might be stale
- Force refresh by deleting cache:
DELETE FROM repo_mtimes WHERE repo_path = '/path/to/repo' - Or touch the JSONL file:
touch /path/to/repo/.beads/issues.jsonl
Performance issues?
- Check repo count:
SELECT COUNT(*) FROM repo_mtimes - Measure hydration time with
BD_DEBUG=1 - Consider reducing
additionalrepos if startup is slow
See Also
- CONFIG.md - Configuration system documentation
- EXTENDING.md - Database schema extension guide
- bd-307 - Original multi-repo feature request