Commit Graph

21 Commits

Author SHA1 Message Date
Ryan
690c73fc31 Performance Improvements (#319)
* feat: add performance testing framework foundation

Implements foundation for comprehensive performance testing and user
diagnostics for beads databases at 10K-20K scale.

Components added:
- Fixture generator (internal/testutil/fixtures/) for realistic test data
  * LargeSQLite/XLargeSQLite: 10K/20K issues with epic hierarchies
  * LargeFromJSONL/XLargeFromJSONL: test JSONL import path
  * Realistic cross-linked dependencies, labels, assignees
  * Reproducible with seeded RNG

- User diagnostics (bd doctor --perf) for field performance data
  * Collects platform info (OS, arch, Go/SQLite versions)
  * Measures key operation timings (ready, list, show, search)
  * Generates CPU profiles for bug reports
  * Clean separation in cmd/bd/doctor/perf.go

Test data characteristics:
- 10% epics, 30% features, 60% tasks
- 4-level hierarchies (Epic → Feature → Task → Subtask)
- 20% cross-epic blocking dependencies
- Realistic status/priority/label distributions

Supports bd-l954 (Performance Testing Framework epic)
Closes bd-6ed8, bd-q59i

* perf: optimize GetReadyWork with compound index (20x speedup)

Add compound index on dependencies(depends_on_id, type, issue_id) to
eliminate performance bottleneck in GetReadyWork recursive CTE query.

Performance improvements (10K issue database):
- GetReadyWork: 752ms → 36.6ms (20.5x faster)
- Target: <50ms ✓ ACHIEVED
- 20K database: ~1500ms → 79.4ms (19x faster)

Benchmark infrastructure enhancements:
- Add dataset caching in /tmp/beads-bench-cache/ to avoid regenerating
  10K-20K issues on every benchmark run (first run: ~2min, subsequent: <5s)
- Add progress logging during fixture generation (shows 10%, 20%... completion)
- Add database size logging (17.5 MB for 10K, 35.1 MB for 20K)
- Document rationale for only benchmarking large datasets (>10K issues)
- Add CPU/trace profiling with --profile flag for performance debugging

Schema changes:
- internal/storage/sqlite/schema.go: Add idx_dependencies_depends_on_type_issue

New files:
- internal/storage/sqlite/bench_helpers_test.go: Reusable benchmark setup with caching
- internal/storage/sqlite/sqlite_bench_test.go: Comprehensive benchmarks for critical operations
- Makefile: Convenient benchmark execution (make bench-quick, make bench)

Related:
- Resolves bd-5qim (optimize GetReadyWork performance)
- Builds on bd-6ed8 (fixture generator), bd-q59i (bd doctor --perf)

* perf: add WASM compilation cache to eliminate cold-start overhead

Configure wazero compilation cache for ncruces/go-sqlite3 to avoid
~220ms JIT compilation on every process start.

Cache configuration:
- Location: ~/.cache/beads/wasm/ (platform-specific via os.UserCacheDir)
- Automatic version management: wazero keys entries by its version
- Fallback: in-memory cache if directory creation fails
- No cleanup needed: old versions are harmless (~5-10MB each)

Performance impact:
- First run: ~220ms (populate cache)
- Subsequent runs: ~20ms (load from cache)
- Savings: ~200ms per cold start

Cache invalidation:
- Automatic when wazero version changes (upgrades use new cache dir)
- Manual cleanup: rm -rf ~/.cache/beads/wasm/ (safe to delete anytime)

This complements daemon mode:
- Daemon mode: eliminates startup cost by keeping process alive
- WASM cache: reduces startup cost for one-off commands or daemon restarts

Changes:
- internal/storage/sqlite/sqlite.go: Add init() with cache setup

* refactor: improve maintainability of performance testing code

Extract common patterns and eliminate duplication across benchmarks, fixture generation, and performance diagnostics. Replace magic numbers with explicit configuration to improve readability and make it easier to tune test parameters.

* docs: clarify profiling behavior and add missing documentation

Add explanatory comments for profiling setup to clarify why --profile
forces direct mode (captures actual database operations instead of RPC
overhead) and document the stopCPUProfile function's role in flushing
profile data to disk. Also fix gosec G104 linter warning by explicitly
ignoring Close() error during cleanup.

* fix: prevent bench-quick from running indefinitely

Added //go:build bench tags and skipped timeout-prone benchmarks to
prevent make bench-quick from running for hours.

Changes:
- Add //go:build bench tag to cycle_bench_test.go and compact_bench_test.go
- Skip Dense graph benchmarks (documented to timeout >120s)
- Fix compact benchmark prefix: bd- → bd (validation expects prefix without trailing dash)

Before: make bench-quick ran for 3.5+ hours (12,699s) before manual interrupt
After: make bench-quick completes in ~25 seconds

The Dense graph benchmarks are known to timeout and represent rare edge
cases that don't need optimization for typical workflows.
2025-11-15 12:46:13 -08:00
Steve Yegge
8be792a460 Fix external_ref migration failure on old databases
The schema initialization was trying to create an index on the external_ref
column before the migration that adds the column runs. This caused 'no such
column: external_ref' errors when opening very old databases (pre-0.17.5).

Solution: Move the index creation into the migration that adds the column.

Fixes #284

Amp-Thread-ID: https://ampcode.com/threads/T-2744d5a7-168f-4ef6-bcab-926db846de20
Co-authored-by: Amp <amp@ampcode.com>
2025-11-10 10:50:39 -08:00
Steve Yegge
05529fe4c0 Implement multi-repo hydration layer with mtime caching (bd-307)
- Add repo_mtimes table to track JSONL file modification times
- Implement HydrateFromMultiRepo() with mtime-based skip optimization
- Support tilde expansion for repo paths in config
- Add source_repo column via migration (not in base schema)
- Fix schema to allow migration on existing databases
- Comprehensive test coverage for hydration logic
- Resurrect missing parent issues bd-cb64c226 and bd-cbed9619

Implementation:
- internal/storage/sqlite/multirepo.go - Core hydration logic
- internal/storage/sqlite/multirepo_test.go - Test coverage
- docs/MULTI_REPO_HYDRATION.md - Documentation

Schema changes:
- source_repo column added via migration only (not base schema)
- repo_mtimes table for mtime caching
- All SELECT queries updated to include source_repo

Database recovery:
- Restored from 17 to 285 issues
- Created placeholder parents for orphaned hierarchical children

Amp-Thread-ID: https://ampcode.com/threads/T-faa1339a-14b2-426c-8e18-aa8be6f5cde6
Co-authored-by: Amp <amp@ampcode.com>
2025-11-04 23:12:41 -08:00
Steve Yegge
55c722a3e3 Implement external_ref as primary matching key for import updates (bd-1022)
- Add GetIssueByExternalRef() query function to storage interface and implementations
- Update DetectCollisions() to prioritize external_ref matching over ID matching
- Modify upsertIssues() to handle external_ref matches in import logic
- Add index on external_ref column for performance
- Add comprehensive tests for external_ref matching in both collision detection and import
- Enables re-syncing from external systems (Jira, GitHub, Linear) without duplicates
- Preserves local issues (no external_ref) from being overwritten
2025-11-02 15:28:09 -08:00
Steve Yegge
e3afecca37 Remove sequential ID code path (bd-aa744b)
- Removed nextSequentialID() and getIDMode() functions
- Removed issue_counters table from schema
- Made SyncAllCounters() a no-op for backward compatibility
- Simplified ID generation to hash-only (adaptive length)
- Removed id_mode config setting
- Removed sequential ID tests and migration code
- Updated CONFIG.md and AGENTS.md to remove sequential ID references

Follow-up bd-2a70 will remove obsolete test files and renumber command.
2025-10-30 21:51:39 -07:00
Steve Yegge
2b05ec65f8 Implement 6-char progressive hash IDs (bd-166, bd-167)
- Hash ID generation now returns full 64-char SHA256
- Progressive collision handling: 6→7→8 chars on INSERT failure
- Added child_counters table for hierarchical IDs
- Updated all docs to reflect 6-char design
- Collision math: 97% of 1K issues stay at 6 chars

Next: Implement progressive retry logic in CreateIssue (bd-168)
Amp-Thread-ID: https://ampcode.com/threads/T-9931c1b7-c989-47a1-8e6a-a04469bd937d
Co-authored-by: Amp <amp@ampcode.com>
2025-10-30 14:04:03 -07:00
Steve Yegge
d9eb273e15 Complete bd-95: Add content-addressable identity (ContentHash field) 2025-10-28 18:57:16 -07:00
Steve Yegge
a898df6915 WIP: bd-164 timestamp-only export deduplication (~80% complete)
Implemented content hash-based deduplication to skip exports when only
timestamps changed. Core logic complete, needs export_hashes table wiring.

Completed:
- Added computeIssueContentHash() excluding timestamps
- Created shouldSkipExport() logic
- Updated export loop to skip timestamp-only changes
- Added hash.go with content hashing
- Extended Storage interface

Remaining:
- Complete export_hashes table migration
- Add SetExportHash/GetExportHash to interface
- Revert content_hash from dirty_issues approach
- Wire up hash persistence in export
- Testing

See bd-164 notes for details.

Amp-Thread-ID: https://ampcode.com/threads/T-d70657d1-4433-4f7e-b10a-3fccf8bf17fb
Co-authored-by: Amp <amp@ampcode.com>
2025-10-26 20:29:10 -07:00
Steve Yegge
a28d4fe4c7 Add comments feature (bd-162)
- Add comments table to SQLite schema
- Add Comment type to internal/types
- Implement AddIssueComment and GetIssueComments in storage layer
- Update JSONL export/import to include comments
- Add comments to 'bd show' output
- Create 'bd comments' CLI command structure
- Fix UpdateIssueID to update comments table and defer FK checks
- Add GetIssueComments/AddIssueComment to Storage interface

Note: CLI command needs daemon RPC support (tracked in bd-163)
Amp-Thread-ID: https://ampcode.com/threads/T-ece10dd1-cf64-48ff-9adb-dd304d0bcb25
Co-authored-by: Amp <amp@ampcode.com>
2025-10-19 18:28:41 -07:00
Steve Yegge
65f59e6b01 Add compacted_at_commit field and git commit capture during compaction
- Add compacted_at_commit field to Issue type (bd-405)
- Add database schema and migration for new field
- Create GetCurrentCommitHash() helper function
- Update ApplyCompaction to store git commit hash (bd-395)
- Update compaction calls to capture current commit
- Update tests to verify commit hash storage
- All tests passing

Amp-Thread-ID: https://ampcode.com/threads/T-5518cccb-7fc9-4dcd-ba5a-e22cd10e45d7
Co-authored-by: Amp <amp@ampcode.com>
2025-10-16 17:43:38 -07:00
Steve Yegge
5f6aac5fb1 Implement snapshot creation and restoration for compaction (bd-256)
- Add compaction_snapshots table to schema with proper indexes
- Implement CreateSnapshot, RestoreFromSnapshot, GetSnapshots functions
- Use UTC timestamps throughout
- RestoreFromSnapshot uses transactions with optimistic concurrency control
- Add validation for levels and issue_id matching
- Prevent race conditions with compaction_level guard
- Create bd-268 to explore lightweight SQL alternatives

Amp-Thread-ID: https://ampcode.com/threads/T-3bdd0d6b-9212-4e4e-b22d-f658949df7a9
Co-authored-by: Amp <amp@ampcode.com>
2025-10-15 23:20:21 -07:00
Steve Yegge
1c5a4a9c70 Add compaction schema and candidate identification
- Added compaction columns to issues table (compaction_level, compacted_at, original_size)
- Created issue_snapshots table for snapshot storage before compaction
- Added compaction configuration with opt-in flag (compaction_enabled=false by default)
- Implemented GetTier1Candidates and GetTier2Candidates queries
- Added CheckEligibility validation function
- Comprehensive tests for all compaction queries
- Idempotent migrations for existing databases

Closes bd-252, bd-253, bd-254

Amp-Thread-ID: https://ampcode.com/threads/T-c4d7acd1-c161-4b80-9d80-a0691e8fa87b
Co-authored-by: Amp <amp@ampcode.com>
2025-10-15 22:26:11 -07:00
Steve Yegge
d2b50e6cdc Add closed_at timestamp tracking to issues
- Add closed_at field to Issue type with JSON marshaling
- Implement closed_at timestamp in SQLite storage layer
- Update import/export to handle closed_at field
- Add comprehensive tests for closed_at functionality
- Maintain backward compatibility with existing databases

Amp-Thread-ID: https://ampcode.com/threads/T-f3a7799b-f91e-4432-a690-aae0aed364b3
Co-authored-by: Amp <amp@ampcode.com>
2025-10-15 14:52:29 -07:00
Steve Yegge
4479bc41e6 fix: Update ready_issues VIEW to use hierarchical blocking
The ready_issues VIEW was using old logic that didn't propagate blocking
through parent-child hierarchies. This caused inconsistency with the
GetReadyWork() function for users querying via sqlite3 CLI.

Changes:
- Updated VIEW to use same recursive CTE as GetReadyWork()
- Added test to verify VIEW and function produce identical results
- No migration needed (CREATE VIEW IF NOT EXISTS handles recreation)

The VIEW is documented in WORKFLOW.md for direct SQL queries and is now
consistent with the function-based API.

Resolves: bd-60

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 13:07:30 -07:00
Steve Yegge
1dd3109489 perf: Add composite index on dependencies(depends_on_id, type)
The hierarchical blocking query recursively joins on dependencies with
a type filter. Without a composite index, SQLite must scan all
dependencies for a given depends_on_id and filter by type afterward.

With 10k+ issues and many dependencies per issue, this could cause
noticeable slowdowns in ready work calculations.

Changes:
- Added idx_dependencies_depends_on_type composite index to schema
- Added automatic migration for existing databases
- Index creation is silent and requires no user intervention

The recursive CTE now efficiently seeks (depends_on_id, type) pairs
directly instead of post-filtering.

Resolves: bd-59

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 13:02:22 -07:00
Steve Yegge
2bd0f11698 feat: Add metadata table for internal state storage
The other agent added a metadata table for storing internal state
like import hashes. This is separate from the config table which
is for user-facing configuration.

🤖 Generated by other agent
2025-10-14 02:51:15 -07:00
Steve Yegge
e6be7dd3e8 feat: Add external_ref field for linking to external issue trackers
Add nullable external_ref TEXT field to link bd issues with external
systems like GitHub Issues, Jira, etc. Includes automatic schema
migration for backward compatibility.

Changes:
- Added external_ref column to issues table with feature-based migration
- Updated Issue struct with ExternalRef *string field
- Added --external-ref flag to bd create and bd update commands
- Updated all SQL queries across the codebase to include external_ref:
  - GetIssue, CreateIssue, UpdateIssue, SearchIssues
  - GetDependencies, GetDependents, GetDependencyTree
  - GetReadyWork, GetBlockedIssues, GetIssuesByLabel
- Added external_ref handling in import/export logic
- Follows existing patterns for nullable fields (sql.NullString)

This enables tracking relationships between bd issues and external
systems without requiring changes to existing databases or JSONL files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 02:43:10 -07:00
v4rgas
20e3235435 fix: replace in-memory ID counter with atomic database counter
Replace the in-memory nextID counter with an atomic database-backed
counter using the issue_counters table. This fixes race conditions
when multiple processes create issues concurrently.

Changes:
- Add issue_counters table with atomic INSERT...ON CONFLICT pattern
- Remove in-memory nextID field and sync.Mutex from SQLiteStorage
- Implement getNextIDForPrefix() for atomic ID generation
- Update CreateIssue() to use database counter instead of memory
- Update RemapCollisions() to use database counter for collision resolution
- Clean up old planning and bug documentation files

Fixes the multi-process ID generation race condition tested in
cmd/bd/race_test.go.
2025-10-14 01:18:50 -07:00
Steve Yegge
bafb2801c5 Implement incremental JSONL export with dirty issue tracking
Optimize auto-flush by tracking which issues have changed instead of
exporting the entire database on every flush. For large projects with
1000+ issues, this provides significant performance improvements.

Changes:
- Add dirty_issues table to schema with issue_id and marked_at columns
- Implement dirty tracking functions in new dirty.go file:
  * MarkIssueDirty() - Mark single issue as needing export
  * MarkIssuesDirty() - Batch mark multiple issues efficiently
  * GetDirtyIssues() - Query which issues need export
  * ClearDirtyIssues() - Clear tracking after successful export
  * GetDirtyIssueCount() - Monitor dirty issue count
- Update all CRUD operations to mark affected issues as dirty:
  * CreateIssue, UpdateIssue, DeleteIssue
  * AddDependency, RemoveDependency (marks both issues)
  * AddLabel, RemoveLabel, AddEvent
- Modify export to support incremental mode:
  * Add --incremental flag to export only dirty issues
  * Used by auto-flush for performance
  * Full export still available without flag
- Add Storage interface methods for dirty tracking

Performance impact: With incremental export, large databases only write
changed issues instead of regenerating entire JSONL file on every
auto-flush.

Closes bd-39

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 00:17:23 -07:00
Steve Yegge
15afb5ad17 Implement JSONL export/import and shift to text-first architecture
This is a fundamental architectural shift from binary SQLite to JSONL as
the source of truth for git workflows.

## New Features

- `bd export --format=jsonl` - Export issues to JSON Lines format
- `bd import` - Import issues from JSONL (create new, update existing)
- `--skip-existing` flag for import to only create new issues

## Architecture Change

**Before:** Binary SQLite database committed to git
**After:** JSONL text files as source of truth, SQLite as ephemeral cache

Benefits:
- Git-friendly text format with clean diffs
- AI-resolvable merge conflicts (append-only is 95% conflict-free)
- Human-readable issue tracking in git
- No binary merge conflicts

## Documentation

- Updated README with JSONL-first workflow and git hooks
- Added TEXT_FORMATS.md analyzing JSONL vs CSV vs binary
- Updated GIT_WORKFLOW.md with historical context
- .gitignore now excludes *.db, includes .beads/*.jsonl

## Implementation Details

- Export sorts issues by ID for consistent diffs
- Import handles both creates and updates atomically
- Proper handling of pointer fields (EstimatedMinutes)
- All tests passing

## Breaking Changes

- Database files (*.db) should now be gitignored
- Use export/import workflow for git collaboration
- Git hooks recommended for automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 01:17:50 -07:00
Steve Yegge
704515125d Initial commit: Beads issue tracker with security fixes
Core features:
- Dependency-aware issue tracking with SQLite backend
- Ready work detection (issues with no open blockers)
- Dependency tree visualization
- Cycle detection and prevention
- Full audit trail
- CLI with colored output

Security and correctness fixes applied:
- Fixed SQL injection vulnerability in UpdateIssue (whitelisted fields)
- Fixed race condition in ID generation (added mutex)
- Fixed cycle detection to return full paths (not just issue IDs)
- Added cycle prevention in AddDependency (validates before commit)
- Added comprehensive input validation (priority, status, types, etc.)
- Fixed N+1 query in GetBlockedIssues (using GROUP_CONCAT)
- Improved query building in GetReadyWork (proper string joining)
- Fixed P0 priority filter bug (using Changed() instead of value check)

All critical and major issues from code review have been addressed.

🤖 Generated with Claude Code
2025-10-11 20:07:36 -07:00