diff --git a/COMPACTION_DESIGN.md b/COMPACTION_DESIGN.md deleted file mode 100644 index f3c06689..00000000 --- a/COMPACTION_DESIGN.md +++ /dev/null @@ -1,1654 +0,0 @@ -# Issue Database Compaction Design - -**Status:** Design Phase -**Created:** 2025-10-15 -**Target:** Beads v1.1 - -## Executive Summary - -Add intelligent database compaction to beads that uses Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context about past work. The design philosophy: **most work is throwaway, and forensic value decays exponentially with time**. - -### Key Metrics -- **Space savings:** 70-95% reduction in text volume for old issues -- **Cost:** ~$1.10 per 1,000 issues compacted (Haiku pricing) -- **Safety:** Full snapshot system with restore capability -- **Performance:** Batch processing with parallel workers - ---- - -## Motivation - -### The Problem - -Beads databases grow indefinitely: -- Issues accumulate detailed `description`, `design`, `notes`, `acceptance_criteria` fields -- Events table logs every change forever -- Old closed issues (especially those with all dependents closed) rarely need full detail -- Agent context windows work better with concise, relevant information - -### Why This Matters - -1. **Agent Efficiency:** Smaller databases → faster queries → clearer agent thinking -2. **Context Management:** Agents benefit from summaries of old work, not verbose details -3. **Git Performance:** Smaller JSONL exports → faster git operations -4. **Pragmatic Philosophy:** Beads is agent memory, not a historical archive -5. **Forensic Decay:** Need for detail decreases exponentially after closure - -### What We Keep - -- Issue ID and title (always) -- Semantic summary of what was done and why -- Key architectural decisions -- Closure outcome -- Full git history in JSONL commits (ultimate backup) -- Restore capability via snapshots - ---- - -## Technical Design - -### Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Compaction Pipeline │ -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ 1. Candidate Identification │ -│ ↓ │ -│ • Query closed issues meeting time + dependency criteria │ -│ • Check dependency depth (recursive CTE) │ -│ • Calculate size/savings estimates │ -│ │ -│ 2. Snapshot Creation │ -│ ↓ │ -│ • Store original content in issue_snapshots table │ -│ • Calculate content hash for verification │ -│ • Enable restore capability │ -│ │ -│ 3. Haiku Summarization │ -│ ↓ │ -│ • Batch process with worker pool (5 parallel) │ -│ • Different prompts for Tier 1 vs Tier 2 │ -│ • Handle API errors gracefully │ -│ │ -│ 4. Issue Update │ -│ ↓ │ -│ • Replace verbose fields with summary │ -│ • Set compaction_level and compacted_at │ -│ • Record event (EventCompacted) │ -│ • Mark dirty for JSONL export │ -│ │ -│ 5. Optional: Events Pruning │ -│ ↓ │ -│ • Keep only created/closed events for Tier 2 │ -│ • Archive detailed event history to snapshots │ -│ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Compaction Tiers - -#### **Tier 1: Standard Compaction** - -**Eligibility:** -- `status = 'closed'` -- `closed_at >= 30 days ago` (configurable: `compact_tier1_days`) -- All issues that depend on this one (via `blocks` or `parent-child`) are closed -- Dependency check depth: 2 levels (configurable: `compact_tier1_dep_levels`) - -**Process:** -1. Snapshot original content -2. Send to Haiku with 300-word summarization prompt -3. Store summary in `description` -4. Clear `design`, `notes`, `acceptance_criteria` -5. Set `compaction_level = 1` -6. Keep all events - -**Output Format (Haiku prompt):** -``` -**Summary:** [2-3 sentences: problem, solution, outcome] -**Key Decisions:** [bullet points of non-obvious choices] -**Resolution:** [how it was closed] -``` - -**Expected Reduction:** 70-85% of original text size - -#### **Tier 2: Aggressive Compaction** - -**Eligibility:** -- Already at `compaction_level = 1` -- `closed_at >= 90 days ago` (configurable: `compact_tier2_days`) -- All dependencies (all 4 types) up to 5 levels deep are closed -- One of: - - ≥100 git commits since `closed_at` (configurable: `compact_tier2_commits`) - - ≥500 new issues created since closure - - Manual override with `--force` - -**Process:** -1. Snapshot Tier 1 content -2. Send to Haiku with 150-word ultra-compression prompt -3. Store single paragraph in `description` -4. Clear all other text fields -5. Set `compaction_level = 2` -6. Prune events: keep only `created` and `closed`, move rest to snapshot - -**Output Format (Haiku prompt):** -``` -Single paragraph (≤150 words): -- What was built/fixed -- Why it mattered -- Lasting architectural impact (if any) -``` - -**Expected Reduction:** 90-95% of original text size - ---- - -### Schema Changes - -#### New Columns on `issues` Table - -```sql -ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0; -ALTER TABLE issues ADD COLUMN compacted_at DATETIME; -ALTER TABLE issues ADD COLUMN original_size INTEGER; -- bytes before compaction -``` - -#### New Table: `issue_snapshots` - -```sql -CREATE TABLE IF NOT EXISTS issue_snapshots ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - issue_id TEXT NOT NULL, - snapshot_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, - compaction_level INTEGER NOT NULL, -- 1 or 2 - original_size INTEGER NOT NULL, - compressed_size INTEGER NOT NULL, - -- JSON blob with original content - original_content TEXT NOT NULL, - -- Optional: compressed events for Tier 2 - archived_events TEXT, - FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE -); - -CREATE INDEX IF NOT EXISTS idx_snapshots_issue ON issue_snapshots(issue_id); -CREATE INDEX IF NOT EXISTS idx_snapshots_level ON issue_snapshots(compaction_level); -``` - -**Snapshot JSON Structure:** -```json -{ - "description": "original description text...", - "design": "original design text...", - "notes": "original notes text...", - "acceptance_criteria": "original criteria text...", - "title": "original title", - "hash": "sha256:abc123..." -} -``` - -#### New Config Keys - -```sql -INSERT INTO config (key, value) VALUES - -- Tier 1 settings - ('compact_tier1_days', '30'), - ('compact_tier1_dep_levels', '2'), - - -- Tier 2 settings - ('compact_tier2_days', '90'), - ('compact_tier2_dep_levels', '5'), - ('compact_tier2_commits', '100'), - ('compact_tier2_new_issues', '500'), - - -- API settings - ('anthropic_api_key', ''), -- Falls back to ANTHROPIC_API_KEY env var - ('compact_model', 'claude-3-5-haiku-20241022'), - - -- Performance settings - ('compact_batch_size', '50'), - ('compact_parallel_workers', '5'), - - -- Safety settings - ('auto_compact_enabled', 'false'), -- Opt-in - ('compact_events_enabled', 'false'), -- Events pruning (Tier 2) - - -- Display settings - ('compact_show_savings', 'true'); -- Show size reduction in output -``` - -#### New Event Type - -```go -const ( - // ... existing event types ... - EventCompacted EventType = "compacted" -) -``` - ---- - -### Haiku Integration - -#### Prompt Templates - -**Tier 1 Prompt:** -``` -Summarize this closed software issue. Preserve key decisions, implementation approach, and outcome. Max 300 words. - -Title: {{.Title}} -Type: {{.IssueType}} -Priority: {{.Priority}} - -Description: -{{.Description}} - -Design Notes: -{{.Design}} - -Implementation Notes: -{{.Notes}} - -Acceptance Criteria: -{{.AcceptanceCriteria}} - -Output format: -**Summary:** [2-3 sentences: what problem, what solution, what outcome] -**Key Decisions:** [bullet points of non-obvious choices] -**Resolution:** [how it was closed] -``` - -**Tier 2 Prompt:** -``` -Ultra-compress this old closed issue to ≤150 words. Focus on lasting architectural impact. - -Title: {{.Title}} -Original Summary (already compressed): -{{.Description}} - -Output a single paragraph covering: -- What was built/fixed -- Why it mattered -- Lasting impact (if any) - -If there's no lasting impact, just state what was done and that it's resolved. -``` - -#### API Client Structure - -```go -package compact - -import ( - "github.com/anthropics/anthropic-sdk-go" -) - -type HaikuClient struct { - client *anthropic.Client - model string -} - -func NewHaikuClient(apiKey string) *HaikuClient { - return &HaikuClient{ - client: anthropic.NewClient(apiKey), - model: anthropic.ModelClaude_3_5_Haiku_20241022, - } -} - -func (h *HaikuClient) Summarize(ctx context.Context, prompt string, maxTokens int) (string, error) -``` - -#### Error Handling - -- **Rate limits:** Exponential backoff with jitter -- **API errors:** Log and skip issue (don't fail entire batch) -- **Network failures:** Retry up to 3 times -- **Invalid responses:** Fall back to truncation with warning -- **Context length:** Truncate input if needed (rare, but possible) - ---- - -### Dependency Checking - -Reuse existing recursive CTE logic from `ready_issues` view, adapted for compaction: - -```sql --- Check if issue and N levels of dependents are all closed -WITH RECURSIVE dependent_tree AS ( - -- Base case: the candidate issue - SELECT id, status, 0 as depth - FROM issues - WHERE id = ? - - UNION ALL - - -- Recursive case: issues that depend on this one - SELECT i.id, i.status, dt.depth + 1 - FROM dependent_tree dt - JOIN dependencies d ON d.depends_on_id = dt.id - JOIN issues i ON i.id = d.issue_id - WHERE d.type IN ('blocks', 'parent-child') -- Only blocking deps matter - AND dt.depth < ? -- Max depth parameter -) -SELECT CASE - WHEN COUNT(*) = SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END) - THEN 1 ELSE 0 END as all_closed -FROM dependent_tree; -``` - -**Performance:** This query is O(N) where N is the number of dependents. With proper indexes, should be <10ms per issue. - ---- - -### Git Integration (Optional) - -For "project time" measurement via commit counting: - -```go -func getCommitsSince(closedAt time.Time) (int, error) { - cmd := exec.Command("git", "rev-list", "--count", - fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), - "HEAD") - output, err := cmd.Output() - if err != nil { - return 0, err - } - return strconv.Atoi(strings.TrimSpace(string(output))) -} -``` - -**Fallback:** If git unavailable or not in a repo, use issue counter delta: -```sql -SELECT last_id FROM issue_counters WHERE prefix = ? --- Store at close time, compare at compaction time -``` - ---- - -### CLI Commands - -#### `bd compact` - Main Command - -```bash -bd compact [flags] - -Flags: - --dry-run Show what would be compacted without doing it - --tier int Compaction tier (1 or 2, default: 1) - --all Process all eligible issues (default: preview only) - --id string Compact specific issue by ID - --force Bypass eligibility checks (with --id) - --batch-size int Issues per batch (default: from config) - --workers int Parallel workers (default: from config) - --json JSON output for agents - -Examples: - bd compact --dry-run # Preview Tier 1 candidates - bd compact --dry-run --tier 2 # Preview Tier 2 candidates - bd compact --all # Compact all Tier 1 candidates - bd compact --tier 2 --all # Compact all Tier 2 candidates - bd compact --id bd-42 # Compact specific issue - bd compact --id bd-42 --force # Force compact even if recent -``` - -#### `bd compact --restore` - Restore Compacted Issues - -```bash -bd compact --restore [flags] - -Flags: - --level int Restore to specific snapshot level (default: latest) - --json JSON output - -Examples: - bd compact --restore bd-42 # Restore bd-42 from latest snapshot - bd compact --restore bd-42 --level 1 # Restore from Tier 1 snapshot -``` - -#### `bd compact --stats` - Compaction Statistics - -```bash -bd compact --stats [flags] - -Flags: - --json JSON output - -Example output: - === Compaction Statistics === - - Total Issues: 1,247 - Compacted (Tier 1): 342 (27.4%) - Compacted (Tier 2): 89 (7.1%) - - Database Size: 2.3 MB - Estimated Uncompacted Size: 8.7 MB - Space Savings: 6.4 MB (73.6%) - - Candidates: - Tier 1: 47 issues (est. 320 KB → 64 KB) - Tier 2: 15 issues (est. 180 KB → 18 KB) - - Estimated Compaction Cost: $0.04 (Haiku) -``` - ---- - -### Output Examples - -#### Dry Run Output - -``` -$ bd compact --dry-run - -=== Tier 1 Compaction Preview === -Eligibility: Closed ≥30 days, 2 levels of dependents closed - -bd-42: Fix authentication bug (P1, bug) - Closed: 45 days ago - Size: 2,341 bytes → ~468 bytes (80% reduction) - Dependents: bd-43, bd-44 (both closed) - -bd-57: Add login form (P2, feature) - Closed: 38 days ago - Size: 1,823 bytes → ~365 bytes (80% reduction) - Dependents: (none) - -bd-58: Refactor auth middleware (P2, task) - Closed: 35 days ago - Size: 3,102 bytes → ~620 bytes (80% reduction) - Dependents: bd-59 (closed) - -... (44 more issues) - -Total: 47 issues -Estimated reduction: 87.3 KB → 17.5 KB (80%) -Estimated cost: $0.03 (Haiku API) - -Run with --all to compact these issues. -``` - -#### Compaction Progress - -``` -$ bd compact --all - -Compacting 47 issues (Tier 1)... - -Creating snapshots... [████████████████████] 47/47 -Calling Haiku API... [████████████████████] 47/47 (12s, $0.027) -Updating issues... [████████████████████] 47/47 - -✓ Successfully compacted 47 issues - Size reduction: 87.3 KB → 18.2 KB (79.1%) - API cost: $0.027 - Time: 14.3s - -Compacted issues will be exported to .beads/issues.jsonl -``` - -#### Show Compacted Issue - -``` -$ bd show bd-42 - -bd-42: Fix authentication bug [CLOSED] 🗜️ - -Status: closed (compacted L1) -Priority: 1 (High) -Type: bug -Closed: 2025-08-31 (45 days ago) -Compacted: 2025-10-15 (saved 1,873 bytes) - -**Summary:** Fixed race condition in JWT token refresh logic causing intermittent -401 errors under high load. Implemented mutex-based locking around token refresh -operations. All users can now stay authenticated reliably during concurrent requests. - -**Key Decisions:** -- Used sync.RWMutex instead of channels for simpler reasoning about lock state -- Added exponential backoff to token refresh to prevent thundering herd -- Preserved existing token format for backward compatibility with mobile clients - -**Resolution:** Deployed to production on Aug 31, monitored for 2 weeks with zero -401 errors. Closed after confirming fix with load testing. - ---- -💾 Restore: bd compact --restore bd-42 -📊 Original size: 2,341 bytes | Compressed: 468 bytes (80% reduction) -``` - ---- - -### Safety Mechanisms - -1. **Snapshot-First:** Always create snapshot before modifying issue -2. **Restore Capability:** Full restore from snapshots with `--restore` -3. **Opt-In Auto-Compaction:** Disabled by default (`auto_compact_enabled = false`) -4. **Dry-Run Required:** Preview before committing with `--dry-run` -5. **Git Backup:** JSONL exports preserve full history in git commits -6. **Audit Trail:** `EventCompacted` records what was done and when -7. **Size Verification:** Track original_size and compressed_size for validation -8. **Idempotent:** Re-running compaction on already-compacted issues is safe (no-op) -9. **Graceful Degradation:** API failures don't corrupt data, just skip issues -10. **Reversible:** Restore is always available, even after git push - ---- - -### Testing Strategy - -#### Unit Tests - -1. **Candidate Identification:** - - Issues meeting time criteria - - Dependency depth checking - - Mixed status dependents (some closed, some open) - - Edge case: circular dependencies - -2. **Haiku Client:** - - Mock API responses - - Rate limit handling - - Error recovery - - Prompt rendering - -3. **Snapshot Management:** - - Create snapshot - - Restore from snapshot - - Multiple snapshots per issue (Tier 1 → Tier 2) - - Snapshot integrity (hash verification) - -4. **Size Calculation:** - - Accurate byte counting - - UTF-8 handling - - Empty fields - -#### Integration Tests - -1. **End-to-End Compaction:** - - Create test issues - - Age them (mock timestamps) - - Run compaction - - Verify summaries - - Restore and verify - -2. **Batch Processing:** - - Large batches (100+ issues) - - Parallel worker coordination - - Error handling mid-batch - -3. **JSONL Export:** - - Compacted issues export correctly - - Import preserves compaction_level - - Round-trip fidelity - -4. **CLI Commands:** - - All flag combinations - - JSON output parsing - - Error messages - -#### Manual Testing Checklist - -- [ ] Dry-run shows accurate candidates -- [ ] Compaction reduces size as expected -- [ ] Haiku summaries are high quality -- [ ] Restore returns exact original content -- [ ] Stats command shows correct numbers -- [ ] Auto-compaction respects config -- [ ] Git workflow (commit → pull → auto-compact) -- [ ] Multi-machine workflow with compaction -- [ ] API key handling (env var vs config) -- [ ] Rate limit handling under load - ---- - -### Performance Considerations - -#### Scalability - -**Small databases (<1,000 issues):** -- Full scan acceptable -- Compact all eligible in one run -- <1 minute total time - -**Medium databases (1,000-10,000 issues):** -- Batch processing required -- Progress reporting essential -- 5-10 minutes total time - -**Large databases (>10,000 issues):** -- Incremental compaction (process N per run) -- Consider scheduled background job -- 30-60 minutes total time - -#### Optimization Strategies - -1. **Index Usage:** - - `idx_issues_status` - filter closed issues - - `idx_dependencies_depends_on` - dependency traversal - - `idx_snapshots_issue` - restore lookups - -2. **Batch Sizing:** - - Default 50 issues per batch - - Configurable via `compact_batch_size` - - Trade-off: larger batches = fewer commits, more RAM - -3. **Parallel Workers:** - - Default 5 parallel Haiku calls - - Configurable via `compact_parallel_workers` - - Respects Haiku rate limits - -4. **Query Optimization:** - - Use prepared statements for snapshots - - Reuse dependency check query - - Avoid N+1 queries in batch operations - ---- - -### Cost Analysis - -#### Haiku Pricing (as of 2025-10-15) - -- Input: $0.25 per million tokens (~$0.0003 per 1K tokens) -- Output: $1.25 per million tokens (~$0.0013 per 1K tokens) - -#### Per-Issue Estimates - -**Tier 1:** -- Input: ~1,000 tokens (full issue content) -- Output: ~400 tokens (summary) -- Cost: ~$0.0008 per issue - -**Tier 2:** -- Input: ~500 tokens (Tier 1 summary) -- Output: ~200 tokens (ultra-compressed) -- Cost: ~$0.0003 per issue - -#### Batch Costs - -| Issues | Tier 1 Cost | Tier 2 Cost | Total | -|--------|-------------|-------------|-------| -| 100 | $0.08 | $0.03 | $0.11 | -| 500 | $0.40 | $0.15 | $0.55 | -| 1,000 | $0.80 | $0.30 | $1.10 | -| 5,000 | $4.00 | $1.50 | $5.50 | - -**Monthly budget (typical project):** -- ~50-100 new issues closed per month -- ~30-60 days later, eligible for Tier 1 -- Monthly cost: $0.04 - $0.08 (negligible) - ---- - -### Configuration Examples - -#### Conservative Setup (manual only) - -```bash -bd init -bd config set compact_tier1_days 60 -bd config set compact_tier2_days 180 -bd config set auto_compact_enabled false - -# Run manually when needed -bd compact --dry-run -bd compact --all # after review -``` - -#### Aggressive Setup (auto-compact) - -```bash -bd config set compact_tier1_days 14 -bd config set compact_tier2_days 45 -bd config set auto_compact_enabled true -bd config set compact_batch_size 100 - -# Auto-compacts on bd stats, bd export -bd stats # triggers compaction if candidates exist -``` - -#### Development Setup (fast feedback) - -```bash -bd config set compact_tier1_days 1 -bd config set compact_tier2_days 3 -bd config set compact_tier1_dep_levels 1 -bd config set compact_tier2_dep_levels 2 - -# Test compaction on recently closed issues -``` - ---- - -### Future Enhancements - -#### Phase 2 (Post-MVP) - -1. **Local Model Support:** - - Use Ollama for zero-cost summarization - - Fallback chain: Haiku → Ollama → truncation - -2. **Custom Prompts:** - - User-defined summarization prompts - - Per-project templates - - Domain-specific summaries (e.g., "focus on API changes") - -3. **Selective Preservation:** - - Mark issues as "do not compact" - - Preserve certain labels (e.g., `architecture`, `security`) - - Field-level preservation (e.g., keep design notes, compress others) - -4. **Analytics:** - - Compaction effectiveness over time - - Cost tracking per run - - Quality feedback (user ratings of summaries) - -5. **Smart Scheduling:** - - Auto-detect optimal compaction times - - Avoid compaction during active development - - Weekend/off-hours processing - -6. **Multi-Tier Expansion:** - - Tier 3: Archive to separate file - - Tier 4: Delete (with backup) - - Configurable tier chain - -#### Phase 3 (Advanced) - -1. **Distributed Compaction:** - - Coordinate across multiple machines - - Avoid duplicate work in team settings - - Lock mechanism for compaction jobs - -2. **Incremental Summarization:** - - Re-summarize if issue reopened - - Preserve history of summaries - - Version tracking for prompts - -3. **Search Integration:** - - Full-text search includes summaries - - Boost compacted issues in search results - - Semantic search using embeddings - ---- - -## Implementation Issues - -The following issues should be created in beads to track this work. They're designed to be implemented in dependency order. - ---- - -### Epic: Issue Database Compaction - -**Issue ID:** (auto-generated) -**Title:** Epic: Add intelligent database compaction with Claude Haiku -**Type:** epic -**Priority:** 2 -**Description:** - -Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context. - -**Goals:** -- 70-95% space reduction for eligible issues -- Full restore capability via snapshots -- Opt-in with dry-run safety -- ~$1 per 1,000 issues compacted - -**Acceptance Criteria:** -- [ ] Schema migration with snapshots table -- [ ] Haiku integration for summarization -- [ ] Two-tier compaction (30d, 90d) -- [ ] CLI with dry-run, restore, stats -- [ ] Full test coverage -- [ ] Documentation complete - -**Dependencies:** None (this is the epic) - ---- - -### Issue 1: Add compaction schema and migrations - -**Type:** task -**Priority:** 1 -**Dependencies:** Blocks all other compaction work - -**Description:** - -Add database schema support for issue compaction tracking and snapshot storage. - -**Design:** - -Add three columns to `issues` table: -- `compaction_level INTEGER DEFAULT 0` - 0=original, 1=tier1, 2=tier2 -- `compacted_at DATETIME` - when last compacted -- `original_size INTEGER` - bytes before first compaction - -Create `issue_snapshots` table: -```sql -CREATE TABLE issue_snapshots ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - issue_id TEXT NOT NULL, - snapshot_time DATETIME NOT NULL, - compaction_level INTEGER NOT NULL, - original_size INTEGER NOT NULL, - compressed_size INTEGER NOT NULL, - original_content TEXT NOT NULL, -- JSON blob - archived_events TEXT, - FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE -); -``` - -Add indexes: -- `idx_snapshots_issue` on `issue_id` -- `idx_snapshots_level` on `compaction_level` - -Add migration functions in `internal/storage/sqlite/sqlite.go`: -- `migrateCompactionColumns(db *sql.DB) error` -- `migrateSnapshotsTable(db *sql.DB) error` - -**Acceptance Criteria:** -- [ ] Existing databases migrate automatically -- [ ] New databases include columns by default -- [ ] Migration is idempotent (safe to run multiple times) -- [ ] No data loss during migration -- [ ] Tests verify migration on fresh and existing DBs - -**Estimated Time:** 4 hours - ---- - -### Issue 2: Add compaction configuration keys - -**Type:** task -**Priority:** 1 -**Dependencies:** Blocks compaction logic - -**Description:** - -Add configuration keys for compaction behavior with sensible defaults. - -**Implementation:** - -Add to `internal/storage/sqlite/schema.go` initial config: -```sql -INSERT OR IGNORE INTO config (key, value) VALUES - ('compact_tier1_days', '30'), - ('compact_tier1_dep_levels', '2'), - ('compact_tier2_days', '90'), - ('compact_tier2_dep_levels', '5'), - ('compact_tier2_commits', '100'), - ('compact_model', 'claude-3-5-haiku-20241022'), - ('compact_batch_size', '50'), - ('compact_parallel_workers', '5'), - ('auto_compact_enabled', 'false'); -``` - -Add helper functions in `internal/storage/` or `cmd/bd/`: -```go -func getCompactConfig(ctx context.Context, store Storage) (*CompactConfig, error) -``` - -**Acceptance Criteria:** -- [ ] Config keys created on init -- [ ] Existing DBs get defaults on migration -- [ ] `bd config get/set` works with all keys -- [ ] Type validation (days=int, enabled=bool) -- [ ] Documentation in README.md - -**Estimated Time:** 2 hours - ---- - -### Issue 3: Implement candidate identification queries - -**Type:** task -**Priority:** 1 -**Dependencies:** Needs schema and config - -**Description:** - -Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction. - -**Design:** - -Create `internal/storage/sqlite/compact.go` with: - -```go -type CompactionCandidate struct { - IssueID string - ClosedAt time.Time - OriginalSize int - EstimatedSize int - DependentCount int -} - -func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error) -func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error) -func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error) -``` - -Implement recursive dependency checking using CTE similar to `ready_issues` view. - -**Acceptance Criteria:** -- [ ] Tier 1 query filters by days and dependency depth -- [ ] Tier 2 query includes commit/issue count checks -- [ ] Dependency checking handles circular deps gracefully -- [ ] Performance: <100ms for 10,000 issue database -- [ ] Tests cover edge cases (no deps, circular deps, mixed status) - -**Estimated Time:** 6 hours - ---- - -### Issue 4: Create Haiku client and prompt templates - -**Type:** task -**Priority:** 1 -**Dependencies:** None (can work in parallel) - -**Description:** - -Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization. - -**Implementation:** - -Create `internal/compact/haiku.go`: - -```go -type HaikuClient struct { - client *anthropic.Client - model string -} - -func NewHaikuClient(apiKey string) (*HaikuClient, error) -func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error) -func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error) -``` - -Use text/template for prompt rendering. - -Add error handling: -- Rate limit retry with exponential backoff -- Network errors: 3 retries -- Invalid responses: return error, don't corrupt data - -**Acceptance Criteria:** -- [ ] API key from env var or config (env takes precedence) -- [ ] Prompts render correctly with template -- [ ] Rate limiting handled gracefully -- [ ] Mock tests for API calls -- [ ] Real integration test (optional, requires API key) - -**Estimated Time:** 6 hours - ---- - -### Issue 5: Implement snapshot creation and restoration - -**Type:** task -**Priority:** 1 -**Dependencies:** Needs schema changes - -**Description:** - -Implement snapshot creation before compaction and restoration capability. - -**Implementation:** - -Add to `internal/storage/sqlite/compact.go`: - -```go -type Snapshot struct { - ID int64 - IssueID string - SnapshotTime time.Time - CompactionLevel int - OriginalSize int - CompressedSize int - OriginalContent string // JSON - ArchivedEvents string // JSON, nullable -} - -func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error -func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error -func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error) -``` - -Snapshot JSON structure: -```json -{ - "description": "...", - "design": "...", - "notes": "...", - "acceptance_criteria": "...", - "title": "..." -} -``` - -**Acceptance Criteria:** -- [ ] Snapshot created atomically with compaction -- [ ] Restore returns exact original content -- [ ] Multiple snapshots per issue supported (Tier 1 → Tier 2) -- [ ] JSON encoding handles special characters -- [ ] Size calculation is accurate (UTF-8 bytes) - -**Estimated Time:** 5 hours - ---- - -### Issue 6: Implement Tier 1 compaction logic - -**Type:** task -**Priority:** 1 -**Dependencies:** Needs Haiku client, snapshots, candidate queries - -**Description:** - -Implement the core Tier 1 compaction process: snapshot → summarize → update. - -**Implementation:** - -Add to `internal/compact/compactor.go`: - -```go -type Compactor struct { - store storage.Storage - haiku *HaikuClient - config *CompactConfig -} - -func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error) -func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error -func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error -``` - -Process: -1. Verify eligibility -2. Calculate original size -3. Create snapshot -4. Call Haiku for summary -5. Update issue (description = summary, clear design/notes/criteria) -6. Set compaction_level = 1, compacted_at = now -7. Record EventCompacted -8. Mark dirty for export - -**Acceptance Criteria:** -- [ ] Single issue compaction works end-to-end -- [ ] Batch processing with parallel workers -- [ ] Errors don't corrupt database (transaction rollback) -- [ ] EventCompacted includes size savings -- [ ] Dry-run mode (identify + size estimate only) - -**Estimated Time:** 8 hours - ---- - -### Issue 7: Implement Tier 2 compaction logic - -**Type:** task -**Priority:** 2 -**Dependencies:** Needs Tier 1 working - -**Description:** - -Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning. - -**Implementation:** - -Add to `internal/compact/compactor.go`: - -```go -func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error -func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error -``` - -Process: -1. Verify issue is at compaction_level = 1 -2. Check Tier 2 eligibility (days, deps, commits/issues) -3. Create Tier 2 snapshot -4. Call Haiku with ultra-compression prompt -5. Update issue (description = single paragraph, clear all else) -6. Set compaction_level = 2 -7. Optionally prune events (keep created/closed, archive rest) - -**Acceptance Criteria:** -- [ ] Requires existing Tier 1 compaction -- [ ] Git commit counting works (with fallback) -- [ ] Events optionally pruned (config: compact_events_enabled) -- [ ] Archived events stored in snapshot -- [ ] Size reduction 90-95% - -**Estimated Time:** 6 hours - ---- - -### Issue 8: Add `bd compact` CLI command - -**Type:** task -**Priority:** 1 -**Dependencies:** Needs Tier 1 compaction logic - -**Description:** - -Implement the `bd compact` command with dry-run, batch processing, and progress reporting. - -**Implementation:** - -Create `cmd/bd/compact.go`: - -```go -var compactCmd = &cobra.Command{ - Use: "compact", - Short: "Compact old closed issues to save space", - Long: `...`, -} - -var ( - compactDryRun bool - compactTier int - compactAll bool - compactID string - compactForce bool - compactBatchSize int - compactWorkers int -) - -func init() { - compactCmd.Flags().BoolVar(&compactDryRun, "dry-run", false, "Preview without compacting") - compactCmd.Flags().IntVar(&compactTier, "tier", 1, "Compaction tier (1 or 2)") - compactCmd.Flags().BoolVar(&compactAll, "all", false, "Process all candidates") - compactCmd.Flags().StringVar(&compactID, "id", "", "Compact specific issue") - compactCmd.Flags().BoolVar(&compactForce, "force", false, "Force compact (bypass checks)") - // ... more flags -} -``` - -**Output:** -- Dry-run: Table of candidates with size estimates -- Actual run: Progress bar with batch updates -- Summary: Count, size saved, cost, time - -**Acceptance Criteria:** -- [ ] `--dry-run` shows accurate preview -- [ ] `--all` processes all candidates -- [ ] `--id` compacts single issue -- [ ] `--force` bypasses eligibility checks (with --id) -- [ ] Progress bar for batches -- [ ] JSON output with `--json` -- [ ] Exit code: 0=success, 1=error - -**Estimated Time:** 6 hours - ---- - -### Issue 9: Add `bd compact --restore` functionality - -**Type:** task -**Priority:** 2 -**Dependencies:** Needs snapshots and CLI - -**Description:** - -Implement restore command to undo compaction from snapshots. - -**Implementation:** - -Add to `cmd/bd/compact.go`: - -```go -var compactRestore string - -compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot") -``` - -Process: -1. Load snapshot for issue -2. Parse JSON content -3. Update issue with original content -4. Set compaction_level = 0, compacted_at = NULL -5. Record EventRestored -6. Mark dirty - -**Acceptance Criteria:** -- [ ] Restores exact original content -- [ ] Handles multiple snapshots (prompt user or use latest) -- [ ] `--level` flag to choose snapshot -- [ ] Updates compaction_level correctly -- [ ] Exports restored content to JSONL - -**Estimated Time:** 4 hours - ---- - -### Issue 10: Add `bd compact --stats` command - -**Type:** task -**Priority:** 2 -**Dependencies:** Needs compaction working - -**Description:** - -Add statistics command showing compaction status and potential savings. - -**Implementation:** - -```go -var compactStats bool - -compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics") -``` - -Output: -- Total issues, by compaction level -- Current DB size vs estimated uncompacted size -- Space savings (MB and %) -- Candidates for each tier with estimates -- Estimated API cost - -**Acceptance Criteria:** -- [ ] Accurate counts by compaction_level -- [ ] Size calculations include all text fields -- [ ] Shows candidates with eligibility reasons -- [ ] Cost estimation based on Haiku pricing -- [ ] JSON output supported - -**Estimated Time:** 4 hours - ---- - -### Issue 11: Add EventCompacted to event system - -**Type:** task -**Priority:** 2 -**Dependencies:** Needs schema changes - -**Description:** - -Add new event type for tracking compaction in audit trail. - -**Implementation:** - -1. Add to `internal/types/types.go`: -```go -const EventCompacted EventType = "compacted" -``` - -2. Record event during compaction: -```go -eventData := map[string]interface{}{ - "tier": tier, - "original_size": originalSize, - "compressed_size": compressedSize, - "reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100, -} -``` - -3. Show in `bd show` output: -``` -Events: - 2025-10-15: compacted (tier 1, saved 1.8KB, 80%) - 2025-08-31: closed by alice - 2025-08-20: created by alice -``` - -**Acceptance Criteria:** -- [ ] Event includes tier and size info -- [ ] Shows in event history -- [ ] Exports to JSONL -- [ ] `bd show` displays compaction marker - -**Estimated Time:** 3 hours - ---- - -### Issue 12: Add compaction indicator to `bd show` - -**Type:** task -**Priority:** 2 -**Dependencies:** Needs compaction working - -**Description:** - -Update `bd show` command to display compaction status prominently. - -**Implementation:** - -Add to issue display: -``` -bd-42: Fix authentication bug [CLOSED] 🗜️ - -Status: closed (compacted L1) -... - ---- -💾 Restore: bd compact --restore bd-42 -📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction) -🗜️ Compacted: 2025-10-15 (Tier 1) -``` - -Show different emoji for tiers: -- Tier 1: 🗜️ -- Tier 2: 📦 - -**Acceptance Criteria:** -- [ ] Compaction status visible in title -- [ ] Footer shows size savings -- [ ] Restore command shown -- [ ] Works with `--json` output - -**Estimated Time:** 2 hours - ---- - -### Issue 13: Write compaction tests - -**Type:** task -**Priority:** 1 -**Dependencies:** Needs all compaction logic - -**Description:** - -Comprehensive test suite for compaction functionality. - -**Test Coverage:** - -1. **Candidate Identification:** - - Eligibility by time - - Dependency depth checking - - Mixed status dependents - - Edge cases (no deps, circular) - -2. **Snapshots:** - - Create and restore - - Multiple snapshots per issue - - Content integrity - -3. **Tier 1 Compaction:** - - Single issue - - Batch processing - - Error handling - -4. **Tier 2 Compaction:** - - Requires Tier 1 - - Events pruning - - Commit counting fallback - -5. **CLI:** - - All flag combinations - - Dry-run accuracy - - JSON output - -6. **Integration:** - - End-to-end flow - - JSONL export/import - - Restore verification - -**Acceptance Criteria:** -- [ ] Test coverage >80% -- [ ] All edge cases covered -- [ ] Mock Haiku API in tests -- [ ] Integration tests pass -- [ ] `go test ./...` passes - -**Estimated Time:** 8 hours - ---- - -### Issue 14: Add compaction documentation - -**Type:** task -**Priority:** 2 -**Dependencies:** All features complete - -**Description:** - -Document compaction feature in README and create COMPACTION.md guide. - -**Content:** - -Update README.md: -- Add to Features section -- CLI examples -- Configuration guide - -Create COMPACTION.md: -- How compaction works -- When to use each tier -- Cost analysis -- Safety mechanisms -- Troubleshooting - -Create examples/compaction/: -- Example workflow -- Cron job setup -- Auto-compaction script - -**Acceptance Criteria:** -- [ ] README.md updated -- [ ] COMPACTION.md comprehensive -- [ ] Examples work as documented -- [ ] Screenshots/examples included -- [ ] API key setup documented - -**Estimated Time:** 4 hours - ---- - -### Issue 15: Optional: Implement auto-compaction - -**Type:** task -**Priority:** 3 (nice-to-have) -**Dependencies:** Needs all compaction working - -**Description:** - -Implement automatic compaction triggered by certain operations when enabled via config. - -**Implementation:** - -Trigger points (when `auto_compact_enabled = true`): -1. `bd stats` - check and compact if candidates exist -2. `bd export` - before exporting -3. Background timer (optional, via daemon) - -Add: -```go -func (s *SQLiteStorage) AutoCompact(ctx context.Context) error { - enabled, _ := s.GetConfig(ctx, "auto_compact_enabled") - if enabled != "true" { - return nil - } - - // Run Tier 1 compaction on all candidates - // Limit to batch_size to avoid long operations -} -``` - -**Acceptance Criteria:** -- [ ] Respects auto_compact_enabled config -- [ ] Limits batch size to avoid blocking -- [ ] Logs compaction activity -- [ ] Can be disabled per-command with `--no-auto-compact` - -**Estimated Time:** 4 hours - ---- - -### Issue 16: Optional: Add git commit counting - -**Type:** task -**Priority:** 3 (nice-to-have) -**Dependencies:** Needs Tier 2 logic - -**Description:** - -Implement git commit counting for "project time" measurement as alternative to calendar time. - -**Implementation:** - -```go -func getCommitsSince(closedAt time.Time) (int, error) { - cmd := exec.Command("git", "rev-list", "--count", - fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD") - output, err := cmd.Output() - if err != nil { - return 0, err // Not in git repo or git not available - } - return strconv.Atoi(strings.TrimSpace(string(output))) -} -``` - -Fallback to issue counter delta if git unavailable. - -**Acceptance Criteria:** -- [ ] Counts commits since closed_at -- [ ] Handles git not available gracefully -- [ ] Fallback to issue counter works -- [ ] Configurable via compact_tier2_commits -- [ ] Tested with real git repo - -**Estimated Time:** 3 hours - ---- - -## Success Metrics - -### Technical Metrics - -- [ ] 70-85% size reduction for Tier 1 -- [ ] 90-95% size reduction for Tier 2 -- [ ] <100ms candidate identification query -- [ ] <2s per issue compaction (Haiku latency) -- [ ] Zero data loss (all restore tests pass) - -### Quality Metrics - -- [ ] Haiku summaries preserve key information -- [ ] Developers can understand compacted issues -- [ ] Restore returns exact original content -- [ ] No corruption in multi-machine workflows - -### Operational Metrics - -- [ ] Cost: <$1.50 per 1,000 issues -- [ ] Dry-run accuracy: 95%+ estimate correctness -- [ ] Error rate: <1% API failures (with retry) -- [ ] User adoption: Docs clear, examples work - ---- - -## Rollout Plan - -### Phase 1: Alpha (Internal Testing) - -1. Merge compaction feature to main -2. Test on beads' own database -3. Verify JSONL export/import -4. Validate Haiku summaries -5. Fix any critical bugs - -### Phase 2: Beta (Opt-In) - -1. Announce in README (opt-in, experimental) -2. Gather feedback from early adopters -3. Iterate on prompt templates -4. Add telemetry (optional, with consent) - -### Phase 3: Stable (Default Disabled) - -1. Mark feature as stable -2. Keep auto_compact_enabled = false by default -3. Encourage manual `bd compact --dry-run` first -4. Document in quickstart guide - -### Phase 4: Mature (Consider Auto-Enable) - -1. After 6+ months of stability -2. Consider auto-compaction for new users -3. Provide migration guide for disabling - ---- - -## Risks and Mitigations - -| Risk | Impact | Likelihood | Mitigation | -|------|--------|------------|------------| -| Haiku summaries lose critical info | High | Medium | Manual review in dry-run, restore capability, improve prompts | -| API rate limits during batch | Medium | Medium | Exponential backoff, respect rate limits, batch sizing | -| JSONL merge conflicts increase | Medium | Low | Compaction is deterministic per issue, git handles well | -| Users accidentally compress important issues | High | Low | Dry-run required, restore available, snapshots permanent | -| Cost higher than expected | Low | Low | Dry-run shows estimates, configurable batch sizes | -| Schema migration fails | High | Very Low | Idempotent migrations, tested on existing DBs | - ---- - -## Open Questions - -1. **Should compaction be reversible forever, or expire snapshots?** - - Recommendation: Keep snapshots indefinitely (disk is cheap) - -2. **Should we compress snapshots themselves (gzip)?** - - Recommendation: Not in MVP, add if storage becomes issue - -3. **Should tier selection be automatic or manual?** - - Recommendation: Manual in MVP, auto-tier in future - -4. **How to handle issues compacted on one machine but not another?** - - Answer: JSONL export includes compaction_level, imports preserve it - -5. **Should we support custom models (Sonnet, Opus)?** - - Recommendation: Haiku only in MVP, add later if needed - ---- - -## Appendix: Example Workflow - -### Typical Monthly Compaction - -```bash -# 1. Check what's eligible -$ bd compact --dry-run -=== Tier 1 Candidates === -42 issues eligible (closed >30 days, deps closed) -Est. reduction: 127 KB → 25 KB (80%) -Est. cost: $0.03 - -# 2. Review candidates manually -$ bd list --status closed --json | jq 'map(select(.compaction_level == 0))' - -# 3. Compact Tier 1 -$ bd compact --all -✓ Compacted 42 issues in 18s ($0.03) - -# 4. Check Tier 2 candidates (optional) -$ bd compact --dry-run --tier 2 -=== Tier 2 Candidates === -8 issues eligible (closed >90 days, 100+ commits since) -Est. reduction: 45 KB → 4 KB (91%) -Est. cost: $0.01 - -# 5. Compact Tier 2 -$ bd compact --all --tier 2 -✓ Compacted 8 issues in 6s ($0.01) - -# 6. Export and commit -$ bd export -o .beads/issues.jsonl -$ git add .beads/issues.jsonl -$ git commit -m "Compact 50 old issues (saved 143 KB)" -$ git push - -# 7. View stats -$ bd compact --stats -Total Space Saved: 143 KB (82% reduction) -Database Size: 2.1 MB (down from 2.3 MB) -``` - ---- - -## Conclusion - -This design provides a comprehensive, safe, and cost-effective way to keep beads databases lightweight while preserving essential context. The two-tier approach balances aggressiveness with safety, and the snapshot system ensures full reversibility. - -The use of Claude Haiku for semantic compression is key - it preserves meaning rather than just truncating text. At ~$1 per 1,000 issues, the cost is negligible for the value provided. - -Implementation is straightforward with clear phases and well-defined issues. The MVP (Tier 1 only) can be delivered in ~40 hours of work, with Tier 2 and enhancements following incrementally. - -This aligns perfectly with beads' philosophy: **pragmatic, agent-focused, and evolutionarily designed**. diff --git a/COMPACTION_SUMMARY.md b/COMPACTION_SUMMARY.md deleted file mode 100644 index 614427c9..00000000 --- a/COMPACTION_SUMMARY.md +++ /dev/null @@ -1,285 +0,0 @@ -# Issue Compaction - Quick Reference - -**Status:** Design Complete, Ready for Implementation -**Target:** Beads v1.1 -**Estimated Effort:** 40-60 hours (MVP) - -## What Is This? - -Add intelligent database compaction to beads using Claude Haiku to semantically compress old, closed issues. Keep your database lightweight while preserving the meaning of past work. - -## Why? - -- **Agent Efficiency:** Smaller DBs → faster queries → clearer thinking -- **Context Management:** Agents need summaries, not verbose details -- **Pragmatic:** Most work is throwaway; forensic value decays exponentially -- **Git-Friendly:** Smaller JSONL exports - -## How It Works - -### Two-Tier Compression - -**Tier 1: Standard (30 days)** -- Closed ≥30 days, all dependents closed (2 levels deep) -- Haiku summarizes to ~300 words -- Keeps: title, summary with key decisions -- Clears: verbose design/notes/criteria -- **Result:** 70-85% space reduction - -**Tier 2: Aggressive (90 days)** -- Already Tier 1 compressed -- Closed ≥90 days, deep dependencies closed (5 levels) -- ≥100 git commits since closure (measures "project time") -- Ultra-compress to single ≤150 word paragraph -- Optionally prunes old events -- **Result:** 90-95% space reduction - -### Safety First - -- **Snapshots:** Full original content saved before compaction -- **Restore:** `bd compact --restore ` undoes compaction -- **Dry-Run:** `bd compact --dry-run` previews without changing anything -- **Git Backup:** JSONL commits preserve full history -- **Opt-In:** Disabled by default (`auto_compact_enabled = false`) - -## CLI - -```bash -# Preview candidates -bd compact --dry-run - -# Compact Tier 1 -bd compact --all - -# Compact Tier 2 -bd compact --tier 2 --all - -# Compact specific issue -bd compact --id bd-42 - -# Restore compacted issue -bd compact --restore bd-42 - -# Show statistics -bd compact --stats -``` - -## Cost - -**Haiku Pricing:** -- ~$0.0008 per issue (Tier 1) -- ~$0.0003 per issue (Tier 2) -- **~$1.10 per 1,000 issues total** - -Negligible for typical usage (~$0.05/month for active project). - -## Example Output - -### Before Compaction -``` -bd-42: Fix authentication bug [CLOSED] - -Description: (800 words about the problem) - -Design: (1,200 words of implementation notes) - -Notes: (500 words of testing/deployment details) - -Acceptance Criteria: (300 words of requirements) - -Total: 2,341 bytes -``` - -### After Tier 1 Compaction -``` -bd-42: Fix authentication bug [CLOSED] 🗜️ - -**Summary:** Fixed race condition in JWT token refresh logic causing -intermittent 401 errors. Implemented mutex-based locking. All users -can now stay authenticated reliably. - -**Key Decisions:** -- Used sync.RWMutex for simpler reasoning -- Added exponential backoff to prevent thundering herd -- Preserved token format for backward compatibility - -**Resolution:** Deployed Aug 31, zero 401s after 2 weeks monitoring. - ---- -💾 Restore: bd compact --restore bd-42 -📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction) - -Total: 468 bytes (80% reduction) -``` - -## Implementation Plan - -### Phase 1: Foundation (16 hours) -1. Schema changes (compaction columns, snapshots table) -2. Config keys -3. Candidate identification queries -4. Migration testing - -### Phase 2: Core Compaction (20 hours) -5. Haiku client with prompts -6. Snapshot creation/restoration -7. Tier 1 compaction logic -8. CLI command (`bd compact`) - -### Phase 3: Advanced Features (12 hours) -9. Tier 2 compaction -10. Restore functionality -11. Stats command -12. Event tracking - -### Phase 4: Polish (12 hours) -13. Comprehensive tests -14. Documentation (README, COMPACTION.md) -15. Examples and workflows - -**Total MVP:** ~60 hours - -### Optional Enhancements (Phase 5+) -- Auto-compaction triggers -- Git commit counting -- Local model support (Ollama) -- Custom prompt templates - -## Architecture - -``` -Issue (closed 45 days ago) - ↓ -Eligibility Check - ↓ (passes: all deps closed, >30 days) -Create Snapshot - ↓ -Call Haiku API - ↓ (returns: structured summary) -Update Issue - ↓ -Record Event - ↓ -Export to JSONL -``` - -## Key Files - -**New Files:** -- `internal/compact/haiku.go` - Haiku API client -- `internal/compact/compactor.go` - Core compaction logic -- `internal/storage/sqlite/compact.go` - Candidate queries -- `cmd/bd/compact.go` - CLI command - -**Modified Files:** -- `internal/storage/sqlite/schema.go` - Add snapshots table -- `internal/storage/sqlite/sqlite.go` - Add migrations -- `internal/types/types.go` - Add EventCompacted -- `cmd/bd/show.go` - Display compaction status - -## Schema Changes - -```sql --- Add to issues table -ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0; -ALTER TABLE issues ADD COLUMN compacted_at DATETIME; -ALTER TABLE issues ADD COLUMN original_size INTEGER; - --- New table -CREATE TABLE issue_snapshots ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - issue_id TEXT NOT NULL, - snapshot_time DATETIME NOT NULL, - compaction_level INTEGER NOT NULL, - original_size INTEGER NOT NULL, - compressed_size INTEGER NOT NULL, - original_content TEXT NOT NULL, -- JSON - archived_events TEXT, - FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE -); -``` - -## Configuration - -```sql -INSERT INTO config (key, value) VALUES - ('compact_tier1_days', '30'), - ('compact_tier1_dep_levels', '2'), - ('compact_tier2_days', '90'), - ('compact_tier2_dep_levels', '5'), - ('compact_tier2_commits', '100'), - ('compact_model', 'claude-3-5-haiku-20241022'), - ('compact_batch_size', '50'), - ('compact_parallel_workers', '5'), - ('auto_compact_enabled', 'false'); -``` - -## Haiku Prompts - -### Tier 1 (300 words) -``` -Summarize this closed software issue. Preserve key decisions, -implementation approach, and outcome. Max 300 words. - -Title: {{.Title}} -Type: {{.IssueType}} -Priority: {{.Priority}} - -Description: {{.Description}} -Design Notes: {{.Design}} -Implementation Notes: {{.Notes}} -Acceptance Criteria: {{.AcceptanceCriteria}} - -Output format: -**Summary:** [2-3 sentences: problem, solution, outcome] -**Key Decisions:** [bullet points of non-obvious choices] -**Resolution:** [how it was closed] -``` - -### Tier 2 (150 words) -``` -Ultra-compress this old closed issue to ≤150 words. -Focus on lasting architectural impact. - -Title: {{.Title}} -Original Summary: {{.Description}} - -Output a single paragraph covering: -- What was built/fixed -- Why it mattered -- Lasting impact (if any) -``` - -## Success Metrics - -- **Space:** 70-85% reduction (Tier 1), 90-95% (Tier 2) -- **Quality:** Summaries preserve essential context -- **Safety:** 100% restore success rate -- **Performance:** <100ms candidate query, ~2s per issue (Haiku) -- **Cost:** <$1.50 per 1,000 issues - -## Risks & Mitigations - -| Risk | Mitigation | -|------|------------| -| Info loss in summaries | Dry-run review, restore capability, prompt tuning | -| API rate limits | Exponential backoff, respect limits, batch sizing | -| Accidental compression | Dry-run required, snapshots permanent | -| JSONL conflicts | Compaction is deterministic, git handles well | - -## Next Steps - -1. **Review design:** `COMPACTION_DESIGN.md` (comprehensive spec) -2. **File issues:** `bd create -f compaction-issues.md` (16 issues) -3. **Start implementation:** Begin with schema + config (Issues 1-2) -4. **Iterate:** Build MVP (Tier 1 only), then enhance - -## Questions? - -- Full design: `COMPACTION_DESIGN.md` -- Issue breakdown: `compaction-issues.md` -- Reference implementation in Phase 1-4 above - ---- - -**Ready to implement in another session!** diff --git a/compaction-issues.md b/compaction-issues.md deleted file mode 100644 index 8ab20124..00000000 --- a/compaction-issues.md +++ /dev/null @@ -1,779 +0,0 @@ -# Compaction Feature Issues - -This file contains all issues for the database compaction feature, ready to import with: -```bash -bd create -f compaction-issues.md -``` - ---- - -## Epic: Add intelligent database compaction with Claude Haiku - -### Type -epic - -### Priority -2 - -### Description - -Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context. - -Goals: -- 70-95% space reduction for eligible issues -- Full restore capability via snapshots -- Opt-in with dry-run safety -- ~$1 per 1,000 issues compacted - -### Acceptance Criteria -- Schema migration with snapshots table -- Haiku integration for summarization -- Two-tier compaction (30d, 90d) -- CLI with dry-run, restore, stats -- Full test coverage -- Documentation complete - -### Labels -compaction, epic, haiku, v1.1 - ---- - -## Add compaction schema and migrations - -### Type -task - -### Priority -1 - -### Description - -Add database schema support for issue compaction tracking and snapshot storage. - -### Design - -Add three columns to `issues` table: -- `compaction_level INTEGER DEFAULT 0` - 0=original, 1=tier1, 2=tier2 -- `compacted_at DATETIME` - when last compacted -- `original_size INTEGER` - bytes before first compaction - -Create `issue_snapshots` table: -```sql -CREATE TABLE issue_snapshots ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - issue_id TEXT NOT NULL, - snapshot_time DATETIME NOT NULL, - compaction_level INTEGER NOT NULL, - original_size INTEGER NOT NULL, - compressed_size INTEGER NOT NULL, - original_content TEXT NOT NULL, -- JSON blob - archived_events TEXT, - FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE -); -``` - -Add indexes: -- `idx_snapshots_issue` on `issue_id` -- `idx_snapshots_level` on `compaction_level` - -Add migration functions in `internal/storage/sqlite/sqlite.go`: -- `migrateCompactionColumns(db *sql.DB) error` -- `migrateSnapshotsTable(db *sql.DB) error` - -### Acceptance Criteria -- Existing databases migrate automatically -- New databases include columns by default -- Migration is idempotent (safe to run multiple times) -- No data loss during migration -- Tests verify migration on fresh and existing DBs - -### Labels -compaction, schema, migration, database - ---- - -## Add compaction configuration keys - -### Type -task - -### Priority -1 - -### Description - -Add configuration keys for compaction behavior with sensible defaults. - -### Design - -Add to `internal/storage/sqlite/schema.go` initial config: -```sql -INSERT OR IGNORE INTO config (key, value) VALUES - ('compact_tier1_days', '30'), - ('compact_tier1_dep_levels', '2'), - ('compact_tier2_days', '90'), - ('compact_tier2_dep_levels', '5'), - ('compact_tier2_commits', '100'), - ('compact_model', 'claude-3-5-haiku-20241022'), - ('compact_batch_size', '50'), - ('compact_parallel_workers', '5'), - ('auto_compact_enabled', 'false'); -``` - -Add helper functions for loading config into typed struct. - -### Acceptance Criteria -- Config keys created on init -- Existing DBs get defaults on migration -- `bd config get/set` works with all keys -- Type validation (days=int, enabled=bool) -- Documentation in README.md - -### Labels -compaction, config, configuration - ---- - -## Implement candidate identification queries - -### Type -task - -### Priority -1 - -### Description - -Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction based on closure time and dependency status. - -### Design - -Create `internal/storage/sqlite/compact.go` with: - -```go -type CompactionCandidate struct { - IssueID string - ClosedAt time.Time - OriginalSize int - EstimatedSize int - DependentCount int -} - -func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error) -func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error) -func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error) -``` - -Use recursive CTE for dependency depth checking (similar to ready_issues view). - -### Acceptance Criteria -- Tier 1 query filters by days and dependency depth -- Tier 2 query includes commit/issue count checks -- Dependency checking handles circular deps gracefully -- Performance: <100ms for 10,000 issue database -- Tests cover edge cases (no deps, circular deps, mixed status) - -### Labels -compaction, sql, query, dependencies - ---- - -## Create Haiku client and prompt templates - -### Type -task - -### Priority -1 - -### Description - -Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization. - -### Design - -Create `internal/compact/haiku.go`: - -```go -type HaikuClient struct { - client *anthropic.Client - model string -} - -func NewHaikuClient(apiKey string) (*HaikuClient, error) -func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error) -func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error) -``` - -Use text/template for prompt rendering. - -Tier 1 output format: -``` -**Summary:** [2-3 sentences] -**Key Decisions:** [bullet points] -**Resolution:** [outcome] -``` - -Tier 2 output format: -``` -Single paragraph ≤150 words covering what was built, why it mattered, lasting impact. -``` - -### Acceptance Criteria -- API key from env var or config (env takes precedence) -- Prompts render correctly with templates -- Rate limiting handled gracefully (exponential backoff) -- Network errors retry up to 3 times -- Mock tests for API calls - -### Labels -compaction, haiku, api, llm - ---- - -## Implement snapshot creation and restoration - -### Type -task - -### Priority -1 - -### Description - -Implement snapshot creation before compaction and restoration capability to undo compaction. - -### Design - -Add to `internal/storage/sqlite/compact.go`: - -```go -func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error -func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error -func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error) -``` - -Snapshot JSON structure: -```json -{ - "description": "...", - "design": "...", - "notes": "...", - "acceptance_criteria": "...", - "title": "..." -} -``` - -### Acceptance Criteria -- Snapshot created atomically with compaction -- Restore returns exact original content -- Multiple snapshots per issue supported (Tier 1 → Tier 2) -- JSON encoding handles UTF-8 and special characters -- Size calculation is accurate (UTF-8 bytes) - -### Labels -compaction, snapshot, restore, safety - ---- - -## Implement Tier 1 compaction logic - -### Type -task - -### Priority -1 - -### Description - -Implement the core Tier 1 compaction process: snapshot → summarize → update. - -### Design - -Add to `internal/compact/compactor.go`: - -```go -type Compactor struct { - store storage.Storage - haiku *HaikuClient - config *CompactConfig -} - -func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error) -func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error -func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error -``` - -Process: -1. Verify eligibility -2. Calculate original size -3. Create snapshot -4. Call Haiku for summary -5. Update issue (description=summary, clear design/notes/criteria) -6. Set compaction_level=1, compacted_at=now, original_size -7. Record EventCompacted -8. Mark dirty for export - -### Acceptance Criteria -- Single issue compaction works end-to-end -- Batch processing with parallel workers (5 concurrent) -- Errors don't corrupt database (transaction rollback) -- EventCompacted includes size savings -- Dry-run mode (identify + size estimate only, no API calls) - -### Labels -compaction, tier1, core-logic - ---- - -## Implement Tier 2 compaction logic - -### Type -task - -### Priority -2 - -### Description - -Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning. - -### Design - -Add to `internal/compact/compactor.go`: - -```go -func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error -func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error -``` - -Process: -1. Verify issue is at compaction_level = 1 -2. Check Tier 2 eligibility (days, deps, commits/issues) -3. Create Tier 2 snapshot -4. Call Haiku with ultra-compression prompt -5. Update issue (description = single paragraph, clear all other fields) -6. Set compaction_level = 2 -7. Optionally prune events (keep created/closed, archive rest to snapshot) - -### Acceptance Criteria -- Requires existing Tier 1 compaction -- Git commit counting works (with fallback to issue counter) -- Events optionally pruned (config: compact_events_enabled) -- Archived events stored in snapshot JSON -- Size reduction 90-95% - -### Labels -compaction, tier2, advanced - ---- - -## Add `bd compact` CLI command - -### Type -task - -### Priority -1 - -### Description - -Implement the `bd compact` command with dry-run, batch processing, and progress reporting. - -### Design - -Create `cmd/bd/compact.go`: - -```go -var compactCmd = &cobra.Command{ - Use: "compact", - Short: "Compact old closed issues to save space", -} - -Flags: - --dry-run Preview without compacting - --tier int Compaction tier (1 or 2, default: 1) - --all Process all candidates - --id string Compact specific issue - --force Force compact (bypass checks, requires --id) - --batch-size int Issues per batch - --workers int Parallel workers - --json JSON output -``` - -### Acceptance Criteria -- `--dry-run` shows accurate preview with size estimates -- `--all` processes all candidates -- `--id` compacts single issue -- `--force` bypasses eligibility checks (only with --id) -- Progress bar for batches (e.g., [████████] 47/47) -- JSON output with `--json` -- Exit codes: 0=success, 1=error -- Shows summary: count, size saved, cost, time - -### Labels -compaction, cli, command - ---- - -## Add `bd compact --restore` functionality - -### Type -task - -### Priority -2 - -### Description - -Implement restore command to undo compaction from snapshots. - -### Design - -Add to `cmd/bd/compact.go`: - -```go -var compactRestore string - -compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot") -``` - -Process: -1. Load snapshot for issue -2. Parse JSON content -3. Update issue with original content -4. Set compaction_level = 0, compacted_at = NULL, original_size = NULL -5. Record event (EventRestored or EventUpdated) -6. Mark dirty for export - -### Acceptance Criteria -- Restores exact original content -- Handles multiple snapshots (use latest by default) -- `--level` flag to choose specific snapshot -- Updates compaction_level correctly -- Exports restored content to JSONL -- Shows before/after in output - -### Labels -compaction, restore, cli - ---- - -## Add `bd compact --stats` command - -### Type -task - -### Priority -2 - -### Description - -Add statistics command showing compaction status and potential savings. - -### Design - -```go -var compactStats bool - -compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics") -``` - -Output: -- Total issues, by compaction level (0, 1, 2) -- Current DB size vs estimated uncompacted size -- Space savings (KB/MB and %) -- Candidates for each tier with size estimates -- Estimated API cost (Haiku pricing) - -### Acceptance Criteria -- Accurate counts by compaction_level -- Size calculations include all text fields (UTF-8 bytes) -- Shows candidates with eligibility reasons -- Cost estimation based on current Haiku pricing -- JSON output supported -- Clear, readable table format - -### Labels -compaction, stats, reporting - ---- - -## Add EventCompacted to event system - -### Type -task - -### Priority -2 - -### Description - -Add new event type for tracking compaction in audit trail. - -### Design - -1. Add to `internal/types/types.go`: -```go -const EventCompacted EventType = "compacted" -``` - -2. Record event during compaction: -```go -eventData := map[string]interface{}{ - "tier": tier, - "original_size": originalSize, - "compressed_size": compressedSize, - "reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100, -} -``` - -3. Update event display in `bd show`. - -### Acceptance Criteria -- Event includes tier, original_size, compressed_size, reduction_pct -- Shows in event history (`bd events `) -- Exports to JSONL correctly -- `bd show` displays compaction status and marker - -### Labels -compaction, events, audit - ---- - -## Add compaction indicator to `bd show` - -### Type -task - -### Priority -2 - -### Description - -Update `bd show` command to display compaction status prominently. - -### Design - -Add to issue display: -``` -bd-42: Fix authentication bug [CLOSED] 🗜️ - -Status: closed (compacted L1) -... - ---- -💾 Restore: bd compact --restore bd-42 -📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction) -🗜️ Compacted: 2025-10-15 (Tier 1) -``` - -Emoji indicators: -- Tier 1: 🗜️ -- Tier 2: 📦 - -### Acceptance Criteria -- Compaction status visible in title line -- Footer shows size savings when compacted -- Restore command shown for compacted issues -- Works with `--json` output (includes compaction fields) -- Emoji optional (controlled by config or terminal detection) - -### Labels -compaction, ui, display - ---- - -## Write compaction tests - -### Type -task - -### Priority -1 - -### Description - -Comprehensive test suite for compaction functionality. - -### Design - -Test coverage: - -1. **Candidate Identification:** - - Eligibility by time - - Dependency depth checking - - Mixed status dependents - - Edge cases (no deps, circular deps) - -2. **Snapshots:** - - Create and restore - - Multiple snapshots per issue - - Content integrity (UTF-8, special chars) - -3. **Tier 1 Compaction:** - - Single issue compaction - - Batch processing - - Error handling (API failures) - -4. **Tier 2 Compaction:** - - Requires Tier 1 - - Events pruning - - Commit counting fallback - -5. **CLI:** - - All flag combinations - - Dry-run accuracy - - JSON output parsing - -6. **Integration:** - - End-to-end flow - - JSONL export/import - - Restore verification - -### Acceptance Criteria -- Test coverage >80% -- All edge cases covered -- Mock Haiku API in tests (no real API calls) -- Integration tests pass -- `go test ./...` passes -- Benchmarks for performance-critical paths - -### Labels -compaction, testing, quality - ---- - -## Add compaction documentation - -### Type -task - -### Priority -2 - -### Description - -Document compaction feature in README and create detailed COMPACTION.md guide. - -### Design - -**Update README.md:** -- Add to Features section -- CLI examples (dry-run, compact, restore, stats) -- Configuration guide -- Cost analysis - -**Create COMPACTION.md:** -- How compaction works (architecture overview) -- When to use each tier -- Detailed cost analysis with examples -- Safety mechanisms (snapshots, restore, dry-run) -- Troubleshooting guide -- FAQ - -**Create examples/compaction/:** -- `workflow.sh` - Example monthly compaction workflow -- `cron-compact.sh` - Cron job setup -- `auto-compact.sh` - Auto-compaction script - -### Acceptance Criteria -- README.md updated with compaction section -- COMPACTION.md comprehensive and clear -- Examples work as documented (tested) -- Screenshots or ASCII examples included -- API key setup documented (env var vs config) -- Covers common questions and issues - -### Labels -compaction, docs, documentation, examples - ---- - -## Optional: Implement auto-compaction - -### Type -task - -### Priority -3 - -### Description - -Implement automatic compaction triggered by certain operations when enabled via config. - -### Design - -Trigger points (when `auto_compact_enabled = true`): -1. `bd stats` - check and compact if candidates exist -2. `bd export` - before exporting -3. Configurable: on any read operation after N candidates accumulate - -Add: -```go -func (s *SQLiteStorage) AutoCompact(ctx context.Context) error { - enabled, _ := s.GetConfig(ctx, "auto_compact_enabled") - if enabled != "true" { - return nil - } - - // Run Tier 1 compaction on all candidates - // Limit to batch_size to avoid long operations - // Log activity for transparency -} -``` - -### Acceptance Criteria -- Respects auto_compact_enabled config (default: false) -- Limits batch size to avoid blocking operations -- Logs compaction activity (visible with --verbose) -- Can be disabled per-command with `--no-auto-compact` flag -- Only compacts Tier 1 (Tier 2 remains manual) -- Doesn't run more than once per hour (rate limiting) - -### Labels -compaction, automation, optional, v1.2 - ---- - -## Optional: Add git commit counting - -### Type -task - -### Priority -3 - -### Description - -Implement git commit counting for "project time" measurement as alternative to calendar time for Tier 2 eligibility. - -### Design - -```go -func getCommitsSince(closedAt time.Time) (int, error) { - cmd := exec.Command("git", "rev-list", "--count", - fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD") - output, err := cmd.Output() - if err != nil { - return 0, err // Not in git repo or git not available - } - return strconv.Atoi(strings.TrimSpace(string(output))) -} -``` - -Fallback strategies: -1. Git commit count (preferred) -2. Issue counter delta (store counter at close time, compare later) -3. Pure time-based (90 days) - -### Acceptance Criteria -- Counts commits since closed_at timestamp -- Handles git not available gracefully (falls back) -- Fallback to issue counter delta works -- Configurable via compact_tier2_commits config key -- Tested with real git repo -- Works in non-git environments - -### Labels -compaction, git, optional, tier2