Files
beads/COMPACTION_DESIGN.md

45 KiB

Issue Database Compaction Design

Status: Design Phase Created: 2025-10-15 Target: Beads v1.1

Executive Summary

Add intelligent database compaction to beads that uses Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context about past work. The design philosophy: most work is throwaway, and forensic value decays exponentially with time.

Key Metrics

  • Space savings: 70-95% reduction in text volume for old issues
  • Cost: ~$1.10 per 1,000 issues compacted (Haiku pricing)
  • Safety: Full snapshot system with restore capability
  • Performance: Batch processing with parallel workers

Motivation

The Problem

Beads databases grow indefinitely:

  • Issues accumulate detailed description, design, notes, acceptance_criteria fields
  • Events table logs every change forever
  • Old closed issues (especially those with all dependents closed) rarely need full detail
  • Agent context windows work better with concise, relevant information

Why This Matters

  1. Agent Efficiency: Smaller databases → faster queries → clearer agent thinking
  2. Context Management: Agents benefit from summaries of old work, not verbose details
  3. Git Performance: Smaller JSONL exports → faster git operations
  4. Pragmatic Philosophy: Beads is agent memory, not a historical archive
  5. Forensic Decay: Need for detail decreases exponentially after closure

What We Keep

  • Issue ID and title (always)
  • Semantic summary of what was done and why
  • Key architectural decisions
  • Closure outcome
  • Full git history in JSONL commits (ultimate backup)
  • Restore capability via snapshots

Technical Design

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                      Compaction Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  1. Candidate Identification                                     │
│     ↓                                                             │
│     • Query closed issues meeting time + dependency criteria     │
│     • Check dependency depth (recursive CTE)                     │
│     • Calculate size/savings estimates                           │
│                                                                   │
│  2. Snapshot Creation                                            │
│     ↓                                                             │
│     • Store original content in issue_snapshots table            │
│     • Calculate content hash for verification                    │
│     • Enable restore capability                                  │
│                                                                   │
│  3. Haiku Summarization                                          │
│     ↓                                                             │
│     • Batch process with worker pool (5 parallel)                │
│     • Different prompts for Tier 1 vs Tier 2                     │
│     • Handle API errors gracefully                               │
│                                                                   │
│  4. Issue Update                                                 │
│     ↓                                                             │
│     • Replace verbose fields with summary                        │
│     • Set compaction_level and compacted_at                      │
│     • Record event (EventCompacted)                              │
│     • Mark dirty for JSONL export                                │
│                                                                   │
│  5. Optional: Events Pruning                                     │
│     ↓                                                             │
│     • Keep only created/closed events for Tier 2                 │
│     • Archive detailed event history to snapshots                │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Compaction Tiers

Tier 1: Standard Compaction

Eligibility:

  • status = 'closed'
  • closed_at >= 30 days ago (configurable: compact_tier1_days)
  • All issues that depend on this one (via blocks or parent-child) are closed
  • Dependency check depth: 2 levels (configurable: compact_tier1_dep_levels)

Process:

  1. Snapshot original content
  2. Send to Haiku with 300-word summarization prompt
  3. Store summary in description
  4. Clear design, notes, acceptance_criteria
  5. Set compaction_level = 1
  6. Keep all events

Output Format (Haiku prompt):

**Summary:** [2-3 sentences: problem, solution, outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]

Expected Reduction: 70-85% of original text size

Tier 2: Aggressive Compaction

Eligibility:

  • Already at compaction_level = 1
  • closed_at >= 90 days ago (configurable: compact_tier2_days)
  • All dependencies (all 4 types) up to 5 levels deep are closed
  • One of:
    • ≥100 git commits since closed_at (configurable: compact_tier2_commits)
    • ≥500 new issues created since closure
    • Manual override with --force

Process:

  1. Snapshot Tier 1 content
  2. Send to Haiku with 150-word ultra-compression prompt
  3. Store single paragraph in description
  4. Clear all other text fields
  5. Set compaction_level = 2
  6. Prune events: keep only created and closed, move rest to snapshot

Output Format (Haiku prompt):

Single paragraph (≤150 words):
- What was built/fixed
- Why it mattered
- Lasting architectural impact (if any)

Expected Reduction: 90-95% of original text size


Schema Changes

New Columns on issues Table

ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0;
ALTER TABLE issues ADD COLUMN compacted_at DATETIME;
ALTER TABLE issues ADD COLUMN original_size INTEGER; -- bytes before compaction

New Table: issue_snapshots

CREATE TABLE IF NOT EXISTS issue_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    issue_id TEXT NOT NULL,
    snapshot_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    compaction_level INTEGER NOT NULL, -- 1 or 2
    original_size INTEGER NOT NULL,
    compressed_size INTEGER NOT NULL,
    -- JSON blob with original content
    original_content TEXT NOT NULL,
    -- Optional: compressed events for Tier 2
    archived_events TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_snapshots_issue ON issue_snapshots(issue_id);
CREATE INDEX IF NOT EXISTS idx_snapshots_level ON issue_snapshots(compaction_level);

Snapshot JSON Structure:

{
  "description": "original description text...",
  "design": "original design text...",
  "notes": "original notes text...",
  "acceptance_criteria": "original criteria text...",
  "title": "original title",
  "hash": "sha256:abc123..."
}

New Config Keys

INSERT INTO config (key, value) VALUES
    -- Tier 1 settings
    ('compact_tier1_days', '30'),
    ('compact_tier1_dep_levels', '2'),

    -- Tier 2 settings
    ('compact_tier2_days', '90'),
    ('compact_tier2_dep_levels', '5'),
    ('compact_tier2_commits', '100'),
    ('compact_tier2_new_issues', '500'),

    -- API settings
    ('anthropic_api_key', ''),  -- Falls back to ANTHROPIC_API_KEY env var
    ('compact_model', 'claude-3-5-haiku-20241022'),

    -- Performance settings
    ('compact_batch_size', '50'),
    ('compact_parallel_workers', '5'),

    -- Safety settings
    ('auto_compact_enabled', 'false'),  -- Opt-in
    ('compact_events_enabled', 'false'), -- Events pruning (Tier 2)

    -- Display settings
    ('compact_show_savings', 'true');  -- Show size reduction in output

New Event Type

const (
    // ... existing event types ...
    EventCompacted EventType = "compacted"
)

Haiku Integration

Prompt Templates

Tier 1 Prompt:

Summarize this closed software issue. Preserve key decisions, implementation approach, and outcome. Max 300 words.

Title: {{.Title}}
Type: {{.IssueType}}
Priority: {{.Priority}}

Description:
{{.Description}}

Design Notes:
{{.Design}}

Implementation Notes:
{{.Notes}}

Acceptance Criteria:
{{.AcceptanceCriteria}}

Output format:
**Summary:** [2-3 sentences: what problem, what solution, what outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]

Tier 2 Prompt:

Ultra-compress this old closed issue to ≤150 words. Focus on lasting architectural impact.

Title: {{.Title}}
Original Summary (already compressed):
{{.Description}}

Output a single paragraph covering:
- What was built/fixed
- Why it mattered
- Lasting impact (if any)

If there's no lasting impact, just state what was done and that it's resolved.

API Client Structure

package compact

import (
    "github.com/anthropics/anthropic-sdk-go"
)

type HaikuClient struct {
    client *anthropic.Client
    model  string
}

func NewHaikuClient(apiKey string) *HaikuClient {
    return &HaikuClient{
        client: anthropic.NewClient(apiKey),
        model:  anthropic.ModelClaude_3_5_Haiku_20241022,
    }
}

func (h *HaikuClient) Summarize(ctx context.Context, prompt string, maxTokens int) (string, error)

Error Handling

  • Rate limits: Exponential backoff with jitter
  • API errors: Log and skip issue (don't fail entire batch)
  • Network failures: Retry up to 3 times
  • Invalid responses: Fall back to truncation with warning
  • Context length: Truncate input if needed (rare, but possible)

Dependency Checking

Reuse existing recursive CTE logic from ready_issues view, adapted for compaction:

-- Check if issue and N levels of dependents are all closed
WITH RECURSIVE dependent_tree AS (
    -- Base case: the candidate issue
    SELECT id, status, 0 as depth
    FROM issues
    WHERE id = ?

    UNION ALL

    -- Recursive case: issues that depend on this one
    SELECT i.id, i.status, dt.depth + 1
    FROM dependent_tree dt
    JOIN dependencies d ON d.depends_on_id = dt.id
    JOIN issues i ON i.id = d.issue_id
    WHERE d.type IN ('blocks', 'parent-child')  -- Only blocking deps matter
      AND dt.depth < ?  -- Max depth parameter
)
SELECT CASE
    WHEN COUNT(*) = SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
    THEN 1 ELSE 0 END as all_closed
FROM dependent_tree;

Performance: This query is O(N) where N is the number of dependents. With proper indexes, should be <10ms per issue.


Git Integration (Optional)

For "project time" measurement via commit counting:

func getCommitsSince(closedAt time.Time) (int, error) {
    cmd := exec.Command("git", "rev-list", "--count",
        fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)),
        "HEAD")
    output, err := cmd.Output()
    if err != nil {
        return 0, err
    }
    return strconv.Atoi(strings.TrimSpace(string(output)))
}

Fallback: If git unavailable or not in a repo, use issue counter delta:

SELECT last_id FROM issue_counters WHERE prefix = ?
-- Store at close time, compare at compaction time

CLI Commands

bd compact - Main Command

bd compact [flags]

Flags:
  --dry-run              Show what would be compacted without doing it
  --tier int            Compaction tier (1 or 2, default: 1)
  --all                 Process all eligible issues (default: preview only)
  --id string           Compact specific issue by ID
  --force               Bypass eligibility checks (with --id)
  --batch-size int      Issues per batch (default: from config)
  --workers int         Parallel workers (default: from config)
  --json                JSON output for agents

Examples:
  bd compact --dry-run                 # Preview Tier 1 candidates
  bd compact --dry-run --tier 2        # Preview Tier 2 candidates
  bd compact --all                     # Compact all Tier 1 candidates
  bd compact --tier 2 --all            # Compact all Tier 2 candidates
  bd compact --id bd-42                # Compact specific issue
  bd compact --id bd-42 --force        # Force compact even if recent

bd compact --restore - Restore Compacted Issues

bd compact --restore <issue-id> [flags]

Flags:
  --level int           Restore to specific snapshot level (default: latest)
  --json                JSON output

Examples:
  bd compact --restore bd-42           # Restore bd-42 from latest snapshot
  bd compact --restore bd-42 --level 1 # Restore from Tier 1 snapshot

bd compact --stats - Compaction Statistics

bd compact --stats [flags]

Flags:
  --json                JSON output

Example output:
  === Compaction Statistics ===

  Total Issues: 1,247
  Compacted (Tier 1): 342 (27.4%)
  Compacted (Tier 2): 89 (7.1%)

  Database Size: 2.3 MB
  Estimated Uncompacted Size: 8.7 MB
  Space Savings: 6.4 MB (73.6%)

  Candidates:
    Tier 1: 47 issues (est. 320 KB → 64 KB)
    Tier 2: 15 issues (est. 180 KB → 18 KB)

  Estimated Compaction Cost: $0.04 (Haiku)

Output Examples

Dry Run Output

$ bd compact --dry-run

=== Tier 1 Compaction Preview ===
Eligibility: Closed ≥30 days, 2 levels of dependents closed

bd-42: Fix authentication bug (P1, bug)
  Closed: 45 days ago
  Size: 2,341 bytes → ~468 bytes (80% reduction)
  Dependents: bd-43, bd-44 (both closed)

bd-57: Add login form (P2, feature)
  Closed: 38 days ago
  Size: 1,823 bytes → ~365 bytes (80% reduction)
  Dependents: (none)

bd-58: Refactor auth middleware (P2, task)
  Closed: 35 days ago
  Size: 3,102 bytes → ~620 bytes (80% reduction)
  Dependents: bd-59 (closed)

... (44 more issues)

Total: 47 issues
Estimated reduction: 87.3 KB → 17.5 KB (80%)
Estimated cost: $0.03 (Haiku API)

Run with --all to compact these issues.

Compaction Progress

$ bd compact --all

Compacting 47 issues (Tier 1)...

Creating snapshots... [████████████████████] 47/47
Calling Haiku API...   [████████████████████] 47/47 (12s, $0.027)
Updating issues...     [████████████████████] 47/47

✓ Successfully compacted 47 issues
  Size reduction: 87.3 KB → 18.2 KB (79.1%)
  API cost: $0.027
  Time: 14.3s

Compacted issues will be exported to .beads/issues.jsonl

Show Compacted Issue

$ bd show bd-42

bd-42: Fix authentication bug [CLOSED] 🗜️

Status: closed (compacted L1)
Priority: 1 (High)
Type: bug
Closed: 2025-08-31 (45 days ago)
Compacted: 2025-10-15 (saved 1,873 bytes)

**Summary:** Fixed race condition in JWT token refresh logic causing intermittent
401 errors under high load. Implemented mutex-based locking around token refresh
operations. All users can now stay authenticated reliably during concurrent requests.

**Key Decisions:**
- Used sync.RWMutex instead of channels for simpler reasoning about lock state
- Added exponential backoff to token refresh to prevent thundering herd
- Preserved existing token format for backward compatibility with mobile clients

**Resolution:** Deployed to production on Aug 31, monitored for 2 weeks with zero
401 errors. Closed after confirming fix with load testing.

---
💾 Restore: bd compact --restore bd-42
📊 Original size: 2,341 bytes | Compressed: 468 bytes (80% reduction)

Safety Mechanisms

  1. Snapshot-First: Always create snapshot before modifying issue
  2. Restore Capability: Full restore from snapshots with --restore
  3. Opt-In Auto-Compaction: Disabled by default (auto_compact_enabled = false)
  4. Dry-Run Required: Preview before committing with --dry-run
  5. Git Backup: JSONL exports preserve full history in git commits
  6. Audit Trail: EventCompacted records what was done and when
  7. Size Verification: Track original_size and compressed_size for validation
  8. Idempotent: Re-running compaction on already-compacted issues is safe (no-op)
  9. Graceful Degradation: API failures don't corrupt data, just skip issues
  10. Reversible: Restore is always available, even after git push

Testing Strategy

Unit Tests

  1. Candidate Identification:

    • Issues meeting time criteria
    • Dependency depth checking
    • Mixed status dependents (some closed, some open)
    • Edge case: circular dependencies
  2. Haiku Client:

    • Mock API responses
    • Rate limit handling
    • Error recovery
    • Prompt rendering
  3. Snapshot Management:

    • Create snapshot
    • Restore from snapshot
    • Multiple snapshots per issue (Tier 1 → Tier 2)
    • Snapshot integrity (hash verification)
  4. Size Calculation:

    • Accurate byte counting
    • UTF-8 handling
    • Empty fields

Integration Tests

  1. End-to-End Compaction:

    • Create test issues
    • Age them (mock timestamps)
    • Run compaction
    • Verify summaries
    • Restore and verify
  2. Batch Processing:

    • Large batches (100+ issues)
    • Parallel worker coordination
    • Error handling mid-batch
  3. JSONL Export:

    • Compacted issues export correctly
    • Import preserves compaction_level
    • Round-trip fidelity
  4. CLI Commands:

    • All flag combinations
    • JSON output parsing
    • Error messages

Manual Testing Checklist

  • Dry-run shows accurate candidates
  • Compaction reduces size as expected
  • Haiku summaries are high quality
  • Restore returns exact original content
  • Stats command shows correct numbers
  • Auto-compaction respects config
  • Git workflow (commit → pull → auto-compact)
  • Multi-machine workflow with compaction
  • API key handling (env var vs config)
  • Rate limit handling under load

Performance Considerations

Scalability

Small databases (<1,000 issues):

  • Full scan acceptable
  • Compact all eligible in one run
  • <1 minute total time

Medium databases (1,000-10,000 issues):

  • Batch processing required
  • Progress reporting essential
  • 5-10 minutes total time

Large databases (>10,000 issues):

  • Incremental compaction (process N per run)
  • Consider scheduled background job
  • 30-60 minutes total time

Optimization Strategies

  1. Index Usage:

    • idx_issues_status - filter closed issues
    • idx_dependencies_depends_on - dependency traversal
    • idx_snapshots_issue - restore lookups
  2. Batch Sizing:

    • Default 50 issues per batch
    • Configurable via compact_batch_size
    • Trade-off: larger batches = fewer commits, more RAM
  3. Parallel Workers:

    • Default 5 parallel Haiku calls
    • Configurable via compact_parallel_workers
    • Respects Haiku rate limits
  4. Query Optimization:

    • Use prepared statements for snapshots
    • Reuse dependency check query
    • Avoid N+1 queries in batch operations

Cost Analysis

Haiku Pricing (as of 2025-10-15)

  • Input: $0.25 per million tokens (~$0.0003 per 1K tokens)
  • Output: $1.25 per million tokens (~$0.0013 per 1K tokens)

Per-Issue Estimates

Tier 1:

  • Input: ~1,000 tokens (full issue content)
  • Output: ~400 tokens (summary)
  • Cost: ~$0.0008 per issue

Tier 2:

  • Input: ~500 tokens (Tier 1 summary)
  • Output: ~200 tokens (ultra-compressed)
  • Cost: ~$0.0003 per issue

Batch Costs

Issues Tier 1 Cost Tier 2 Cost Total
100 $0.08 $0.03 $0.11
500 $0.40 $0.15 $0.55
1,000 $0.80 $0.30 $1.10
5,000 $4.00 $1.50 $5.50

Monthly budget (typical project):

  • ~50-100 new issues closed per month
  • ~30-60 days later, eligible for Tier 1
  • Monthly cost: $0.04 - $0.08 (negligible)

Configuration Examples

Conservative Setup (manual only)

bd init
bd config set compact_tier1_days 60
bd config set compact_tier2_days 180
bd config set auto_compact_enabled false

# Run manually when needed
bd compact --dry-run
bd compact --all  # after review

Aggressive Setup (auto-compact)

bd config set compact_tier1_days 14
bd config set compact_tier2_days 45
bd config set auto_compact_enabled true
bd config set compact_batch_size 100

# Auto-compacts on bd stats, bd export
bd stats  # triggers compaction if candidates exist

Development Setup (fast feedback)

bd config set compact_tier1_days 1
bd config set compact_tier2_days 3
bd config set compact_tier1_dep_levels 1
bd config set compact_tier2_dep_levels 2

# Test compaction on recently closed issues

Future Enhancements

Phase 2 (Post-MVP)

  1. Local Model Support:

    • Use Ollama for zero-cost summarization
    • Fallback chain: Haiku → Ollama → truncation
  2. Custom Prompts:

    • User-defined summarization prompts
    • Per-project templates
    • Domain-specific summaries (e.g., "focus on API changes")
  3. Selective Preservation:

    • Mark issues as "do not compact"
    • Preserve certain labels (e.g., architecture, security)
    • Field-level preservation (e.g., keep design notes, compress others)
  4. Analytics:

    • Compaction effectiveness over time
    • Cost tracking per run
    • Quality feedback (user ratings of summaries)
  5. Smart Scheduling:

    • Auto-detect optimal compaction times
    • Avoid compaction during active development
    • Weekend/off-hours processing
  6. Multi-Tier Expansion:

    • Tier 3: Archive to separate file
    • Tier 4: Delete (with backup)
    • Configurable tier chain

Phase 3 (Advanced)

  1. Distributed Compaction:

    • Coordinate across multiple machines
    • Avoid duplicate work in team settings
    • Lock mechanism for compaction jobs
  2. Incremental Summarization:

    • Re-summarize if issue reopened
    • Preserve history of summaries
    • Version tracking for prompts
  3. Search Integration:

    • Full-text search includes summaries
    • Boost compacted issues in search results
    • Semantic search using embeddings

Implementation Issues

The following issues should be created in beads to track this work. They're designed to be implemented in dependency order.


Epic: Issue Database Compaction

Issue ID: (auto-generated) Title: Epic: Add intelligent database compaction with Claude Haiku Type: epic Priority: 2 Description:

Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context.

Goals:

  • 70-95% space reduction for eligible issues
  • Full restore capability via snapshots
  • Opt-in with dry-run safety
  • ~$1 per 1,000 issues compacted

Acceptance Criteria:

  • Schema migration with snapshots table
  • Haiku integration for summarization
  • Two-tier compaction (30d, 90d)
  • CLI with dry-run, restore, stats
  • Full test coverage
  • Documentation complete

Dependencies: None (this is the epic)


Issue 1: Add compaction schema and migrations

Type: task Priority: 1 Dependencies: Blocks all other compaction work

Description:

Add database schema support for issue compaction tracking and snapshot storage.

Design:

Add three columns to issues table:

  • compaction_level INTEGER DEFAULT 0 - 0=original, 1=tier1, 2=tier2
  • compacted_at DATETIME - when last compacted
  • original_size INTEGER - bytes before first compaction

Create issue_snapshots table:

CREATE TABLE issue_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    issue_id TEXT NOT NULL,
    snapshot_time DATETIME NOT NULL,
    compaction_level INTEGER NOT NULL,
    original_size INTEGER NOT NULL,
    compressed_size INTEGER NOT NULL,
    original_content TEXT NOT NULL,  -- JSON blob
    archived_events TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);

Add indexes:

  • idx_snapshots_issue on issue_id
  • idx_snapshots_level on compaction_level

Add migration functions in internal/storage/sqlite/sqlite.go:

  • migrateCompactionColumns(db *sql.DB) error
  • migrateSnapshotsTable(db *sql.DB) error

Acceptance Criteria:

  • Existing databases migrate automatically
  • New databases include columns by default
  • Migration is idempotent (safe to run multiple times)
  • No data loss during migration
  • Tests verify migration on fresh and existing DBs

Estimated Time: 4 hours


Issue 2: Add compaction configuration keys

Type: task Priority: 1 Dependencies: Blocks compaction logic

Description:

Add configuration keys for compaction behavior with sensible defaults.

Implementation:

Add to internal/storage/sqlite/schema.go initial config:

INSERT OR IGNORE INTO config (key, value) VALUES
    ('compact_tier1_days', '30'),
    ('compact_tier1_dep_levels', '2'),
    ('compact_tier2_days', '90'),
    ('compact_tier2_dep_levels', '5'),
    ('compact_tier2_commits', '100'),
    ('compact_model', 'claude-3-5-haiku-20241022'),
    ('compact_batch_size', '50'),
    ('compact_parallel_workers', '5'),
    ('auto_compact_enabled', 'false');

Add helper functions in internal/storage/ or cmd/bd/:

func getCompactConfig(ctx context.Context, store Storage) (*CompactConfig, error)

Acceptance Criteria:

  • Config keys created on init
  • Existing DBs get defaults on migration
  • bd config get/set works with all keys
  • Type validation (days=int, enabled=bool)
  • Documentation in README.md

Estimated Time: 2 hours


Issue 3: Implement candidate identification queries

Type: task Priority: 1 Dependencies: Needs schema and config

Description:

Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction.

Design:

Create internal/storage/sqlite/compact.go with:

type CompactionCandidate struct {
    IssueID       string
    ClosedAt      time.Time
    OriginalSize  int
    EstimatedSize int
    DependentCount int
}

func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error)

Implement recursive dependency checking using CTE similar to ready_issues view.

Acceptance Criteria:

  • Tier 1 query filters by days and dependency depth
  • Tier 2 query includes commit/issue count checks
  • Dependency checking handles circular deps gracefully
  • Performance: <100ms for 10,000 issue database
  • Tests cover edge cases (no deps, circular deps, mixed status)

Estimated Time: 6 hours


Issue 4: Create Haiku client and prompt templates

Type: task Priority: 1 Dependencies: None (can work in parallel)

Description:

Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization.

Implementation:

Create internal/compact/haiku.go:

type HaikuClient struct {
    client *anthropic.Client
    model  string
}

func NewHaikuClient(apiKey string) (*HaikuClient, error)
func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error)
func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error)

Use text/template for prompt rendering.

Add error handling:

  • Rate limit retry with exponential backoff
  • Network errors: 3 retries
  • Invalid responses: return error, don't corrupt data

Acceptance Criteria:

  • API key from env var or config (env takes precedence)
  • Prompts render correctly with template
  • Rate limiting handled gracefully
  • Mock tests for API calls
  • Real integration test (optional, requires API key)

Estimated Time: 6 hours


Issue 5: Implement snapshot creation and restoration

Type: task Priority: 1 Dependencies: Needs schema changes

Description:

Implement snapshot creation before compaction and restoration capability.

Implementation:

Add to internal/storage/sqlite/compact.go:

type Snapshot struct {
    ID              int64
    IssueID         string
    SnapshotTime    time.Time
    CompactionLevel int
    OriginalSize    int
    CompressedSize  int
    OriginalContent string  // JSON
    ArchivedEvents  string  // JSON, nullable
}

func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error
func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error
func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error)

Snapshot JSON structure:

{
  "description": "...",
  "design": "...",
  "notes": "...",
  "acceptance_criteria": "...",
  "title": "..."
}

Acceptance Criteria:

  • Snapshot created atomically with compaction
  • Restore returns exact original content
  • Multiple snapshots per issue supported (Tier 1 → Tier 2)
  • JSON encoding handles special characters
  • Size calculation is accurate (UTF-8 bytes)

Estimated Time: 5 hours


Issue 6: Implement Tier 1 compaction logic

Type: task Priority: 1 Dependencies: Needs Haiku client, snapshots, candidate queries

Description:

Implement the core Tier 1 compaction process: snapshot → summarize → update.

Implementation:

Add to internal/compact/compactor.go:

type Compactor struct {
    store  storage.Storage
    haiku  *HaikuClient
    config *CompactConfig
}

func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error)
func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error

Process:

  1. Verify eligibility
  2. Calculate original size
  3. Create snapshot
  4. Call Haiku for summary
  5. Update issue (description = summary, clear design/notes/criteria)
  6. Set compaction_level = 1, compacted_at = now
  7. Record EventCompacted
  8. Mark dirty for export

Acceptance Criteria:

  • Single issue compaction works end-to-end
  • Batch processing with parallel workers
  • Errors don't corrupt database (transaction rollback)
  • EventCompacted includes size savings
  • Dry-run mode (identify + size estimate only)

Estimated Time: 8 hours


Issue 7: Implement Tier 2 compaction logic

Type: task Priority: 2 Dependencies: Needs Tier 1 working

Description:

Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning.

Implementation:

Add to internal/compact/compactor.go:

func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error

Process:

  1. Verify issue is at compaction_level = 1
  2. Check Tier 2 eligibility (days, deps, commits/issues)
  3. Create Tier 2 snapshot
  4. Call Haiku with ultra-compression prompt
  5. Update issue (description = single paragraph, clear all else)
  6. Set compaction_level = 2
  7. Optionally prune events (keep created/closed, archive rest)

Acceptance Criteria:

  • Requires existing Tier 1 compaction
  • Git commit counting works (with fallback)
  • Events optionally pruned (config: compact_events_enabled)
  • Archived events stored in snapshot
  • Size reduction 90-95%

Estimated Time: 6 hours


Issue 8: Add bd compact CLI command

Type: task Priority: 1 Dependencies: Needs Tier 1 compaction logic

Description:

Implement the bd compact command with dry-run, batch processing, and progress reporting.

Implementation:

Create cmd/bd/compact.go:

var compactCmd = &cobra.Command{
    Use:   "compact",
    Short: "Compact old closed issues to save space",
    Long:  `...`,
}

var (
    compactDryRun    bool
    compactTier      int
    compactAll       bool
    compactID        string
    compactForce     bool
    compactBatchSize int
    compactWorkers   int
)

func init() {
    compactCmd.Flags().BoolVar(&compactDryRun, "dry-run", false, "Preview without compacting")
    compactCmd.Flags().IntVar(&compactTier, "tier", 1, "Compaction tier (1 or 2)")
    compactCmd.Flags().BoolVar(&compactAll, "all", false, "Process all candidates")
    compactCmd.Flags().StringVar(&compactID, "id", "", "Compact specific issue")
    compactCmd.Flags().BoolVar(&compactForce, "force", false, "Force compact (bypass checks)")
    // ... more flags
}

Output:

  • Dry-run: Table of candidates with size estimates
  • Actual run: Progress bar with batch updates
  • Summary: Count, size saved, cost, time

Acceptance Criteria:

  • --dry-run shows accurate preview
  • --all processes all candidates
  • --id compacts single issue
  • --force bypasses eligibility checks (with --id)
  • Progress bar for batches
  • JSON output with --json
  • Exit code: 0=success, 1=error

Estimated Time: 6 hours


Issue 9: Add bd compact --restore functionality

Type: task Priority: 2 Dependencies: Needs snapshots and CLI

Description:

Implement restore command to undo compaction from snapshots.

Implementation:

Add to cmd/bd/compact.go:

var compactRestore string

compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot")

Process:

  1. Load snapshot for issue
  2. Parse JSON content
  3. Update issue with original content
  4. Set compaction_level = 0, compacted_at = NULL
  5. Record EventRestored
  6. Mark dirty

Acceptance Criteria:

  • Restores exact original content
  • Handles multiple snapshots (prompt user or use latest)
  • --level flag to choose snapshot
  • Updates compaction_level correctly
  • Exports restored content to JSONL

Estimated Time: 4 hours


Issue 10: Add bd compact --stats command

Type: task Priority: 2 Dependencies: Needs compaction working

Description:

Add statistics command showing compaction status and potential savings.

Implementation:

var compactStats bool

compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics")

Output:

  • Total issues, by compaction level
  • Current DB size vs estimated uncompacted size
  • Space savings (MB and %)
  • Candidates for each tier with estimates
  • Estimated API cost

Acceptance Criteria:

  • Accurate counts by compaction_level
  • Size calculations include all text fields
  • Shows candidates with eligibility reasons
  • Cost estimation based on Haiku pricing
  • JSON output supported

Estimated Time: 4 hours


Issue 11: Add EventCompacted to event system

Type: task Priority: 2 Dependencies: Needs schema changes

Description:

Add new event type for tracking compaction in audit trail.

Implementation:

  1. Add to internal/types/types.go:
const EventCompacted EventType = "compacted"
  1. Record event during compaction:
eventData := map[string]interface{}{
    "tier": tier,
    "original_size": originalSize,
    "compressed_size": compressedSize,
    "reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100,
}
  1. Show in bd show output:
Events:
  2025-10-15: compacted (tier 1, saved 1.8KB, 80%)
  2025-08-31: closed by alice
  2025-08-20: created by alice

Acceptance Criteria:

  • Event includes tier and size info
  • Shows in event history
  • Exports to JSONL
  • bd show displays compaction marker

Estimated Time: 3 hours


Issue 12: Add compaction indicator to bd show

Type: task Priority: 2 Dependencies: Needs compaction working

Description:

Update bd show command to display compaction status prominently.

Implementation:

Add to issue display:

bd-42: Fix authentication bug [CLOSED] 🗜️

Status: closed (compacted L1)
...

---
💾 Restore: bd compact --restore bd-42
📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction)
🗜️ Compacted: 2025-10-15 (Tier 1)

Show different emoji for tiers:

  • Tier 1: 🗜️
  • Tier 2: 📦

Acceptance Criteria:

  • Compaction status visible in title
  • Footer shows size savings
  • Restore command shown
  • Works with --json output

Estimated Time: 2 hours


Issue 13: Write compaction tests

Type: task Priority: 1 Dependencies: Needs all compaction logic

Description:

Comprehensive test suite for compaction functionality.

Test Coverage:

  1. Candidate Identification:

    • Eligibility by time
    • Dependency depth checking
    • Mixed status dependents
    • Edge cases (no deps, circular)
  2. Snapshots:

    • Create and restore
    • Multiple snapshots per issue
    • Content integrity
  3. Tier 1 Compaction:

    • Single issue
    • Batch processing
    • Error handling
  4. Tier 2 Compaction:

    • Requires Tier 1
    • Events pruning
    • Commit counting fallback
  5. CLI:

    • All flag combinations
    • Dry-run accuracy
    • JSON output
  6. Integration:

    • End-to-end flow
    • JSONL export/import
    • Restore verification

Acceptance Criteria:

  • Test coverage >80%
  • All edge cases covered
  • Mock Haiku API in tests
  • Integration tests pass
  • go test ./... passes

Estimated Time: 8 hours


Issue 14: Add compaction documentation

Type: task Priority: 2 Dependencies: All features complete

Description:

Document compaction feature in README and create COMPACTION.md guide.

Content:

Update README.md:

  • Add to Features section
  • CLI examples
  • Configuration guide

Create COMPACTION.md:

  • How compaction works
  • When to use each tier
  • Cost analysis
  • Safety mechanisms
  • Troubleshooting

Create examples/compaction/:

  • Example workflow
  • Cron job setup
  • Auto-compaction script

Acceptance Criteria:

  • README.md updated
  • COMPACTION.md comprehensive
  • Examples work as documented
  • Screenshots/examples included
  • API key setup documented

Estimated Time: 4 hours


Issue 15: Optional: Implement auto-compaction

Type: task Priority: 3 (nice-to-have) Dependencies: Needs all compaction working

Description:

Implement automatic compaction triggered by certain operations when enabled via config.

Implementation:

Trigger points (when auto_compact_enabled = true):

  1. bd stats - check and compact if candidates exist
  2. bd export - before exporting
  3. Background timer (optional, via daemon)

Add:

func (s *SQLiteStorage) AutoCompact(ctx context.Context) error {
    enabled, _ := s.GetConfig(ctx, "auto_compact_enabled")
    if enabled != "true" {
        return nil
    }

    // Run Tier 1 compaction on all candidates
    // Limit to batch_size to avoid long operations
}

Acceptance Criteria:

  • Respects auto_compact_enabled config
  • Limits batch size to avoid blocking
  • Logs compaction activity
  • Can be disabled per-command with --no-auto-compact

Estimated Time: 4 hours


Issue 16: Optional: Add git commit counting

Type: task Priority: 3 (nice-to-have) Dependencies: Needs Tier 2 logic

Description:

Implement git commit counting for "project time" measurement as alternative to calendar time.

Implementation:

func getCommitsSince(closedAt time.Time) (int, error) {
    cmd := exec.Command("git", "rev-list", "--count",
        fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD")
    output, err := cmd.Output()
    if err != nil {
        return 0, err  // Not in git repo or git not available
    }
    return strconv.Atoi(strings.TrimSpace(string(output)))
}

Fallback to issue counter delta if git unavailable.

Acceptance Criteria:

  • Counts commits since closed_at
  • Handles git not available gracefully
  • Fallback to issue counter works
  • Configurable via compact_tier2_commits
  • Tested with real git repo

Estimated Time: 3 hours


Success Metrics

Technical Metrics

  • 70-85% size reduction for Tier 1
  • 90-95% size reduction for Tier 2
  • <100ms candidate identification query
  • <2s per issue compaction (Haiku latency)
  • Zero data loss (all restore tests pass)

Quality Metrics

  • Haiku summaries preserve key information
  • Developers can understand compacted issues
  • Restore returns exact original content
  • No corruption in multi-machine workflows

Operational Metrics

  • Cost: <$1.50 per 1,000 issues
  • Dry-run accuracy: 95%+ estimate correctness
  • Error rate: <1% API failures (with retry)
  • User adoption: Docs clear, examples work

Rollout Plan

Phase 1: Alpha (Internal Testing)

  1. Merge compaction feature to main
  2. Test on beads' own database
  3. Verify JSONL export/import
  4. Validate Haiku summaries
  5. Fix any critical bugs

Phase 2: Beta (Opt-In)

  1. Announce in README (opt-in, experimental)
  2. Gather feedback from early adopters
  3. Iterate on prompt templates
  4. Add telemetry (optional, with consent)

Phase 3: Stable (Default Disabled)

  1. Mark feature as stable
  2. Keep auto_compact_enabled = false by default
  3. Encourage manual bd compact --dry-run first
  4. Document in quickstart guide

Phase 4: Mature (Consider Auto-Enable)

  1. After 6+ months of stability
  2. Consider auto-compaction for new users
  3. Provide migration guide for disabling

Risks and Mitigations

Risk Impact Likelihood Mitigation
Haiku summaries lose critical info High Medium Manual review in dry-run, restore capability, improve prompts
API rate limits during batch Medium Medium Exponential backoff, respect rate limits, batch sizing
JSONL merge conflicts increase Medium Low Compaction is deterministic per issue, git handles well
Users accidentally compress important issues High Low Dry-run required, restore available, snapshots permanent
Cost higher than expected Low Low Dry-run shows estimates, configurable batch sizes
Schema migration fails High Very Low Idempotent migrations, tested on existing DBs

Open Questions

  1. Should compaction be reversible forever, or expire snapshots?

    • Recommendation: Keep snapshots indefinitely (disk is cheap)
  2. Should we compress snapshots themselves (gzip)?

    • Recommendation: Not in MVP, add if storage becomes issue
  3. Should tier selection be automatic or manual?

    • Recommendation: Manual in MVP, auto-tier in future
  4. How to handle issues compacted on one machine but not another?

    • Answer: JSONL export includes compaction_level, imports preserve it
  5. Should we support custom models (Sonnet, Opus)?

    • Recommendation: Haiku only in MVP, add later if needed

Appendix: Example Workflow

Typical Monthly Compaction

# 1. Check what's eligible
$ bd compact --dry-run
=== Tier 1 Candidates ===
42 issues eligible (closed >30 days, deps closed)
Est. reduction: 127 KB → 25 KB (80%)
Est. cost: $0.03

# 2. Review candidates manually
$ bd list --status closed --json | jq 'map(select(.compaction_level == 0))'

# 3. Compact Tier 1
$ bd compact --all
✓ Compacted 42 issues in 18s ($0.03)

# 4. Check Tier 2 candidates (optional)
$ bd compact --dry-run --tier 2
=== Tier 2 Candidates ===
8 issues eligible (closed >90 days, 100+ commits since)
Est. reduction: 45 KB → 4 KB (91%)
Est. cost: $0.01

# 5. Compact Tier 2
$ bd compact --all --tier 2
✓ Compacted 8 issues in 6s ($0.01)

# 6. Export and commit
$ bd export -o .beads/issues.jsonl
$ git add .beads/issues.jsonl
$ git commit -m "Compact 50 old issues (saved 143 KB)"
$ git push

# 7. View stats
$ bd compact --stats
Total Space Saved: 143 KB (82% reduction)
Database Size: 2.1 MB (down from 2.3 MB)

Conclusion

This design provides a comprehensive, safe, and cost-effective way to keep beads databases lightweight while preserving essential context. The two-tier approach balances aggressiveness with safety, and the snapshot system ensures full reversibility.

The use of Claude Haiku for semantic compression is key - it preserves meaning rather than just truncating text. At ~$1 per 1,000 issues, the cost is negligible for the value provided.

Implementation is straightforward with clear phases and well-defined issues. The MVP (Tier 1 only) can be delivered in ~40 hours of work, with Tier 2 and enhancements following incrementally.

This aligns perfectly with beads' philosophy: pragmatic, agent-focused, and evolutionarily designed.