Files

Steve Yegge 0817b44469 Add compaction feature design and file 17 issues (bd-251 to bd-267)

Amp-Thread-ID: https://ampcode.com/threads/T-8535178e-f814-43e7-a8a0-4aea93ef3970
Co-authored-by: Amp <amp@ampcode.com>

2025-10-15 21:53:08 -07:00

45 KiB

Raw Blame History

Issue Database Compaction Design

Status: Design Phase Created: 2025-10-15 Target: Beads v1.1

Executive Summary

Add intelligent database compaction to beads that uses Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context about past work. The design philosophy: most work is throwaway, and forensic value decays exponentially with time.

Key Metrics

Space savings: 70-95% reduction in text volume for old issues
Cost: ~$1.10 per 1,000 issues compacted (Haiku pricing)
Safety: Full snapshot system with restore capability
Performance: Batch processing with parallel workers

Motivation

The Problem

Beads databases grow indefinitely:

Issues accumulate detailed description, design, notes, acceptance_criteria fields
Events table logs every change forever
Old closed issues (especially those with all dependents closed) rarely need full detail
Agent context windows work better with concise, relevant information

Why This Matters

Agent Efficiency: Smaller databases → faster queries → clearer agent thinking
Context Management: Agents benefit from summaries of old work, not verbose details
Git Performance: Smaller JSONL exports → faster git operations
Pragmatic Philosophy: Beads is agent memory, not a historical archive
Forensic Decay: Need for detail decreases exponentially after closure

What We Keep

Issue ID and title (always)
Semantic summary of what was done and why
Key architectural decisions
Closure outcome
Full git history in JSONL commits (ultimate backup)
Restore capability via snapshots

Technical Design

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                      Compaction Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  1. Candidate Identification                                     │
│     ↓                                                             │
│     • Query closed issues meeting time + dependency criteria     │
│     • Check dependency depth (recursive CTE)                     │
│     • Calculate size/savings estimates                           │
│                                                                   │
│  2. Snapshot Creation                                            │
│     ↓                                                             │
│     • Store original content in issue_snapshots table            │
│     • Calculate content hash for verification                    │
│     • Enable restore capability                                  │
│                                                                   │
│  3. Haiku Summarization                                          │
│     ↓                                                             │
│     • Batch process with worker pool (5 parallel)                │
│     • Different prompts for Tier 1 vs Tier 2                     │
│     • Handle API errors gracefully                               │
│                                                                   │
│  4. Issue Update                                                 │
│     ↓                                                             │
│     • Replace verbose fields with summary                        │
│     • Set compaction_level and compacted_at                      │
│     • Record event (EventCompacted)                              │
│     • Mark dirty for JSONL export                                │
│                                                                   │
│  5. Optional: Events Pruning                                     │
│     ↓                                                             │
│     • Keep only created/closed events for Tier 2                 │
│     • Archive detailed event history to snapshots                │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Compaction Tiers

Tier 1: Standard Compaction

Eligibility:

status = 'closed'
closed_at >= 30 days ago (configurable: compact_tier1_days)
All issues that depend on this one (via blocks or parent-child) are closed
Dependency check depth: 2 levels (configurable: compact_tier1_dep_levels)

Process:

Snapshot original content
Send to Haiku with 300-word summarization prompt
Store summary in description
Clear design, notes, acceptance_criteria
Set compaction_level = 1
Keep all events

Output Format (Haiku prompt):

**Summary:** [2-3 sentences: problem, solution, outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]

Expected Reduction: 70-85% of original text size

Tier 2: Aggressive Compaction

Eligibility:

Already at compaction_level = 1
closed_at >= 90 days ago (configurable: compact_tier2_days)
All dependencies (all 4 types) up to 5 levels deep are closed
One of:
- ≥100 git commits since closed_at (configurable: compact_tier2_commits)
- ≥500 new issues created since closure
- Manual override with --force

Process:

Snapshot Tier 1 content
Send to Haiku with 150-word ultra-compression prompt
Store single paragraph in description
Clear all other text fields
Set compaction_level = 2
Prune events: keep only created and closed, move rest to snapshot

Output Format (Haiku prompt):

Single paragraph (≤150 words):
- What was built/fixed
- Why it mattered
- Lasting architectural impact (if any)

Expected Reduction: 90-95% of original text size

Schema Changes

New Columns on `issues` Table

ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0;
ALTER TABLE issues ADD COLUMN compacted_at DATETIME;
ALTER TABLE issues ADD COLUMN original_size INTEGER; -- bytes before compaction

New Table: `issue_snapshots`

CREATE TABLE IF NOT EXISTS issue_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    issue_id TEXT NOT NULL,
    snapshot_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    compaction_level INTEGER NOT NULL, -- 1 or 2
    original_size INTEGER NOT NULL,
    compressed_size INTEGER NOT NULL,
    -- JSON blob with original content
    original_content TEXT NOT NULL,
    -- Optional: compressed events for Tier 2
    archived_events TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_snapshots_issue ON issue_snapshots(issue_id);
CREATE INDEX IF NOT EXISTS idx_snapshots_level ON issue_snapshots(compaction_level);

Snapshot JSON Structure:

{
  "description": "original description text...",
  "design": "original design text...",
  "notes": "original notes text...",
  "acceptance_criteria": "original criteria text...",
  "title": "original title",
  "hash": "sha256:abc123..."
}

New Config Keys

INSERT INTO config (key, value) VALUES
    -- Tier 1 settings
    ('compact_tier1_days', '30'),
    ('compact_tier1_dep_levels', '2'),

    -- Tier 2 settings
    ('compact_tier2_days', '90'),
    ('compact_tier2_dep_levels', '5'),
    ('compact_tier2_commits', '100'),
    ('compact_tier2_new_issues', '500'),

    -- API settings
    ('anthropic_api_key', ''),  -- Falls back to ANTHROPIC_API_KEY env var
    ('compact_model', 'claude-3-5-haiku-20241022'),

    -- Performance settings
    ('compact_batch_size', '50'),
    ('compact_parallel_workers', '5'),

    -- Safety settings
    ('auto_compact_enabled', 'false'),  -- Opt-in
    ('compact_events_enabled', 'false'), -- Events pruning (Tier 2)

    -- Display settings
    ('compact_show_savings', 'true');  -- Show size reduction in output

New Event Type

const (
    // ... existing event types ...
    EventCompacted EventType = "compacted"
)

Haiku Integration

Prompt Templates

Tier 1 Prompt:

Summarize this closed software issue. Preserve key decisions, implementation approach, and outcome. Max 300 words.

Title: {{.Title}}
Type: {{.IssueType}}
Priority: {{.Priority}}

Description:
{{.Description}}

Design Notes:
{{.Design}}

Implementation Notes:
{{.Notes}}

Acceptance Criteria:
{{.AcceptanceCriteria}}

Output format:
**Summary:** [2-3 sentences: what problem, what solution, what outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]

Tier 2 Prompt:

Ultra-compress this old closed issue to ≤150 words. Focus on lasting architectural impact.

Title: {{.Title}}
Original Summary (already compressed):
{{.Description}}

Output a single paragraph covering:
- What was built/fixed
- Why it mattered
- Lasting impact (if any)

If there's no lasting impact, just state what was done and that it's resolved.

API Client Structure

package compact

import (
    "github.com/anthropics/anthropic-sdk-go"
)

type HaikuClient struct {
    client *anthropic.Client
    model  string
}

func NewHaikuClient(apiKey string) *HaikuClient {
    return &HaikuClient{
        client: anthropic.NewClient(apiKey),
        model:  anthropic.ModelClaude_3_5_Haiku_20241022,
    }
}

func (h *HaikuClient) Summarize(ctx context.Context, prompt string, maxTokens int) (string, error)

Error Handling

Rate limits: Exponential backoff with jitter
API errors: Log and skip issue (don't fail entire batch)
Network failures: Retry up to 3 times
Invalid responses: Fall back to truncation with warning
Context length: Truncate input if needed (rare, but possible)

Dependency Checking

Reuse existing recursive CTE logic from ready_issues view, adapted for compaction:

-- Check if issue and N levels of dependents are all closed
WITH RECURSIVE dependent_tree AS (
    -- Base case: the candidate issue
    SELECT id, status, 0 as depth
    FROM issues
    WHERE id = ?

    UNION ALL

    -- Recursive case: issues that depend on this one
    SELECT i.id, i.status, dt.depth + 1
    FROM dependent_tree dt
    JOIN dependencies d ON d.depends_on_id = dt.id
    JOIN issues i ON i.id = d.issue_id
    WHERE d.type IN ('blocks', 'parent-child')  -- Only blocking deps matter
      AND dt.depth < ?  -- Max depth parameter
)
SELECT CASE
    WHEN COUNT(*) = SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
    THEN 1 ELSE 0 END as all_closed
FROM dependent_tree;

Performance: This query is O(N) where N is the number of dependents. With proper indexes, should be <10ms per issue.

Git Integration (Optional)

For "project time" measurement via commit counting:

func getCommitsSince(closedAt time.Time) (int, error) {
    cmd := exec.Command("git", "rev-list", "--count",
        fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)),
        "HEAD")
    output, err := cmd.Output()
    if err != nil {
        return 0, err
    }
    return strconv.Atoi(strings.TrimSpace(string(output)))
}

Fallback: If git unavailable or not in a repo, use issue counter delta:

SELECT last_id FROM issue_counters WHERE prefix = ?
-- Store at close time, compare at compaction time

CLI Commands

`bd compact` - Main Command

bd compact [flags]

Flags:
  --dry-run              Show what would be compacted without doing it
  --tier int            Compaction tier (1 or 2, default: 1)
  --all                 Process all eligible issues (default: preview only)
  --id string           Compact specific issue by ID
  --force               Bypass eligibility checks (with --id)
  --batch-size int      Issues per batch (default: from config)
  --workers int         Parallel workers (default: from config)
  --json                JSON output for agents

Examples:
  bd compact --dry-run                 # Preview Tier 1 candidates
  bd compact --dry-run --tier 2        # Preview Tier 2 candidates
  bd compact --all                     # Compact all Tier 1 candidates
  bd compact --tier 2 --all            # Compact all Tier 2 candidates
  bd compact --id bd-42                # Compact specific issue
  bd compact --id bd-42 --force        # Force compact even if recent

`bd compact --restore` - Restore Compacted Issues

bd compact --restore <issue-id> [flags]

Flags:
  --level int           Restore to specific snapshot level (default: latest)
  --json                JSON output

Examples:
  bd compact --restore bd-42           # Restore bd-42 from latest snapshot
  bd compact --restore bd-42 --level 1 # Restore from Tier 1 snapshot

`bd compact --stats` - Compaction Statistics

bd compact --stats [flags]

Flags:
  --json                JSON output

Example output:
  === Compaction Statistics ===

  Total Issues: 1,247
  Compacted (Tier 1): 342 (27.4%)
  Compacted (Tier 2): 89 (7.1%)

  Database Size: 2.3 MB
  Estimated Uncompacted Size: 8.7 MB
  Space Savings: 6.4 MB (73.6%)

  Candidates:
    Tier 1: 47 issues (est. 320 KB → 64 KB)
    Tier 2: 15 issues (est. 180 KB → 18 KB)

  Estimated Compaction Cost: $0.04 (Haiku)

Output Examples

Dry Run Output

$ bd compact --dry-run

=== Tier 1 Compaction Preview ===
Eligibility: Closed ≥30 days, 2 levels of dependents closed

bd-42: Fix authentication bug (P1, bug)
  Closed: 45 days ago
  Size: 2,341 bytes → ~468 bytes (80% reduction)
  Dependents: bd-43, bd-44 (both closed)

bd-57: Add login form (P2, feature)
  Closed: 38 days ago
  Size: 1,823 bytes → ~365 bytes (80% reduction)
  Dependents: (none)

bd-58: Refactor auth middleware (P2, task)
  Closed: 35 days ago
  Size: 3,102 bytes → ~620 bytes (80% reduction)
  Dependents: bd-59 (closed)

... (44 more issues)

Total: 47 issues
Estimated reduction: 87.3 KB → 17.5 KB (80%)
Estimated cost: $0.03 (Haiku API)

Run with --all to compact these issues.

Compaction Progress

$ bd compact --all

Compacting 47 issues (Tier 1)...

Creating snapshots... [████████████████████] 47/47
Calling Haiku API...   [████████████████████] 47/47 (12s, $0.027)
Updating issues...     [████████████████████] 47/47

✓ Successfully compacted 47 issues
  Size reduction: 87.3 KB → 18.2 KB (79.1%)
  API cost: $0.027
  Time: 14.3s

Compacted issues will be exported to .beads/issues.jsonl

Show Compacted Issue

$ bd show bd-42

bd-42: Fix authentication bug [CLOSED] 🗜️

Status: closed (compacted L1)
Priority: 1 (High)
Type: bug
Closed: 2025-08-31 (45 days ago)
Compacted: 2025-10-15 (saved 1,873 bytes)

**Summary:** Fixed race condition in JWT token refresh logic causing intermittent
401 errors under high load. Implemented mutex-based locking around token refresh
operations. All users can now stay authenticated reliably during concurrent requests.

**Key Decisions:**
- Used sync.RWMutex instead of channels for simpler reasoning about lock state
- Added exponential backoff to token refresh to prevent thundering herd
- Preserved existing token format for backward compatibility with mobile clients

**Resolution:** Deployed to production on Aug 31, monitored for 2 weeks with zero
401 errors. Closed after confirming fix with load testing.

---
💾 Restore: bd compact --restore bd-42
📊 Original size: 2,341 bytes | Compressed: 468 bytes (80% reduction)

Safety Mechanisms

Snapshot-First: Always create snapshot before modifying issue
Restore Capability: Full restore from snapshots with --restore
Opt-In Auto-Compaction: Disabled by default (auto_compact_enabled = false)
Dry-Run Required: Preview before committing with --dry-run
Git Backup: JSONL exports preserve full history in git commits
Audit Trail: EventCompacted records what was done and when
Size Verification: Track original_size and compressed_size for validation
Idempotent: Re-running compaction on already-compacted issues is safe (no-op)
Graceful Degradation: API failures don't corrupt data, just skip issues
Reversible: Restore is always available, even after git push

Testing Strategy

Unit Tests

Candidate Identification:
- Issues meeting time criteria
- Dependency depth checking
- Mixed status dependents (some closed, some open)
- Edge case: circular dependencies
Haiku Client:
- Mock API responses
- Rate limit handling
- Error recovery
- Prompt rendering
Snapshot Management:
- Create snapshot
- Restore from snapshot
- Multiple snapshots per issue (Tier 1 → Tier 2)
- Snapshot integrity (hash verification)
Size Calculation:
- Accurate byte counting
- UTF-8 handling
- Empty fields

Integration Tests

End-to-End Compaction:
- Create test issues
- Age them (mock timestamps)
- Run compaction
- Verify summaries
- Restore and verify
Batch Processing:
- Large batches (100+ issues)
- Parallel worker coordination
- Error handling mid-batch
JSONL Export:
- Compacted issues export correctly
- Import preserves compaction_level
- Round-trip fidelity
CLI Commands:
- All flag combinations
- JSON output parsing
- Error messages

Manual Testing Checklist

Dry-run shows accurate candidates
Compaction reduces size as expected
Haiku summaries are high quality
Restore returns exact original content
Stats command shows correct numbers
Auto-compaction respects config
Git workflow (commit → pull → auto-compact)
Multi-machine workflow with compaction
API key handling (env var vs config)
Rate limit handling under load

Performance Considerations

Scalability

Small databases (<1,000 issues):

Full scan acceptable
Compact all eligible in one run
<1 minute total time

Medium databases (1,000-10,000 issues):

Batch processing required
Progress reporting essential
5-10 minutes total time

Large databases (>10,000 issues):

Incremental compaction (process N per run)
Consider scheduled background job
30-60 minutes total time

Optimization Strategies

Index Usage:
- idx_issues_status - filter closed issues
- idx_dependencies_depends_on - dependency traversal
- idx_snapshots_issue - restore lookups
Batch Sizing:
- Default 50 issues per batch
- Configurable via compact_batch_size
- Trade-off: larger batches = fewer commits, more RAM
Parallel Workers:
- Default 5 parallel Haiku calls
- Configurable via compact_parallel_workers
- Respects Haiku rate limits
Query Optimization:
- Use prepared statements for snapshots
- Reuse dependency check query
- Avoid N+1 queries in batch operations

Cost Analysis

Haiku Pricing (as of 2025-10-15)

Input: $0.25 per million tokens (~$0.0003 per 1K tokens)
Output: $1.25 per million tokens (~$0.0013 per 1K tokens)

Per-Issue Estimates

Tier 1:

Input: ~1,000 tokens (full issue content)
Output: ~400 tokens (summary)
Cost: ~$0.0008 per issue

Tier 2:

Input: ~500 tokens (Tier 1 summary)
Output: ~200 tokens (ultra-compressed)
Cost: ~$0.0003 per issue

Batch Costs

Issues	Tier 1 Cost	Tier 2 Cost	Total
100	$0.08	$0.03	$0.11
500	$0.40	$0.15	$0.55
1,000	$0.80	$0.30	$1.10
5,000	$4.00	$1.50	$5.50

Monthly budget (typical project):

~50-100 new issues closed per month
~30-60 days later, eligible for Tier 1
Monthly cost: $0.04 - $0.08 (negligible)

Configuration Examples

Conservative Setup (manual only)

bd init
bd config set compact_tier1_days 60
bd config set compact_tier2_days 180
bd config set auto_compact_enabled false

# Run manually when needed
bd compact --dry-run
bd compact --all  # after review

Aggressive Setup (auto-compact)

bd config set compact_tier1_days 14
bd config set compact_tier2_days 45
bd config set auto_compact_enabled true
bd config set compact_batch_size 100

# Auto-compacts on bd stats, bd export
bd stats  # triggers compaction if candidates exist

Development Setup (fast feedback)

bd config set compact_tier1_days 1
bd config set compact_tier2_days 3
bd config set compact_tier1_dep_levels 1
bd config set compact_tier2_dep_levels 2

# Test compaction on recently closed issues

Future Enhancements

Phase 2 (Post-MVP)

Local Model Support:
- Use Ollama for zero-cost summarization
- Fallback chain: Haiku → Ollama → truncation
Custom Prompts:
- User-defined summarization prompts
- Per-project templates
- Domain-specific summaries (e.g., "focus on API changes")
Selective Preservation:
- Mark issues as "do not compact"
- Preserve certain labels (e.g., architecture, security)
- Field-level preservation (e.g., keep design notes, compress others)
Analytics:
- Compaction effectiveness over time
- Cost tracking per run
- Quality feedback (user ratings of summaries)
Smart Scheduling:
- Auto-detect optimal compaction times
- Avoid compaction during active development
- Weekend/off-hours processing
Multi-Tier Expansion:
- Tier 3: Archive to separate file
- Tier 4: Delete (with backup)
- Configurable tier chain

Phase 3 (Advanced)

Distributed Compaction:
- Coordinate across multiple machines
- Avoid duplicate work in team settings
- Lock mechanism for compaction jobs
Incremental Summarization:
- Re-summarize if issue reopened
- Preserve history of summaries
- Version tracking for prompts
Search Integration:
- Full-text search includes summaries
- Boost compacted issues in search results
- Semantic search using embeddings

Implementation Issues

The following issues should be created in beads to track this work. They're designed to be implemented in dependency order.

Epic: Issue Database Compaction

Issue ID: (auto-generated) Title: Epic: Add intelligent database compaction with Claude Haiku Type: epic Priority: 2 Description:

Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context.

Goals:

70-95% space reduction for eligible issues
Full restore capability via snapshots
Opt-in with dry-run safety
~$1 per 1,000 issues compacted

Acceptance Criteria:

Schema migration with snapshots table
Haiku integration for summarization
Two-tier compaction (30d, 90d)
CLI with dry-run, restore, stats
Full test coverage
Documentation complete

Dependencies: None (this is the epic)

Issue 1: Add compaction schema and migrations

Type: task Priority: 1 Dependencies: Blocks all other compaction work

Description:

Add database schema support for issue compaction tracking and snapshot storage.

Design:

Add three columns to issues table:

compaction_level INTEGER DEFAULT 0 - 0=original, 1=tier1, 2=tier2
compacted_at DATETIME - when last compacted
original_size INTEGER - bytes before first compaction

Create issue_snapshots table:

CREATE TABLE issue_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    issue_id TEXT NOT NULL,
    snapshot_time DATETIME NOT NULL,
    compaction_level INTEGER NOT NULL,
    original_size INTEGER NOT NULL,
    compressed_size INTEGER NOT NULL,
    original_content TEXT NOT NULL,  -- JSON blob
    archived_events TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);

Add indexes:

idx_snapshots_issue on issue_id
idx_snapshots_level on compaction_level

Add migration functions in internal/storage/sqlite/sqlite.go:

migrateCompactionColumns(db *sql.DB) error
migrateSnapshotsTable(db *sql.DB) error

Acceptance Criteria:

Existing databases migrate automatically
New databases include columns by default
Migration is idempotent (safe to run multiple times)
No data loss during migration
Tests verify migration on fresh and existing DBs

Estimated Time: 4 hours

Issue 2: Add compaction configuration keys

Type: task Priority: 1 Dependencies: Blocks compaction logic

Description:

Add configuration keys for compaction behavior with sensible defaults.

Implementation:

Add to internal/storage/sqlite/schema.go initial config:

INSERT OR IGNORE INTO config (key, value) VALUES
    ('compact_tier1_days', '30'),
    ('compact_tier1_dep_levels', '2'),
    ('compact_tier2_days', '90'),
    ('compact_tier2_dep_levels', '5'),
    ('compact_tier2_commits', '100'),
    ('compact_model', 'claude-3-5-haiku-20241022'),
    ('compact_batch_size', '50'),
    ('compact_parallel_workers', '5'),
    ('auto_compact_enabled', 'false');

Add helper functions in internal/storage/ or cmd/bd/:

func getCompactConfig(ctx context.Context, store Storage) (*CompactConfig, error)

Acceptance Criteria:

Config keys created on init
Existing DBs get defaults on migration
bd config get/set works with all keys
Type validation (days=int, enabled=bool)
Documentation in README.md

Estimated Time: 2 hours

Issue 3: Implement candidate identification queries

Type: task Priority: 1 Dependencies: Needs schema and config

Description:

Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction.

Design:

Create internal/storage/sqlite/compact.go with:

type CompactionCandidate struct {
    IssueID       string
    ClosedAt      time.Time
    OriginalSize  int
    EstimatedSize int
    DependentCount int
}

func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error)

Implement recursive dependency checking using CTE similar to ready_issues view.

Acceptance Criteria:

Tier 1 query filters by days and dependency depth
Tier 2 query includes commit/issue count checks
Dependency checking handles circular deps gracefully
Performance: <100ms for 10,000 issue database
Tests cover edge cases (no deps, circular deps, mixed status)

Estimated Time: 6 hours

Issue 4: Create Haiku client and prompt templates

Type: task Priority: 1 Dependencies: None (can work in parallel)

Description:

Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization.

Implementation:

Create internal/compact/haiku.go:

type HaikuClient struct {
    client *anthropic.Client
    model  string
}

func NewHaikuClient(apiKey string) (*HaikuClient, error)
func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error)
func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error)

Use text/template for prompt rendering.

Add error handling:

Rate limit retry with exponential backoff
Network errors: 3 retries
Invalid responses: return error, don't corrupt data

Acceptance Criteria:

API key from env var or config (env takes precedence)
Prompts render correctly with template
Rate limiting handled gracefully
Mock tests for API calls
Real integration test (optional, requires API key)

Estimated Time: 6 hours

Issue 5: Implement snapshot creation and restoration

Type: task Priority: 1 Dependencies: Needs schema changes

Description:

Implement snapshot creation before compaction and restoration capability.

Implementation:

Add to internal/storage/sqlite/compact.go:

type Snapshot struct {
    ID              int64
    IssueID         string
    SnapshotTime    time.Time
    CompactionLevel int
    OriginalSize    int
    CompressedSize  int
    OriginalContent string  // JSON
    ArchivedEvents  string  // JSON, nullable
}

func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error
func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error
func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error)

Snapshot JSON structure:

{
  "description": "...",
  "design": "...",
  "notes": "...",
  "acceptance_criteria": "...",
  "title": "..."
}

Acceptance Criteria:

Snapshot created atomically with compaction
Restore returns exact original content
Multiple snapshots per issue supported (Tier 1 → Tier 2)
JSON encoding handles special characters
Size calculation is accurate (UTF-8 bytes)

Estimated Time: 5 hours

Issue 6: Implement Tier 1 compaction logic

Type: task Priority: 1 Dependencies: Needs Haiku client, snapshots, candidate queries

Description:

Implement the core Tier 1 compaction process: snapshot → summarize → update.

Implementation:

Add to internal/compact/compactor.go:

type Compactor struct {
    store  storage.Storage
    haiku  *HaikuClient
    config *CompactConfig
}

func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error)
func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error

Process:

Verify eligibility
Calculate original size
Create snapshot
Call Haiku for summary
Update issue (description = summary, clear design/notes/criteria)
Set compaction_level = 1, compacted_at = now
Record EventCompacted
Mark dirty for export

Acceptance Criteria:

Single issue compaction works end-to-end
Batch processing with parallel workers
Errors don't corrupt database (transaction rollback)
EventCompacted includes size savings
Dry-run mode (identify + size estimate only)

Estimated Time: 8 hours

Issue 7: Implement Tier 2 compaction logic

Type: task Priority: 2 Dependencies: Needs Tier 1 working

Description:

Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning.

Implementation:

Add to internal/compact/compactor.go:

func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error

Process:

Verify issue is at compaction_level = 1
Check Tier 2 eligibility (days, deps, commits/issues)
Create Tier 2 snapshot
Call Haiku with ultra-compression prompt
Update issue (description = single paragraph, clear all else)
Set compaction_level = 2
Optionally prune events (keep created/closed, archive rest)

Acceptance Criteria:

Requires existing Tier 1 compaction
Git commit counting works (with fallback)
Events optionally pruned (config: compact_events_enabled)
Archived events stored in snapshot
Size reduction 90-95%

Estimated Time: 6 hours

Issue 8: Add `bd compact` CLI command

Type: task Priority: 1 Dependencies: Needs Tier 1 compaction logic

Description:

Implement the bd compact command with dry-run, batch processing, and progress reporting.

Implementation:

Create cmd/bd/compact.go:

var compactCmd = &cobra.Command{
    Use:   "compact",
    Short: "Compact old closed issues to save space",
    Long:  `...`,
}

var (
    compactDryRun    bool
    compactTier      int
    compactAll       bool
    compactID        string
    compactForce     bool
    compactBatchSize int
    compactWorkers   int
)

func init() {
    compactCmd.Flags().BoolVar(&compactDryRun, "dry-run", false, "Preview without compacting")
    compactCmd.Flags().IntVar(&compactTier, "tier", 1, "Compaction tier (1 or 2)")
    compactCmd.Flags().BoolVar(&compactAll, "all", false, "Process all candidates")
    compactCmd.Flags().StringVar(&compactID, "id", "", "Compact specific issue")
    compactCmd.Flags().BoolVar(&compactForce, "force", false, "Force compact (bypass checks)")
    // ... more flags
}

Output:

Dry-run: Table of candidates with size estimates
Actual run: Progress bar with batch updates
Summary: Count, size saved, cost, time

Acceptance Criteria:

--dry-run shows accurate preview
--all processes all candidates
--id compacts single issue
--force bypasses eligibility checks (with --id)
Progress bar for batches
JSON output with --json
Exit code: 0=success, 1=error

Estimated Time: 6 hours

Issue 9: Add `bd compact --restore` functionality

Type: task Priority: 2 Dependencies: Needs snapshots and CLI

Description:

Implement restore command to undo compaction from snapshots.

Implementation:

Add to cmd/bd/compact.go:

var compactRestore string

compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot")

Process:

Load snapshot for issue
Parse JSON content
Update issue with original content
Set compaction_level = 0, compacted_at = NULL
Record EventRestored
Mark dirty

Acceptance Criteria:

Restores exact original content
Handles multiple snapshots (prompt user or use latest)
--level flag to choose snapshot
Updates compaction_level correctly
Exports restored content to JSONL

Estimated Time: 4 hours

Issue 10: Add `bd compact --stats` command

Type: task Priority: 2 Dependencies: Needs compaction working

Description:

Add statistics command showing compaction status and potential savings.

Implementation:

var compactStats bool

compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics")

Output:

Total issues, by compaction level
Current DB size vs estimated uncompacted size
Space savings (MB and %)
Candidates for each tier with estimates
Estimated API cost

Acceptance Criteria:

Accurate counts by compaction_level
Size calculations include all text fields
Shows candidates with eligibility reasons
Cost estimation based on Haiku pricing
JSON output supported

Estimated Time: 4 hours

Issue 11: Add EventCompacted to event system

Type: task Priority: 2 Dependencies: Needs schema changes

Description:

Add new event type for tracking compaction in audit trail.

Implementation:

Add to internal/types/types.go:

const EventCompacted EventType = "compacted"

Record event during compaction:

eventData := map[string]interface{}{
    "tier": tier,
    "original_size": originalSize,
    "compressed_size": compressedSize,
    "reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100,
}

Show in bd show output:

Events:
  2025-10-15: compacted (tier 1, saved 1.8KB, 80%)
  2025-08-31: closed by alice
  2025-08-20: created by alice

Acceptance Criteria:

Event includes tier and size info
Shows in event history
Exports to JSONL
bd show displays compaction marker

Estimated Time: 3 hours

Issue 12: Add compaction indicator to `bd show`

Type: task Priority: 2 Dependencies: Needs compaction working

Description:

Update bd show command to display compaction status prominently.

Implementation:

Add to issue display:

bd-42: Fix authentication bug [CLOSED] 🗜️

Status: closed (compacted L1)
...

---
💾 Restore: bd compact --restore bd-42
📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction)
🗜️ Compacted: 2025-10-15 (Tier 1)

Show different emoji for tiers:

Tier 1: 🗜️
Tier 2: 📦

Acceptance Criteria:

Compaction status visible in title
Footer shows size savings
Restore command shown
Works with --json output

Estimated Time: 2 hours

Issue 13: Write compaction tests

Type: task Priority: 1 Dependencies: Needs all compaction logic

Description:

Comprehensive test suite for compaction functionality.

Test Coverage:

Candidate Identification:
- Eligibility by time
- Dependency depth checking
- Mixed status dependents
- Edge cases (no deps, circular)
Snapshots:
- Create and restore
- Multiple snapshots per issue
- Content integrity
Tier 1 Compaction:
- Single issue
- Batch processing
- Error handling
Tier 2 Compaction:
- Requires Tier 1
- Events pruning
- Commit counting fallback
CLI:
- All flag combinations
- Dry-run accuracy
- JSON output
Integration:
- End-to-end flow
- JSONL export/import
- Restore verification

Acceptance Criteria:

Test coverage >80%
All edge cases covered
Mock Haiku API in tests
Integration tests pass
go test ./... passes

Estimated Time: 8 hours

Issue 14: Add compaction documentation

Type: task Priority: 2 Dependencies: All features complete

Description:

Document compaction feature in README and create COMPACTION.md guide.

Content:

Update README.md:

Add to Features section
CLI examples
Configuration guide

Create COMPACTION.md:

How compaction works
When to use each tier
Cost analysis
Safety mechanisms
Troubleshooting

Create examples/compaction/:

Example workflow
Cron job setup
Auto-compaction script

Acceptance Criteria:

README.md updated
COMPACTION.md comprehensive
Examples work as documented
Screenshots/examples included
API key setup documented

Estimated Time: 4 hours

Issue 15: Optional: Implement auto-compaction

Type: task Priority: 3 (nice-to-have) Dependencies: Needs all compaction working

Description:

Implement automatic compaction triggered by certain operations when enabled via config.

Implementation:

Trigger points (when auto_compact_enabled = true):

bd stats - check and compact if candidates exist
bd export - before exporting
Background timer (optional, via daemon)

Add:

func (s *SQLiteStorage) AutoCompact(ctx context.Context) error {
    enabled, _ := s.GetConfig(ctx, "auto_compact_enabled")
    if enabled != "true" {
        return nil
    }

    // Run Tier 1 compaction on all candidates
    // Limit to batch_size to avoid long operations
}

Acceptance Criteria:

Respects auto_compact_enabled config
Limits batch size to avoid blocking
Logs compaction activity
Can be disabled per-command with --no-auto-compact

Estimated Time: 4 hours

Issue 16: Optional: Add git commit counting

Type: task Priority: 3 (nice-to-have) Dependencies: Needs Tier 2 logic

Description:

Implement git commit counting for "project time" measurement as alternative to calendar time.

Implementation:

func getCommitsSince(closedAt time.Time) (int, error) {
    cmd := exec.Command("git", "rev-list", "--count",
        fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD")
    output, err := cmd.Output()
    if err != nil {
        return 0, err  // Not in git repo or git not available
    }
    return strconv.Atoi(strings.TrimSpace(string(output)))
}

Fallback to issue counter delta if git unavailable.

Acceptance Criteria:

Counts commits since closed_at
Handles git not available gracefully
Fallback to issue counter works
Configurable via compact_tier2_commits
Tested with real git repo

Estimated Time: 3 hours

Success Metrics

Technical Metrics

70-85% size reduction for Tier 1
90-95% size reduction for Tier 2
<100ms candidate identification query
<2s per issue compaction (Haiku latency)
Zero data loss (all restore tests pass)

Quality Metrics

Haiku summaries preserve key information
Developers can understand compacted issues
Restore returns exact original content
No corruption in multi-machine workflows

Operational Metrics

Cost: <$1.50 per 1,000 issues
Dry-run accuracy: 95%+ estimate correctness
Error rate: <1% API failures (with retry)
User adoption: Docs clear, examples work

Rollout Plan

Phase 1: Alpha (Internal Testing)

Merge compaction feature to main
Test on beads' own database
Verify JSONL export/import
Validate Haiku summaries
Fix any critical bugs

Phase 2: Beta (Opt-In)

Announce in README (opt-in, experimental)
Gather feedback from early adopters
Iterate on prompt templates
Add telemetry (optional, with consent)

Phase 3: Stable (Default Disabled)

Mark feature as stable
Keep auto_compact_enabled = false by default
Encourage manual bd compact --dry-run first
Document in quickstart guide

Phase 4: Mature (Consider Auto-Enable)

After 6+ months of stability
Consider auto-compaction for new users
Provide migration guide for disabling

Risks and Mitigations

Risk	Impact	Likelihood	Mitigation
Haiku summaries lose critical info	High	Medium	Manual review in dry-run, restore capability, improve prompts
API rate limits during batch	Medium	Medium	Exponential backoff, respect rate limits, batch sizing
JSONL merge conflicts increase	Medium	Low	Compaction is deterministic per issue, git handles well
Users accidentally compress important issues	High	Low	Dry-run required, restore available, snapshots permanent
Cost higher than expected	Low	Low	Dry-run shows estimates, configurable batch sizes
Schema migration fails	High	Very Low	Idempotent migrations, tested on existing DBs

Open Questions

Should compaction be reversible forever, or expire snapshots?
- Recommendation: Keep snapshots indefinitely (disk is cheap)
Should we compress snapshots themselves (gzip)?
- Recommendation: Not in MVP, add if storage becomes issue
Should tier selection be automatic or manual?
- Recommendation: Manual in MVP, auto-tier in future
How to handle issues compacted on one machine but not another?
- Answer: JSONL export includes compaction_level, imports preserve it
Should we support custom models (Sonnet, Opus)?
- Recommendation: Haiku only in MVP, add later if needed

Appendix: Example Workflow

Typical Monthly Compaction

# 1. Check what's eligible
$ bd compact --dry-run
=== Tier 1 Candidates ===
42 issues eligible (closed >30 days, deps closed)
Est. reduction: 127 KB → 25 KB (80%)
Est. cost: $0.03

# 2. Review candidates manually
$ bd list --status closed --json | jq 'map(select(.compaction_level == 0))'

# 3. Compact Tier 1
$ bd compact --all
✓ Compacted 42 issues in 18s ($0.03)

# 4. Check Tier 2 candidates (optional)
$ bd compact --dry-run --tier 2
=== Tier 2 Candidates ===
8 issues eligible (closed >90 days, 100+ commits since)
Est. reduction: 45 KB → 4 KB (91%)
Est. cost: $0.01

# 5. Compact Tier 2
$ bd compact --all --tier 2
✓ Compacted 8 issues in 6s ($0.01)

# 6. Export and commit
$ bd export -o .beads/issues.jsonl
$ git add .beads/issues.jsonl
$ git commit -m "Compact 50 old issues (saved 143 KB)"
$ git push

# 7. View stats
$ bd compact --stats
Total Space Saved: 143 KB (82% reduction)
Database Size: 2.1 MB (down from 2.3 MB)

Conclusion

This design provides a comprehensive, safe, and cost-effective way to keep beads databases lightweight while preserving essential context. The two-tier approach balances aggressiveness with safety, and the snapshot system ensures full reversibility.

The use of Claude Haiku for semantic compression is key - it preserves meaning rather than just truncating text. At ~$1 per 1,000 issues, the cost is negligible for the value provided.

Implementation is straightforward with clear phases and well-defined issues. The MVP (Tier 1 only) can be delivered in ~40 hours of work, with Tier 2 and enhancements following incrementally.

This aligns perfectly with beads' philosophy: pragmatic, agent-focused, and evolutionarily designed.

45 KiB Raw Blame History

Issue Database Compaction Design

Executive Summary

Key Metrics

Motivation

The Problem

Why This Matters

What We Keep

Technical Design

Architecture Overview

Compaction Tiers

Tier 1: Standard Compaction

Tier 2: Aggressive Compaction

Schema Changes

New Columns on issues Table

New Table: issue_snapshots

New Config Keys

New Event Type

Haiku Integration

Prompt Templates

API Client Structure

Error Handling

Dependency Checking

Git Integration (Optional)

CLI Commands

bd compact - Main Command

bd compact --restore - Restore Compacted Issues

bd compact --stats - Compaction Statistics

Output Examples

Dry Run Output

Compaction Progress

Show Compacted Issue

Safety Mechanisms

Testing Strategy

Unit Tests

Integration Tests

Manual Testing Checklist

Performance Considerations

Scalability

Optimization Strategies

Cost Analysis

Haiku Pricing (as of 2025-10-15)

Per-Issue Estimates

Batch Costs

Configuration Examples

Conservative Setup (manual only)

Aggressive Setup (auto-compact)

Development Setup (fast feedback)

Future Enhancements

Phase 2 (Post-MVP)

Phase 3 (Advanced)

Implementation Issues

Epic: Issue Database Compaction

Issue 1: Add compaction schema and migrations

Issue 2: Add compaction configuration keys

Issue 3: Implement candidate identification queries

Issue 4: Create Haiku client and prompt templates

Issue 5: Implement snapshot creation and restoration

Issue 6: Implement Tier 1 compaction logic

Issue 7: Implement Tier 2 compaction logic

Issue 8: Add bd compact CLI command

Issue 9: Add bd compact --restore functionality

Issue 10: Add bd compact --stats command

Issue 11: Add EventCompacted to event system

Issue 12: Add compaction indicator to bd show

Issue 13: Write compaction tests

Issue 14: Add compaction documentation

Issue 15: Optional: Implement auto-compaction

Issue 16: Optional: Add git commit counting

Success Metrics

Technical Metrics

Quality Metrics

Operational Metrics

Rollout Plan

Phase 1: Alpha (Internal Testing)

Phase 2: Beta (Opt-In)

Phase 3: Stable (Default Disabled)

Phase 4: Mature (Consider Auto-Enable)

Risks and Mitigations

Open Questions

45 KiB

Raw Blame History

New Columns on `issues` Table

New Table: `issue_snapshots`

`bd compact` - Main Command

`bd compact --restore` - Restore Compacted Issues

`bd compact --stats` - Compaction Statistics

Issue 8: Add `bd compact` CLI command

Issue 9: Add `bd compact --restore` functionality

Issue 10: Add `bd compact --stats` command

Issue 12: Add compaction indicator to `bd show`