beads/COMPACTION_DESIGN.md

# Issue Database Compaction Design

**Status:** Design Phase
**Created:** 2025-10-15
**Target:** Beads v1.1

## Executive Summary

Add intelligent database compaction to beads that uses Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context about past work. The design philosophy: **most work is throwaway, and forensic value decays exponentially with time**.

### Key Metrics
- **Space savings:** 70-95% reduction in text volume for old issues
- **Cost:** ~$1.10 per 1,000 issues compacted (Haiku pricing)
- **Safety:** Full snapshot system with restore capability
- **Performance:** Batch processing with parallel workers

---

## Motivation

### The Problem

Beads databases grow indefinitely:
- Issues accumulate detailed `description`, `design`, `notes`, `acceptance_criteria` fields
- Events table logs every change forever
- Old closed issues (especially those with all dependents closed) rarely need full detail
- Agent context windows work better with concise, relevant information

### Why This Matters

1. **Agent Efficiency:** Smaller databases → faster queries → clearer agent thinking
2. **Context Management:** Agents benefit from summaries of old work, not verbose details
3. **Git Performance:** Smaller JSONL exports → faster git operations
4. **Pragmatic Philosophy:** Beads is agent memory, not a historical archive
5. **Forensic Decay:** Need for detail decreases exponentially after closure

### What We Keep

- Issue ID and title (always)
- Semantic summary of what was done and why
- Key architectural decisions
- Closure outcome
- Full git history in JSONL commits (ultimate backup)
- Restore capability via snapshots

---

## Technical Design

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                      Compaction Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  1. Candidate Identification                                     │
│     ↓                                                             │
│     • Query closed issues meeting time + dependency criteria     │
│     • Check dependency depth (recursive CTE)                     │
│     • Calculate size/savings estimates                           │
│                                                                   │
│  2. Snapshot Creation                                            │
│     ↓                                                             │
│     • Store original content in issue_snapshots table            │
│     • Calculate content hash for verification                    │
│     • Enable restore capability                                  │
│                                                                   │
│  3. Haiku Summarization                                          │
│     ↓                                                             │
│     • Batch process with worker pool (5 parallel)                │
│     • Different prompts for Tier 1 vs Tier 2                     │
│     • Handle API errors gracefully                               │
│                                                                   │
│  4. Issue Update                                                 │
│     ↓                                                             │
│     • Replace verbose fields with summary                        │
│     • Set compaction_level and compacted_at                      │
│     • Record event (EventCompacted)                              │
│     • Mark dirty for JSONL export                                │
│                                                                   │
│  5. Optional: Events Pruning                                     │
│     ↓                                                             │
│     • Keep only created/closed events for Tier 2                 │
│     • Archive detailed event history to snapshots                │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘
```

### Compaction Tiers

#### **Tier 1: Standard Compaction**

**Eligibility:**
- `status = 'closed'`
- `closed_at >= 30 days ago` (configurable: `compact_tier1_days`)
- All issues that depend on this one (via `blocks` or `parent-child`) are closed
- Dependency check depth: 2 levels (configurable: `compact_tier1_dep_levels`)

**Process:**
1. Snapshot original content
2. Send to Haiku with 300-word summarization prompt
3. Store summary in `description`
4. Clear `design`, `notes`, `acceptance_criteria`
5. Set `compaction_level = 1`
6. Keep all events

**Output Format (Haiku prompt):**
```
**Summary:** [2-3 sentences: problem, solution, outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]
```

**Expected Reduction:** 70-85% of original text size

#### **Tier 2: Aggressive Compaction**

**Eligibility:**
- Already at `compaction_level = 1`
- `closed_at >= 90 days ago` (configurable: `compact_tier2_days`)
- All dependencies (all 4 types) up to 5 levels deep are closed
- One of:
  - ≥100 git commits since `closed_at` (configurable: `compact_tier2_commits`)
  - ≥500 new issues created since closure
  - Manual override with `--force`

**Process:**
1. Snapshot Tier 1 content
2. Send to Haiku with 150-word ultra-compression prompt
3. Store single paragraph in `description`
4. Clear all other text fields
5. Set `compaction_level = 2`
6. Prune events: keep only `created` and `closed`, move rest to snapshot

**Output Format (Haiku prompt):**
```
Single paragraph (≤150 words):
- What was built/fixed
- Why it mattered
- Lasting architectural impact (if any)
```

**Expected Reduction:** 90-95% of original text size

---

### Schema Changes

#### New Columns on `issues` Table

```sql
ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0;
ALTER TABLE issues ADD COLUMN compacted_at DATETIME;
ALTER TABLE issues ADD COLUMN original_size INTEGER; -- bytes before compaction
```

#### New Table: `issue_snapshots`

```sql
CREATE TABLE IF NOT EXISTS issue_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    issue_id TEXT NOT NULL,
    snapshot_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    compaction_level INTEGER NOT NULL, -- 1 or 2
    original_size INTEGER NOT NULL,
    compressed_size INTEGER NOT NULL,
    -- JSON blob with original content
    original_content TEXT NOT NULL,
    -- Optional: compressed events for Tier 2
    archived_events TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_snapshots_issue ON issue_snapshots(issue_id);
CREATE INDEX IF NOT EXISTS idx_snapshots_level ON issue_snapshots(compaction_level);
```

**Snapshot JSON Structure:**
```json
{
  "description": "original description text...",
  "design": "original design text...",
  "notes": "original notes text...",
  "acceptance_criteria": "original criteria text...",
  "title": "original title",
  "hash": "sha256:abc123..."
}
```

#### New Config Keys

```sql
INSERT INTO config (key, value) VALUES
    -- Tier 1 settings
    ('compact_tier1_days', '30'),
    ('compact_tier1_dep_levels', '2'),

    -- Tier 2 settings
    ('compact_tier2_days', '90'),
    ('compact_tier2_dep_levels', '5'),
    ('compact_tier2_commits', '100'),
    ('compact_tier2_new_issues', '500'),

    -- API settings
    ('anthropic_api_key', ''),  -- Falls back to ANTHROPIC_API_KEY env var
    ('compact_model', 'claude-3-5-haiku-20241022'),

    -- Performance settings
    ('compact_batch_size', '50'),
    ('compact_parallel_workers', '5'),

    -- Safety settings
    ('auto_compact_enabled', 'false'),  -- Opt-in
    ('compact_events_enabled', 'false'), -- Events pruning (Tier 2)

    -- Display settings
    ('compact_show_savings', 'true');  -- Show size reduction in output
```

#### New Event Type

```go
const (
    // ... existing event types ...
    EventCompacted EventType = "compacted"
)
```

---

### Haiku Integration

#### Prompt Templates

**Tier 1 Prompt:**
```
Summarize this closed software issue. Preserve key decisions, implementation approach, and outcome. Max 300 words.

Title: {{.Title}}
Type: {{.IssueType}}
Priority: {{.Priority}}

Description:
{{.Description}}

Design Notes:
{{.Design}}

Implementation Notes:
{{.Notes}}

Acceptance Criteria:
{{.AcceptanceCriteria}}

Output format:
**Summary:** [2-3 sentences: what problem, what solution, what outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]
```

**Tier 2 Prompt:**
```
Ultra-compress this old closed issue to ≤150 words. Focus on lasting architectural impact.

Title: {{.Title}}
Original Summary (already compressed):
{{.Description}}

Output a single paragraph covering:
- What was built/fixed
- Why it mattered
- Lasting impact (if any)

If there's no lasting impact, just state what was done and that it's resolved.
```

#### API Client Structure

```go
package compact

import (
    "github.com/anthropics/anthropic-sdk-go"
)

type HaikuClient struct {
    client *anthropic.Client
    model  string
}

func NewHaikuClient(apiKey string) *HaikuClient {
    return &HaikuClient{
        client: anthropic.NewClient(apiKey),
        model:  anthropic.ModelClaude_3_5_Haiku_20241022,
    }
}

func (h *HaikuClient) Summarize(ctx context.Context, prompt string, maxTokens int) (string, error)
```

#### Error Handling

- **Rate limits:** Exponential backoff with jitter
- **API errors:** Log and skip issue (don't fail entire batch)
- **Network failures:** Retry up to 3 times
- **Invalid responses:** Fall back to truncation with warning
- **Context length:** Truncate input if needed (rare, but possible)

---

### Dependency Checking

Reuse existing recursive CTE logic from `ready_issues` view, adapted for compaction:

```sql
-- Check if issue and N levels of dependents are all closed
WITH RECURSIVE dependent_tree AS (
    -- Base case: the candidate issue
    SELECT id, status, 0 as depth
    FROM issues
    WHERE id = ?

    UNION ALL

    -- Recursive case: issues that depend on this one
    SELECT i.id, i.status, dt.depth + 1
    FROM dependent_tree dt
    JOIN dependencies d ON d.depends_on_id = dt.id
    JOIN issues i ON i.id = d.issue_id
    WHERE d.type IN ('blocks', 'parent-child')  -- Only blocking deps matter
      AND dt.depth < ?  -- Max depth parameter
)
SELECT CASE
    WHEN COUNT(*) = SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
    THEN 1 ELSE 0 END as all_closed
FROM dependent_tree;
```

**Performance:** This query is O(N) where N is the number of dependents. With proper indexes, should be <10ms per issue.

---

### Git Integration (Optional)

For "project time" measurement via commit counting:

```go
func getCommitsSince(closedAt time.Time) (int, error) {
    cmd := exec.Command("git", "rev-list", "--count",
        fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)),
        "HEAD")
    output, err := cmd.Output()
    if err != nil {
        return 0, err
    }
    return strconv.Atoi(strings.TrimSpace(string(output)))
}
```

**Fallback:** If git unavailable or not in a repo, use issue counter delta:
```sql
SELECT last_id FROM issue_counters WHERE prefix = ?
-- Store at close time, compare at compaction time
```

---

### CLI Commands

#### `bd compact` - Main Command

```bash
bd compact [flags]

Flags:
  --dry-run              Show what would be compacted without doing it
  --tier int            Compaction tier (1 or 2, default: 1)
  --all                 Process all eligible issues (default: preview only)
  --id string           Compact specific issue by ID
  --force               Bypass eligibility checks (with --id)
  --batch-size int      Issues per batch (default: from config)
  --workers int         Parallel workers (default: from config)
  --json                JSON output for agents

Examples:
  bd compact --dry-run                 # Preview Tier 1 candidates
  bd compact --dry-run --tier 2        # Preview Tier 2 candidates
  bd compact --all                     # Compact all Tier 1 candidates
  bd compact --tier 2 --all            # Compact all Tier 2 candidates
  bd compact --id bd-42                # Compact specific issue
  bd compact --id bd-42 --force        # Force compact even if recent
```

#### `bd compact --restore` - Restore Compacted Issues

```bash
bd compact --restore <issue-id> [flags]

Flags:
  --level int           Restore to specific snapshot level (default: latest)
  --json                JSON output

Examples:
  bd compact --restore bd-42           # Restore bd-42 from latest snapshot
  bd compact --restore bd-42 --level 1 # Restore from Tier 1 snapshot
```

#### `bd compact --stats` - Compaction Statistics

```bash
bd compact --stats [flags]

Flags:
  --json                JSON output

Example output:
  === Compaction Statistics ===

  Total Issues: 1,247
  Compacted (Tier 1): 342 (27.4%)
  Compacted (Tier 2): 89 (7.1%)

  Database Size: 2.3 MB
  Estimated Uncompacted Size: 8.7 MB
  Space Savings: 6.4 MB (73.6%)

  Candidates:
    Tier 1: 47 issues (est. 320 KB → 64 KB)
    Tier 2: 15 issues (est. 180 KB → 18 KB)

  Estimated Compaction Cost: $0.04 (Haiku)
```

---

### Output Examples

#### Dry Run Output

```
$ bd compact --dry-run

=== Tier 1 Compaction Preview ===
Eligibility: Closed ≥30 days, 2 levels of dependents closed

bd-42: Fix authentication bug (P1, bug)
  Closed: 45 days ago
  Size: 2,341 bytes → ~468 bytes (80% reduction)
  Dependents: bd-43, bd-44 (both closed)

bd-57: Add login form (P2, feature)
  Closed: 38 days ago
  Size: 1,823 bytes → ~365 bytes (80% reduction)
  Dependents: (none)

bd-58: Refactor auth middleware (P2, task)
  Closed: 35 days ago
  Size: 3,102 bytes → ~620 bytes (80% reduction)
  Dependents: bd-59 (closed)

... (44 more issues)

Total: 47 issues
Estimated reduction: 87.3 KB → 17.5 KB (80%)
Estimated cost: $0.03 (Haiku API)

Run with --all to compact these issues.
```

#### Compaction Progress

```
$ bd compact --all

Compacting 47 issues (Tier 1)...

Creating snapshots... [████████████████████] 47/47
Calling Haiku API...   [████████████████████] 47/47 (12s, $0.027)
Updating issues...     [████████████████████] 47/47

✓ Successfully compacted 47 issues
  Size reduction: 87.3 KB → 18.2 KB (79.1%)
  API cost: $0.027
  Time: 14.3s

Compacted issues will be exported to .beads/issues.jsonl
```

#### Show Compacted Issue

```
$ bd show bd-42

bd-42: Fix authentication bug [CLOSED] 🗜️

Status: closed (compacted L1)
Priority: 1 (High)
Type: bug
Closed: 2025-08-31 (45 days ago)
Compacted: 2025-10-15 (saved 1,873 bytes)

**Summary:** Fixed race condition in JWT token refresh logic causing intermittent
401 errors under high load. Implemented mutex-based locking around token refresh
operations. All users can now stay authenticated reliably during concurrent requests.

**Key Decisions:**
- Used sync.RWMutex instead of channels for simpler reasoning about lock state
- Added exponential backoff to token refresh to prevent thundering herd
- Preserved existing token format for backward compatibility with mobile clients

**Resolution:** Deployed to production on Aug 31, monitored for 2 weeks with zero
401 errors. Closed after confirming fix with load testing.

---
💾 Restore: bd compact --restore bd-42
📊 Original size: 2,341 bytes | Compressed: 468 bytes (80% reduction)
```

---

### Safety Mechanisms

1. **Snapshot-First:** Always create snapshot before modifying issue
2. **Restore Capability:** Full restore from snapshots with `--restore`
3. **Opt-In Auto-Compaction:** Disabled by default (`auto_compact_enabled = false`)
4. **Dry-Run Required:** Preview before committing with `--dry-run`
5. **Git Backup:** JSONL exports preserve full history in git commits
6. **Audit Trail:** `EventCompacted` records what was done and when
7. **Size Verification:** Track original_size and compressed_size for validation
8. **Idempotent:** Re-running compaction on already-compacted issues is safe (no-op)
9. **Graceful Degradation:** API failures don't corrupt data, just skip issues
10. **Reversible:** Restore is always available, even after git push

---

### Testing Strategy

#### Unit Tests

1. **Candidate Identification:**
   - Issues meeting time criteria
   - Dependency depth checking
   - Mixed status dependents (some closed, some open)
   - Edge case: circular dependencies

2. **Haiku Client:**
   - Mock API responses
   - Rate limit handling
   - Error recovery
   - Prompt rendering

3. **Snapshot Management:**
   - Create snapshot
   - Restore from snapshot
   - Multiple snapshots per issue (Tier 1 → Tier 2)
   - Snapshot integrity (hash verification)

4. **Size Calculation:**
   - Accurate byte counting
   - UTF-8 handling
   - Empty fields

#### Integration Tests

1. **End-to-End Compaction:**
   - Create test issues
   - Age them (mock timestamps)
   - Run compaction
   - Verify summaries
   - Restore and verify

2. **Batch Processing:**
   - Large batches (100+ issues)
   - Parallel worker coordination
   - Error handling mid-batch

3. **JSONL Export:**
   - Compacted issues export correctly
   - Import preserves compaction_level
   - Round-trip fidelity

4. **CLI Commands:**
   - All flag combinations
   - JSON output parsing
   - Error messages

#### Manual Testing Checklist

- [ ] Dry-run shows accurate candidates
- [ ] Compaction reduces size as expected
- [ ] Haiku summaries are high quality
- [ ] Restore returns exact original content
- [ ] Stats command shows correct numbers
- [ ] Auto-compaction respects config
- [ ] Git workflow (commit → pull → auto-compact)
- [ ] Multi-machine workflow with compaction
- [ ] API key handling (env var vs config)
- [ ] Rate limit handling under load

---

### Performance Considerations

#### Scalability

**Small databases (<1,000 issues):**
- Full scan acceptable
- Compact all eligible in one run
- <1 minute total time

**Medium databases (1,000-10,000 issues):**
- Batch processing required
- Progress reporting essential
- 5-10 minutes total time

**Large databases (>10,000 issues):**
- Incremental compaction (process N per run)
- Consider scheduled background job
- 30-60 minutes total time

#### Optimization Strategies

1. **Index Usage:**
   - `idx_issues_status` - filter closed issues
   - `idx_dependencies_depends_on` - dependency traversal
   - `idx_snapshots_issue` - restore lookups

2. **Batch Sizing:**
   - Default 50 issues per batch
   - Configurable via `compact_batch_size`
   - Trade-off: larger batches = fewer commits, more RAM

3. **Parallel Workers:**
   - Default 5 parallel Haiku calls
   - Configurable via `compact_parallel_workers`
   - Respects Haiku rate limits

4. **Query Optimization:**
   - Use prepared statements for snapshots
   - Reuse dependency check query
   - Avoid N+1 queries in batch operations

---

### Cost Analysis

#### Haiku Pricing (as of 2025-10-15)

- Input: $0.25 per million tokens (~$0.0003 per 1K tokens)
- Output: $1.25 per million tokens (~$0.0013 per 1K tokens)

#### Per-Issue Estimates

**Tier 1:**
- Input: ~1,000 tokens (full issue content)
- Output: ~400 tokens (summary)
- Cost: ~$0.0008 per issue

**Tier 2:**
- Input: ~500 tokens (Tier 1 summary)
- Output: ~200 tokens (ultra-compressed)
- Cost: ~$0.0003 per issue

#### Batch Costs

| Issues | Tier 1 Cost | Tier 2 Cost | Total |
|--------|-------------|-------------|-------|
| 100    | $0.08       | $0.03       | $0.11 |
| 500    | $0.40       | $0.15       | $0.55 |
| 1,000  | $0.80       | $0.30       | $1.10 |
| 5,000  | $4.00       | $1.50       | $5.50 |

**Monthly budget (typical project):**
- ~50-100 new issues closed per month
- ~30-60 days later, eligible for Tier 1
- Monthly cost: $0.04 - $0.08 (negligible)

---

### Configuration Examples

#### Conservative Setup (manual only)

```bash
bd init
bd config set compact_tier1_days 60
bd config set compact_tier2_days 180
bd config set auto_compact_enabled false

# Run manually when needed
bd compact --dry-run
bd compact --all  # after review
```

#### Aggressive Setup (auto-compact)

```bash
bd config set compact_tier1_days 14
bd config set compact_tier2_days 45
bd config set auto_compact_enabled true
bd config set compact_batch_size 100

# Auto-compacts on bd stats, bd export
bd stats  # triggers compaction if candidates exist
```

#### Development Setup (fast feedback)

```bash
bd config set compact_tier1_days 1
bd config set compact_tier2_days 3
bd config set compact_tier1_dep_levels 1
bd config set compact_tier2_dep_levels 2

# Test compaction on recently closed issues
```

---

### Future Enhancements

#### Phase 2 (Post-MVP)

1. **Local Model Support:**
   - Use Ollama for zero-cost summarization
   - Fallback chain: Haiku → Ollama → truncation

2. **Custom Prompts:**
   - User-defined summarization prompts
   - Per-project templates
   - Domain-specific summaries (e.g., "focus on API changes")

3. **Selective Preservation:**
   - Mark issues as "do not compact"
   - Preserve certain labels (e.g., `architecture`, `security`)
   - Field-level preservation (e.g., keep design notes, compress others)

4. **Analytics:**
   - Compaction effectiveness over time
   - Cost tracking per run
   - Quality feedback (user ratings of summaries)

5. **Smart Scheduling:**
   - Auto-detect optimal compaction times
   - Avoid compaction during active development
   - Weekend/off-hours processing

6. **Multi-Tier Expansion:**
   - Tier 3: Archive to separate file
   - Tier 4: Delete (with backup)
   - Configurable tier chain

#### Phase 3 (Advanced)

1. **Distributed Compaction:**
   - Coordinate across multiple machines
   - Avoid duplicate work in team settings
   - Lock mechanism for compaction jobs

2. **Incremental Summarization:**
   - Re-summarize if issue reopened
   - Preserve history of summaries
   - Version tracking for prompts

3. **Search Integration:**
   - Full-text search includes summaries
   - Boost compacted issues in search results
   - Semantic search using embeddings

---

## Implementation Issues

The following issues should be created in beads to track this work. They're designed to be implemented in dependency order.

---

### Epic: Issue Database Compaction

**Issue ID:** (auto-generated)
**Title:** Epic: Add intelligent database compaction with Claude Haiku
**Type:** epic
**Priority:** 2
**Description:**

Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context.

**Goals:**
- 70-95% space reduction for eligible issues
- Full restore capability via snapshots
- Opt-in with dry-run safety
- ~$1 per 1,000 issues compacted

**Acceptance Criteria:**
- [ ] Schema migration with snapshots table
- [ ] Haiku integration for summarization
- [ ] Two-tier compaction (30d, 90d)
- [ ] CLI with dry-run, restore, stats
- [ ] Full test coverage
- [ ] Documentation complete

**Dependencies:** None (this is the epic)

---

### Issue 1: Add compaction schema and migrations

**Type:** task
**Priority:** 1
**Dependencies:** Blocks all other compaction work

**Description:**

Add database schema support for issue compaction tracking and snapshot storage.

**Design:**

Add three columns to `issues` table:
- `compaction_level INTEGER DEFAULT 0` - 0=original, 1=tier1, 2=tier2
- `compacted_at DATETIME` - when last compacted
- `original_size INTEGER` - bytes before first compaction

Create `issue_snapshots` table:
```sql
CREATE TABLE issue_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    issue_id TEXT NOT NULL,
    snapshot_time DATETIME NOT NULL,
    compaction_level INTEGER NOT NULL,
    original_size INTEGER NOT NULL,
    compressed_size INTEGER NOT NULL,
    original_content TEXT NOT NULL,  -- JSON blob
    archived_events TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);
```

Add indexes:
- `idx_snapshots_issue` on `issue_id`
- `idx_snapshots_level` on `compaction_level`

Add migration functions in `internal/storage/sqlite/sqlite.go`:
- `migrateCompactionColumns(db *sql.DB) error`
- `migrateSnapshotsTable(db *sql.DB) error`

**Acceptance Criteria:**
- [ ] Existing databases migrate automatically
- [ ] New databases include columns by default
- [ ] Migration is idempotent (safe to run multiple times)
- [ ] No data loss during migration
- [ ] Tests verify migration on fresh and existing DBs

**Estimated Time:** 4 hours

---

### Issue 2: Add compaction configuration keys

**Type:** task
**Priority:** 1
**Dependencies:** Blocks compaction logic

**Description:**

Add configuration keys for compaction behavior with sensible defaults.

**Implementation:**

Add to `internal/storage/sqlite/schema.go` initial config:
```sql
INSERT OR IGNORE INTO config (key, value) VALUES
    ('compact_tier1_days', '30'),
    ('compact_tier1_dep_levels', '2'),
    ('compact_tier2_days', '90'),
    ('compact_tier2_dep_levels', '5'),
    ('compact_tier2_commits', '100'),
    ('compact_model', 'claude-3-5-haiku-20241022'),
    ('compact_batch_size', '50'),
    ('compact_parallel_workers', '5'),
    ('auto_compact_enabled', 'false');
```

Add helper functions in `internal/storage/` or `cmd/bd/`:
```go
func getCompactConfig(ctx context.Context, store Storage) (*CompactConfig, error)
```

**Acceptance Criteria:**
- [ ] Config keys created on init
- [ ] Existing DBs get defaults on migration
- [ ] `bd config get/set` works with all keys
- [ ] Type validation (days=int, enabled=bool)
- [ ] Documentation in README.md

**Estimated Time:** 2 hours

---

### Issue 3: Implement candidate identification queries

**Type:** task
**Priority:** 1
**Dependencies:** Needs schema and config

**Description:**

Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction.

**Design:**

Create `internal/storage/sqlite/compact.go` with:

```go
type CompactionCandidate struct {
    IssueID       string
    ClosedAt      time.Time
    OriginalSize  int
    EstimatedSize int
    DependentCount int
}

func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error)
```

Implement recursive dependency checking using CTE similar to `ready_issues` view.

**Acceptance Criteria:**
- [ ] Tier 1 query filters by days and dependency depth
- [ ] Tier 2 query includes commit/issue count checks
- [ ] Dependency checking handles circular deps gracefully
- [ ] Performance: <100ms for 10,000 issue database
- [ ] Tests cover edge cases (no deps, circular deps, mixed status)

**Estimated Time:** 6 hours

---

### Issue 4: Create Haiku client and prompt templates

**Type:** task
**Priority:** 1
**Dependencies:** None (can work in parallel)

**Description:**

Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization.

**Implementation:**

Create `internal/compact/haiku.go`:

```go
type HaikuClient struct {
    client *anthropic.Client
    model  string
}

func NewHaikuClient(apiKey string) (*HaikuClient, error)
func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error)
func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error)
```

Use text/template for prompt rendering.

Add error handling:
- Rate limit retry with exponential backoff
- Network errors: 3 retries
- Invalid responses: return error, don't corrupt data

**Acceptance Criteria:**
- [ ] API key from env var or config (env takes precedence)
- [ ] Prompts render correctly with template
- [ ] Rate limiting handled gracefully
- [ ] Mock tests for API calls
- [ ] Real integration test (optional, requires API key)

**Estimated Time:** 6 hours

---

### Issue 5: Implement snapshot creation and restoration

**Type:** task
**Priority:** 1
**Dependencies:** Needs schema changes

**Description:**

Implement snapshot creation before compaction and restoration capability.

**Implementation:**

Add to `internal/storage/sqlite/compact.go`:

```go
type Snapshot struct {
    ID              int64
    IssueID         string
    SnapshotTime    time.Time
    CompactionLevel int
    OriginalSize    int
    CompressedSize  int
    OriginalContent string  // JSON
    ArchivedEvents  string  // JSON, nullable
}

func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error
func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error
func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error)
```

Snapshot JSON structure:
```json
{
  "description": "...",
  "design": "...",
  "notes": "...",
  "acceptance_criteria": "...",
  "title": "..."
}
```

**Acceptance Criteria:**
- [ ] Snapshot created atomically with compaction
- [ ] Restore returns exact original content
- [ ] Multiple snapshots per issue supported (Tier 1 → Tier 2)
- [ ] JSON encoding handles special characters
- [ ] Size calculation is accurate (UTF-8 bytes)

**Estimated Time:** 5 hours

---

### Issue 6: Implement Tier 1 compaction logic

**Type:** task
**Priority:** 1
**Dependencies:** Needs Haiku client, snapshots, candidate queries

**Description:**

Implement the core Tier 1 compaction process: snapshot → summarize → update.

**Implementation:**

Add to `internal/compact/compactor.go`:

```go
type Compactor struct {
    store  storage.Storage
    haiku  *HaikuClient
    config *CompactConfig
}

func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error)
func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error
```

Process:
1. Verify eligibility
2. Calculate original size
3. Create snapshot
4. Call Haiku for summary
5. Update issue (description = summary, clear design/notes/criteria)
6. Set compaction_level = 1, compacted_at = now
7. Record EventCompacted
8. Mark dirty for export

**Acceptance Criteria:**
- [ ] Single issue compaction works end-to-end
- [ ] Batch processing with parallel workers
- [ ] Errors don't corrupt database (transaction rollback)
- [ ] EventCompacted includes size savings
- [ ] Dry-run mode (identify + size estimate only)

**Estimated Time:** 8 hours

---

### Issue 7: Implement Tier 2 compaction logic

**Type:** task
**Priority:** 2
**Dependencies:** Needs Tier 1 working

**Description:**

Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning.

**Implementation:**

Add to `internal/compact/compactor.go`:

```go
func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error
```

Process:
1. Verify issue is at compaction_level = 1
2. Check Tier 2 eligibility (days, deps, commits/issues)
3. Create Tier 2 snapshot
4. Call Haiku with ultra-compression prompt
5. Update issue (description = single paragraph, clear all else)
6. Set compaction_level = 2
7. Optionally prune events (keep created/closed, archive rest)

**Acceptance Criteria:**
- [ ] Requires existing Tier 1 compaction
- [ ] Git commit counting works (with fallback)
- [ ] Events optionally pruned (config: compact_events_enabled)
- [ ] Archived events stored in snapshot
- [ ] Size reduction 90-95%

**Estimated Time:** 6 hours

---

### Issue 8: Add `bd compact` CLI command

**Type:** task
**Priority:** 1
**Dependencies:** Needs Tier 1 compaction logic

**Description:**

Implement the `bd compact` command with dry-run, batch processing, and progress reporting.

**Implementation:**

Create `cmd/bd/compact.go`:

```go
var compactCmd = &cobra.Command{
    Use:   "compact",
    Short: "Compact old closed issues to save space",
    Long:  `...`,
}

var (
    compactDryRun    bool
    compactTier      int
    compactAll       bool
    compactID        string
    compactForce     bool
    compactBatchSize int
    compactWorkers   int
)

func init() {
    compactCmd.Flags().BoolVar(&compactDryRun, "dry-run", false, "Preview without compacting")
    compactCmd.Flags().IntVar(&compactTier, "tier", 1, "Compaction tier (1 or 2)")
    compactCmd.Flags().BoolVar(&compactAll, "all", false, "Process all candidates")
    compactCmd.Flags().StringVar(&compactID, "id", "", "Compact specific issue")
    compactCmd.Flags().BoolVar(&compactForce, "force", false, "Force compact (bypass checks)")
    // ... more flags
}
```

**Output:**
- Dry-run: Table of candidates with size estimates
- Actual run: Progress bar with batch updates
- Summary: Count, size saved, cost, time

**Acceptance Criteria:**
- [ ] `--dry-run` shows accurate preview
- [ ] `--all` processes all candidates
- [ ] `--id` compacts single issue
- [ ] `--force` bypasses eligibility checks (with --id)
- [ ] Progress bar for batches
- [ ] JSON output with `--json`
- [ ] Exit code: 0=success, 1=error

**Estimated Time:** 6 hours

---

### Issue 9: Add `bd compact --restore` functionality

**Type:** task
**Priority:** 2
**Dependencies:** Needs snapshots and CLI

**Description:**

Implement restore command to undo compaction from snapshots.

**Implementation:**

Add to `cmd/bd/compact.go`:

```go
var compactRestore string

compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot")
```

Process:
1. Load snapshot for issue
2. Parse JSON content
3. Update issue with original content
4. Set compaction_level = 0, compacted_at = NULL
5. Record EventRestored
6. Mark dirty

**Acceptance Criteria:**
- [ ] Restores exact original content
- [ ] Handles multiple snapshots (prompt user or use latest)
- [ ] `--level` flag to choose snapshot
- [ ] Updates compaction_level correctly
- [ ] Exports restored content to JSONL

**Estimated Time:** 4 hours

---

### Issue 10: Add `bd compact --stats` command

**Type:** task
**Priority:** 2
**Dependencies:** Needs compaction working

**Description:**

Add statistics command showing compaction status and potential savings.

**Implementation:**

```go
var compactStats bool

compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics")
```

Output:
- Total issues, by compaction level
- Current DB size vs estimated uncompacted size
- Space savings (MB and %)
- Candidates for each tier with estimates
- Estimated API cost

**Acceptance Criteria:**
- [ ] Accurate counts by compaction_level
- [ ] Size calculations include all text fields
- [ ] Shows candidates with eligibility reasons
- [ ] Cost estimation based on Haiku pricing
- [ ] JSON output supported

**Estimated Time:** 4 hours

---

### Issue 11: Add EventCompacted to event system

**Type:** task
**Priority:** 2
**Dependencies:** Needs schema changes

**Description:**

Add new event type for tracking compaction in audit trail.

**Implementation:**

1. Add to `internal/types/types.go`:
```go
const EventCompacted EventType = "compacted"
```

2. Record event during compaction:
```go
eventData := map[string]interface{}{
    "tier": tier,
    "original_size": originalSize,
    "compressed_size": compressedSize,
    "reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100,
}
```

3. Show in `bd show` output:
```
Events:
  2025-10-15: compacted (tier 1, saved 1.8KB, 80%)
  2025-08-31: closed by alice
  2025-08-20: created by alice
```

**Acceptance Criteria:**
- [ ] Event includes tier and size info
- [ ] Shows in event history
- [ ] Exports to JSONL
- [ ] `bd show` displays compaction marker

**Estimated Time:** 3 hours

---

### Issue 12: Add compaction indicator to `bd show`

**Type:** task
**Priority:** 2
**Dependencies:** Needs compaction working

**Description:**

Update `bd show` command to display compaction status prominently.

**Implementation:**

Add to issue display:
```
bd-42: Fix authentication bug [CLOSED] 🗜️

Status: closed (compacted L1)
...

---
💾 Restore: bd compact --restore bd-42
📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction)
🗜️ Compacted: 2025-10-15 (Tier 1)
```

Show different emoji for tiers:
- Tier 1: 🗜️
- Tier 2: 📦

**Acceptance Criteria:**
- [ ] Compaction status visible in title
- [ ] Footer shows size savings
- [ ] Restore command shown
- [ ] Works with `--json` output

**Estimated Time:** 2 hours

---

### Issue 13: Write compaction tests

**Type:** task
**Priority:** 1
**Dependencies:** Needs all compaction logic

**Description:**

Comprehensive test suite for compaction functionality.

**Test Coverage:**

1. **Candidate Identification:**
   - Eligibility by time
   - Dependency depth checking
   - Mixed status dependents
   - Edge cases (no deps, circular)

2. **Snapshots:**
   - Create and restore
   - Multiple snapshots per issue
   - Content integrity

3. **Tier 1 Compaction:**
   - Single issue
   - Batch processing
   - Error handling

4. **Tier 2 Compaction:**
   - Requires Tier 1
   - Events pruning
   - Commit counting fallback

5. **CLI:**
   - All flag combinations
   - Dry-run accuracy
   - JSON output

6. **Integration:**
   - End-to-end flow
   - JSONL export/import
   - Restore verification

**Acceptance Criteria:**
- [ ] Test coverage >80%
- [ ] All edge cases covered
- [ ] Mock Haiku API in tests
- [ ] Integration tests pass
- [ ] `go test ./...` passes

**Estimated Time:** 8 hours

---

### Issue 14: Add compaction documentation

**Type:** task
**Priority:** 2
**Dependencies:** All features complete

**Description:**

Document compaction feature in README and create COMPACTION.md guide.

**Content:**

Update README.md:
- Add to Features section
- CLI examples
- Configuration guide

Create COMPACTION.md:
- How compaction works
- When to use each tier
- Cost analysis
- Safety mechanisms
- Troubleshooting

Create examples/compaction/:
- Example workflow
- Cron job setup
- Auto-compaction script

**Acceptance Criteria:**
- [ ] README.md updated
- [ ] COMPACTION.md comprehensive
- [ ] Examples work as documented
- [ ] Screenshots/examples included
- [ ] API key setup documented

**Estimated Time:** 4 hours

---

### Issue 15: Optional: Implement auto-compaction

**Type:** task
**Priority:** 3 (nice-to-have)
**Dependencies:** Needs all compaction working

**Description:**

Implement automatic compaction triggered by certain operations when enabled via config.

**Implementation:**

Trigger points (when `auto_compact_enabled = true`):
1. `bd stats` - check and compact if candidates exist
2. `bd export` - before exporting
3. Background timer (optional, via daemon)

Add:
```go
func (s *SQLiteStorage) AutoCompact(ctx context.Context) error {
    enabled, _ := s.GetConfig(ctx, "auto_compact_enabled")
    if enabled != "true" {
        return nil
    }

    // Run Tier 1 compaction on all candidates
    // Limit to batch_size to avoid long operations
}
```

**Acceptance Criteria:**
- [ ] Respects auto_compact_enabled config
- [ ] Limits batch size to avoid blocking
- [ ] Logs compaction activity
- [ ] Can be disabled per-command with `--no-auto-compact`

**Estimated Time:** 4 hours

---

### Issue 16: Optional: Add git commit counting

**Type:** task
**Priority:** 3 (nice-to-have)
**Dependencies:** Needs Tier 2 logic

**Description:**

Implement git commit counting for "project time" measurement as alternative to calendar time.

**Implementation:**

```go
func getCommitsSince(closedAt time.Time) (int, error) {
    cmd := exec.Command("git", "rev-list", "--count",
        fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD")
    output, err := cmd.Output()
    if err != nil {
        return 0, err  // Not in git repo or git not available
    }
    return strconv.Atoi(strings.TrimSpace(string(output)))
}
```

Fallback to issue counter delta if git unavailable.

**Acceptance Criteria:**
- [ ] Counts commits since closed_at
- [ ] Handles git not available gracefully
- [ ] Fallback to issue counter works
- [ ] Configurable via compact_tier2_commits
- [ ] Tested with real git repo

**Estimated Time:** 3 hours

---

## Success Metrics

### Technical Metrics

- [ ] 70-85% size reduction for Tier 1
- [ ] 90-95% size reduction for Tier 2
- [ ] <100ms candidate identification query
- [ ] <2s per issue compaction (Haiku latency)
- [ ] Zero data loss (all restore tests pass)

### Quality Metrics

- [ ] Haiku summaries preserve key information
- [ ] Developers can understand compacted issues
- [ ] Restore returns exact original content
- [ ] No corruption in multi-machine workflows

### Operational Metrics

- [ ] Cost: <$1.50 per 1,000 issues
- [ ] Dry-run accuracy: 95%+ estimate correctness
- [ ] Error rate: <1% API failures (with retry)
- [ ] User adoption: Docs clear, examples work

---

## Rollout Plan

### Phase 1: Alpha (Internal Testing)

1. Merge compaction feature to main
2. Test on beads' own database
3. Verify JSONL export/import
4. Validate Haiku summaries
5. Fix any critical bugs

### Phase 2: Beta (Opt-In)

1. Announce in README (opt-in, experimental)
2. Gather feedback from early adopters
3. Iterate on prompt templates
4. Add telemetry (optional, with consent)

### Phase 3: Stable (Default Disabled)

1. Mark feature as stable
2. Keep auto_compact_enabled = false by default
3. Encourage manual `bd compact --dry-run` first
4. Document in quickstart guide

### Phase 4: Mature (Consider Auto-Enable)

1. After 6+ months of stability
2. Consider auto-compaction for new users
3. Provide migration guide for disabling

---

## Risks and Mitigations

| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| Haiku summaries lose critical info | High | Medium | Manual review in dry-run, restore capability, improve prompts |
| API rate limits during batch | Medium | Medium | Exponential backoff, respect rate limits, batch sizing |
| JSONL merge conflicts increase | Medium | Low | Compaction is deterministic per issue, git handles well |
| Users accidentally compress important issues | High | Low | Dry-run required, restore available, snapshots permanent |
| Cost higher than expected | Low | Low | Dry-run shows estimates, configurable batch sizes |
| Schema migration fails | High | Very Low | Idempotent migrations, tested on existing DBs |

---

## Open Questions

1. **Should compaction be reversible forever, or expire snapshots?**
   - Recommendation: Keep snapshots indefinitely (disk is cheap)

2. **Should we compress snapshots themselves (gzip)?**
   - Recommendation: Not in MVP, add if storage becomes issue

3. **Should tier selection be automatic or manual?**
   - Recommendation: Manual in MVP, auto-tier in future

4. **How to handle issues compacted on one machine but not another?**
   - Answer: JSONL export includes compaction_level, imports preserve it

5. **Should we support custom models (Sonnet, Opus)?**
   - Recommendation: Haiku only in MVP, add later if needed

---

## Appendix: Example Workflow

### Typical Monthly Compaction

```bash
# 1. Check what's eligible
$ bd compact --dry-run
=== Tier 1 Candidates ===
42 issues eligible (closed >30 days, deps closed)
Est. reduction: 127 KB → 25 KB (80%)
Est. cost: $0.03

# 2. Review candidates manually
$ bd list --status closed --json | jq 'map(select(.compaction_level == 0))'

# 3. Compact Tier 1
$ bd compact --all
✓ Compacted 42 issues in 18s ($0.03)

# 4. Check Tier 2 candidates (optional)
$ bd compact --dry-run --tier 2
=== Tier 2 Candidates ===
8 issues eligible (closed >90 days, 100+ commits since)
Est. reduction: 45 KB → 4 KB (91%)
Est. cost: $0.01

# 5. Compact Tier 2
$ bd compact --all --tier 2
✓ Compacted 8 issues in 6s ($0.01)

# 6. Export and commit
$ bd export -o .beads/issues.jsonl
$ git add .beads/issues.jsonl
$ git commit -m "Compact 50 old issues (saved 143 KB)"
$ git push

# 7. View stats
$ bd compact --stats
Total Space Saved: 143 KB (82% reduction)
Database Size: 2.1 MB (down from 2.3 MB)
```

---

## Conclusion

This design provides a comprehensive, safe, and cost-effective way to keep beads databases lightweight while preserving essential context. The two-tier approach balances aggressiveness with safety, and the snapshot system ensures full reversibility.

The use of Claude Haiku for semantic compression is key - it preserves meaning rather than just truncating text. At ~$1 per 1,000 issues, the cost is negligible for the value provided.

Implementation is straightforward with clear phases and well-defined issues. The MVP (Tier 1 only) can be delivered in ~40 hours of work, with Tier 2 and enhancements following incrementally.

This aligns perfectly with beads' philosophy: **pragmatic, agent-focused, and evolutionarily designed**.