Files
beads/COMPACTION_DESIGN.md

1655 lines
45 KiB
Markdown

# Issue Database Compaction Design
**Status:** Design Phase
**Created:** 2025-10-15
**Target:** Beads v1.1
## Executive Summary
Add intelligent database compaction to beads that uses Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context about past work. The design philosophy: **most work is throwaway, and forensic value decays exponentially with time**.
### Key Metrics
- **Space savings:** 70-95% reduction in text volume for old issues
- **Cost:** ~$1.10 per 1,000 issues compacted (Haiku pricing)
- **Safety:** Full snapshot system with restore capability
- **Performance:** Batch processing with parallel workers
---
## Motivation
### The Problem
Beads databases grow indefinitely:
- Issues accumulate detailed `description`, `design`, `notes`, `acceptance_criteria` fields
- Events table logs every change forever
- Old closed issues (especially those with all dependents closed) rarely need full detail
- Agent context windows work better with concise, relevant information
### Why This Matters
1. **Agent Efficiency:** Smaller databases → faster queries → clearer agent thinking
2. **Context Management:** Agents benefit from summaries of old work, not verbose details
3. **Git Performance:** Smaller JSONL exports → faster git operations
4. **Pragmatic Philosophy:** Beads is agent memory, not a historical archive
5. **Forensic Decay:** Need for detail decreases exponentially after closure
### What We Keep
- Issue ID and title (always)
- Semantic summary of what was done and why
- Key architectural decisions
- Closure outcome
- Full git history in JSONL commits (ultimate backup)
- Restore capability via snapshots
---
## Technical Design
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ Compaction Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Candidate Identification │
│ ↓ │
│ • Query closed issues meeting time + dependency criteria │
│ • Check dependency depth (recursive CTE) │
│ • Calculate size/savings estimates │
│ │
│ 2. Snapshot Creation │
│ ↓ │
│ • Store original content in issue_snapshots table │
│ • Calculate content hash for verification │
│ • Enable restore capability │
│ │
│ 3. Haiku Summarization │
│ ↓ │
│ • Batch process with worker pool (5 parallel) │
│ • Different prompts for Tier 1 vs Tier 2 │
│ • Handle API errors gracefully │
│ │
│ 4. Issue Update │
│ ↓ │
│ • Replace verbose fields with summary │
│ • Set compaction_level and compacted_at │
│ • Record event (EventCompacted) │
│ • Mark dirty for JSONL export │
│ │
│ 5. Optional: Events Pruning │
│ ↓ │
│ • Keep only created/closed events for Tier 2 │
│ • Archive detailed event history to snapshots │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Compaction Tiers
#### **Tier 1: Standard Compaction**
**Eligibility:**
- `status = 'closed'`
- `closed_at >= 30 days ago` (configurable: `compact_tier1_days`)
- All issues that depend on this one (via `blocks` or `parent-child`) are closed
- Dependency check depth: 2 levels (configurable: `compact_tier1_dep_levels`)
**Process:**
1. Snapshot original content
2. Send to Haiku with 300-word summarization prompt
3. Store summary in `description`
4. Clear `design`, `notes`, `acceptance_criteria`
5. Set `compaction_level = 1`
6. Keep all events
**Output Format (Haiku prompt):**
```
**Summary:** [2-3 sentences: problem, solution, outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]
```
**Expected Reduction:** 70-85% of original text size
#### **Tier 2: Aggressive Compaction**
**Eligibility:**
- Already at `compaction_level = 1`
- `closed_at >= 90 days ago` (configurable: `compact_tier2_days`)
- All dependencies (all 4 types) up to 5 levels deep are closed
- One of:
- ≥100 git commits since `closed_at` (configurable: `compact_tier2_commits`)
- ≥500 new issues created since closure
- Manual override with `--force`
**Process:**
1. Snapshot Tier 1 content
2. Send to Haiku with 150-word ultra-compression prompt
3. Store single paragraph in `description`
4. Clear all other text fields
5. Set `compaction_level = 2`
6. Prune events: keep only `created` and `closed`, move rest to snapshot
**Output Format (Haiku prompt):**
```
Single paragraph (≤150 words):
- What was built/fixed
- Why it mattered
- Lasting architectural impact (if any)
```
**Expected Reduction:** 90-95% of original text size
---
### Schema Changes
#### New Columns on `issues` Table
```sql
ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0;
ALTER TABLE issues ADD COLUMN compacted_at DATETIME;
ALTER TABLE issues ADD COLUMN original_size INTEGER; -- bytes before compaction
```
#### New Table: `issue_snapshots`
```sql
CREATE TABLE IF NOT EXISTS issue_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
issue_id TEXT NOT NULL,
snapshot_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
compaction_level INTEGER NOT NULL, -- 1 or 2
original_size INTEGER NOT NULL,
compressed_size INTEGER NOT NULL,
-- JSON blob with original content
original_content TEXT NOT NULL,
-- Optional: compressed events for Tier 2
archived_events TEXT,
FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_snapshots_issue ON issue_snapshots(issue_id);
CREATE INDEX IF NOT EXISTS idx_snapshots_level ON issue_snapshots(compaction_level);
```
**Snapshot JSON Structure:**
```json
{
"description": "original description text...",
"design": "original design text...",
"notes": "original notes text...",
"acceptance_criteria": "original criteria text...",
"title": "original title",
"hash": "sha256:abc123..."
}
```
#### New Config Keys
```sql
INSERT INTO config (key, value) VALUES
-- Tier 1 settings
('compact_tier1_days', '30'),
('compact_tier1_dep_levels', '2'),
-- Tier 2 settings
('compact_tier2_days', '90'),
('compact_tier2_dep_levels', '5'),
('compact_tier2_commits', '100'),
('compact_tier2_new_issues', '500'),
-- API settings
('anthropic_api_key', ''), -- Falls back to ANTHROPIC_API_KEY env var
('compact_model', 'claude-3-5-haiku-20241022'),
-- Performance settings
('compact_batch_size', '50'),
('compact_parallel_workers', '5'),
-- Safety settings
('auto_compact_enabled', 'false'), -- Opt-in
('compact_events_enabled', 'false'), -- Events pruning (Tier 2)
-- Display settings
('compact_show_savings', 'true'); -- Show size reduction in output
```
#### New Event Type
```go
const (
// ... existing event types ...
EventCompacted EventType = "compacted"
)
```
---
### Haiku Integration
#### Prompt Templates
**Tier 1 Prompt:**
```
Summarize this closed software issue. Preserve key decisions, implementation approach, and outcome. Max 300 words.
Title: {{.Title}}
Type: {{.IssueType}}
Priority: {{.Priority}}
Description:
{{.Description}}
Design Notes:
{{.Design}}
Implementation Notes:
{{.Notes}}
Acceptance Criteria:
{{.AcceptanceCriteria}}
Output format:
**Summary:** [2-3 sentences: what problem, what solution, what outcome]
**Key Decisions:** [bullet points of non-obvious choices]
**Resolution:** [how it was closed]
```
**Tier 2 Prompt:**
```
Ultra-compress this old closed issue to ≤150 words. Focus on lasting architectural impact.
Title: {{.Title}}
Original Summary (already compressed):
{{.Description}}
Output a single paragraph covering:
- What was built/fixed
- Why it mattered
- Lasting impact (if any)
If there's no lasting impact, just state what was done and that it's resolved.
```
#### API Client Structure
```go
package compact
import (
"github.com/anthropics/anthropic-sdk-go"
)
type HaikuClient struct {
client *anthropic.Client
model string
}
func NewHaikuClient(apiKey string) *HaikuClient {
return &HaikuClient{
client: anthropic.NewClient(apiKey),
model: anthropic.ModelClaude_3_5_Haiku_20241022,
}
}
func (h *HaikuClient) Summarize(ctx context.Context, prompt string, maxTokens int) (string, error)
```
#### Error Handling
- **Rate limits:** Exponential backoff with jitter
- **API errors:** Log and skip issue (don't fail entire batch)
- **Network failures:** Retry up to 3 times
- **Invalid responses:** Fall back to truncation with warning
- **Context length:** Truncate input if needed (rare, but possible)
---
### Dependency Checking
Reuse existing recursive CTE logic from `ready_issues` view, adapted for compaction:
```sql
-- Check if issue and N levels of dependents are all closed
WITH RECURSIVE dependent_tree AS (
-- Base case: the candidate issue
SELECT id, status, 0 as depth
FROM issues
WHERE id = ?
UNION ALL
-- Recursive case: issues that depend on this one
SELECT i.id, i.status, dt.depth + 1
FROM dependent_tree dt
JOIN dependencies d ON d.depends_on_id = dt.id
JOIN issues i ON i.id = d.issue_id
WHERE d.type IN ('blocks', 'parent-child') -- Only blocking deps matter
AND dt.depth < ? -- Max depth parameter
)
SELECT CASE
WHEN COUNT(*) = SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
THEN 1 ELSE 0 END as all_closed
FROM dependent_tree;
```
**Performance:** This query is O(N) where N is the number of dependents. With proper indexes, should be <10ms per issue.
---
### Git Integration (Optional)
For "project time" measurement via commit counting:
```go
func getCommitsSince(closedAt time.Time) (int, error) {
cmd := exec.Command("git", "rev-list", "--count",
fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)),
"HEAD")
output, err := cmd.Output()
if err != nil {
return 0, err
}
return strconv.Atoi(strings.TrimSpace(string(output)))
}
```
**Fallback:** If git unavailable or not in a repo, use issue counter delta:
```sql
SELECT last_id FROM issue_counters WHERE prefix = ?
-- Store at close time, compare at compaction time
```
---
### CLI Commands
#### `bd compact` - Main Command
```bash
bd compact [flags]
Flags:
--dry-run Show what would be compacted without doing it
--tier int Compaction tier (1 or 2, default: 1)
--all Process all eligible issues (default: preview only)
--id string Compact specific issue by ID
--force Bypass eligibility checks (with --id)
--batch-size int Issues per batch (default: from config)
--workers int Parallel workers (default: from config)
--json JSON output for agents
Examples:
bd compact --dry-run # Preview Tier 1 candidates
bd compact --dry-run --tier 2 # Preview Tier 2 candidates
bd compact --all # Compact all Tier 1 candidates
bd compact --tier 2 --all # Compact all Tier 2 candidates
bd compact --id bd-42 # Compact specific issue
bd compact --id bd-42 --force # Force compact even if recent
```
#### `bd compact --restore` - Restore Compacted Issues
```bash
bd compact --restore <issue-id> [flags]
Flags:
--level int Restore to specific snapshot level (default: latest)
--json JSON output
Examples:
bd compact --restore bd-42 # Restore bd-42 from latest snapshot
bd compact --restore bd-42 --level 1 # Restore from Tier 1 snapshot
```
#### `bd compact --stats` - Compaction Statistics
```bash
bd compact --stats [flags]
Flags:
--json JSON output
Example output:
=== Compaction Statistics ===
Total Issues: 1,247
Compacted (Tier 1): 342 (27.4%)
Compacted (Tier 2): 89 (7.1%)
Database Size: 2.3 MB
Estimated Uncompacted Size: 8.7 MB
Space Savings: 6.4 MB (73.6%)
Candidates:
Tier 1: 47 issues (est. 320 KB → 64 KB)
Tier 2: 15 issues (est. 180 KB → 18 KB)
Estimated Compaction Cost: $0.04 (Haiku)
```
---
### Output Examples
#### Dry Run Output
```
$ bd compact --dry-run
=== Tier 1 Compaction Preview ===
Eligibility: Closed ≥30 days, 2 levels of dependents closed
bd-42: Fix authentication bug (P1, bug)
Closed: 45 days ago
Size: 2,341 bytes → ~468 bytes (80% reduction)
Dependents: bd-43, bd-44 (both closed)
bd-57: Add login form (P2, feature)
Closed: 38 days ago
Size: 1,823 bytes → ~365 bytes (80% reduction)
Dependents: (none)
bd-58: Refactor auth middleware (P2, task)
Closed: 35 days ago
Size: 3,102 bytes → ~620 bytes (80% reduction)
Dependents: bd-59 (closed)
... (44 more issues)
Total: 47 issues
Estimated reduction: 87.3 KB → 17.5 KB (80%)
Estimated cost: $0.03 (Haiku API)
Run with --all to compact these issues.
```
#### Compaction Progress
```
$ bd compact --all
Compacting 47 issues (Tier 1)...
Creating snapshots... [████████████████████] 47/47
Calling Haiku API... [████████████████████] 47/47 (12s, $0.027)
Updating issues... [████████████████████] 47/47
✓ Successfully compacted 47 issues
Size reduction: 87.3 KB → 18.2 KB (79.1%)
API cost: $0.027
Time: 14.3s
Compacted issues will be exported to .beads/issues.jsonl
```
#### Show Compacted Issue
```
$ bd show bd-42
bd-42: Fix authentication bug [CLOSED] 🗜️
Status: closed (compacted L1)
Priority: 1 (High)
Type: bug
Closed: 2025-08-31 (45 days ago)
Compacted: 2025-10-15 (saved 1,873 bytes)
**Summary:** Fixed race condition in JWT token refresh logic causing intermittent
401 errors under high load. Implemented mutex-based locking around token refresh
operations. All users can now stay authenticated reliably during concurrent requests.
**Key Decisions:**
- Used sync.RWMutex instead of channels for simpler reasoning about lock state
- Added exponential backoff to token refresh to prevent thundering herd
- Preserved existing token format for backward compatibility with mobile clients
**Resolution:** Deployed to production on Aug 31, monitored for 2 weeks with zero
401 errors. Closed after confirming fix with load testing.
---
💾 Restore: bd compact --restore bd-42
📊 Original size: 2,341 bytes | Compressed: 468 bytes (80% reduction)
```
---
### Safety Mechanisms
1. **Snapshot-First:** Always create snapshot before modifying issue
2. **Restore Capability:** Full restore from snapshots with `--restore`
3. **Opt-In Auto-Compaction:** Disabled by default (`auto_compact_enabled = false`)
4. **Dry-Run Required:** Preview before committing with `--dry-run`
5. **Git Backup:** JSONL exports preserve full history in git commits
6. **Audit Trail:** `EventCompacted` records what was done and when
7. **Size Verification:** Track original_size and compressed_size for validation
8. **Idempotent:** Re-running compaction on already-compacted issues is safe (no-op)
9. **Graceful Degradation:** API failures don't corrupt data, just skip issues
10. **Reversible:** Restore is always available, even after git push
---
### Testing Strategy
#### Unit Tests
1. **Candidate Identification:**
- Issues meeting time criteria
- Dependency depth checking
- Mixed status dependents (some closed, some open)
- Edge case: circular dependencies
2. **Haiku Client:**
- Mock API responses
- Rate limit handling
- Error recovery
- Prompt rendering
3. **Snapshot Management:**
- Create snapshot
- Restore from snapshot
- Multiple snapshots per issue (Tier 1 → Tier 2)
- Snapshot integrity (hash verification)
4. **Size Calculation:**
- Accurate byte counting
- UTF-8 handling
- Empty fields
#### Integration Tests
1. **End-to-End Compaction:**
- Create test issues
- Age them (mock timestamps)
- Run compaction
- Verify summaries
- Restore and verify
2. **Batch Processing:**
- Large batches (100+ issues)
- Parallel worker coordination
- Error handling mid-batch
3. **JSONL Export:**
- Compacted issues export correctly
- Import preserves compaction_level
- Round-trip fidelity
4. **CLI Commands:**
- All flag combinations
- JSON output parsing
- Error messages
#### Manual Testing Checklist
- [ ] Dry-run shows accurate candidates
- [ ] Compaction reduces size as expected
- [ ] Haiku summaries are high quality
- [ ] Restore returns exact original content
- [ ] Stats command shows correct numbers
- [ ] Auto-compaction respects config
- [ ] Git workflow (commit → pull → auto-compact)
- [ ] Multi-machine workflow with compaction
- [ ] API key handling (env var vs config)
- [ ] Rate limit handling under load
---
### Performance Considerations
#### Scalability
**Small databases (<1,000 issues):**
- Full scan acceptable
- Compact all eligible in one run
- <1 minute total time
**Medium databases (1,000-10,000 issues):**
- Batch processing required
- Progress reporting essential
- 5-10 minutes total time
**Large databases (>10,000 issues):**
- Incremental compaction (process N per run)
- Consider scheduled background job
- 30-60 minutes total time
#### Optimization Strategies
1. **Index Usage:**
- `idx_issues_status` - filter closed issues
- `idx_dependencies_depends_on` - dependency traversal
- `idx_snapshots_issue` - restore lookups
2. **Batch Sizing:**
- Default 50 issues per batch
- Configurable via `compact_batch_size`
- Trade-off: larger batches = fewer commits, more RAM
3. **Parallel Workers:**
- Default 5 parallel Haiku calls
- Configurable via `compact_parallel_workers`
- Respects Haiku rate limits
4. **Query Optimization:**
- Use prepared statements for snapshots
- Reuse dependency check query
- Avoid N+1 queries in batch operations
---
### Cost Analysis
#### Haiku Pricing (as of 2025-10-15)
- Input: $0.25 per million tokens (~$0.0003 per 1K tokens)
- Output: $1.25 per million tokens (~$0.0013 per 1K tokens)
#### Per-Issue Estimates
**Tier 1:**
- Input: ~1,000 tokens (full issue content)
- Output: ~400 tokens (summary)
- Cost: ~$0.0008 per issue
**Tier 2:**
- Input: ~500 tokens (Tier 1 summary)
- Output: ~200 tokens (ultra-compressed)
- Cost: ~$0.0003 per issue
#### Batch Costs
| Issues | Tier 1 Cost | Tier 2 Cost | Total |
|--------|-------------|-------------|-------|
| 100 | $0.08 | $0.03 | $0.11 |
| 500 | $0.40 | $0.15 | $0.55 |
| 1,000 | $0.80 | $0.30 | $1.10 |
| 5,000 | $4.00 | $1.50 | $5.50 |
**Monthly budget (typical project):**
- ~50-100 new issues closed per month
- ~30-60 days later, eligible for Tier 1
- Monthly cost: $0.04 - $0.08 (negligible)
---
### Configuration Examples
#### Conservative Setup (manual only)
```bash
bd init
bd config set compact_tier1_days 60
bd config set compact_tier2_days 180
bd config set auto_compact_enabled false
# Run manually when needed
bd compact --dry-run
bd compact --all # after review
```
#### Aggressive Setup (auto-compact)
```bash
bd config set compact_tier1_days 14
bd config set compact_tier2_days 45
bd config set auto_compact_enabled true
bd config set compact_batch_size 100
# Auto-compacts on bd stats, bd export
bd stats # triggers compaction if candidates exist
```
#### Development Setup (fast feedback)
```bash
bd config set compact_tier1_days 1
bd config set compact_tier2_days 3
bd config set compact_tier1_dep_levels 1
bd config set compact_tier2_dep_levels 2
# Test compaction on recently closed issues
```
---
### Future Enhancements
#### Phase 2 (Post-MVP)
1. **Local Model Support:**
- Use Ollama for zero-cost summarization
- Fallback chain: Haiku → Ollama → truncation
2. **Custom Prompts:**
- User-defined summarization prompts
- Per-project templates
- Domain-specific summaries (e.g., "focus on API changes")
3. **Selective Preservation:**
- Mark issues as "do not compact"
- Preserve certain labels (e.g., `architecture`, `security`)
- Field-level preservation (e.g., keep design notes, compress others)
4. **Analytics:**
- Compaction effectiveness over time
- Cost tracking per run
- Quality feedback (user ratings of summaries)
5. **Smart Scheduling:**
- Auto-detect optimal compaction times
- Avoid compaction during active development
- Weekend/off-hours processing
6. **Multi-Tier Expansion:**
- Tier 3: Archive to separate file
- Tier 4: Delete (with backup)
- Configurable tier chain
#### Phase 3 (Advanced)
1. **Distributed Compaction:**
- Coordinate across multiple machines
- Avoid duplicate work in team settings
- Lock mechanism for compaction jobs
2. **Incremental Summarization:**
- Re-summarize if issue reopened
- Preserve history of summaries
- Version tracking for prompts
3. **Search Integration:**
- Full-text search includes summaries
- Boost compacted issues in search results
- Semantic search using embeddings
---
## Implementation Issues
The following issues should be created in beads to track this work. They're designed to be implemented in dependency order.
---
### Epic: Issue Database Compaction
**Issue ID:** (auto-generated)
**Title:** Epic: Add intelligent database compaction with Claude Haiku
**Type:** epic
**Priority:** 2
**Description:**
Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context.
**Goals:**
- 70-95% space reduction for eligible issues
- Full restore capability via snapshots
- Opt-in with dry-run safety
- ~$1 per 1,000 issues compacted
**Acceptance Criteria:**
- [ ] Schema migration with snapshots table
- [ ] Haiku integration for summarization
- [ ] Two-tier compaction (30d, 90d)
- [ ] CLI with dry-run, restore, stats
- [ ] Full test coverage
- [ ] Documentation complete
**Dependencies:** None (this is the epic)
---
### Issue 1: Add compaction schema and migrations
**Type:** task
**Priority:** 1
**Dependencies:** Blocks all other compaction work
**Description:**
Add database schema support for issue compaction tracking and snapshot storage.
**Design:**
Add three columns to `issues` table:
- `compaction_level INTEGER DEFAULT 0` - 0=original, 1=tier1, 2=tier2
- `compacted_at DATETIME` - when last compacted
- `original_size INTEGER` - bytes before first compaction
Create `issue_snapshots` table:
```sql
CREATE TABLE issue_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
issue_id TEXT NOT NULL,
snapshot_time DATETIME NOT NULL,
compaction_level INTEGER NOT NULL,
original_size INTEGER NOT NULL,
compressed_size INTEGER NOT NULL,
original_content TEXT NOT NULL, -- JSON blob
archived_events TEXT,
FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
);
```
Add indexes:
- `idx_snapshots_issue` on `issue_id`
- `idx_snapshots_level` on `compaction_level`
Add migration functions in `internal/storage/sqlite/sqlite.go`:
- `migrateCompactionColumns(db *sql.DB) error`
- `migrateSnapshotsTable(db *sql.DB) error`
**Acceptance Criteria:**
- [ ] Existing databases migrate automatically
- [ ] New databases include columns by default
- [ ] Migration is idempotent (safe to run multiple times)
- [ ] No data loss during migration
- [ ] Tests verify migration on fresh and existing DBs
**Estimated Time:** 4 hours
---
### Issue 2: Add compaction configuration keys
**Type:** task
**Priority:** 1
**Dependencies:** Blocks compaction logic
**Description:**
Add configuration keys for compaction behavior with sensible defaults.
**Implementation:**
Add to `internal/storage/sqlite/schema.go` initial config:
```sql
INSERT OR IGNORE INTO config (key, value) VALUES
('compact_tier1_days', '30'),
('compact_tier1_dep_levels', '2'),
('compact_tier2_days', '90'),
('compact_tier2_dep_levels', '5'),
('compact_tier2_commits', '100'),
('compact_model', 'claude-3-5-haiku-20241022'),
('compact_batch_size', '50'),
('compact_parallel_workers', '5'),
('auto_compact_enabled', 'false');
```
Add helper functions in `internal/storage/` or `cmd/bd/`:
```go
func getCompactConfig(ctx context.Context, store Storage) (*CompactConfig, error)
```
**Acceptance Criteria:**
- [ ] Config keys created on init
- [ ] Existing DBs get defaults on migration
- [ ] `bd config get/set` works with all keys
- [ ] Type validation (days=int, enabled=bool)
- [ ] Documentation in README.md
**Estimated Time:** 2 hours
---
### Issue 3: Implement candidate identification queries
**Type:** task
**Priority:** 1
**Dependencies:** Needs schema and config
**Description:**
Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction.
**Design:**
Create `internal/storage/sqlite/compact.go` with:
```go
type CompactionCandidate struct {
IssueID string
ClosedAt time.Time
OriginalSize int
EstimatedSize int
DependentCount int
}
func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error)
func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error)
```
Implement recursive dependency checking using CTE similar to `ready_issues` view.
**Acceptance Criteria:**
- [ ] Tier 1 query filters by days and dependency depth
- [ ] Tier 2 query includes commit/issue count checks
- [ ] Dependency checking handles circular deps gracefully
- [ ] Performance: <100ms for 10,000 issue database
- [ ] Tests cover edge cases (no deps, circular deps, mixed status)
**Estimated Time:** 6 hours
---
### Issue 4: Create Haiku client and prompt templates
**Type:** task
**Priority:** 1
**Dependencies:** None (can work in parallel)
**Description:**
Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization.
**Implementation:**
Create `internal/compact/haiku.go`:
```go
type HaikuClient struct {
client *anthropic.Client
model string
}
func NewHaikuClient(apiKey string) (*HaikuClient, error)
func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error)
func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error)
```
Use text/template for prompt rendering.
Add error handling:
- Rate limit retry with exponential backoff
- Network errors: 3 retries
- Invalid responses: return error, don't corrupt data
**Acceptance Criteria:**
- [ ] API key from env var or config (env takes precedence)
- [ ] Prompts render correctly with template
- [ ] Rate limiting handled gracefully
- [ ] Mock tests for API calls
- [ ] Real integration test (optional, requires API key)
**Estimated Time:** 6 hours
---
### Issue 5: Implement snapshot creation and restoration
**Type:** task
**Priority:** 1
**Dependencies:** Needs schema changes
**Description:**
Implement snapshot creation before compaction and restoration capability.
**Implementation:**
Add to `internal/storage/sqlite/compact.go`:
```go
type Snapshot struct {
ID int64
IssueID string
SnapshotTime time.Time
CompactionLevel int
OriginalSize int
CompressedSize int
OriginalContent string // JSON
ArchivedEvents string // JSON, nullable
}
func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error
func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error
func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error)
```
Snapshot JSON structure:
```json
{
"description": "...",
"design": "...",
"notes": "...",
"acceptance_criteria": "...",
"title": "..."
}
```
**Acceptance Criteria:**
- [ ] Snapshot created atomically with compaction
- [ ] Restore returns exact original content
- [ ] Multiple snapshots per issue supported (Tier 1 → Tier 2)
- [ ] JSON encoding handles special characters
- [ ] Size calculation is accurate (UTF-8 bytes)
**Estimated Time:** 5 hours
---
### Issue 6: Implement Tier 1 compaction logic
**Type:** task
**Priority:** 1
**Dependencies:** Needs Haiku client, snapshots, candidate queries
**Description:**
Implement the core Tier 1 compaction process: snapshot → summarize → update.
**Implementation:**
Add to `internal/compact/compactor.go`:
```go
type Compactor struct {
store storage.Storage
haiku *HaikuClient
config *CompactConfig
}
func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error)
func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error
```
Process:
1. Verify eligibility
2. Calculate original size
3. Create snapshot
4. Call Haiku for summary
5. Update issue (description = summary, clear design/notes/criteria)
6. Set compaction_level = 1, compacted_at = now
7. Record EventCompacted
8. Mark dirty for export
**Acceptance Criteria:**
- [ ] Single issue compaction works end-to-end
- [ ] Batch processing with parallel workers
- [ ] Errors don't corrupt database (transaction rollback)
- [ ] EventCompacted includes size savings
- [ ] Dry-run mode (identify + size estimate only)
**Estimated Time:** 8 hours
---
### Issue 7: Implement Tier 2 compaction logic
**Type:** task
**Priority:** 2
**Dependencies:** Needs Tier 1 working
**Description:**
Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning.
**Implementation:**
Add to `internal/compact/compactor.go`:
```go
func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error
func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error
```
Process:
1. Verify issue is at compaction_level = 1
2. Check Tier 2 eligibility (days, deps, commits/issues)
3. Create Tier 2 snapshot
4. Call Haiku with ultra-compression prompt
5. Update issue (description = single paragraph, clear all else)
6. Set compaction_level = 2
7. Optionally prune events (keep created/closed, archive rest)
**Acceptance Criteria:**
- [ ] Requires existing Tier 1 compaction
- [ ] Git commit counting works (with fallback)
- [ ] Events optionally pruned (config: compact_events_enabled)
- [ ] Archived events stored in snapshot
- [ ] Size reduction 90-95%
**Estimated Time:** 6 hours
---
### Issue 8: Add `bd compact` CLI command
**Type:** task
**Priority:** 1
**Dependencies:** Needs Tier 1 compaction logic
**Description:**
Implement the `bd compact` command with dry-run, batch processing, and progress reporting.
**Implementation:**
Create `cmd/bd/compact.go`:
```go
var compactCmd = &cobra.Command{
Use: "compact",
Short: "Compact old closed issues to save space",
Long: `...`,
}
var (
compactDryRun bool
compactTier int
compactAll bool
compactID string
compactForce bool
compactBatchSize int
compactWorkers int
)
func init() {
compactCmd.Flags().BoolVar(&compactDryRun, "dry-run", false, "Preview without compacting")
compactCmd.Flags().IntVar(&compactTier, "tier", 1, "Compaction tier (1 or 2)")
compactCmd.Flags().BoolVar(&compactAll, "all", false, "Process all candidates")
compactCmd.Flags().StringVar(&compactID, "id", "", "Compact specific issue")
compactCmd.Flags().BoolVar(&compactForce, "force", false, "Force compact (bypass checks)")
// ... more flags
}
```
**Output:**
- Dry-run: Table of candidates with size estimates
- Actual run: Progress bar with batch updates
- Summary: Count, size saved, cost, time
**Acceptance Criteria:**
- [ ] `--dry-run` shows accurate preview
- [ ] `--all` processes all candidates
- [ ] `--id` compacts single issue
- [ ] `--force` bypasses eligibility checks (with --id)
- [ ] Progress bar for batches
- [ ] JSON output with `--json`
- [ ] Exit code: 0=success, 1=error
**Estimated Time:** 6 hours
---
### Issue 9: Add `bd compact --restore` functionality
**Type:** task
**Priority:** 2
**Dependencies:** Needs snapshots and CLI
**Description:**
Implement restore command to undo compaction from snapshots.
**Implementation:**
Add to `cmd/bd/compact.go`:
```go
var compactRestore string
compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot")
```
Process:
1. Load snapshot for issue
2. Parse JSON content
3. Update issue with original content
4. Set compaction_level = 0, compacted_at = NULL
5. Record EventRestored
6. Mark dirty
**Acceptance Criteria:**
- [ ] Restores exact original content
- [ ] Handles multiple snapshots (prompt user or use latest)
- [ ] `--level` flag to choose snapshot
- [ ] Updates compaction_level correctly
- [ ] Exports restored content to JSONL
**Estimated Time:** 4 hours
---
### Issue 10: Add `bd compact --stats` command
**Type:** task
**Priority:** 2
**Dependencies:** Needs compaction working
**Description:**
Add statistics command showing compaction status and potential savings.
**Implementation:**
```go
var compactStats bool
compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics")
```
Output:
- Total issues, by compaction level
- Current DB size vs estimated uncompacted size
- Space savings (MB and %)
- Candidates for each tier with estimates
- Estimated API cost
**Acceptance Criteria:**
- [ ] Accurate counts by compaction_level
- [ ] Size calculations include all text fields
- [ ] Shows candidates with eligibility reasons
- [ ] Cost estimation based on Haiku pricing
- [ ] JSON output supported
**Estimated Time:** 4 hours
---
### Issue 11: Add EventCompacted to event system
**Type:** task
**Priority:** 2
**Dependencies:** Needs schema changes
**Description:**
Add new event type for tracking compaction in audit trail.
**Implementation:**
1. Add to `internal/types/types.go`:
```go
const EventCompacted EventType = "compacted"
```
2. Record event during compaction:
```go
eventData := map[string]interface{}{
"tier": tier,
"original_size": originalSize,
"compressed_size": compressedSize,
"reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100,
}
```
3. Show in `bd show` output:
```
Events:
2025-10-15: compacted (tier 1, saved 1.8KB, 80%)
2025-08-31: closed by alice
2025-08-20: created by alice
```
**Acceptance Criteria:**
- [ ] Event includes tier and size info
- [ ] Shows in event history
- [ ] Exports to JSONL
- [ ] `bd show` displays compaction marker
**Estimated Time:** 3 hours
---
### Issue 12: Add compaction indicator to `bd show`
**Type:** task
**Priority:** 2
**Dependencies:** Needs compaction working
**Description:**
Update `bd show` command to display compaction status prominently.
**Implementation:**
Add to issue display:
```
bd-42: Fix authentication bug [CLOSED] 🗜️
Status: closed (compacted L1)
...
---
💾 Restore: bd compact --restore bd-42
📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction)
🗜️ Compacted: 2025-10-15 (Tier 1)
```
Show different emoji for tiers:
- Tier 1: 🗜️
- Tier 2: 📦
**Acceptance Criteria:**
- [ ] Compaction status visible in title
- [ ] Footer shows size savings
- [ ] Restore command shown
- [ ] Works with `--json` output
**Estimated Time:** 2 hours
---
### Issue 13: Write compaction tests
**Type:** task
**Priority:** 1
**Dependencies:** Needs all compaction logic
**Description:**
Comprehensive test suite for compaction functionality.
**Test Coverage:**
1. **Candidate Identification:**
- Eligibility by time
- Dependency depth checking
- Mixed status dependents
- Edge cases (no deps, circular)
2. **Snapshots:**
- Create and restore
- Multiple snapshots per issue
- Content integrity
3. **Tier 1 Compaction:**
- Single issue
- Batch processing
- Error handling
4. **Tier 2 Compaction:**
- Requires Tier 1
- Events pruning
- Commit counting fallback
5. **CLI:**
- All flag combinations
- Dry-run accuracy
- JSON output
6. **Integration:**
- End-to-end flow
- JSONL export/import
- Restore verification
**Acceptance Criteria:**
- [ ] Test coverage >80%
- [ ] All edge cases covered
- [ ] Mock Haiku API in tests
- [ ] Integration tests pass
- [ ] `go test ./...` passes
**Estimated Time:** 8 hours
---
### Issue 14: Add compaction documentation
**Type:** task
**Priority:** 2
**Dependencies:** All features complete
**Description:**
Document compaction feature in README and create COMPACTION.md guide.
**Content:**
Update README.md:
- Add to Features section
- CLI examples
- Configuration guide
Create COMPACTION.md:
- How compaction works
- When to use each tier
- Cost analysis
- Safety mechanisms
- Troubleshooting
Create examples/compaction/:
- Example workflow
- Cron job setup
- Auto-compaction script
**Acceptance Criteria:**
- [ ] README.md updated
- [ ] COMPACTION.md comprehensive
- [ ] Examples work as documented
- [ ] Screenshots/examples included
- [ ] API key setup documented
**Estimated Time:** 4 hours
---
### Issue 15: Optional: Implement auto-compaction
**Type:** task
**Priority:** 3 (nice-to-have)
**Dependencies:** Needs all compaction working
**Description:**
Implement automatic compaction triggered by certain operations when enabled via config.
**Implementation:**
Trigger points (when `auto_compact_enabled = true`):
1. `bd stats` - check and compact if candidates exist
2. `bd export` - before exporting
3. Background timer (optional, via daemon)
Add:
```go
func (s *SQLiteStorage) AutoCompact(ctx context.Context) error {
enabled, _ := s.GetConfig(ctx, "auto_compact_enabled")
if enabled != "true" {
return nil
}
// Run Tier 1 compaction on all candidates
// Limit to batch_size to avoid long operations
}
```
**Acceptance Criteria:**
- [ ] Respects auto_compact_enabled config
- [ ] Limits batch size to avoid blocking
- [ ] Logs compaction activity
- [ ] Can be disabled per-command with `--no-auto-compact`
**Estimated Time:** 4 hours
---
### Issue 16: Optional: Add git commit counting
**Type:** task
**Priority:** 3 (nice-to-have)
**Dependencies:** Needs Tier 2 logic
**Description:**
Implement git commit counting for "project time" measurement as alternative to calendar time.
**Implementation:**
```go
func getCommitsSince(closedAt time.Time) (int, error) {
cmd := exec.Command("git", "rev-list", "--count",
fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD")
output, err := cmd.Output()
if err != nil {
return 0, err // Not in git repo or git not available
}
return strconv.Atoi(strings.TrimSpace(string(output)))
}
```
Fallback to issue counter delta if git unavailable.
**Acceptance Criteria:**
- [ ] Counts commits since closed_at
- [ ] Handles git not available gracefully
- [ ] Fallback to issue counter works
- [ ] Configurable via compact_tier2_commits
- [ ] Tested with real git repo
**Estimated Time:** 3 hours
---
## Success Metrics
### Technical Metrics
- [ ] 70-85% size reduction for Tier 1
- [ ] 90-95% size reduction for Tier 2
- [ ] <100ms candidate identification query
- [ ] <2s per issue compaction (Haiku latency)
- [ ] Zero data loss (all restore tests pass)
### Quality Metrics
- [ ] Haiku summaries preserve key information
- [ ] Developers can understand compacted issues
- [ ] Restore returns exact original content
- [ ] No corruption in multi-machine workflows
### Operational Metrics
- [ ] Cost: <$1.50 per 1,000 issues
- [ ] Dry-run accuracy: 95%+ estimate correctness
- [ ] Error rate: <1% API failures (with retry)
- [ ] User adoption: Docs clear, examples work
---
## Rollout Plan
### Phase 1: Alpha (Internal Testing)
1. Merge compaction feature to main
2. Test on beads' own database
3. Verify JSONL export/import
4. Validate Haiku summaries
5. Fix any critical bugs
### Phase 2: Beta (Opt-In)
1. Announce in README (opt-in, experimental)
2. Gather feedback from early adopters
3. Iterate on prompt templates
4. Add telemetry (optional, with consent)
### Phase 3: Stable (Default Disabled)
1. Mark feature as stable
2. Keep auto_compact_enabled = false by default
3. Encourage manual `bd compact --dry-run` first
4. Document in quickstart guide
### Phase 4: Mature (Consider Auto-Enable)
1. After 6+ months of stability
2. Consider auto-compaction for new users
3. Provide migration guide for disabling
---
## Risks and Mitigations
| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| Haiku summaries lose critical info | High | Medium | Manual review in dry-run, restore capability, improve prompts |
| API rate limits during batch | Medium | Medium | Exponential backoff, respect rate limits, batch sizing |
| JSONL merge conflicts increase | Medium | Low | Compaction is deterministic per issue, git handles well |
| Users accidentally compress important issues | High | Low | Dry-run required, restore available, snapshots permanent |
| Cost higher than expected | Low | Low | Dry-run shows estimates, configurable batch sizes |
| Schema migration fails | High | Very Low | Idempotent migrations, tested on existing DBs |
---
## Open Questions
1. **Should compaction be reversible forever, or expire snapshots?**
- Recommendation: Keep snapshots indefinitely (disk is cheap)
2. **Should we compress snapshots themselves (gzip)?**
- Recommendation: Not in MVP, add if storage becomes issue
3. **Should tier selection be automatic or manual?**
- Recommendation: Manual in MVP, auto-tier in future
4. **How to handle issues compacted on one machine but not another?**
- Answer: JSONL export includes compaction_level, imports preserve it
5. **Should we support custom models (Sonnet, Opus)?**
- Recommendation: Haiku only in MVP, add later if needed
---
## Appendix: Example Workflow
### Typical Monthly Compaction
```bash
# 1. Check what's eligible
$ bd compact --dry-run
=== Tier 1 Candidates ===
42 issues eligible (closed >30 days, deps closed)
Est. reduction: 127 KB → 25 KB (80%)
Est. cost: $0.03
# 2. Review candidates manually
$ bd list --status closed --json | jq 'map(select(.compaction_level == 0))'
# 3. Compact Tier 1
$ bd compact --all
✓ Compacted 42 issues in 18s ($0.03)
# 4. Check Tier 2 candidates (optional)
$ bd compact --dry-run --tier 2
=== Tier 2 Candidates ===
8 issues eligible (closed >90 days, 100+ commits since)
Est. reduction: 45 KB → 4 KB (91%)
Est. cost: $0.01
# 5. Compact Tier 2
$ bd compact --all --tier 2
✓ Compacted 8 issues in 6s ($0.01)
# 6. Export and commit
$ bd export -o .beads/issues.jsonl
$ git add .beads/issues.jsonl
$ git commit -m "Compact 50 old issues (saved 143 KB)"
$ git push
# 7. View stats
$ bd compact --stats
Total Space Saved: 143 KB (82% reduction)
Database Size: 2.1 MB (down from 2.3 MB)
```
---
## Conclusion
This design provides a comprehensive, safe, and cost-effective way to keep beads databases lightweight while preserving essential context. The two-tier approach balances aggressiveness with safety, and the snapshot system ensures full reversibility.
The use of Claude Haiku for semantic compression is key - it preserves meaning rather than just truncating text. At ~$1 per 1,000 issues, the cost is negligible for the value provided.
Implementation is straightforward with clear phases and well-defined issues. The MVP (Tier 1 only) can be delivered in ~40 hours of work, with Tier 2 and enhancements following incrementally.
This aligns perfectly with beads' philosophy: **pragmatic, agent-focused, and evolutionarily designed**.