Amp-Thread-ID: https://ampcode.com/threads/T-8535178e-f814-43e7-a8a0-4aea93ef3970 Co-authored-by: Amp <amp@ampcode.com>
1655 lines
45 KiB
Markdown
1655 lines
45 KiB
Markdown
# Issue Database Compaction Design
|
|
|
|
**Status:** Design Phase
|
|
**Created:** 2025-10-15
|
|
**Target:** Beads v1.1
|
|
|
|
## Executive Summary
|
|
|
|
Add intelligent database compaction to beads that uses Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context about past work. The design philosophy: **most work is throwaway, and forensic value decays exponentially with time**.
|
|
|
|
### Key Metrics
|
|
- **Space savings:** 70-95% reduction in text volume for old issues
|
|
- **Cost:** ~$1.10 per 1,000 issues compacted (Haiku pricing)
|
|
- **Safety:** Full snapshot system with restore capability
|
|
- **Performance:** Batch processing with parallel workers
|
|
|
|
---
|
|
|
|
## Motivation
|
|
|
|
### The Problem
|
|
|
|
Beads databases grow indefinitely:
|
|
- Issues accumulate detailed `description`, `design`, `notes`, `acceptance_criteria` fields
|
|
- Events table logs every change forever
|
|
- Old closed issues (especially those with all dependents closed) rarely need full detail
|
|
- Agent context windows work better with concise, relevant information
|
|
|
|
### Why This Matters
|
|
|
|
1. **Agent Efficiency:** Smaller databases → faster queries → clearer agent thinking
|
|
2. **Context Management:** Agents benefit from summaries of old work, not verbose details
|
|
3. **Git Performance:** Smaller JSONL exports → faster git operations
|
|
4. **Pragmatic Philosophy:** Beads is agent memory, not a historical archive
|
|
5. **Forensic Decay:** Need for detail decreases exponentially after closure
|
|
|
|
### What We Keep
|
|
|
|
- Issue ID and title (always)
|
|
- Semantic summary of what was done and why
|
|
- Key architectural decisions
|
|
- Closure outcome
|
|
- Full git history in JSONL commits (ultimate backup)
|
|
- Restore capability via snapshots
|
|
|
|
---
|
|
|
|
## Technical Design
|
|
|
|
### Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Compaction Pipeline │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ 1. Candidate Identification │
|
|
│ ↓ │
|
|
│ • Query closed issues meeting time + dependency criteria │
|
|
│ • Check dependency depth (recursive CTE) │
|
|
│ • Calculate size/savings estimates │
|
|
│ │
|
|
│ 2. Snapshot Creation │
|
|
│ ↓ │
|
|
│ • Store original content in issue_snapshots table │
|
|
│ • Calculate content hash for verification │
|
|
│ • Enable restore capability │
|
|
│ │
|
|
│ 3. Haiku Summarization │
|
|
│ ↓ │
|
|
│ • Batch process with worker pool (5 parallel) │
|
|
│ • Different prompts for Tier 1 vs Tier 2 │
|
|
│ • Handle API errors gracefully │
|
|
│ │
|
|
│ 4. Issue Update │
|
|
│ ↓ │
|
|
│ • Replace verbose fields with summary │
|
|
│ • Set compaction_level and compacted_at │
|
|
│ • Record event (EventCompacted) │
|
|
│ • Mark dirty for JSONL export │
|
|
│ │
|
|
│ 5. Optional: Events Pruning │
|
|
│ ↓ │
|
|
│ • Keep only created/closed events for Tier 2 │
|
|
│ • Archive detailed event history to snapshots │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Compaction Tiers
|
|
|
|
#### **Tier 1: Standard Compaction**
|
|
|
|
**Eligibility:**
|
|
- `status = 'closed'`
|
|
- `closed_at >= 30 days ago` (configurable: `compact_tier1_days`)
|
|
- All issues that depend on this one (via `blocks` or `parent-child`) are closed
|
|
- Dependency check depth: 2 levels (configurable: `compact_tier1_dep_levels`)
|
|
|
|
**Process:**
|
|
1. Snapshot original content
|
|
2. Send to Haiku with 300-word summarization prompt
|
|
3. Store summary in `description`
|
|
4. Clear `design`, `notes`, `acceptance_criteria`
|
|
5. Set `compaction_level = 1`
|
|
6. Keep all events
|
|
|
|
**Output Format (Haiku prompt):**
|
|
```
|
|
**Summary:** [2-3 sentences: problem, solution, outcome]
|
|
**Key Decisions:** [bullet points of non-obvious choices]
|
|
**Resolution:** [how it was closed]
|
|
```
|
|
|
|
**Expected Reduction:** 70-85% of original text size
|
|
|
|
#### **Tier 2: Aggressive Compaction**
|
|
|
|
**Eligibility:**
|
|
- Already at `compaction_level = 1`
|
|
- `closed_at >= 90 days ago` (configurable: `compact_tier2_days`)
|
|
- All dependencies (all 4 types) up to 5 levels deep are closed
|
|
- One of:
|
|
- ≥100 git commits since `closed_at` (configurable: `compact_tier2_commits`)
|
|
- ≥500 new issues created since closure
|
|
- Manual override with `--force`
|
|
|
|
**Process:**
|
|
1. Snapshot Tier 1 content
|
|
2. Send to Haiku with 150-word ultra-compression prompt
|
|
3. Store single paragraph in `description`
|
|
4. Clear all other text fields
|
|
5. Set `compaction_level = 2`
|
|
6. Prune events: keep only `created` and `closed`, move rest to snapshot
|
|
|
|
**Output Format (Haiku prompt):**
|
|
```
|
|
Single paragraph (≤150 words):
|
|
- What was built/fixed
|
|
- Why it mattered
|
|
- Lasting architectural impact (if any)
|
|
```
|
|
|
|
**Expected Reduction:** 90-95% of original text size
|
|
|
|
---
|
|
|
|
### Schema Changes
|
|
|
|
#### New Columns on `issues` Table
|
|
|
|
```sql
|
|
ALTER TABLE issues ADD COLUMN compaction_level INTEGER DEFAULT 0;
|
|
ALTER TABLE issues ADD COLUMN compacted_at DATETIME;
|
|
ALTER TABLE issues ADD COLUMN original_size INTEGER; -- bytes before compaction
|
|
```
|
|
|
|
#### New Table: `issue_snapshots`
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS issue_snapshots (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
issue_id TEXT NOT NULL,
|
|
snapshot_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
|
compaction_level INTEGER NOT NULL, -- 1 or 2
|
|
original_size INTEGER NOT NULL,
|
|
compressed_size INTEGER NOT NULL,
|
|
-- JSON blob with original content
|
|
original_content TEXT NOT NULL,
|
|
-- Optional: compressed events for Tier 2
|
|
archived_events TEXT,
|
|
FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
|
|
);
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_snapshots_issue ON issue_snapshots(issue_id);
|
|
CREATE INDEX IF NOT EXISTS idx_snapshots_level ON issue_snapshots(compaction_level);
|
|
```
|
|
|
|
**Snapshot JSON Structure:**
|
|
```json
|
|
{
|
|
"description": "original description text...",
|
|
"design": "original design text...",
|
|
"notes": "original notes text...",
|
|
"acceptance_criteria": "original criteria text...",
|
|
"title": "original title",
|
|
"hash": "sha256:abc123..."
|
|
}
|
|
```
|
|
|
|
#### New Config Keys
|
|
|
|
```sql
|
|
INSERT INTO config (key, value) VALUES
|
|
-- Tier 1 settings
|
|
('compact_tier1_days', '30'),
|
|
('compact_tier1_dep_levels', '2'),
|
|
|
|
-- Tier 2 settings
|
|
('compact_tier2_days', '90'),
|
|
('compact_tier2_dep_levels', '5'),
|
|
('compact_tier2_commits', '100'),
|
|
('compact_tier2_new_issues', '500'),
|
|
|
|
-- API settings
|
|
('anthropic_api_key', ''), -- Falls back to ANTHROPIC_API_KEY env var
|
|
('compact_model', 'claude-3-5-haiku-20241022'),
|
|
|
|
-- Performance settings
|
|
('compact_batch_size', '50'),
|
|
('compact_parallel_workers', '5'),
|
|
|
|
-- Safety settings
|
|
('auto_compact_enabled', 'false'), -- Opt-in
|
|
('compact_events_enabled', 'false'), -- Events pruning (Tier 2)
|
|
|
|
-- Display settings
|
|
('compact_show_savings', 'true'); -- Show size reduction in output
|
|
```
|
|
|
|
#### New Event Type
|
|
|
|
```go
|
|
const (
|
|
// ... existing event types ...
|
|
EventCompacted EventType = "compacted"
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### Haiku Integration
|
|
|
|
#### Prompt Templates
|
|
|
|
**Tier 1 Prompt:**
|
|
```
|
|
Summarize this closed software issue. Preserve key decisions, implementation approach, and outcome. Max 300 words.
|
|
|
|
Title: {{.Title}}
|
|
Type: {{.IssueType}}
|
|
Priority: {{.Priority}}
|
|
|
|
Description:
|
|
{{.Description}}
|
|
|
|
Design Notes:
|
|
{{.Design}}
|
|
|
|
Implementation Notes:
|
|
{{.Notes}}
|
|
|
|
Acceptance Criteria:
|
|
{{.AcceptanceCriteria}}
|
|
|
|
Output format:
|
|
**Summary:** [2-3 sentences: what problem, what solution, what outcome]
|
|
**Key Decisions:** [bullet points of non-obvious choices]
|
|
**Resolution:** [how it was closed]
|
|
```
|
|
|
|
**Tier 2 Prompt:**
|
|
```
|
|
Ultra-compress this old closed issue to ≤150 words. Focus on lasting architectural impact.
|
|
|
|
Title: {{.Title}}
|
|
Original Summary (already compressed):
|
|
{{.Description}}
|
|
|
|
Output a single paragraph covering:
|
|
- What was built/fixed
|
|
- Why it mattered
|
|
- Lasting impact (if any)
|
|
|
|
If there's no lasting impact, just state what was done and that it's resolved.
|
|
```
|
|
|
|
#### API Client Structure
|
|
|
|
```go
|
|
package compact
|
|
|
|
import (
|
|
"github.com/anthropics/anthropic-sdk-go"
|
|
)
|
|
|
|
type HaikuClient struct {
|
|
client *anthropic.Client
|
|
model string
|
|
}
|
|
|
|
func NewHaikuClient(apiKey string) *HaikuClient {
|
|
return &HaikuClient{
|
|
client: anthropic.NewClient(apiKey),
|
|
model: anthropic.ModelClaude_3_5_Haiku_20241022,
|
|
}
|
|
}
|
|
|
|
func (h *HaikuClient) Summarize(ctx context.Context, prompt string, maxTokens int) (string, error)
|
|
```
|
|
|
|
#### Error Handling
|
|
|
|
- **Rate limits:** Exponential backoff with jitter
|
|
- **API errors:** Log and skip issue (don't fail entire batch)
|
|
- **Network failures:** Retry up to 3 times
|
|
- **Invalid responses:** Fall back to truncation with warning
|
|
- **Context length:** Truncate input if needed (rare, but possible)
|
|
|
|
---
|
|
|
|
### Dependency Checking
|
|
|
|
Reuse existing recursive CTE logic from `ready_issues` view, adapted for compaction:
|
|
|
|
```sql
|
|
-- Check if issue and N levels of dependents are all closed
|
|
WITH RECURSIVE dependent_tree AS (
|
|
-- Base case: the candidate issue
|
|
SELECT id, status, 0 as depth
|
|
FROM issues
|
|
WHERE id = ?
|
|
|
|
UNION ALL
|
|
|
|
-- Recursive case: issues that depend on this one
|
|
SELECT i.id, i.status, dt.depth + 1
|
|
FROM dependent_tree dt
|
|
JOIN dependencies d ON d.depends_on_id = dt.id
|
|
JOIN issues i ON i.id = d.issue_id
|
|
WHERE d.type IN ('blocks', 'parent-child') -- Only blocking deps matter
|
|
AND dt.depth < ? -- Max depth parameter
|
|
)
|
|
SELECT CASE
|
|
WHEN COUNT(*) = SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
|
|
THEN 1 ELSE 0 END as all_closed
|
|
FROM dependent_tree;
|
|
```
|
|
|
|
**Performance:** This query is O(N) where N is the number of dependents. With proper indexes, should be <10ms per issue.
|
|
|
|
---
|
|
|
|
### Git Integration (Optional)
|
|
|
|
For "project time" measurement via commit counting:
|
|
|
|
```go
|
|
func getCommitsSince(closedAt time.Time) (int, error) {
|
|
cmd := exec.Command("git", "rev-list", "--count",
|
|
fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)),
|
|
"HEAD")
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return 0, err
|
|
}
|
|
return strconv.Atoi(strings.TrimSpace(string(output)))
|
|
}
|
|
```
|
|
|
|
**Fallback:** If git unavailable or not in a repo, use issue counter delta:
|
|
```sql
|
|
SELECT last_id FROM issue_counters WHERE prefix = ?
|
|
-- Store at close time, compare at compaction time
|
|
```
|
|
|
|
---
|
|
|
|
### CLI Commands
|
|
|
|
#### `bd compact` - Main Command
|
|
|
|
```bash
|
|
bd compact [flags]
|
|
|
|
Flags:
|
|
--dry-run Show what would be compacted without doing it
|
|
--tier int Compaction tier (1 or 2, default: 1)
|
|
--all Process all eligible issues (default: preview only)
|
|
--id string Compact specific issue by ID
|
|
--force Bypass eligibility checks (with --id)
|
|
--batch-size int Issues per batch (default: from config)
|
|
--workers int Parallel workers (default: from config)
|
|
--json JSON output for agents
|
|
|
|
Examples:
|
|
bd compact --dry-run # Preview Tier 1 candidates
|
|
bd compact --dry-run --tier 2 # Preview Tier 2 candidates
|
|
bd compact --all # Compact all Tier 1 candidates
|
|
bd compact --tier 2 --all # Compact all Tier 2 candidates
|
|
bd compact --id bd-42 # Compact specific issue
|
|
bd compact --id bd-42 --force # Force compact even if recent
|
|
```
|
|
|
|
#### `bd compact --restore` - Restore Compacted Issues
|
|
|
|
```bash
|
|
bd compact --restore <issue-id> [flags]
|
|
|
|
Flags:
|
|
--level int Restore to specific snapshot level (default: latest)
|
|
--json JSON output
|
|
|
|
Examples:
|
|
bd compact --restore bd-42 # Restore bd-42 from latest snapshot
|
|
bd compact --restore bd-42 --level 1 # Restore from Tier 1 snapshot
|
|
```
|
|
|
|
#### `bd compact --stats` - Compaction Statistics
|
|
|
|
```bash
|
|
bd compact --stats [flags]
|
|
|
|
Flags:
|
|
--json JSON output
|
|
|
|
Example output:
|
|
=== Compaction Statistics ===
|
|
|
|
Total Issues: 1,247
|
|
Compacted (Tier 1): 342 (27.4%)
|
|
Compacted (Tier 2): 89 (7.1%)
|
|
|
|
Database Size: 2.3 MB
|
|
Estimated Uncompacted Size: 8.7 MB
|
|
Space Savings: 6.4 MB (73.6%)
|
|
|
|
Candidates:
|
|
Tier 1: 47 issues (est. 320 KB → 64 KB)
|
|
Tier 2: 15 issues (est. 180 KB → 18 KB)
|
|
|
|
Estimated Compaction Cost: $0.04 (Haiku)
|
|
```
|
|
|
|
---
|
|
|
|
### Output Examples
|
|
|
|
#### Dry Run Output
|
|
|
|
```
|
|
$ bd compact --dry-run
|
|
|
|
=== Tier 1 Compaction Preview ===
|
|
Eligibility: Closed ≥30 days, 2 levels of dependents closed
|
|
|
|
bd-42: Fix authentication bug (P1, bug)
|
|
Closed: 45 days ago
|
|
Size: 2,341 bytes → ~468 bytes (80% reduction)
|
|
Dependents: bd-43, bd-44 (both closed)
|
|
|
|
bd-57: Add login form (P2, feature)
|
|
Closed: 38 days ago
|
|
Size: 1,823 bytes → ~365 bytes (80% reduction)
|
|
Dependents: (none)
|
|
|
|
bd-58: Refactor auth middleware (P2, task)
|
|
Closed: 35 days ago
|
|
Size: 3,102 bytes → ~620 bytes (80% reduction)
|
|
Dependents: bd-59 (closed)
|
|
|
|
... (44 more issues)
|
|
|
|
Total: 47 issues
|
|
Estimated reduction: 87.3 KB → 17.5 KB (80%)
|
|
Estimated cost: $0.03 (Haiku API)
|
|
|
|
Run with --all to compact these issues.
|
|
```
|
|
|
|
#### Compaction Progress
|
|
|
|
```
|
|
$ bd compact --all
|
|
|
|
Compacting 47 issues (Tier 1)...
|
|
|
|
Creating snapshots... [████████████████████] 47/47
|
|
Calling Haiku API... [████████████████████] 47/47 (12s, $0.027)
|
|
Updating issues... [████████████████████] 47/47
|
|
|
|
✓ Successfully compacted 47 issues
|
|
Size reduction: 87.3 KB → 18.2 KB (79.1%)
|
|
API cost: $0.027
|
|
Time: 14.3s
|
|
|
|
Compacted issues will be exported to .beads/issues.jsonl
|
|
```
|
|
|
|
#### Show Compacted Issue
|
|
|
|
```
|
|
$ bd show bd-42
|
|
|
|
bd-42: Fix authentication bug [CLOSED] 🗜️
|
|
|
|
Status: closed (compacted L1)
|
|
Priority: 1 (High)
|
|
Type: bug
|
|
Closed: 2025-08-31 (45 days ago)
|
|
Compacted: 2025-10-15 (saved 1,873 bytes)
|
|
|
|
**Summary:** Fixed race condition in JWT token refresh logic causing intermittent
|
|
401 errors under high load. Implemented mutex-based locking around token refresh
|
|
operations. All users can now stay authenticated reliably during concurrent requests.
|
|
|
|
**Key Decisions:**
|
|
- Used sync.RWMutex instead of channels for simpler reasoning about lock state
|
|
- Added exponential backoff to token refresh to prevent thundering herd
|
|
- Preserved existing token format for backward compatibility with mobile clients
|
|
|
|
**Resolution:** Deployed to production on Aug 31, monitored for 2 weeks with zero
|
|
401 errors. Closed after confirming fix with load testing.
|
|
|
|
---
|
|
💾 Restore: bd compact --restore bd-42
|
|
📊 Original size: 2,341 bytes | Compressed: 468 bytes (80% reduction)
|
|
```
|
|
|
|
---
|
|
|
|
### Safety Mechanisms
|
|
|
|
1. **Snapshot-First:** Always create snapshot before modifying issue
|
|
2. **Restore Capability:** Full restore from snapshots with `--restore`
|
|
3. **Opt-In Auto-Compaction:** Disabled by default (`auto_compact_enabled = false`)
|
|
4. **Dry-Run Required:** Preview before committing with `--dry-run`
|
|
5. **Git Backup:** JSONL exports preserve full history in git commits
|
|
6. **Audit Trail:** `EventCompacted` records what was done and when
|
|
7. **Size Verification:** Track original_size and compressed_size for validation
|
|
8. **Idempotent:** Re-running compaction on already-compacted issues is safe (no-op)
|
|
9. **Graceful Degradation:** API failures don't corrupt data, just skip issues
|
|
10. **Reversible:** Restore is always available, even after git push
|
|
|
|
---
|
|
|
|
### Testing Strategy
|
|
|
|
#### Unit Tests
|
|
|
|
1. **Candidate Identification:**
|
|
- Issues meeting time criteria
|
|
- Dependency depth checking
|
|
- Mixed status dependents (some closed, some open)
|
|
- Edge case: circular dependencies
|
|
|
|
2. **Haiku Client:**
|
|
- Mock API responses
|
|
- Rate limit handling
|
|
- Error recovery
|
|
- Prompt rendering
|
|
|
|
3. **Snapshot Management:**
|
|
- Create snapshot
|
|
- Restore from snapshot
|
|
- Multiple snapshots per issue (Tier 1 → Tier 2)
|
|
- Snapshot integrity (hash verification)
|
|
|
|
4. **Size Calculation:**
|
|
- Accurate byte counting
|
|
- UTF-8 handling
|
|
- Empty fields
|
|
|
|
#### Integration Tests
|
|
|
|
1. **End-to-End Compaction:**
|
|
- Create test issues
|
|
- Age them (mock timestamps)
|
|
- Run compaction
|
|
- Verify summaries
|
|
- Restore and verify
|
|
|
|
2. **Batch Processing:**
|
|
- Large batches (100+ issues)
|
|
- Parallel worker coordination
|
|
- Error handling mid-batch
|
|
|
|
3. **JSONL Export:**
|
|
- Compacted issues export correctly
|
|
- Import preserves compaction_level
|
|
- Round-trip fidelity
|
|
|
|
4. **CLI Commands:**
|
|
- All flag combinations
|
|
- JSON output parsing
|
|
- Error messages
|
|
|
|
#### Manual Testing Checklist
|
|
|
|
- [ ] Dry-run shows accurate candidates
|
|
- [ ] Compaction reduces size as expected
|
|
- [ ] Haiku summaries are high quality
|
|
- [ ] Restore returns exact original content
|
|
- [ ] Stats command shows correct numbers
|
|
- [ ] Auto-compaction respects config
|
|
- [ ] Git workflow (commit → pull → auto-compact)
|
|
- [ ] Multi-machine workflow with compaction
|
|
- [ ] API key handling (env var vs config)
|
|
- [ ] Rate limit handling under load
|
|
|
|
---
|
|
|
|
### Performance Considerations
|
|
|
|
#### Scalability
|
|
|
|
**Small databases (<1,000 issues):**
|
|
- Full scan acceptable
|
|
- Compact all eligible in one run
|
|
- <1 minute total time
|
|
|
|
**Medium databases (1,000-10,000 issues):**
|
|
- Batch processing required
|
|
- Progress reporting essential
|
|
- 5-10 minutes total time
|
|
|
|
**Large databases (>10,000 issues):**
|
|
- Incremental compaction (process N per run)
|
|
- Consider scheduled background job
|
|
- 30-60 minutes total time
|
|
|
|
#### Optimization Strategies
|
|
|
|
1. **Index Usage:**
|
|
- `idx_issues_status` - filter closed issues
|
|
- `idx_dependencies_depends_on` - dependency traversal
|
|
- `idx_snapshots_issue` - restore lookups
|
|
|
|
2. **Batch Sizing:**
|
|
- Default 50 issues per batch
|
|
- Configurable via `compact_batch_size`
|
|
- Trade-off: larger batches = fewer commits, more RAM
|
|
|
|
3. **Parallel Workers:**
|
|
- Default 5 parallel Haiku calls
|
|
- Configurable via `compact_parallel_workers`
|
|
- Respects Haiku rate limits
|
|
|
|
4. **Query Optimization:**
|
|
- Use prepared statements for snapshots
|
|
- Reuse dependency check query
|
|
- Avoid N+1 queries in batch operations
|
|
|
|
---
|
|
|
|
### Cost Analysis
|
|
|
|
#### Haiku Pricing (as of 2025-10-15)
|
|
|
|
- Input: $0.25 per million tokens (~$0.0003 per 1K tokens)
|
|
- Output: $1.25 per million tokens (~$0.0013 per 1K tokens)
|
|
|
|
#### Per-Issue Estimates
|
|
|
|
**Tier 1:**
|
|
- Input: ~1,000 tokens (full issue content)
|
|
- Output: ~400 tokens (summary)
|
|
- Cost: ~$0.0008 per issue
|
|
|
|
**Tier 2:**
|
|
- Input: ~500 tokens (Tier 1 summary)
|
|
- Output: ~200 tokens (ultra-compressed)
|
|
- Cost: ~$0.0003 per issue
|
|
|
|
#### Batch Costs
|
|
|
|
| Issues | Tier 1 Cost | Tier 2 Cost | Total |
|
|
|--------|-------------|-------------|-------|
|
|
| 100 | $0.08 | $0.03 | $0.11 |
|
|
| 500 | $0.40 | $0.15 | $0.55 |
|
|
| 1,000 | $0.80 | $0.30 | $1.10 |
|
|
| 5,000 | $4.00 | $1.50 | $5.50 |
|
|
|
|
**Monthly budget (typical project):**
|
|
- ~50-100 new issues closed per month
|
|
- ~30-60 days later, eligible for Tier 1
|
|
- Monthly cost: $0.04 - $0.08 (negligible)
|
|
|
|
---
|
|
|
|
### Configuration Examples
|
|
|
|
#### Conservative Setup (manual only)
|
|
|
|
```bash
|
|
bd init
|
|
bd config set compact_tier1_days 60
|
|
bd config set compact_tier2_days 180
|
|
bd config set auto_compact_enabled false
|
|
|
|
# Run manually when needed
|
|
bd compact --dry-run
|
|
bd compact --all # after review
|
|
```
|
|
|
|
#### Aggressive Setup (auto-compact)
|
|
|
|
```bash
|
|
bd config set compact_tier1_days 14
|
|
bd config set compact_tier2_days 45
|
|
bd config set auto_compact_enabled true
|
|
bd config set compact_batch_size 100
|
|
|
|
# Auto-compacts on bd stats, bd export
|
|
bd stats # triggers compaction if candidates exist
|
|
```
|
|
|
|
#### Development Setup (fast feedback)
|
|
|
|
```bash
|
|
bd config set compact_tier1_days 1
|
|
bd config set compact_tier2_days 3
|
|
bd config set compact_tier1_dep_levels 1
|
|
bd config set compact_tier2_dep_levels 2
|
|
|
|
# Test compaction on recently closed issues
|
|
```
|
|
|
|
---
|
|
|
|
### Future Enhancements
|
|
|
|
#### Phase 2 (Post-MVP)
|
|
|
|
1. **Local Model Support:**
|
|
- Use Ollama for zero-cost summarization
|
|
- Fallback chain: Haiku → Ollama → truncation
|
|
|
|
2. **Custom Prompts:**
|
|
- User-defined summarization prompts
|
|
- Per-project templates
|
|
- Domain-specific summaries (e.g., "focus on API changes")
|
|
|
|
3. **Selective Preservation:**
|
|
- Mark issues as "do not compact"
|
|
- Preserve certain labels (e.g., `architecture`, `security`)
|
|
- Field-level preservation (e.g., keep design notes, compress others)
|
|
|
|
4. **Analytics:**
|
|
- Compaction effectiveness over time
|
|
- Cost tracking per run
|
|
- Quality feedback (user ratings of summaries)
|
|
|
|
5. **Smart Scheduling:**
|
|
- Auto-detect optimal compaction times
|
|
- Avoid compaction during active development
|
|
- Weekend/off-hours processing
|
|
|
|
6. **Multi-Tier Expansion:**
|
|
- Tier 3: Archive to separate file
|
|
- Tier 4: Delete (with backup)
|
|
- Configurable tier chain
|
|
|
|
#### Phase 3 (Advanced)
|
|
|
|
1. **Distributed Compaction:**
|
|
- Coordinate across multiple machines
|
|
- Avoid duplicate work in team settings
|
|
- Lock mechanism for compaction jobs
|
|
|
|
2. **Incremental Summarization:**
|
|
- Re-summarize if issue reopened
|
|
- Preserve history of summaries
|
|
- Version tracking for prompts
|
|
|
|
3. **Search Integration:**
|
|
- Full-text search includes summaries
|
|
- Boost compacted issues in search results
|
|
- Semantic search using embeddings
|
|
|
|
---
|
|
|
|
## Implementation Issues
|
|
|
|
The following issues should be created in beads to track this work. They're designed to be implemented in dependency order.
|
|
|
|
---
|
|
|
|
### Epic: Issue Database Compaction
|
|
|
|
**Issue ID:** (auto-generated)
|
|
**Title:** Epic: Add intelligent database compaction with Claude Haiku
|
|
**Type:** epic
|
|
**Priority:** 2
|
|
**Description:**
|
|
|
|
Implement multi-tier database compaction using Claude Haiku to semantically compress old, closed issues. This keeps the database lightweight and agent-friendly while preserving essential context.
|
|
|
|
**Goals:**
|
|
- 70-95% space reduction for eligible issues
|
|
- Full restore capability via snapshots
|
|
- Opt-in with dry-run safety
|
|
- ~$1 per 1,000 issues compacted
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Schema migration with snapshots table
|
|
- [ ] Haiku integration for summarization
|
|
- [ ] Two-tier compaction (30d, 90d)
|
|
- [ ] CLI with dry-run, restore, stats
|
|
- [ ] Full test coverage
|
|
- [ ] Documentation complete
|
|
|
|
**Dependencies:** None (this is the epic)
|
|
|
|
---
|
|
|
|
### Issue 1: Add compaction schema and migrations
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Blocks all other compaction work
|
|
|
|
**Description:**
|
|
|
|
Add database schema support for issue compaction tracking and snapshot storage.
|
|
|
|
**Design:**
|
|
|
|
Add three columns to `issues` table:
|
|
- `compaction_level INTEGER DEFAULT 0` - 0=original, 1=tier1, 2=tier2
|
|
- `compacted_at DATETIME` - when last compacted
|
|
- `original_size INTEGER` - bytes before first compaction
|
|
|
|
Create `issue_snapshots` table:
|
|
```sql
|
|
CREATE TABLE issue_snapshots (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
issue_id TEXT NOT NULL,
|
|
snapshot_time DATETIME NOT NULL,
|
|
compaction_level INTEGER NOT NULL,
|
|
original_size INTEGER NOT NULL,
|
|
compressed_size INTEGER NOT NULL,
|
|
original_content TEXT NOT NULL, -- JSON blob
|
|
archived_events TEXT,
|
|
FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
|
|
);
|
|
```
|
|
|
|
Add indexes:
|
|
- `idx_snapshots_issue` on `issue_id`
|
|
- `idx_snapshots_level` on `compaction_level`
|
|
|
|
Add migration functions in `internal/storage/sqlite/sqlite.go`:
|
|
- `migrateCompactionColumns(db *sql.DB) error`
|
|
- `migrateSnapshotsTable(db *sql.DB) error`
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Existing databases migrate automatically
|
|
- [ ] New databases include columns by default
|
|
- [ ] Migration is idempotent (safe to run multiple times)
|
|
- [ ] No data loss during migration
|
|
- [ ] Tests verify migration on fresh and existing DBs
|
|
|
|
**Estimated Time:** 4 hours
|
|
|
|
---
|
|
|
|
### Issue 2: Add compaction configuration keys
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Blocks compaction logic
|
|
|
|
**Description:**
|
|
|
|
Add configuration keys for compaction behavior with sensible defaults.
|
|
|
|
**Implementation:**
|
|
|
|
Add to `internal/storage/sqlite/schema.go` initial config:
|
|
```sql
|
|
INSERT OR IGNORE INTO config (key, value) VALUES
|
|
('compact_tier1_days', '30'),
|
|
('compact_tier1_dep_levels', '2'),
|
|
('compact_tier2_days', '90'),
|
|
('compact_tier2_dep_levels', '5'),
|
|
('compact_tier2_commits', '100'),
|
|
('compact_model', 'claude-3-5-haiku-20241022'),
|
|
('compact_batch_size', '50'),
|
|
('compact_parallel_workers', '5'),
|
|
('auto_compact_enabled', 'false');
|
|
```
|
|
|
|
Add helper functions in `internal/storage/` or `cmd/bd/`:
|
|
```go
|
|
func getCompactConfig(ctx context.Context, store Storage) (*CompactConfig, error)
|
|
```
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Config keys created on init
|
|
- [ ] Existing DBs get defaults on migration
|
|
- [ ] `bd config get/set` works with all keys
|
|
- [ ] Type validation (days=int, enabled=bool)
|
|
- [ ] Documentation in README.md
|
|
|
|
**Estimated Time:** 2 hours
|
|
|
|
---
|
|
|
|
### Issue 3: Implement candidate identification queries
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Needs schema and config
|
|
|
|
**Description:**
|
|
|
|
Write SQL queries to identify issues eligible for Tier 1 and Tier 2 compaction.
|
|
|
|
**Design:**
|
|
|
|
Create `internal/storage/sqlite/compact.go` with:
|
|
|
|
```go
|
|
type CompactionCandidate struct {
|
|
IssueID string
|
|
ClosedAt time.Time
|
|
OriginalSize int
|
|
EstimatedSize int
|
|
DependentCount int
|
|
}
|
|
|
|
func (s *SQLiteStorage) GetTier1Candidates(ctx context.Context) ([]*CompactionCandidate, error)
|
|
func (s *SQLiteStorage) GetTier2Candidates(ctx context.Context) ([]*CompactionCandidate, error)
|
|
func (s *SQLiteStorage) CheckEligibility(ctx context.Context, issueID string, tier int) (bool, string, error)
|
|
```
|
|
|
|
Implement recursive dependency checking using CTE similar to `ready_issues` view.
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Tier 1 query filters by days and dependency depth
|
|
- [ ] Tier 2 query includes commit/issue count checks
|
|
- [ ] Dependency checking handles circular deps gracefully
|
|
- [ ] Performance: <100ms for 10,000 issue database
|
|
- [ ] Tests cover edge cases (no deps, circular deps, mixed status)
|
|
|
|
**Estimated Time:** 6 hours
|
|
|
|
---
|
|
|
|
### Issue 4: Create Haiku client and prompt templates
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** None (can work in parallel)
|
|
|
|
**Description:**
|
|
|
|
Implement Claude Haiku API client with template-based prompts for Tier 1 and Tier 2 summarization.
|
|
|
|
**Implementation:**
|
|
|
|
Create `internal/compact/haiku.go`:
|
|
|
|
```go
|
|
type HaikuClient struct {
|
|
client *anthropic.Client
|
|
model string
|
|
}
|
|
|
|
func NewHaikuClient(apiKey string) (*HaikuClient, error)
|
|
func (h *HaikuClient) SummarizeTier1(ctx context.Context, issue *types.Issue) (string, error)
|
|
func (h *HaikuClient) SummarizeTier2(ctx context.Context, issue *types.Issue) (string, error)
|
|
```
|
|
|
|
Use text/template for prompt rendering.
|
|
|
|
Add error handling:
|
|
- Rate limit retry with exponential backoff
|
|
- Network errors: 3 retries
|
|
- Invalid responses: return error, don't corrupt data
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] API key from env var or config (env takes precedence)
|
|
- [ ] Prompts render correctly with template
|
|
- [ ] Rate limiting handled gracefully
|
|
- [ ] Mock tests for API calls
|
|
- [ ] Real integration test (optional, requires API key)
|
|
|
|
**Estimated Time:** 6 hours
|
|
|
|
---
|
|
|
|
### Issue 5: Implement snapshot creation and restoration
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Needs schema changes
|
|
|
|
**Description:**
|
|
|
|
Implement snapshot creation before compaction and restoration capability.
|
|
|
|
**Implementation:**
|
|
|
|
Add to `internal/storage/sqlite/compact.go`:
|
|
|
|
```go
|
|
type Snapshot struct {
|
|
ID int64
|
|
IssueID string
|
|
SnapshotTime time.Time
|
|
CompactionLevel int
|
|
OriginalSize int
|
|
CompressedSize int
|
|
OriginalContent string // JSON
|
|
ArchivedEvents string // JSON, nullable
|
|
}
|
|
|
|
func (s *SQLiteStorage) CreateSnapshot(ctx context.Context, issue *types.Issue, level int) error
|
|
func (s *SQLiteStorage) RestoreFromSnapshot(ctx context.Context, issueID string, level int) error
|
|
func (s *SQLiteStorage) GetSnapshots(ctx context.Context, issueID string) ([]*Snapshot, error)
|
|
```
|
|
|
|
Snapshot JSON structure:
|
|
```json
|
|
{
|
|
"description": "...",
|
|
"design": "...",
|
|
"notes": "...",
|
|
"acceptance_criteria": "...",
|
|
"title": "..."
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Snapshot created atomically with compaction
|
|
- [ ] Restore returns exact original content
|
|
- [ ] Multiple snapshots per issue supported (Tier 1 → Tier 2)
|
|
- [ ] JSON encoding handles special characters
|
|
- [ ] Size calculation is accurate (UTF-8 bytes)
|
|
|
|
**Estimated Time:** 5 hours
|
|
|
|
---
|
|
|
|
### Issue 6: Implement Tier 1 compaction logic
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Needs Haiku client, snapshots, candidate queries
|
|
|
|
**Description:**
|
|
|
|
Implement the core Tier 1 compaction process: snapshot → summarize → update.
|
|
|
|
**Implementation:**
|
|
|
|
Add to `internal/compact/compactor.go`:
|
|
|
|
```go
|
|
type Compactor struct {
|
|
store storage.Storage
|
|
haiku *HaikuClient
|
|
config *CompactConfig
|
|
}
|
|
|
|
func New(store storage.Storage, apiKey string, config *CompactConfig) (*Compactor, error)
|
|
func (c *Compactor) CompactTier1(ctx context.Context, issueID string) error
|
|
func (c *Compactor) CompactTier1Batch(ctx context.Context, issueIDs []string) error
|
|
```
|
|
|
|
Process:
|
|
1. Verify eligibility
|
|
2. Calculate original size
|
|
3. Create snapshot
|
|
4. Call Haiku for summary
|
|
5. Update issue (description = summary, clear design/notes/criteria)
|
|
6. Set compaction_level = 1, compacted_at = now
|
|
7. Record EventCompacted
|
|
8. Mark dirty for export
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Single issue compaction works end-to-end
|
|
- [ ] Batch processing with parallel workers
|
|
- [ ] Errors don't corrupt database (transaction rollback)
|
|
- [ ] EventCompacted includes size savings
|
|
- [ ] Dry-run mode (identify + size estimate only)
|
|
|
|
**Estimated Time:** 8 hours
|
|
|
|
---
|
|
|
|
### Issue 7: Implement Tier 2 compaction logic
|
|
|
|
**Type:** task
|
|
**Priority:** 2
|
|
**Dependencies:** Needs Tier 1 working
|
|
|
|
**Description:**
|
|
|
|
Implement Tier 2 ultra-compression: more aggressive summarization and optional event pruning.
|
|
|
|
**Implementation:**
|
|
|
|
Add to `internal/compact/compactor.go`:
|
|
|
|
```go
|
|
func (c *Compactor) CompactTier2(ctx context.Context, issueID string) error
|
|
func (c *Compactor) CompactTier2Batch(ctx context.Context, issueIDs []string) error
|
|
```
|
|
|
|
Process:
|
|
1. Verify issue is at compaction_level = 1
|
|
2. Check Tier 2 eligibility (days, deps, commits/issues)
|
|
3. Create Tier 2 snapshot
|
|
4. Call Haiku with ultra-compression prompt
|
|
5. Update issue (description = single paragraph, clear all else)
|
|
6. Set compaction_level = 2
|
|
7. Optionally prune events (keep created/closed, archive rest)
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Requires existing Tier 1 compaction
|
|
- [ ] Git commit counting works (with fallback)
|
|
- [ ] Events optionally pruned (config: compact_events_enabled)
|
|
- [ ] Archived events stored in snapshot
|
|
- [ ] Size reduction 90-95%
|
|
|
|
**Estimated Time:** 6 hours
|
|
|
|
---
|
|
|
|
### Issue 8: Add `bd compact` CLI command
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Needs Tier 1 compaction logic
|
|
|
|
**Description:**
|
|
|
|
Implement the `bd compact` command with dry-run, batch processing, and progress reporting.
|
|
|
|
**Implementation:**
|
|
|
|
Create `cmd/bd/compact.go`:
|
|
|
|
```go
|
|
var compactCmd = &cobra.Command{
|
|
Use: "compact",
|
|
Short: "Compact old closed issues to save space",
|
|
Long: `...`,
|
|
}
|
|
|
|
var (
|
|
compactDryRun bool
|
|
compactTier int
|
|
compactAll bool
|
|
compactID string
|
|
compactForce bool
|
|
compactBatchSize int
|
|
compactWorkers int
|
|
)
|
|
|
|
func init() {
|
|
compactCmd.Flags().BoolVar(&compactDryRun, "dry-run", false, "Preview without compacting")
|
|
compactCmd.Flags().IntVar(&compactTier, "tier", 1, "Compaction tier (1 or 2)")
|
|
compactCmd.Flags().BoolVar(&compactAll, "all", false, "Process all candidates")
|
|
compactCmd.Flags().StringVar(&compactID, "id", "", "Compact specific issue")
|
|
compactCmd.Flags().BoolVar(&compactForce, "force", false, "Force compact (bypass checks)")
|
|
// ... more flags
|
|
}
|
|
```
|
|
|
|
**Output:**
|
|
- Dry-run: Table of candidates with size estimates
|
|
- Actual run: Progress bar with batch updates
|
|
- Summary: Count, size saved, cost, time
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] `--dry-run` shows accurate preview
|
|
- [ ] `--all` processes all candidates
|
|
- [ ] `--id` compacts single issue
|
|
- [ ] `--force` bypasses eligibility checks (with --id)
|
|
- [ ] Progress bar for batches
|
|
- [ ] JSON output with `--json`
|
|
- [ ] Exit code: 0=success, 1=error
|
|
|
|
**Estimated Time:** 6 hours
|
|
|
|
---
|
|
|
|
### Issue 9: Add `bd compact --restore` functionality
|
|
|
|
**Type:** task
|
|
**Priority:** 2
|
|
**Dependencies:** Needs snapshots and CLI
|
|
|
|
**Description:**
|
|
|
|
Implement restore command to undo compaction from snapshots.
|
|
|
|
**Implementation:**
|
|
|
|
Add to `cmd/bd/compact.go`:
|
|
|
|
```go
|
|
var compactRestore string
|
|
|
|
compactCmd.Flags().StringVar(&compactRestore, "restore", "", "Restore issue from snapshot")
|
|
```
|
|
|
|
Process:
|
|
1. Load snapshot for issue
|
|
2. Parse JSON content
|
|
3. Update issue with original content
|
|
4. Set compaction_level = 0, compacted_at = NULL
|
|
5. Record EventRestored
|
|
6. Mark dirty
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Restores exact original content
|
|
- [ ] Handles multiple snapshots (prompt user or use latest)
|
|
- [ ] `--level` flag to choose snapshot
|
|
- [ ] Updates compaction_level correctly
|
|
- [ ] Exports restored content to JSONL
|
|
|
|
**Estimated Time:** 4 hours
|
|
|
|
---
|
|
|
|
### Issue 10: Add `bd compact --stats` command
|
|
|
|
**Type:** task
|
|
**Priority:** 2
|
|
**Dependencies:** Needs compaction working
|
|
|
|
**Description:**
|
|
|
|
Add statistics command showing compaction status and potential savings.
|
|
|
|
**Implementation:**
|
|
|
|
```go
|
|
var compactStats bool
|
|
|
|
compactCmd.Flags().BoolVar(&compactStats, "stats", false, "Show compaction statistics")
|
|
```
|
|
|
|
Output:
|
|
- Total issues, by compaction level
|
|
- Current DB size vs estimated uncompacted size
|
|
- Space savings (MB and %)
|
|
- Candidates for each tier with estimates
|
|
- Estimated API cost
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Accurate counts by compaction_level
|
|
- [ ] Size calculations include all text fields
|
|
- [ ] Shows candidates with eligibility reasons
|
|
- [ ] Cost estimation based on Haiku pricing
|
|
- [ ] JSON output supported
|
|
|
|
**Estimated Time:** 4 hours
|
|
|
|
---
|
|
|
|
### Issue 11: Add EventCompacted to event system
|
|
|
|
**Type:** task
|
|
**Priority:** 2
|
|
**Dependencies:** Needs schema changes
|
|
|
|
**Description:**
|
|
|
|
Add new event type for tracking compaction in audit trail.
|
|
|
|
**Implementation:**
|
|
|
|
1. Add to `internal/types/types.go`:
|
|
```go
|
|
const EventCompacted EventType = "compacted"
|
|
```
|
|
|
|
2. Record event during compaction:
|
|
```go
|
|
eventData := map[string]interface{}{
|
|
"tier": tier,
|
|
"original_size": originalSize,
|
|
"compressed_size": compressedSize,
|
|
"reduction_pct": (1 - float64(compressedSize)/float64(originalSize)) * 100,
|
|
}
|
|
```
|
|
|
|
3. Show in `bd show` output:
|
|
```
|
|
Events:
|
|
2025-10-15: compacted (tier 1, saved 1.8KB, 80%)
|
|
2025-08-31: closed by alice
|
|
2025-08-20: created by alice
|
|
```
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Event includes tier and size info
|
|
- [ ] Shows in event history
|
|
- [ ] Exports to JSONL
|
|
- [ ] `bd show` displays compaction marker
|
|
|
|
**Estimated Time:** 3 hours
|
|
|
|
---
|
|
|
|
### Issue 12: Add compaction indicator to `bd show`
|
|
|
|
**Type:** task
|
|
**Priority:** 2
|
|
**Dependencies:** Needs compaction working
|
|
|
|
**Description:**
|
|
|
|
Update `bd show` command to display compaction status prominently.
|
|
|
|
**Implementation:**
|
|
|
|
Add to issue display:
|
|
```
|
|
bd-42: Fix authentication bug [CLOSED] 🗜️
|
|
|
|
Status: closed (compacted L1)
|
|
...
|
|
|
|
---
|
|
💾 Restore: bd compact --restore bd-42
|
|
📊 Original: 2,341 bytes | Compressed: 468 bytes (80% reduction)
|
|
🗜️ Compacted: 2025-10-15 (Tier 1)
|
|
```
|
|
|
|
Show different emoji for tiers:
|
|
- Tier 1: 🗜️
|
|
- Tier 2: 📦
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Compaction status visible in title
|
|
- [ ] Footer shows size savings
|
|
- [ ] Restore command shown
|
|
- [ ] Works with `--json` output
|
|
|
|
**Estimated Time:** 2 hours
|
|
|
|
---
|
|
|
|
### Issue 13: Write compaction tests
|
|
|
|
**Type:** task
|
|
**Priority:** 1
|
|
**Dependencies:** Needs all compaction logic
|
|
|
|
**Description:**
|
|
|
|
Comprehensive test suite for compaction functionality.
|
|
|
|
**Test Coverage:**
|
|
|
|
1. **Candidate Identification:**
|
|
- Eligibility by time
|
|
- Dependency depth checking
|
|
- Mixed status dependents
|
|
- Edge cases (no deps, circular)
|
|
|
|
2. **Snapshots:**
|
|
- Create and restore
|
|
- Multiple snapshots per issue
|
|
- Content integrity
|
|
|
|
3. **Tier 1 Compaction:**
|
|
- Single issue
|
|
- Batch processing
|
|
- Error handling
|
|
|
|
4. **Tier 2 Compaction:**
|
|
- Requires Tier 1
|
|
- Events pruning
|
|
- Commit counting fallback
|
|
|
|
5. **CLI:**
|
|
- All flag combinations
|
|
- Dry-run accuracy
|
|
- JSON output
|
|
|
|
6. **Integration:**
|
|
- End-to-end flow
|
|
- JSONL export/import
|
|
- Restore verification
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Test coverage >80%
|
|
- [ ] All edge cases covered
|
|
- [ ] Mock Haiku API in tests
|
|
- [ ] Integration tests pass
|
|
- [ ] `go test ./...` passes
|
|
|
|
**Estimated Time:** 8 hours
|
|
|
|
---
|
|
|
|
### Issue 14: Add compaction documentation
|
|
|
|
**Type:** task
|
|
**Priority:** 2
|
|
**Dependencies:** All features complete
|
|
|
|
**Description:**
|
|
|
|
Document compaction feature in README and create COMPACTION.md guide.
|
|
|
|
**Content:**
|
|
|
|
Update README.md:
|
|
- Add to Features section
|
|
- CLI examples
|
|
- Configuration guide
|
|
|
|
Create COMPACTION.md:
|
|
- How compaction works
|
|
- When to use each tier
|
|
- Cost analysis
|
|
- Safety mechanisms
|
|
- Troubleshooting
|
|
|
|
Create examples/compaction/:
|
|
- Example workflow
|
|
- Cron job setup
|
|
- Auto-compaction script
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] README.md updated
|
|
- [ ] COMPACTION.md comprehensive
|
|
- [ ] Examples work as documented
|
|
- [ ] Screenshots/examples included
|
|
- [ ] API key setup documented
|
|
|
|
**Estimated Time:** 4 hours
|
|
|
|
---
|
|
|
|
### Issue 15: Optional: Implement auto-compaction
|
|
|
|
**Type:** task
|
|
**Priority:** 3 (nice-to-have)
|
|
**Dependencies:** Needs all compaction working
|
|
|
|
**Description:**
|
|
|
|
Implement automatic compaction triggered by certain operations when enabled via config.
|
|
|
|
**Implementation:**
|
|
|
|
Trigger points (when `auto_compact_enabled = true`):
|
|
1. `bd stats` - check and compact if candidates exist
|
|
2. `bd export` - before exporting
|
|
3. Background timer (optional, via daemon)
|
|
|
|
Add:
|
|
```go
|
|
func (s *SQLiteStorage) AutoCompact(ctx context.Context) error {
|
|
enabled, _ := s.GetConfig(ctx, "auto_compact_enabled")
|
|
if enabled != "true" {
|
|
return nil
|
|
}
|
|
|
|
// Run Tier 1 compaction on all candidates
|
|
// Limit to batch_size to avoid long operations
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Respects auto_compact_enabled config
|
|
- [ ] Limits batch size to avoid blocking
|
|
- [ ] Logs compaction activity
|
|
- [ ] Can be disabled per-command with `--no-auto-compact`
|
|
|
|
**Estimated Time:** 4 hours
|
|
|
|
---
|
|
|
|
### Issue 16: Optional: Add git commit counting
|
|
|
|
**Type:** task
|
|
**Priority:** 3 (nice-to-have)
|
|
**Dependencies:** Needs Tier 2 logic
|
|
|
|
**Description:**
|
|
|
|
Implement git commit counting for "project time" measurement as alternative to calendar time.
|
|
|
|
**Implementation:**
|
|
|
|
```go
|
|
func getCommitsSince(closedAt time.Time) (int, error) {
|
|
cmd := exec.Command("git", "rev-list", "--count",
|
|
fmt.Sprintf("--since=%s", closedAt.Format(time.RFC3339)), "HEAD")
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return 0, err // Not in git repo or git not available
|
|
}
|
|
return strconv.Atoi(strings.TrimSpace(string(output)))
|
|
}
|
|
```
|
|
|
|
Fallback to issue counter delta if git unavailable.
|
|
|
|
**Acceptance Criteria:**
|
|
- [ ] Counts commits since closed_at
|
|
- [ ] Handles git not available gracefully
|
|
- [ ] Fallback to issue counter works
|
|
- [ ] Configurable via compact_tier2_commits
|
|
- [ ] Tested with real git repo
|
|
|
|
**Estimated Time:** 3 hours
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
### Technical Metrics
|
|
|
|
- [ ] 70-85% size reduction for Tier 1
|
|
- [ ] 90-95% size reduction for Tier 2
|
|
- [ ] <100ms candidate identification query
|
|
- [ ] <2s per issue compaction (Haiku latency)
|
|
- [ ] Zero data loss (all restore tests pass)
|
|
|
|
### Quality Metrics
|
|
|
|
- [ ] Haiku summaries preserve key information
|
|
- [ ] Developers can understand compacted issues
|
|
- [ ] Restore returns exact original content
|
|
- [ ] No corruption in multi-machine workflows
|
|
|
|
### Operational Metrics
|
|
|
|
- [ ] Cost: <$1.50 per 1,000 issues
|
|
- [ ] Dry-run accuracy: 95%+ estimate correctness
|
|
- [ ] Error rate: <1% API failures (with retry)
|
|
- [ ] User adoption: Docs clear, examples work
|
|
|
|
---
|
|
|
|
## Rollout Plan
|
|
|
|
### Phase 1: Alpha (Internal Testing)
|
|
|
|
1. Merge compaction feature to main
|
|
2. Test on beads' own database
|
|
3. Verify JSONL export/import
|
|
4. Validate Haiku summaries
|
|
5. Fix any critical bugs
|
|
|
|
### Phase 2: Beta (Opt-In)
|
|
|
|
1. Announce in README (opt-in, experimental)
|
|
2. Gather feedback from early adopters
|
|
3. Iterate on prompt templates
|
|
4. Add telemetry (optional, with consent)
|
|
|
|
### Phase 3: Stable (Default Disabled)
|
|
|
|
1. Mark feature as stable
|
|
2. Keep auto_compact_enabled = false by default
|
|
3. Encourage manual `bd compact --dry-run` first
|
|
4. Document in quickstart guide
|
|
|
|
### Phase 4: Mature (Consider Auto-Enable)
|
|
|
|
1. After 6+ months of stability
|
|
2. Consider auto-compaction for new users
|
|
3. Provide migration guide for disabling
|
|
|
|
---
|
|
|
|
## Risks and Mitigations
|
|
|
|
| Risk | Impact | Likelihood | Mitigation |
|
|
|------|--------|------------|------------|
|
|
| Haiku summaries lose critical info | High | Medium | Manual review in dry-run, restore capability, improve prompts |
|
|
| API rate limits during batch | Medium | Medium | Exponential backoff, respect rate limits, batch sizing |
|
|
| JSONL merge conflicts increase | Medium | Low | Compaction is deterministic per issue, git handles well |
|
|
| Users accidentally compress important issues | High | Low | Dry-run required, restore available, snapshots permanent |
|
|
| Cost higher than expected | Low | Low | Dry-run shows estimates, configurable batch sizes |
|
|
| Schema migration fails | High | Very Low | Idempotent migrations, tested on existing DBs |
|
|
|
|
---
|
|
|
|
## Open Questions
|
|
|
|
1. **Should compaction be reversible forever, or expire snapshots?**
|
|
- Recommendation: Keep snapshots indefinitely (disk is cheap)
|
|
|
|
2. **Should we compress snapshots themselves (gzip)?**
|
|
- Recommendation: Not in MVP, add if storage becomes issue
|
|
|
|
3. **Should tier selection be automatic or manual?**
|
|
- Recommendation: Manual in MVP, auto-tier in future
|
|
|
|
4. **How to handle issues compacted on one machine but not another?**
|
|
- Answer: JSONL export includes compaction_level, imports preserve it
|
|
|
|
5. **Should we support custom models (Sonnet, Opus)?**
|
|
- Recommendation: Haiku only in MVP, add later if needed
|
|
|
|
---
|
|
|
|
## Appendix: Example Workflow
|
|
|
|
### Typical Monthly Compaction
|
|
|
|
```bash
|
|
# 1. Check what's eligible
|
|
$ bd compact --dry-run
|
|
=== Tier 1 Candidates ===
|
|
42 issues eligible (closed >30 days, deps closed)
|
|
Est. reduction: 127 KB → 25 KB (80%)
|
|
Est. cost: $0.03
|
|
|
|
# 2. Review candidates manually
|
|
$ bd list --status closed --json | jq 'map(select(.compaction_level == 0))'
|
|
|
|
# 3. Compact Tier 1
|
|
$ bd compact --all
|
|
✓ Compacted 42 issues in 18s ($0.03)
|
|
|
|
# 4. Check Tier 2 candidates (optional)
|
|
$ bd compact --dry-run --tier 2
|
|
=== Tier 2 Candidates ===
|
|
8 issues eligible (closed >90 days, 100+ commits since)
|
|
Est. reduction: 45 KB → 4 KB (91%)
|
|
Est. cost: $0.01
|
|
|
|
# 5. Compact Tier 2
|
|
$ bd compact --all --tier 2
|
|
✓ Compacted 8 issues in 6s ($0.01)
|
|
|
|
# 6. Export and commit
|
|
$ bd export -o .beads/issues.jsonl
|
|
$ git add .beads/issues.jsonl
|
|
$ git commit -m "Compact 50 old issues (saved 143 KB)"
|
|
$ git push
|
|
|
|
# 7. View stats
|
|
$ bd compact --stats
|
|
Total Space Saved: 143 KB (82% reduction)
|
|
Database Size: 2.1 MB (down from 2.3 MB)
|
|
```
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
This design provides a comprehensive, safe, and cost-effective way to keep beads databases lightweight while preserving essential context. The two-tier approach balances aggressiveness with safety, and the snapshot system ensures full reversibility.
|
|
|
|
The use of Claude Haiku for semantic compression is key - it preserves meaning rather than just truncating text. At ~$1 per 1,000 issues, the cost is negligible for the value provided.
|
|
|
|
Implementation is straightforward with clear phases and well-defined issues. The MVP (Tier 1 only) can be delivered in ~40 hours of work, with Tier 2 and enhancements following incrementally.
|
|
|
|
This aligns perfectly with beads' philosophy: **pragmatic, agent-focused, and evolutionarily designed**.
|