Add comprehensive compaction documentation
- Updated README.md with Tier 1/2 info, restore command, cost analysis - Created COMPACTION.md with full guide covering: - How compaction works (architecture, two-tier system) - CLI reference and examples - Eligibility rules and configuration - Cost analysis with detailed tables - Automation examples (cron, workflows) - Safety, recovery, and troubleshooting - FAQ and best practices - Added examples/compaction/ with 3 scripts: - workflow.sh: Interactive compaction workflow - cron-compact.sh: Automated monthly compaction - auto-compact.sh: Smart threshold-based compaction - README.md: Examples documentation Closes bd-265 Amp-Thread-ID: https://ampcode.com/threads/T-8113e88e-1cd0-4a9e-b581-07045a3ed31e Co-authored-by: Amp <amp@ampcode.com>
This commit is contained in:
481
COMPACTION.md
Normal file
481
COMPACTION.md
Normal file
@@ -0,0 +1,481 @@
|
||||
# Database Compaction Guide
|
||||
|
||||
## Overview
|
||||
|
||||
Beads compaction is **agentic memory decay** - your database naturally forgets fine-grained details of old work while preserving the essential context agents need. This keeps your database lightweight and fast, even after thousands of issues.
|
||||
|
||||
### Key Concepts
|
||||
|
||||
- **Semantic compression**: Claude Haiku summarizes issues intelligently, preserving decisions and outcomes
|
||||
- **Two-tier system**: Gradual decay from full detail → summary → ultra-brief
|
||||
- **Full recovery**: Snapshots enable complete restoration if needed
|
||||
- **Safe by design**: Dry-run preview, eligibility checks, snapshot verification
|
||||
|
||||
## How It Works
|
||||
|
||||
### Tier 1: Semantic Compression (30+ days)
|
||||
|
||||
**Target**: Closed issues 30+ days old with no open dependents
|
||||
|
||||
**Process**:
|
||||
1. Check eligibility (closed, 30+ days, no blockers)
|
||||
2. Create snapshot (full JSON backup)
|
||||
3. Send to Claude Haiku for summarization
|
||||
4. Replace verbose fields with concise summary
|
||||
5. Store original size for statistics
|
||||
|
||||
**Result**: 70-80% space reduction
|
||||
|
||||
**Example**:
|
||||
|
||||
*Before (856 bytes):*
|
||||
```
|
||||
Title: Fix authentication race condition in login flow
|
||||
Description: Users report intermittent 401 errors during concurrent
|
||||
login attempts. The issue occurs when multiple requests hit the auth
|
||||
middleware simultaneously...
|
||||
|
||||
Design: [15 lines of implementation details]
|
||||
Acceptance Criteria: [8 test scenarios]
|
||||
Notes: [debugging session notes]
|
||||
```
|
||||
|
||||
*After (171 bytes):*
|
||||
```
|
||||
Title: Fix authentication race condition in login flow
|
||||
Description: Fixed race condition in auth middleware causing 401s
|
||||
during concurrent logins. Added mutex locks and updated tests.
|
||||
Resolution: Deployed in v1.2.3.
|
||||
```
|
||||
|
||||
### Tier 2: Ultra Compression (90+ days)
|
||||
|
||||
**Target**: Tier 1 issues 90+ days old, rarely referenced
|
||||
|
||||
**Process**:
|
||||
1. Verify existing Tier 1 compaction
|
||||
2. Check reference frequency (git commits, other issues)
|
||||
3. Create Tier 2 snapshot
|
||||
4. Ultra-compress to single paragraph
|
||||
5. Optionally prune events (keep created/closed only)
|
||||
|
||||
**Result**: 90-95% space reduction
|
||||
|
||||
**Example**:
|
||||
|
||||
*After Tier 2 (43 bytes):*
|
||||
```
|
||||
Description: Auth race condition fixed, deployed v1.2.3.
|
||||
```
|
||||
|
||||
## CLI Reference
|
||||
|
||||
### Preview Candidates
|
||||
|
||||
```bash
|
||||
# See what would be compacted
|
||||
bd compact --dry-run --all
|
||||
|
||||
# Check Tier 2 candidates
|
||||
bd compact --dry-run --all --tier 2
|
||||
|
||||
# Preview specific issue
|
||||
bd compact --dry-run --id bd-42
|
||||
```
|
||||
|
||||
### Compact Issues
|
||||
|
||||
```bash
|
||||
# Compact all eligible issues (Tier 1)
|
||||
bd compact --all
|
||||
|
||||
# Compact specific issue
|
||||
bd compact --id bd-42
|
||||
|
||||
# Force compact (bypass checks - use with caution)
|
||||
bd compact --id bd-42 --force
|
||||
|
||||
# Tier 2 ultra-compression
|
||||
bd compact --all --tier 2
|
||||
|
||||
# Control parallelism
|
||||
bd compact --all --workers 10 --batch-size 20
|
||||
```
|
||||
|
||||
### Statistics & Monitoring
|
||||
|
||||
```bash
|
||||
# Show compaction stats
|
||||
bd compact --stats
|
||||
|
||||
# Output:
|
||||
# Total issues: 2,438
|
||||
# Compacted: 847 (34.7%)
|
||||
# Tier 1: 812 issues
|
||||
# Tier 2: 35 issues
|
||||
# Space saved: 1.2 MB (68% reduction)
|
||||
# Estimated cost: $0.85
|
||||
```
|
||||
|
||||
### Restore from Snapshot
|
||||
|
||||
```bash
|
||||
# Restore compacted issue to original state
|
||||
bd compact --restore bd-42
|
||||
|
||||
# Show the issue to verify
|
||||
bd show bd-42
|
||||
```
|
||||
|
||||
## Eligibility Rules
|
||||
|
||||
### Tier 1 Eligibility
|
||||
|
||||
- ✅ Status: `closed`
|
||||
- ✅ Age: 30+ days since `closed_at`
|
||||
- ✅ Dependents: No open issues depending on this one
|
||||
- ✅ Not already compacted
|
||||
|
||||
### Tier 2 Eligibility
|
||||
|
||||
- ✅ Already Tier 1 compacted
|
||||
- ✅ Age: 90+ days since `closed_at`
|
||||
- ✅ Low reference frequency:
|
||||
- Mentioned in <5 git commits in last 90 days, OR
|
||||
- Referenced by <3 issues created in last 90 days
|
||||
|
||||
## Configuration
|
||||
|
||||
### API Key Setup
|
||||
|
||||
**Option 1: Environment variable (recommended)**
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="sk-ant-..."
|
||||
```
|
||||
|
||||
Add to your shell profile (`~/.zshrc`, `~/.bashrc`, etc.) for persistence.
|
||||
|
||||
**Option 2: CI/CD environments**
|
||||
|
||||
```yaml
|
||||
# GitHub Actions
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
|
||||
# GitLab CI
|
||||
variables:
|
||||
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
|
||||
```
|
||||
|
||||
### Parallel Processing
|
||||
|
||||
Control performance vs. API rate limits:
|
||||
|
||||
```bash
|
||||
# Default: 5 workers, 10 issues per batch
|
||||
bd compact --all
|
||||
|
||||
# High throughput (watch rate limits!)
|
||||
bd compact --all --workers 20 --batch-size 50
|
||||
|
||||
# Conservative (avoid rate limits)
|
||||
bd compact --all --workers 2 --batch-size 5
|
||||
```
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
### Pricing Basics
|
||||
|
||||
Compaction uses Claude Haiku (~$1 per 1M input tokens, ~$5 per 1M output tokens).
|
||||
|
||||
Typical issue:
|
||||
- Input: ~500 tokens (issue content)
|
||||
- Output: ~100 tokens (summary)
|
||||
- Cost per issue: ~$0.001 (0.1¢)
|
||||
|
||||
### Cost Examples
|
||||
|
||||
| Issues | Est. Cost | Time (5 workers) |
|
||||
|--------|-----------|------------------|
|
||||
| 100 | $0.10 | ~2 minutes |
|
||||
| 1,000 | $1.00 | ~20 minutes |
|
||||
| 10,000 | $10.00 | ~3 hours |
|
||||
|
||||
### Monthly Cost Estimate
|
||||
|
||||
If you close 50 issues/month and compact monthly:
|
||||
- **Monthly cost**: $0.05
|
||||
- **Annual cost**: $0.60
|
||||
|
||||
Even large teams (500 issues/month) pay ~$6/year.
|
||||
|
||||
### Space Savings
|
||||
|
||||
| Database Size | Issues | After Tier 1 | After Tier 2 |
|
||||
|---------------|--------|--------------|--------------|
|
||||
| 10 MB | 2,000 | 3 MB (-70%) | 1 MB (-90%) |
|
||||
| 100 MB | 20,000 | 30 MB (-70%) | 10 MB (-90%) |
|
||||
| 1 GB | 200,000| 300 MB (-70%)| 100 MB (-90%)|
|
||||
|
||||
## Automation
|
||||
|
||||
### Monthly Cron Job
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /etc/cron.monthly/bd-compact.sh
|
||||
|
||||
export ANTHROPIC_API_KEY="sk-ant-..."
|
||||
cd /path/to/your/repo
|
||||
|
||||
# Compact Tier 1
|
||||
bd compact --all 2>&1 | tee -a ~/.bd-compact.log
|
||||
|
||||
# Commit results
|
||||
git add .beads/issues.jsonl issues.db
|
||||
git commit -m "Monthly compaction: $(date +%Y-%m)"
|
||||
git push
|
||||
```
|
||||
|
||||
Make executable:
|
||||
```bash
|
||||
chmod +x /etc/cron.monthly/bd-compact.sh
|
||||
```
|
||||
|
||||
### Automated Workflow Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# examples/compaction/workflow.sh
|
||||
|
||||
# Exit on error
|
||||
set -e
|
||||
|
||||
echo "=== BD Compaction Workflow ==="
|
||||
echo "Date: $(date)"
|
||||
echo
|
||||
|
||||
# Check API key
|
||||
if [ -z "$ANTHROPIC_API_KEY" ]; then
|
||||
echo "Error: ANTHROPIC_API_KEY not set"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Preview candidates
|
||||
echo "--- Preview Tier 1 Candidates ---"
|
||||
bd compact --dry-run --all
|
||||
|
||||
read -p "Proceed with Tier 1 compaction? (y/N) " -n 1 -r
|
||||
echo
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "--- Running Tier 1 Compaction ---"
|
||||
bd compact --all
|
||||
fi
|
||||
|
||||
# Preview Tier 2
|
||||
echo
|
||||
echo "--- Preview Tier 2 Candidates ---"
|
||||
bd compact --dry-run --all --tier 2
|
||||
|
||||
read -p "Proceed with Tier 2 compaction? (y/N) " -n 1 -r
|
||||
echo
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "--- Running Tier 2 Compaction ---"
|
||||
bd compact --all --tier 2
|
||||
fi
|
||||
|
||||
# Show stats
|
||||
echo
|
||||
echo "--- Final Statistics ---"
|
||||
bd compact --stats
|
||||
|
||||
echo
|
||||
echo "=== Compaction Complete ==="
|
||||
```
|
||||
|
||||
### Pre-commit Hook (Automatic)
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# .git/hooks/pre-commit
|
||||
|
||||
# Auto-compact before each commit (optional, experimental)
|
||||
if command -v bd &> /dev/null && [ -n "$ANTHROPIC_API_KEY" ]; then
|
||||
bd compact --all --dry-run > /dev/null 2>&1
|
||||
# Only compact if >10 eligible issues
|
||||
ELIGIBLE=$(bd compact --dry-run --all --json 2>/dev/null | jq '. | length')
|
||||
if [ "$ELIGIBLE" -gt 10 ]; then
|
||||
echo "Auto-compacting $ELIGIBLE eligible issues..."
|
||||
bd compact --all
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
## Safety & Recovery
|
||||
|
||||
### Snapshots
|
||||
|
||||
Every compaction creates a snapshot in the `compaction_snapshots` table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE compaction_snapshots (
|
||||
id INTEGER PRIMARY KEY,
|
||||
issue_id TEXT NOT NULL,
|
||||
tier INTEGER NOT NULL,
|
||||
snapshot_data TEXT NOT NULL, -- Full JSON of original issue
|
||||
created_at DATETIME NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### Restore Process
|
||||
|
||||
```bash
|
||||
# Restore single issue
|
||||
bd compact --restore bd-42
|
||||
|
||||
# Verify restoration
|
||||
bd show bd-42 # Should show original content
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
After compaction, verify with:
|
||||
|
||||
```bash
|
||||
# Check compaction stats
|
||||
bd compact --stats
|
||||
|
||||
# Spot-check compacted issues
|
||||
bd show bd-42
|
||||
|
||||
# Verify snapshots exist
|
||||
sqlite3 issues.db "SELECT COUNT(*) FROM compaction_snapshots;"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "ANTHROPIC_API_KEY not set"
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="sk-ant-..."
|
||||
# Add to ~/.zshrc or ~/.bashrc for persistence
|
||||
```
|
||||
|
||||
### Rate Limit Errors
|
||||
|
||||
Reduce parallelism:
|
||||
```bash
|
||||
bd compact --all --workers 2 --batch-size 5
|
||||
```
|
||||
|
||||
Or add delays between batches (future enhancement).
|
||||
|
||||
### Issue Not Eligible
|
||||
|
||||
Check eligibility:
|
||||
```bash
|
||||
bd compact --dry-run --id bd-42
|
||||
```
|
||||
|
||||
Force compact (if you know what you're doing):
|
||||
```bash
|
||||
bd compact --id bd-42 --force
|
||||
```
|
||||
|
||||
### Restore Failed
|
||||
|
||||
Snapshots are stored in SQLite. If restore fails, manually query:
|
||||
|
||||
```bash
|
||||
sqlite3 issues.db "SELECT snapshot_data FROM compaction_snapshots WHERE issue_id='bd-42' ORDER BY created_at DESC LIMIT 1;"
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
### When should I compact?
|
||||
|
||||
- **Small projects (<500 issues)**: Rarely needed, maybe annually
|
||||
- **Medium projects (500-5000 issues)**: Every 3-6 months
|
||||
- **Large projects (5000+ issues)**: Monthly or quarterly
|
||||
- **High-velocity teams**: Set up automated monthly compaction
|
||||
|
||||
### Can I restore compacted issues?
|
||||
|
||||
**Yes!** Full snapshots are stored. Use `bd compact --restore <id>` anytime.
|
||||
|
||||
### What happens to dependencies?
|
||||
|
||||
Dependencies are preserved. Compaction only affects the issue's text fields (description, design, notes, acceptance criteria).
|
||||
|
||||
### Does compaction affect git history?
|
||||
|
||||
No. Old versions of issues remain in git history. Compaction only affects the current state in `.beads/issues.jsonl` and `issues.db`.
|
||||
|
||||
### Should I commit compacted issues?
|
||||
|
||||
**Yes.** Compaction modifies both the database and JSONL. Commit and push:
|
||||
|
||||
```bash
|
||||
git add .beads/issues.jsonl issues.db
|
||||
git commit -m "Compact old closed issues"
|
||||
git push
|
||||
```
|
||||
|
||||
### What if my team disagrees on compaction frequency?
|
||||
|
||||
Use `bd compact --dry-run` to preview. Discuss the candidates before running. You can always restore if someone needs the original.
|
||||
|
||||
### Can I compact open issues?
|
||||
|
||||
No. Compaction only works on closed issues to ensure active work retains full detail.
|
||||
|
||||
### How does Tier 2 decide "rarely referenced"?
|
||||
|
||||
It checks:
|
||||
1. Git commits mentioning the issue ID in last 90 days
|
||||
2. Other issues referencing it in descriptions/notes
|
||||
|
||||
If references are low (< 5 commits or < 3 issues), it's eligible for Tier 2.
|
||||
|
||||
### Does compaction slow down queries?
|
||||
|
||||
No. Compaction reduces database size, making queries faster. Agents benefit from smaller context when reading issues.
|
||||
|
||||
### Can I customize the summarization prompt?
|
||||
|
||||
Not yet, but it's planned (bd-264). The current prompt is optimized for preserving key decisions and outcomes.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start with dry-run**: Always preview before compacting
|
||||
2. **Compact regularly**: Monthly or quarterly depending on project size
|
||||
3. **Monitor costs**: Use `bd compact --stats` to track savings
|
||||
4. **Automate it**: Set up cron jobs for hands-off maintenance
|
||||
5. **Check snapshots**: Periodically verify snapshots are being created
|
||||
6. **Commit results**: Always commit and push after compaction
|
||||
7. **Team communication**: Let team know before large compaction runs
|
||||
|
||||
## Examples
|
||||
|
||||
See [examples/compaction/](examples/compaction/) for:
|
||||
- `workflow.sh` - Interactive compaction workflow
|
||||
- `cron-compact.sh` - Automated monthly compaction
|
||||
- `auto-compact.sh` - Smart auto-compaction with thresholds
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [README.md](README.md) - Quick start and overview
|
||||
- [EXTENDING.md](EXTENDING.md) - Database schema and extensions
|
||||
- [GIT_WORKFLOW.md](GIT_WORKFLOW.md) - Multi-machine collaboration
|
||||
|
||||
## Contributing
|
||||
|
||||
Found a bug or have ideas for improving compaction? Open an issue or PR!
|
||||
|
||||
Common enhancement requests:
|
||||
- Custom summarization prompts (bd-264)
|
||||
- Alternative LLM backends (local models)
|
||||
- Configurable eligibility rules
|
||||
- Batch restore operations
|
||||
- Compaction analytics dashboard
|
||||
Reference in New Issue
Block a user