Files
beads/COMPACTION.md
Steve Yegge 1cc37d9acf Remove all restore/snapshot references from compaction
- Removed restore command from bd show output
- Updated compact help text (removed snapshot claim)
- Fixed COMPACTION.md (removed 'batch restore' from roadmap)
- All compaction UI now correctly states permanent decay
2025-10-16 01:13:08 -07:00

452 lines
11 KiB
Markdown

# Database Compaction Guide
## Overview
Beads compaction is **agentic memory decay** - your database naturally forgets fine-grained details of old work while preserving the essential context agents need. This keeps your database lightweight and fast, even after thousands of issues.
### Key Concepts
- **Semantic compression**: Claude Haiku summarizes issues intelligently, preserving decisions and outcomes
- **Two-tier system**: Gradual decay from full detail → summary → ultra-brief
- **Permanent decay**: Original content is discarded to save space (not reversible)
- **Safe by design**: Dry-run preview, eligibility checks, git history preserves old versions
## How It Works
### Tier 1: Semantic Compression (30+ days)
**Target**: Closed issues 30+ days old with no open dependents
**Process**:
1. Check eligibility (closed, 30+ days, no blockers)
2. Send to Claude Haiku for summarization
3. Replace verbose fields with concise summary
4. Store original size for statistics
**Result**: 70-80% space reduction
**Example**:
*Before (856 bytes):*
```
Title: Fix authentication race condition in login flow
Description: Users report intermittent 401 errors during concurrent
login attempts. The issue occurs when multiple requests hit the auth
middleware simultaneously...
Design: [15 lines of implementation details]
Acceptance Criteria: [8 test scenarios]
Notes: [debugging session notes]
```
*After (171 bytes):*
```
Title: Fix authentication race condition in login flow
Description: Fixed race condition in auth middleware causing 401s
during concurrent logins. Added mutex locks and updated tests.
Resolution: Deployed in v1.2.3.
```
### Tier 2: Ultra Compression (90+ days)
**Target**: Tier 1 issues 90+ days old, rarely referenced
**Process**:
1. Verify existing Tier 1 compaction
2. Check reference frequency (git commits, other issues)
3. Ultra-compress to single paragraph
4. Optionally prune events (keep created/closed only)
**Result**: 90-95% space reduction
**Example**:
*After Tier 2 (43 bytes):*
```
Description: Auth race condition fixed, deployed v1.2.3.
```
## CLI Reference
### Preview Candidates
```bash
# See what would be compacted
bd compact --dry-run --all
# Check Tier 2 candidates
bd compact --dry-run --all --tier 2
# Preview specific issue
bd compact --dry-run --id bd-42
```
### Compact Issues
```bash
# Compact all eligible issues (Tier 1)
bd compact --all
# Compact specific issue
bd compact --id bd-42
# Force compact (bypass checks - use with caution)
bd compact --id bd-42 --force
# Tier 2 ultra-compression
bd compact --all --tier 2
# Control parallelism
bd compact --all --workers 10 --batch-size 20
```
### Statistics & Monitoring
```bash
# Show compaction stats
bd compact --stats
# Output:
# Total issues: 2,438
# Compacted: 847 (34.7%)
# Tier 1: 812 issues
# Tier 2: 35 issues
# Space saved: 1.2 MB (68% reduction)
# Estimated cost: $0.85
```
## Eligibility Rules
### Tier 1 Eligibility
- ✅ Status: `closed`
- ✅ Age: 30+ days since `closed_at`
- ✅ Dependents: No open issues depending on this one
- ✅ Not already compacted
### Tier 2 Eligibility
- ✅ Already Tier 1 compacted
- ✅ Age: 90+ days since `closed_at`
- ✅ Low reference frequency:
- Mentioned in <5 git commits in last 90 days, OR
- Referenced by <3 issues created in last 90 days
## Configuration
### API Key Setup
**Option 1: Environment variable (recommended)**
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```
Add to your shell profile (`~/.zshrc`, `~/.bashrc`, etc.) for persistence.
**Option 2: CI/CD environments**
```yaml
# GitHub Actions
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
# GitLab CI
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
```
### Parallel Processing
Control performance vs. API rate limits:
```bash
# Default: 5 workers, 10 issues per batch
bd compact --all
# High throughput (watch rate limits!)
bd compact --all --workers 20 --batch-size 50
# Conservative (avoid rate limits)
bd compact --all --workers 2 --batch-size 5
```
## Cost Analysis
### Pricing Basics
Compaction uses Claude Haiku (~$1 per 1M input tokens, ~$5 per 1M output tokens).
Typical issue:
- Input: ~500 tokens (issue content)
- Output: ~100 tokens (summary)
- Cost per issue: ~$0.001 (0.1¢)
### Cost Examples
| Issues | Est. Cost | Time (5 workers) |
|--------|-----------|------------------|
| 100 | $0.10 | ~2 minutes |
| 1,000 | $1.00 | ~20 minutes |
| 10,000 | $10.00 | ~3 hours |
### Monthly Cost Estimate
If you close 50 issues/month and compact monthly:
- **Monthly cost**: $0.05
- **Annual cost**: $0.60
Even large teams (500 issues/month) pay ~$6/year.
### Space Savings
| Database Size | Issues | After Tier 1 | After Tier 2 |
|---------------|--------|--------------|--------------|
| 10 MB | 2,000 | 3 MB (-70%) | 1 MB (-90%) |
| 100 MB | 20,000 | 30 MB (-70%) | 10 MB (-90%) |
| 1 GB | 200,000| 300 MB (-70%)| 100 MB (-90%)|
## Automation
### Monthly Cron Job
```bash
#!/bin/bash
# /etc/cron.monthly/bd-compact.sh
export ANTHROPIC_API_KEY="sk-ant-..."
cd /path/to/your/repo
# Compact Tier 1
bd compact --all 2>&1 | tee -a ~/.bd-compact.log
# Commit results
git add .beads/issues.jsonl issues.db
git commit -m "Monthly compaction: $(date +%Y-%m)"
git push
```
Make executable:
```bash
chmod +x /etc/cron.monthly/bd-compact.sh
```
### Automated Workflow Script
```bash
#!/bin/bash
# examples/compaction/workflow.sh
# Exit on error
set -e
echo "=== BD Compaction Workflow ==="
echo "Date: $(date)"
echo
# Check API key
if [ -z "$ANTHROPIC_API_KEY" ]; then
echo "Error: ANTHROPIC_API_KEY not set"
exit 1
fi
# Preview candidates
echo "--- Preview Tier 1 Candidates ---"
bd compact --dry-run --all
read -p "Proceed with Tier 1 compaction? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "--- Running Tier 1 Compaction ---"
bd compact --all
fi
# Preview Tier 2
echo
echo "--- Preview Tier 2 Candidates ---"
bd compact --dry-run --all --tier 2
read -p "Proceed with Tier 2 compaction? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "--- Running Tier 2 Compaction ---"
bd compact --all --tier 2
fi
# Show stats
echo
echo "--- Final Statistics ---"
bd compact --stats
echo
echo "=== Compaction Complete ==="
```
### Pre-commit Hook (Automatic)
```bash
#!/bin/bash
# .git/hooks/pre-commit
# Auto-compact before each commit (optional, experimental)
if command -v bd &> /dev/null && [ -n "$ANTHROPIC_API_KEY" ]; then
bd compact --all --dry-run > /dev/null 2>&1
# Only compact if >10 eligible issues
ELIGIBLE=$(bd compact --dry-run --all --json 2>/dev/null | jq '. | length')
if [ "$ELIGIBLE" -gt 10 ]; then
echo "Auto-compacting $ELIGIBLE eligible issues..."
bd compact --all
fi
fi
```
## Safety & Recovery
### Git History
Compaction is permanent - the original content is discarded to save space. However, you can recover old versions from git history:
```bash
# View issue before compaction
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-42"
# Checkout old version
git checkout <commit-hash> -- .beads/issues.jsonl
# Or use git show
git show <commit-hash>:.beads/issues.jsonl | grep -A 50 "bd-42"
```
### Verification
After compaction, verify with:
```bash
# Check compaction stats
bd compact --stats
# Spot-check compacted issues
bd show bd-42
```
## Troubleshooting
### "ANTHROPIC_API_KEY not set"
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
# Add to ~/.zshrc or ~/.bashrc for persistence
```
### Rate Limit Errors
Reduce parallelism:
```bash
bd compact --all --workers 2 --batch-size 5
```
Or add delays between batches (future enhancement).
### Issue Not Eligible
Check eligibility:
```bash
bd compact --dry-run --id bd-42
```
Force compact (if you know what you're doing):
```bash
bd compact --id bd-42 --force
```
## FAQ
### When should I compact?
- **Small projects (<500 issues)**: Rarely needed, maybe annually
- **Medium projects (500-5000 issues)**: Every 3-6 months
- **Large projects (5000+ issues)**: Monthly or quarterly
- **High-velocity teams**: Set up automated monthly compaction
### Can I recover compacted issues?
Compaction is permanent, but you can recover from git history:
```bash
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-42"
```
### What happens to dependencies?
Dependencies are preserved. Compaction only affects the issue's text fields (description, design, notes, acceptance criteria).
### Does compaction affect git history?
No. Old versions of issues remain in git history. Compaction only affects the current state in `.beads/issues.jsonl` and `issues.db`.
### Should I commit compacted issues?
**Yes.** Compaction modifies both the database and JSONL. Commit and push:
```bash
git add .beads/issues.jsonl issues.db
git commit -m "Compact old closed issues"
git push
```
### What if my team disagrees on compaction frequency?
Use `bd compact --dry-run` to preview. Discuss the candidates before running. Since compaction is permanent, get team consensus first.
### Can I compact open issues?
No. Compaction only works on closed issues to ensure active work retains full detail.
### How does Tier 2 decide "rarely referenced"?
It checks:
1. Git commits mentioning the issue ID in last 90 days
2. Other issues referencing it in descriptions/notes
If references are low (< 5 commits or < 3 issues), it's eligible for Tier 2.
### Does compaction slow down queries?
No. Compaction reduces database size, making queries faster. Agents benefit from smaller context when reading issues.
### Can I customize the summarization prompt?
Not yet, but it's planned (bd-264). The current prompt is optimized for preserving key decisions and outcomes.
## Best Practices
1. **Start with dry-run**: Always preview before compacting
2. **Compact regularly**: Monthly or quarterly depending on project size
3. **Monitor costs**: Use `bd compact --stats` to track savings
4. **Automate it**: Set up cron jobs for hands-off maintenance
5. **Commit results**: Always commit and push after compaction
6. **Team communication**: Let team know before large compaction runs (it's permanent!)
## Examples
See [examples/compaction/](examples/compaction/) for:
- `workflow.sh` - Interactive compaction workflow
- `cron-compact.sh` - Automated monthly compaction
- `auto-compact.sh` - Smart auto-compaction with thresholds
## Related Documentation
- [README.md](README.md) - Quick start and overview
- [EXTENDING.md](EXTENDING.md) - Database schema and extensions
- [GIT_WORKFLOW.md](GIT_WORKFLOW.md) - Multi-machine collaboration
## Contributing
Found a bug or have ideas for improving compaction? Open an issue or PR!
Common enhancement requests:
- Custom summarization prompts (bd-264)
- Alternative LLM backends (local models)
- Configurable eligibility rules
- Compaction analytics dashboard
- Optional snapshot retention for restore (if requested)