Condense COMPACTION.md into README and make README more succinct

This commit is contained in:
Steve Yegge
2025-10-16 15:22:44 -07:00
parent 1eb59fa120
commit a7a4600b31
3 changed files with 22 additions and 647 deletions

View File

@@ -81,8 +81,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Community
- Merged PR #31: Windows Defender mitigation for export
- Merged PR #37: Fix NULL handling in statistics
- Merged PR #38: Nix flake for declarative builds
- Merged PR #40: MCP integration test fixes
- Merged PR #45: Label and title filtering for bd list
- Merged PR #46: Add --format flag to bd list
- Merged PR #47: Error handling consistency
- Merged PR #48: Cyclomatic complexity reduction

View File

@@ -1,451 +0,0 @@
# Database Compaction Guide
## Overview
Beads compaction is **agentic memory decay** - your database naturally forgets fine-grained details of old work while preserving the essential context agents need. This keeps your database lightweight and fast, even after thousands of issues.
### Key Concepts
- **Semantic compression**: Claude Haiku summarizes issues intelligently, preserving decisions and outcomes
- **Two-tier system**: Gradual decay from full detail → summary → ultra-brief
- **Permanent decay**: Original content is discarded to save space (not reversible)
- **Safe by design**: Dry-run preview, eligibility checks, git history preserves old versions
## How It Works
### Tier 1: Semantic Compression (30+ days)
**Target**: Closed issues 30+ days old with no open dependents
**Process**:
1. Check eligibility (closed, 30+ days, no blockers)
2. Send to Claude Haiku for summarization
3. Replace verbose fields with concise summary
4. Store original size for statistics
**Result**: 70-80% space reduction
**Example**:
*Before (856 bytes):*
```
Title: Fix authentication race condition in login flow
Description: Users report intermittent 401 errors during concurrent
login attempts. The issue occurs when multiple requests hit the auth
middleware simultaneously...
Design: [15 lines of implementation details]
Acceptance Criteria: [8 test scenarios]
Notes: [debugging session notes]
```
*After (171 bytes):*
```
Title: Fix authentication race condition in login flow
Description: Fixed race condition in auth middleware causing 401s
during concurrent logins. Added mutex locks and updated tests.
Resolution: Deployed in v1.2.3.
```
### Tier 2: Ultra Compression (90+ days)
**Target**: Tier 1 issues 90+ days old, rarely referenced
**Process**:
1. Verify existing Tier 1 compaction
2. Check reference frequency (git commits, other issues)
3. Ultra-compress to single paragraph
4. Optionally prune events (keep created/closed only)
**Result**: 90-95% space reduction
**Example**:
*After Tier 2 (43 bytes):*
```
Description: Auth race condition fixed, deployed v1.2.3.
```
## CLI Reference
### Preview Candidates
```bash
# See what would be compacted
bd compact --dry-run --all
# Check Tier 2 candidates
bd compact --dry-run --all --tier 2
# Preview specific issue
bd compact --dry-run --id bd-42
```
### Compact Issues
```bash
# Compact all eligible issues (Tier 1)
bd compact --all
# Compact specific issue
bd compact --id bd-42
# Force compact (bypass checks - use with caution)
bd compact --id bd-42 --force
# Tier 2 ultra-compression
bd compact --all --tier 2
# Control parallelism
bd compact --all --workers 10 --batch-size 20
```
### Statistics & Monitoring
```bash
# Show compaction stats
bd compact --stats
# Output:
# Total issues: 2,438
# Compacted: 847 (34.7%)
# Tier 1: 812 issues
# Tier 2: 35 issues
# Space saved: 1.2 MB (68% reduction)
# Estimated cost: $0.85
```
## Eligibility Rules
### Tier 1 Eligibility
- ✅ Status: `closed`
- ✅ Age: 30+ days since `closed_at`
- ✅ Dependents: No open issues depending on this one
- ✅ Not already compacted
### Tier 2 Eligibility
- ✅ Already Tier 1 compacted
- ✅ Age: 90+ days since `closed_at`
- ✅ Low reference frequency:
- Mentioned in <5 git commits in last 90 days, OR
- Referenced by <3 issues created in last 90 days
## Configuration
### API Key Setup
**Option 1: Environment variable (recommended)**
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```
Add to your shell profile (`~/.zshrc`, `~/.bashrc`, etc.) for persistence.
**Option 2: CI/CD environments**
```yaml
# GitHub Actions
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
# GitLab CI
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
```
### Parallel Processing
Control performance vs. API rate limits:
```bash
# Default: 5 workers, 10 issues per batch
bd compact --all
# High throughput (watch rate limits!)
bd compact --all --workers 20 --batch-size 50
# Conservative (avoid rate limits)
bd compact --all --workers 2 --batch-size 5
```
## Cost Analysis
### Pricing Basics
Compaction uses Claude Haiku (~$1 per 1M input tokens, ~$5 per 1M output tokens).
Typical issue:
- Input: ~500 tokens (issue content)
- Output: ~100 tokens (summary)
- Cost per issue: ~$0.001 (0.1¢)
### Cost Examples
| Issues | Est. Cost | Time (5 workers) |
|--------|-----------|------------------|
| 100 | $0.10 | ~2 minutes |
| 1,000 | $1.00 | ~20 minutes |
| 10,000 | $10.00 | ~3 hours |
### Monthly Cost Estimate
If you close 50 issues/month and compact monthly:
- **Monthly cost**: $0.05
- **Annual cost**: $0.60
Even large teams (500 issues/month) pay ~$6/year.
### Space Savings
| Database Size | Issues | After Tier 1 | After Tier 2 |
|---------------|--------|--------------|--------------|
| 10 MB | 2,000 | 3 MB (-70%) | 1 MB (-90%) |
| 100 MB | 20,000 | 30 MB (-70%) | 10 MB (-90%) |
| 1 GB | 200,000| 300 MB (-70%)| 100 MB (-90%)|
## Automation
### Monthly Cron Job
```bash
#!/bin/bash
# /etc/cron.monthly/bd-compact.sh
export ANTHROPIC_API_KEY="sk-ant-..."
cd /path/to/your/repo
# Compact Tier 1
bd compact --all 2>&1 | tee -a ~/.bd-compact.log
# Commit results
git add .beads/issues.jsonl issues.db
git commit -m "Monthly compaction: $(date +%Y-%m)"
git push
```
Make executable:
```bash
chmod +x /etc/cron.monthly/bd-compact.sh
```
### Automated Workflow Script
```bash
#!/bin/bash
# examples/compaction/workflow.sh
# Exit on error
set -e
echo "=== BD Compaction Workflow ==="
echo "Date: $(date)"
echo
# Check API key
if [ -z "$ANTHROPIC_API_KEY" ]; then
echo "Error: ANTHROPIC_API_KEY not set"
exit 1
fi
# Preview candidates
echo "--- Preview Tier 1 Candidates ---"
bd compact --dry-run --all
read -p "Proceed with Tier 1 compaction? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "--- Running Tier 1 Compaction ---"
bd compact --all
fi
# Preview Tier 2
echo
echo "--- Preview Tier 2 Candidates ---"
bd compact --dry-run --all --tier 2
read -p "Proceed with Tier 2 compaction? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "--- Running Tier 2 Compaction ---"
bd compact --all --tier 2
fi
# Show stats
echo
echo "--- Final Statistics ---"
bd compact --stats
echo
echo "=== Compaction Complete ==="
```
### Pre-commit Hook (Automatic)
```bash
#!/bin/bash
# .git/hooks/pre-commit
# Auto-compact before each commit (optional, experimental)
if command -v bd &> /dev/null && [ -n "$ANTHROPIC_API_KEY" ]; then
bd compact --all --dry-run > /dev/null 2>&1
# Only compact if >10 eligible issues
ELIGIBLE=$(bd compact --dry-run --all --json 2>/dev/null | jq '. | length')
if [ "$ELIGIBLE" -gt 10 ]; then
echo "Auto-compacting $ELIGIBLE eligible issues..."
bd compact --all
fi
fi
```
## Safety & Recovery
### Git History
Compaction is permanent - the original content is discarded to save space. However, you can recover old versions from git history:
```bash
# View issue before compaction
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-42"
# Checkout old version
git checkout <commit-hash> -- .beads/issues.jsonl
# Or use git show
git show <commit-hash>:.beads/issues.jsonl | grep -A 50 "bd-42"
```
### Verification
After compaction, verify with:
```bash
# Check compaction stats
bd compact --stats
# Spot-check compacted issues
bd show bd-42
```
## Troubleshooting
### "ANTHROPIC_API_KEY not set"
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
# Add to ~/.zshrc or ~/.bashrc for persistence
```
### Rate Limit Errors
Reduce parallelism:
```bash
bd compact --all --workers 2 --batch-size 5
```
Or add delays between batches (future enhancement).
### Issue Not Eligible
Check eligibility:
```bash
bd compact --dry-run --id bd-42
```
Force compact (if you know what you're doing):
```bash
bd compact --id bd-42 --force
```
## FAQ
### When should I compact?
- **Small projects (<500 issues)**: Rarely needed, maybe annually
- **Medium projects (500-5000 issues)**: Every 3-6 months
- **Large projects (5000+ issues)**: Monthly or quarterly
- **High-velocity teams**: Set up automated monthly compaction
### Can I recover compacted issues?
Compaction is permanent, but you can recover from git history:
```bash
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-42"
```
### What happens to dependencies?
Dependencies are preserved. Compaction only affects the issue's text fields (description, design, notes, acceptance criteria).
### Does compaction affect git history?
No. Old versions of issues remain in git history. Compaction only affects the current state in `.beads/issues.jsonl` and `issues.db`.
### Should I commit compacted issues?
**Yes.** Compaction modifies both the database and JSONL. Commit and push:
```bash
git add .beads/issues.jsonl issues.db
git commit -m "Compact old closed issues"
git push
```
### What if my team disagrees on compaction frequency?
Use `bd compact --dry-run` to preview. Discuss the candidates before running. Since compaction is permanent, get team consensus first.
### Can I compact open issues?
No. Compaction only works on closed issues to ensure active work retains full detail.
### How does Tier 2 decide "rarely referenced"?
It checks:
1. Git commits mentioning the issue ID in last 90 days
2. Other issues referencing it in descriptions/notes
If references are low (< 5 commits or < 3 issues), it's eligible for Tier 2.
### Does compaction slow down queries?
No. Compaction reduces database size, making queries faster. Agents benefit from smaller context when reading issues.
### Can I customize the summarization prompt?
Not yet, but it's planned (bd-264). The current prompt is optimized for preserving key decisions and outcomes.
## Best Practices
1. **Start with dry-run**: Always preview before compacting
2. **Compact regularly**: Monthly or quarterly depending on project size
3. **Monitor costs**: Use `bd compact --stats` to track savings
4. **Automate it**: Set up cron jobs for hands-off maintenance
5. **Commit results**: Always commit and push after compaction
6. **Team communication**: Let team know before large compaction runs (it's permanent!)
## Examples
See [examples/compaction/](examples/compaction/) for:
- `workflow.sh` - Interactive compaction workflow
- `cron-compact.sh` - Automated monthly compaction
- `auto-compact.sh` - Smart auto-compaction with thresholds
## Related Documentation
- [README.md](README.md) - Quick start and overview
- [EXTENDING.md](EXTENDING.md) - Database schema and extensions
- [GIT_WORKFLOW.md](GIT_WORKFLOW.md) - Multi-machine collaboration
## Contributing
Found a bug or have ideas for improving compaction? Open an issue or PR!
Common enhancement requests:
- Custom summarization prompts (bd-264)
- Alternative LLM backends (local models)
- Configurable eligibility rules
- Compaction analytics dashboard
- Optional snapshot retention for restore (if requested)

216
README.md
View File

@@ -281,87 +281,7 @@ Options:
#### Creating Issues from Markdown
You can draft multiple issues in a markdown file and create them all at once. This is useful for planning features or converting written notes into tracked work.
Markdown format:
```markdown
## Issue Title
Optional description text here.
### Priority
1
### Type
feature
### Description
More detailed description (overrides text after title).
### Design
Design notes and implementation details.
### Acceptance Criteria
- Must do this
- Must do that
### Assignee
username
### Labels
label1, label2, label3
### Dependencies
bd-10, bd-20
```
Example markdown file (`auth-improvements.md`):
```markdown
## Add OAuth2 support
We need to support OAuth2 authentication.
### Priority
1
### Type
feature
### Assignee
alice
### Labels
auth, high-priority
## Add rate limiting
### Priority
0
### Type
bug
### Description
Auth endpoints are vulnerable to brute force attacks.
### Labels
security, urgent
```
Create all issues:
```bash
bd create -f auth-improvements.md
# ✓ Created 2 issues from auth-improvements.md:
# bd-42: Add OAuth2 support [P1, feature]
# bd-43: Add rate limiting [P0, bug]
```
**Notes:**
- Each `## Heading` creates a new issue
- Sections (`### Priority`, `### Type`, etc.) are optional
- Defaults: Priority=2, Type=task
- Text immediately after the title becomes the description (unless overridden by `### Description`)
- All standard issue fields are supported
Draft multiple issues in a markdown file with `bd create -f file.md`. Format: `## Issue Title` creates new issue, optional sections: `### Priority`, `### Type`, `### Description`, `### Assignee`, `### Labels`, `### Dependencies`. Defaults: Priority=2, Type=task
### Viewing Issues
@@ -419,26 +339,7 @@ bd dep cycles
#### Cycle Prevention
beads maintains a directed acyclic graph (DAG) of dependencies and prevents cycles across **all** dependency types. This ensures:
- **Ready work is accurate**: Cycles can hide issues from `bd ready` by making them appear blocked when they're actually part of a circular dependency
- **Dependencies are clear**: Circular dependencies are semantically confusing (if A depends on B and B depends on A, which should be done first?)
- **Traversals work correctly**: Commands like `bd dep tree` rely on DAG structure
**Example - Prevented Cycle:**
```bash
bd dep add bd-1 bd-2 # bd-1 blocks on bd-2 ✓
bd dep add bd-2 bd-3 # bd-2 blocks on bd-3 ✓
bd dep add bd-3 bd-1 # ERROR: would create cycle bd-3 → bd-1 → bd-2 → bd-3 ✗
```
Cross-type cycles are also prevented:
```bash
bd dep add bd-1 bd-2 --type blocks # bd-1 blocks on bd-2 ✓
bd dep add bd-2 bd-1 --type parent-child # ERROR: would create cycle ✗
```
If you try to add a dependency that creates a cycle, you'll get a clear error message. After successfully adding dependencies, beads will warn you if any cycles are detected elsewhere in the graph.
Beads maintains a DAG and prevents cycles across all dependency types. Cycles break ready work detection and tree traversals. Attempting to add a cycle-creating dependency returns an error
### Finding Work
@@ -461,44 +362,26 @@ bd ready --json
### Compaction (Memory Decay)
Beads can semantically compress old closed issues to keep the database lightweight. This is agentic memory decay - the database naturally forgets details over time while preserving essential context.
Beads uses AI to compress old closed issues, keeping databases lightweight as they age. This is agentic memory decay - your database naturally forgets fine-grained details while preserving essential context agents need.
```bash
# Preview what would be compacted
bd compact --dry-run --all
# Show compaction statistics
bd compact --stats
# Compact all eligible issues (30+ days closed, no open dependents)
bd compact --all
# Compact specific issue
bd compact --id bd-42
# Force compact (bypass eligibility checks)
bd compact --id bd-42 --force
# Tier 2 ultra-compression (90+ days, 95% reduction)
bd compact --tier 2 --all
bd compact --dry-run --all # Preview candidates
bd compact --stats # Show statistics
bd compact --all # Compact eligible issues (30+ days closed)
bd compact --tier 2 --all # Ultra-compress (90+ days, rarely referenced)
```
Compaction uses Claude Haiku to semantically summarize issues:
- **Tier 1**: 70-80% space reduction (30+ days closed)
- **Tier 2**: 90-95% space reduction (90+ days closed, rarely referenced)
Uses Claude Haiku for semantic summarization. **Tier 1** (30+ days): 70-80% reduction. **Tier 2** (90+ days, low references): 90-95% reduction. Requires `ANTHROPIC_API_KEY`. Cost: ~$1 per 1,000 issues.
**Requirements:**
- Set `ANTHROPIC_API_KEY` environment variable
- Cost: ~$1 per 1,000 issues compacted (Haiku pricing)
Eligibility: Must be closed with no open dependents. Tier 2 requires low reference frequency (<5 commits or <3 issues in last 90 days).
**Eligibility:**
- Status: closed
- Tier 1: 30+ days since closed, no open dependents
- Tier 2: 90+ days since closed, rarely referenced in commits/issues
**Permanent:** Original content is discarded. Recover old versions from git history if needed.
**Note:** Compaction is permanent graceful decay - original content is discarded to save space. Use git history to recover old versions if needed.
See [COMPACTION.md](COMPACTION.md) for detailed documentation, cost analysis, and automation examples.
**Automation:**
```bash
# Monthly cron
0 0 1 * * bd compact --all && git add .beads && git commit -m "Monthly compaction"
```
## Database Discovery
@@ -602,49 +485,15 @@ The `discovered-from` type is particularly useful for AI-supervised workflows, w
## AI Agent Integration
bd is designed to work seamlessly with AI coding agents:
```bash
# Agent discovers ready work
WORK=$(bd ready --limit 1 --json)
ISSUE_ID=$(echo $WORK | jq -r '.[0].id')
# Agent claims and starts work
bd update $ISSUE_ID --status in_progress --json
# Agent discovers new work while executing
bd create "Fix bug found in testing" -t bug -p 0 --json > new_issue.json
NEW_ID=$(cat new_issue.json | jq -r '.id')
bd dep add $NEW_ID $ISSUE_ID --type discovered-from
# Agent completes work
bd close $ISSUE_ID --reason "Implemented and tested" --json
```
The `--json` flag on every command makes bd perfect for programmatic workflows.
All commands support `--json` for programmatic use. Typical agent workflow: `bd ready --json` → `bd update --status in_progress` → `bd create` (discovered work) → `bd close`
## Ready Work Algorithm
An issue is "ready" if:
- Status is `open`
- It has NO open `blocks` dependencies
- All blockers are either closed or non-existent
Example:
```
bd-1 [open] ← blocks ← bd-2 [open] ← blocks ← bd-3 [open]
```
Ready work: `[bd-1]`
Blocked: `[bd-2, bd-3]`
Issue is "ready" if status is `open` and it has no open `blocks` dependencies.
## Issue Lifecycle
```
open → in_progress → closed
blocked (manually set, or has open blockers)
```
`open → in_progress → closed` (or `blocked` if has open blockers)
## Architecture
@@ -706,36 +555,11 @@ This pattern enables powerful integrations while keeping bd simple and focused.
## Why bd?
**bd is designed for AI coding agents, not humans.**
Traditional issue trackers (Jira, GitHub Issues, Linear) assume humans are the primary users. Humans click through web UIs, drag cards on boards, and manually update status.
bd assumes **AI agents are the primary users**, with humans supervising:
- **Agents discover work** - `bd ready --json` gives agents unblocked tasks to execute
- **Dependencies prevent wasted work** - Agents don't duplicate effort or work on blocked tasks
- **Discovery during execution** - Agents create issues for work they discover while executing, linked with `discovered-from`
- **Agents lose focus** - Long-running conversations can forget tasks; bd remembers everything
- **Humans supervise** - Check on progress with `bd list` and `bd dep tree`, but don't micromanage
In human-managed workflows, issues are planning artifacts. In agent-managed workflows, **issues are memory** - preventing agents from forgetting tasks during long coding sessions.
Traditional issue trackers were built for human project managers. bd is built for autonomous agents.
**bd is designed for AI agents**, not humans. Traditional trackers (Jira, GitHub) require web UIs. bd provides `--json` on all commands, explicit dependency types, and `bd ready` for unblocked work detection. In agent workflows, issues are **memory** - preventing agents from forgetting tasks during long sessions
## Architecture: JSONL + SQLite
bd uses a dual-storage approach:
- **JSONL files** (`.beads/issues.jsonl`) - Source of truth, committed to git
- **SQLite database** (`.beads/*.db`) - Ephemeral cache for fast queries, gitignored
This gives you:
- ✅ **Git-friendly storage** - Text diffs, AI-resolvable conflicts
- ✅ **Fast queries** - SQLite indexes for dependency graphs
- ✅ **Automatic sync** - Auto-export after CRUD ops, auto-import after pulls
- ✅ **No daemon required** - In-process SQLite, ~10-100ms per command
When you run `bd create`, it writes to SQLite. After 5 seconds of inactivity, changes automatically export to JSONL. After `git pull`, the next bd command automatically imports if JSONL is newer. No manual steps needed!
**JSONL** (`.beads/issues.jsonl`) is source of truth, committed to git. **SQLite** (`.beads/*.db`) is ephemeral cache for fast queries, gitignored. Auto-export after CRUD (5s debounce), auto-import after `git pull`. No manual sync needed
## Export/Import (JSONL Format)