Condense COMPACTION.md into README and make README more succinct

This commit is contained in:
Steve Yegge
2025-10-16 15:22:44 -07:00
parent 1eb59fa120
commit a7a4600b31
3 changed files with 22 additions and 647 deletions

View File

@@ -81,8 +81,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Community ### Community
- Merged PR #31: Windows Defender mitigation for export - Merged PR #31: Windows Defender mitigation for export
- Merged PR #37: Fix NULL handling in statistics
- Merged PR #38: Nix flake for declarative builds - Merged PR #38: Nix flake for declarative builds
- Merged PR #40: MCP integration test fixes - Merged PR #40: MCP integration test fixes
- Merged PR #45: Label and title filtering for bd list
- Merged PR #46: Add --format flag to bd list - Merged PR #46: Add --format flag to bd list
- Merged PR #47: Error handling consistency - Merged PR #47: Error handling consistency
- Merged PR #48: Cyclomatic complexity reduction - Merged PR #48: Cyclomatic complexity reduction

View File

@@ -1,451 +0,0 @@
# Database Compaction Guide
## Overview
Beads compaction is **agentic memory decay** - your database naturally forgets fine-grained details of old work while preserving the essential context agents need. This keeps your database lightweight and fast, even after thousands of issues.
### Key Concepts
- **Semantic compression**: Claude Haiku summarizes issues intelligently, preserving decisions and outcomes
- **Two-tier system**: Gradual decay from full detail → summary → ultra-brief
- **Permanent decay**: Original content is discarded to save space (not reversible)
- **Safe by design**: Dry-run preview, eligibility checks, git history preserves old versions
## How It Works
### Tier 1: Semantic Compression (30+ days)
**Target**: Closed issues 30+ days old with no open dependents
**Process**:
1. Check eligibility (closed, 30+ days, no blockers)
2. Send to Claude Haiku for summarization
3. Replace verbose fields with concise summary
4. Store original size for statistics
**Result**: 70-80% space reduction
**Example**:
*Before (856 bytes):*
```
Title: Fix authentication race condition in login flow
Description: Users report intermittent 401 errors during concurrent
login attempts. The issue occurs when multiple requests hit the auth
middleware simultaneously...
Design: [15 lines of implementation details]
Acceptance Criteria: [8 test scenarios]
Notes: [debugging session notes]
```
*After (171 bytes):*
```
Title: Fix authentication race condition in login flow
Description: Fixed race condition in auth middleware causing 401s
during concurrent logins. Added mutex locks and updated tests.
Resolution: Deployed in v1.2.3.
```
### Tier 2: Ultra Compression (90+ days)
**Target**: Tier 1 issues 90+ days old, rarely referenced
**Process**:
1. Verify existing Tier 1 compaction
2. Check reference frequency (git commits, other issues)
3. Ultra-compress to single paragraph
4. Optionally prune events (keep created/closed only)
**Result**: 90-95% space reduction
**Example**:
*After Tier 2 (43 bytes):*
```
Description: Auth race condition fixed, deployed v1.2.3.
```
## CLI Reference
### Preview Candidates
```bash
# See what would be compacted
bd compact --dry-run --all
# Check Tier 2 candidates
bd compact --dry-run --all --tier 2
# Preview specific issue
bd compact --dry-run --id bd-42
```
### Compact Issues
```bash
# Compact all eligible issues (Tier 1)
bd compact --all
# Compact specific issue
bd compact --id bd-42
# Force compact (bypass checks - use with caution)
bd compact --id bd-42 --force
# Tier 2 ultra-compression
bd compact --all --tier 2
# Control parallelism
bd compact --all --workers 10 --batch-size 20
```
### Statistics & Monitoring
```bash
# Show compaction stats
bd compact --stats
# Output:
# Total issues: 2,438
# Compacted: 847 (34.7%)
# Tier 1: 812 issues
# Tier 2: 35 issues
# Space saved: 1.2 MB (68% reduction)
# Estimated cost: $0.85
```
## Eligibility Rules
### Tier 1 Eligibility
- ✅ Status: `closed`
- ✅ Age: 30+ days since `closed_at`
- ✅ Dependents: No open issues depending on this one
- ✅ Not already compacted
### Tier 2 Eligibility
- ✅ Already Tier 1 compacted
- ✅ Age: 90+ days since `closed_at`
- ✅ Low reference frequency:
- Mentioned in <5 git commits in last 90 days, OR
- Referenced by <3 issues created in last 90 days
## Configuration
### API Key Setup
**Option 1: Environment variable (recommended)**
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```
Add to your shell profile (`~/.zshrc`, `~/.bashrc`, etc.) for persistence.
**Option 2: CI/CD environments**
```yaml
# GitHub Actions
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
# GitLab CI
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
```
### Parallel Processing
Control performance vs. API rate limits:
```bash
# Default: 5 workers, 10 issues per batch
bd compact --all
# High throughput (watch rate limits!)
bd compact --all --workers 20 --batch-size 50
# Conservative (avoid rate limits)
bd compact --all --workers 2 --batch-size 5
```
## Cost Analysis
### Pricing Basics
Compaction uses Claude Haiku (~$1 per 1M input tokens, ~$5 per 1M output tokens).
Typical issue:
- Input: ~500 tokens (issue content)
- Output: ~100 tokens (summary)
- Cost per issue: ~$0.001 (0.1¢)
### Cost Examples
| Issues | Est. Cost | Time (5 workers) |
|--------|-----------|------------------|
| 100 | $0.10 | ~2 minutes |
| 1,000 | $1.00 | ~20 minutes |
| 10,000 | $10.00 | ~3 hours |
### Monthly Cost Estimate
If you close 50 issues/month and compact monthly:
- **Monthly cost**: $0.05
- **Annual cost**: $0.60
Even large teams (500 issues/month) pay ~$6/year.
### Space Savings
| Database Size | Issues | After Tier 1 | After Tier 2 |
|---------------|--------|--------------|--------------|
| 10 MB | 2,000 | 3 MB (-70%) | 1 MB (-90%) |
| 100 MB | 20,000 | 30 MB (-70%) | 10 MB (-90%) |
| 1 GB | 200,000| 300 MB (-70%)| 100 MB (-90%)|
## Automation
### Monthly Cron Job
```bash
#!/bin/bash
# /etc/cron.monthly/bd-compact.sh
export ANTHROPIC_API_KEY="sk-ant-..."
cd /path/to/your/repo
# Compact Tier 1
bd compact --all 2>&1 | tee -a ~/.bd-compact.log
# Commit results
git add .beads/issues.jsonl issues.db
git commit -m "Monthly compaction: $(date +%Y-%m)"
git push
```
Make executable:
```bash
chmod +x /etc/cron.monthly/bd-compact.sh
```
### Automated Workflow Script
```bash
#!/bin/bash
# examples/compaction/workflow.sh
# Exit on error
set -e
echo "=== BD Compaction Workflow ==="
echo "Date: $(date)"
echo
# Check API key
if [ -z "$ANTHROPIC_API_KEY" ]; then
echo "Error: ANTHROPIC_API_KEY not set"
exit 1
fi
# Preview candidates
echo "--- Preview Tier 1 Candidates ---"
bd compact --dry-run --all
read -p "Proceed with Tier 1 compaction? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "--- Running Tier 1 Compaction ---"
bd compact --all
fi
# Preview Tier 2
echo
echo "--- Preview Tier 2 Candidates ---"
bd compact --dry-run --all --tier 2
read -p "Proceed with Tier 2 compaction? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "--- Running Tier 2 Compaction ---"
bd compact --all --tier 2
fi
# Show stats
echo
echo "--- Final Statistics ---"
bd compact --stats
echo
echo "=== Compaction Complete ==="
```
### Pre-commit Hook (Automatic)
```bash
#!/bin/bash
# .git/hooks/pre-commit
# Auto-compact before each commit (optional, experimental)
if command -v bd &> /dev/null && [ -n "$ANTHROPIC_API_KEY" ]; then
bd compact --all --dry-run > /dev/null 2>&1
# Only compact if >10 eligible issues
ELIGIBLE=$(bd compact --dry-run --all --json 2>/dev/null | jq '. | length')
if [ "$ELIGIBLE" -gt 10 ]; then
echo "Auto-compacting $ELIGIBLE eligible issues..."
bd compact --all
fi
fi
```
## Safety & Recovery
### Git History
Compaction is permanent - the original content is discarded to save space. However, you can recover old versions from git history:
```bash
# View issue before compaction
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-42"
# Checkout old version
git checkout <commit-hash> -- .beads/issues.jsonl
# Or use git show
git show <commit-hash>:.beads/issues.jsonl | grep -A 50 "bd-42"
```
### Verification
After compaction, verify with:
```bash
# Check compaction stats
bd compact --stats
# Spot-check compacted issues
bd show bd-42
```
## Troubleshooting
### "ANTHROPIC_API_KEY not set"
```bash
export ANTHROPIC_API_KEY="sk-ant-..."
# Add to ~/.zshrc or ~/.bashrc for persistence
```
### Rate Limit Errors
Reduce parallelism:
```bash
bd compact --all --workers 2 --batch-size 5
```
Or add delays between batches (future enhancement).
### Issue Not Eligible
Check eligibility:
```bash
bd compact --dry-run --id bd-42
```
Force compact (if you know what you're doing):
```bash
bd compact --id bd-42 --force
```
## FAQ
### When should I compact?
- **Small projects (<500 issues)**: Rarely needed, maybe annually
- **Medium projects (500-5000 issues)**: Every 3-6 months
- **Large projects (5000+ issues)**: Monthly or quarterly
- **High-velocity teams**: Set up automated monthly compaction
### Can I recover compacted issues?
Compaction is permanent, but you can recover from git history:
```bash
git log -p -- .beads/issues.jsonl | grep -A 50 "bd-42"
```
### What happens to dependencies?
Dependencies are preserved. Compaction only affects the issue's text fields (description, design, notes, acceptance criteria).
### Does compaction affect git history?
No. Old versions of issues remain in git history. Compaction only affects the current state in `.beads/issues.jsonl` and `issues.db`.
### Should I commit compacted issues?
**Yes.** Compaction modifies both the database and JSONL. Commit and push:
```bash
git add .beads/issues.jsonl issues.db
git commit -m "Compact old closed issues"
git push
```
### What if my team disagrees on compaction frequency?
Use `bd compact --dry-run` to preview. Discuss the candidates before running. Since compaction is permanent, get team consensus first.
### Can I compact open issues?
No. Compaction only works on closed issues to ensure active work retains full detail.
### How does Tier 2 decide "rarely referenced"?
It checks:
1. Git commits mentioning the issue ID in last 90 days
2. Other issues referencing it in descriptions/notes
If references are low (< 5 commits or < 3 issues), it's eligible for Tier 2.
### Does compaction slow down queries?
No. Compaction reduces database size, making queries faster. Agents benefit from smaller context when reading issues.
### Can I customize the summarization prompt?
Not yet, but it's planned (bd-264). The current prompt is optimized for preserving key decisions and outcomes.
## Best Practices
1. **Start with dry-run**: Always preview before compacting
2. **Compact regularly**: Monthly or quarterly depending on project size
3. **Monitor costs**: Use `bd compact --stats` to track savings
4. **Automate it**: Set up cron jobs for hands-off maintenance
5. **Commit results**: Always commit and push after compaction
6. **Team communication**: Let team know before large compaction runs (it's permanent!)
## Examples
See [examples/compaction/](examples/compaction/) for:
- `workflow.sh` - Interactive compaction workflow
- `cron-compact.sh` - Automated monthly compaction
- `auto-compact.sh` - Smart auto-compaction with thresholds
## Related Documentation
- [README.md](README.md) - Quick start and overview
- [EXTENDING.md](EXTENDING.md) - Database schema and extensions
- [GIT_WORKFLOW.md](GIT_WORKFLOW.md) - Multi-machine collaboration
## Contributing
Found a bug or have ideas for improving compaction? Open an issue or PR!
Common enhancement requests:
- Custom summarization prompts (bd-264)
- Alternative LLM backends (local models)
- Configurable eligibility rules
- Compaction analytics dashboard
- Optional snapshot retention for restore (if requested)

216
README.md
View File

@@ -281,87 +281,7 @@ Options:
#### Creating Issues from Markdown #### Creating Issues from Markdown
You can draft multiple issues in a markdown file and create them all at once. This is useful for planning features or converting written notes into tracked work. Draft multiple issues in a markdown file with `bd create -f file.md`. Format: `## Issue Title` creates new issue, optional sections: `### Priority`, `### Type`, `### Description`, `### Assignee`, `### Labels`, `### Dependencies`. Defaults: Priority=2, Type=task
Markdown format:
```markdown
## Issue Title
Optional description text here.
### Priority
1
### Type
feature
### Description
More detailed description (overrides text after title).
### Design
Design notes and implementation details.
### Acceptance Criteria
- Must do this
- Must do that
### Assignee
username
### Labels
label1, label2, label3
### Dependencies
bd-10, bd-20
```
Example markdown file (`auth-improvements.md`):
```markdown
## Add OAuth2 support
We need to support OAuth2 authentication.
### Priority
1
### Type
feature
### Assignee
alice
### Labels
auth, high-priority
## Add rate limiting
### Priority
0
### Type
bug
### Description
Auth endpoints are vulnerable to brute force attacks.
### Labels
security, urgent
```
Create all issues:
```bash
bd create -f auth-improvements.md
# ✓ Created 2 issues from auth-improvements.md:
# bd-42: Add OAuth2 support [P1, feature]
# bd-43: Add rate limiting [P0, bug]
```
**Notes:**
- Each `## Heading` creates a new issue
- Sections (`### Priority`, `### Type`, etc.) are optional
- Defaults: Priority=2, Type=task
- Text immediately after the title becomes the description (unless overridden by `### Description`)
- All standard issue fields are supported
### Viewing Issues ### Viewing Issues
@@ -419,26 +339,7 @@ bd dep cycles
#### Cycle Prevention #### Cycle Prevention
beads maintains a directed acyclic graph (DAG) of dependencies and prevents cycles across **all** dependency types. This ensures: Beads maintains a DAG and prevents cycles across all dependency types. Cycles break ready work detection and tree traversals. Attempting to add a cycle-creating dependency returns an error
- **Ready work is accurate**: Cycles can hide issues from `bd ready` by making them appear blocked when they're actually part of a circular dependency
- **Dependencies are clear**: Circular dependencies are semantically confusing (if A depends on B and B depends on A, which should be done first?)
- **Traversals work correctly**: Commands like `bd dep tree` rely on DAG structure
**Example - Prevented Cycle:**
```bash
bd dep add bd-1 bd-2 # bd-1 blocks on bd-2 ✓
bd dep add bd-2 bd-3 # bd-2 blocks on bd-3 ✓
bd dep add bd-3 bd-1 # ERROR: would create cycle bd-3 → bd-1 → bd-2 → bd-3 ✗
```
Cross-type cycles are also prevented:
```bash
bd dep add bd-1 bd-2 --type blocks # bd-1 blocks on bd-2 ✓
bd dep add bd-2 bd-1 --type parent-child # ERROR: would create cycle ✗
```
If you try to add a dependency that creates a cycle, you'll get a clear error message. After successfully adding dependencies, beads will warn you if any cycles are detected elsewhere in the graph.
### Finding Work ### Finding Work
@@ -461,44 +362,26 @@ bd ready --json
### Compaction (Memory Decay) ### Compaction (Memory Decay)
Beads can semantically compress old closed issues to keep the database lightweight. This is agentic memory decay - the database naturally forgets details over time while preserving essential context. Beads uses AI to compress old closed issues, keeping databases lightweight as they age. This is agentic memory decay - your database naturally forgets fine-grained details while preserving essential context agents need.
```bash ```bash
# Preview what would be compacted bd compact --dry-run --all # Preview candidates
bd compact --dry-run --all bd compact --stats # Show statistics
bd compact --all # Compact eligible issues (30+ days closed)
# Show compaction statistics bd compact --tier 2 --all # Ultra-compress (90+ days, rarely referenced)
bd compact --stats
# Compact all eligible issues (30+ days closed, no open dependents)
bd compact --all
# Compact specific issue
bd compact --id bd-42
# Force compact (bypass eligibility checks)
bd compact --id bd-42 --force
# Tier 2 ultra-compression (90+ days, 95% reduction)
bd compact --tier 2 --all
``` ```
Compaction uses Claude Haiku to semantically summarize issues: Uses Claude Haiku for semantic summarization. **Tier 1** (30+ days): 70-80% reduction. **Tier 2** (90+ days, low references): 90-95% reduction. Requires `ANTHROPIC_API_KEY`. Cost: ~$1 per 1,000 issues.
- **Tier 1**: 70-80% space reduction (30+ days closed)
- **Tier 2**: 90-95% space reduction (90+ days closed, rarely referenced)
**Requirements:** Eligibility: Must be closed with no open dependents. Tier 2 requires low reference frequency (<5 commits or <3 issues in last 90 days).
- Set `ANTHROPIC_API_KEY` environment variable
- Cost: ~$1 per 1,000 issues compacted (Haiku pricing)
**Eligibility:** **Permanent:** Original content is discarded. Recover old versions from git history if needed.
- Status: closed
- Tier 1: 30+ days since closed, no open dependents
- Tier 2: 90+ days since closed, rarely referenced in commits/issues
**Note:** Compaction is permanent graceful decay - original content is discarded to save space. Use git history to recover old versions if needed. **Automation:**
```bash
See [COMPACTION.md](COMPACTION.md) for detailed documentation, cost analysis, and automation examples. # Monthly cron
0 0 1 * * bd compact --all && git add .beads && git commit -m "Monthly compaction"
```
## Database Discovery ## Database Discovery
@@ -602,49 +485,15 @@ The `discovered-from` type is particularly useful for AI-supervised workflows, w
## AI Agent Integration ## AI Agent Integration
bd is designed to work seamlessly with AI coding agents: All commands support `--json` for programmatic use. Typical agent workflow: `bd ready --json` → `bd update --status in_progress` → `bd create` (discovered work) → `bd close`
```bash
# Agent discovers ready work
WORK=$(bd ready --limit 1 --json)
ISSUE_ID=$(echo $WORK | jq -r '.[0].id')
# Agent claims and starts work
bd update $ISSUE_ID --status in_progress --json
# Agent discovers new work while executing
bd create "Fix bug found in testing" -t bug -p 0 --json > new_issue.json
NEW_ID=$(cat new_issue.json | jq -r '.id')
bd dep add $NEW_ID $ISSUE_ID --type discovered-from
# Agent completes work
bd close $ISSUE_ID --reason "Implemented and tested" --json
```
The `--json` flag on every command makes bd perfect for programmatic workflows.
## Ready Work Algorithm ## Ready Work Algorithm
An issue is "ready" if: Issue is "ready" if status is `open` and it has no open `blocks` dependencies.
- Status is `open`
- It has NO open `blocks` dependencies
- All blockers are either closed or non-existent
Example:
```
bd-1 [open] ← blocks ← bd-2 [open] ← blocks ← bd-3 [open]
```
Ready work: `[bd-1]`
Blocked: `[bd-2, bd-3]`
## Issue Lifecycle ## Issue Lifecycle
``` `open → in_progress → closed` (or `blocked` if has open blockers)
open → in_progress → closed
blocked (manually set, or has open blockers)
```
## Architecture ## Architecture
@@ -706,36 +555,11 @@ This pattern enables powerful integrations while keeping bd simple and focused.
## Why bd? ## Why bd?
**bd is designed for AI coding agents, not humans.** **bd is designed for AI agents**, not humans. Traditional trackers (Jira, GitHub) require web UIs. bd provides `--json` on all commands, explicit dependency types, and `bd ready` for unblocked work detection. In agent workflows, issues are **memory** - preventing agents from forgetting tasks during long sessions
Traditional issue trackers (Jira, GitHub Issues, Linear) assume humans are the primary users. Humans click through web UIs, drag cards on boards, and manually update status.
bd assumes **AI agents are the primary users**, with humans supervising:
- **Agents discover work** - `bd ready --json` gives agents unblocked tasks to execute
- **Dependencies prevent wasted work** - Agents don't duplicate effort or work on blocked tasks
- **Discovery during execution** - Agents create issues for work they discover while executing, linked with `discovered-from`
- **Agents lose focus** - Long-running conversations can forget tasks; bd remembers everything
- **Humans supervise** - Check on progress with `bd list` and `bd dep tree`, but don't micromanage
In human-managed workflows, issues are planning artifacts. In agent-managed workflows, **issues are memory** - preventing agents from forgetting tasks during long coding sessions.
Traditional issue trackers were built for human project managers. bd is built for autonomous agents.
## Architecture: JSONL + SQLite ## Architecture: JSONL + SQLite
bd uses a dual-storage approach: **JSONL** (`.beads/issues.jsonl`) is source of truth, committed to git. **SQLite** (`.beads/*.db`) is ephemeral cache for fast queries, gitignored. Auto-export after CRUD (5s debounce), auto-import after `git pull`. No manual sync needed
- **JSONL files** (`.beads/issues.jsonl`) - Source of truth, committed to git
- **SQLite database** (`.beads/*.db`) - Ephemeral cache for fast queries, gitignored
This gives you:
- ✅ **Git-friendly storage** - Text diffs, AI-resolvable conflicts
- ✅ **Fast queries** - SQLite indexes for dependency graphs
- ✅ **Automatic sync** - Auto-export after CRUD ops, auto-import after pulls
- ✅ **No daemon required** - In-process SQLite, ~10-100ms per command
When you run `bd create`, it writes to SQLite. After 5 seconds of inactivity, changes automatically export to JSONL. After `git pull`, the next bd command automatically imports if JSONL is newer. No manual steps needed!
## Export/Import (JSONL Format) ## Export/Import (JSONL Format)