- Start with 4-char IDs for small databases (0-500 issues) - Scale to 5-char at 500-1500 issues, 6-char at 1500+ - Configurable via max_collision_prob, min/max_hash_length - Birthday paradox math ensures collision probability stays under threshold - Comprehensive tests and documentation - Collision calculator tool for analysis Also filed bd-aa744b to remove sequential ID code path.
200 lines
5.3 KiB
Markdown
200 lines
5.3 KiB
Markdown
# Adaptive ID Length
|
|
|
|
**Feature:** bd-ea2a13
|
|
**Status:** Implemented (v0.21+)
|
|
|
|
## Overview
|
|
|
|
Beads uses adaptive hash ID lengths that automatically scale based on database size, optimizing for readability in small databases while preventing collisions as databases grow.
|
|
|
|
## Motivation
|
|
|
|
- **Small databases** (0-500 issues): Very short, readable IDs like `bd-a3f2` (4 chars)
|
|
- **Medium databases** (500-1500 issues): Slightly longer IDs like `bd-7f3a8` (5 chars)
|
|
- **Large databases** (1500+ issues): Standard IDs like `bd-7f3a86` (6 chars)
|
|
|
|
Users who actively archive old issues can keep their IDs shorter over time.
|
|
|
|
## How It Works
|
|
|
|
### Birthday Paradox Math
|
|
|
|
The collision probability is calculated using:
|
|
|
|
```
|
|
P(collision) ≈ 1 - e^(-n²/2N)
|
|
```
|
|
|
|
Where:
|
|
- `n` = number of issues in database
|
|
- `N` = total possible IDs (36^length for lowercase alphanumeric)
|
|
|
|
### Default Thresholds (25% max collision)
|
|
|
|
| Database Size | ID Length | Collision Probability |
|
|
|--------------|-----------|----------------------|
|
|
| 0-500 | 4 chars | ~7% at 500 |
|
|
| 501-1500 | 5 chars | ~2% at 1500 |
|
|
| 1501+ | 6 chars | continues scaling |
|
|
|
|
### Collision Resolution
|
|
|
|
If a collision occurs (rare), the algorithm automatically tries:
|
|
1. Base length (e.g., 4 chars)
|
|
2. Base + 1 (e.g., 5 chars)
|
|
3. Base + 2 (e.g., 6 chars)
|
|
|
|
With 10 nonces per length, giving 30 attempts total.
|
|
|
|
## Configuration
|
|
|
|
Adaptive ID length is automatically enabled when using `id_mode=hash`. You can customize the behavior:
|
|
|
|
### Max Collision Probability
|
|
|
|
Default: 25% (0.25)
|
|
|
|
```bash
|
|
# More lenient (allow up to 50% collision probability)
|
|
bd config set max_collision_prob "0.50"
|
|
|
|
# Stricter (only allow 1% collision probability)
|
|
bd config set max_collision_prob "0.01"
|
|
```
|
|
|
|
### Minimum Hash Length
|
|
|
|
Default: 4 chars
|
|
|
|
```bash
|
|
# Start with 5-char IDs minimum
|
|
bd config set min_hash_length "5"
|
|
|
|
# Very short IDs (use with caution)
|
|
bd config set min_hash_length "3"
|
|
```
|
|
|
|
### Maximum Hash Length
|
|
|
|
Default: 8 chars
|
|
|
|
```bash
|
|
# Allow even longer IDs for huge databases
|
|
bd config set max_hash_length "10"
|
|
```
|
|
|
|
## Examples
|
|
|
|
### Default Configuration
|
|
|
|
```bash
|
|
# Initialize with hash IDs
|
|
bd init --id-mode hash --prefix myproject
|
|
|
|
# First 500 issues get 4-char IDs
|
|
bd create "Fix bug" -p 1
|
|
# → myproject-a3f2
|
|
|
|
# After 1000 issues, switches to 5-char IDs
|
|
bd create "Add feature" -p 1
|
|
# → myproject-7f3a8c
|
|
|
|
# At 10,000 issues, uses 6-char IDs
|
|
bd create "Refactor" -p 1
|
|
# → myproject-b9d1e4
|
|
```
|
|
|
|
### Custom Configuration
|
|
|
|
```bash
|
|
# Very strict collision tolerance
|
|
bd config set max_collision_prob "0.01"
|
|
|
|
# With 1% threshold and 100 issues, uses 4-char IDs
|
|
# (collision probability is ~0.3% with 4 chars)
|
|
|
|
# Force minimum 5-char IDs for consistency
|
|
bd config set min_hash_length "5"
|
|
|
|
# All IDs will be at least 5 chars now
|
|
bd create "Task" -p 1
|
|
# → myproject-7f3a8
|
|
```
|
|
|
|
## Collision Probability Table
|
|
|
|
Use `scripts/collision-calculator.go` to explore collision probabilities:
|
|
|
|
```bash
|
|
go run scripts/collision-calculator.go
|
|
```
|
|
|
|
Output shows:
|
|
- Collision probabilities for different database sizes and ID lengths
|
|
- Recommended ID lengths for different thresholds
|
|
- Expected number of collisions
|
|
- Adaptive scaling strategy
|
|
|
|
## Implementation Details
|
|
|
|
### Location
|
|
|
|
- Algorithm: `internal/storage/sqlite/adaptive_length.go`
|
|
- ID generation: `internal/storage/sqlite/sqlite.go` (`generateHashID`)
|
|
- Tests: `internal/storage/sqlite/adaptive_length_test.go`
|
|
- E2E tests: `internal/storage/sqlite/adaptive_e2e_test.go`
|
|
|
|
### Database Schema
|
|
|
|
Configuration is stored in the `config` table:
|
|
|
|
```sql
|
|
INSERT INTO config (key, value) VALUES ('max_collision_prob', '0.25');
|
|
INSERT INTO config (key, value) VALUES ('min_hash_length', '4');
|
|
INSERT INTO config (key, value) VALUES ('max_hash_length', '8');
|
|
```
|
|
|
|
### Performance
|
|
|
|
- Collision probability calculation: ~10ns per call
|
|
- ID generation with adaptive length: ~300ns (same as before)
|
|
- Database query to count issues: ~100μs (cached by SQLite)
|
|
|
|
## Migration
|
|
|
|
### Existing Databases
|
|
|
|
Existing databases with 6-char IDs will:
|
|
1. Continue using 6-char IDs by default
|
|
2. Can opt into adaptive mode by setting config (new IDs will use adaptive length)
|
|
3. Old IDs remain unchanged
|
|
|
|
### Sequential to Hash Migration
|
|
|
|
When migrating from sequential IDs to hash IDs with `bd migrate --to-hash-ids`:
|
|
- Uses adaptive length algorithm for new IDs
|
|
- Preserves existing sequential IDs
|
|
- References are automatically updated
|
|
|
|
## Best Practices
|
|
|
|
1. **Default is good**: The 25% threshold works well for most use cases
|
|
2. **Active archival**: Delete closed issues to keep database small and IDs short
|
|
3. **Consistency**: Set `min_hash_length` if you want all IDs to be same length
|
|
4. **Monitoring**: Run collision calculator periodically to check health
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements (not yet implemented):
|
|
|
|
- **Automatic scaling notifications**: Warn when approaching threshold
|
|
- **Per-workspace thresholds**: Different configs for different projects
|
|
- **Dynamic adjustment**: Auto-adjust threshold based on observed collision rate
|
|
- **Compaction-aware**: Don't count compacted issues in collision calculation
|
|
|
|
## Related
|
|
|
|
- [Hash ID Design](HASH_ID_DESIGN.md) - Overview of hash-based IDs
|
|
- [Migration Guide](../README.md#migration) - Converting from sequential to hash IDs
|
|
- [Configuration](../CONFIG.md) - All configuration options
|