# Adaptive ID Length

**Feature:** bd-ea2a13  
**Status:** Implemented (v0.21+)

## Overview

Beads uses adaptive hash ID lengths that automatically scale based on database size, optimizing for readability in small databases while preventing collisions as databases grow.

## Motivation

- **Small databases** (0-500 issues): Very short, readable IDs like `bd-a3f2` (4 chars)
- **Medium databases** (500-1500 issues): Slightly longer IDs like `bd-7f3a8` (5 chars)
- **Large databases** (1500+ issues): Standard IDs like `bd-7f3a86` (6 chars)

Users who actively archive old issues can keep their IDs shorter over time.

## How It Works

### Birthday Paradox Math

The collision probability is calculated using:

```
P(collision) ≈ 1 - e^(-n²/2N)
```

Where:
- `n` = number of issues in database
- `N` = total possible IDs (36^length for lowercase alphanumeric)

### Default Thresholds (25% max collision)

| Database Size | ID Length | Collision Probability |
|--------------|-----------|----------------------|
| 0-500        | 4 chars   | ~7% at 500           |
| 501-1500     | 5 chars   | ~2% at 1500          |
| 1501+        | 6 chars   | continues scaling    |

### Collision Resolution

If a collision occurs (rare), the algorithm automatically tries:
1. Base length (e.g., 4 chars)
2. Base + 1 (e.g., 5 chars)
3. Base + 2 (e.g., 6 chars)

With 10 nonces per length, giving 30 attempts total.

## Configuration

Adaptive ID length is automatically enabled when using `id_mode=hash`. You can customize the behavior:

### Max Collision Probability

Default: 25% (0.25)

```bash
# More lenient (allow up to 50% collision probability)
bd config set max_collision_prob "0.50"

# Stricter (only allow 1% collision probability)
bd config set max_collision_prob "0.01"
```

### Minimum Hash Length

Default: 4 chars

```bash
# Start with 5-char IDs minimum
bd config set min_hash_length "5"

# Very short IDs (use with caution)
bd config set min_hash_length "3"
```

### Maximum Hash Length

Default: 8 chars

```bash
# Allow even longer IDs for huge databases
bd config set max_hash_length "10"
```

## Examples

### Default Configuration

```bash
# Initialize with hash IDs
bd init --id-mode hash --prefix myproject

# First 500 issues get 4-char IDs
bd create "Fix bug" -p 1
# → myproject-a3f2

# After 1000 issues, switches to 5-char IDs
bd create "Add feature" -p 1
# → myproject-7f3a8c

# At 10,000 issues, uses 6-char IDs
bd create "Refactor" -p 1
# → myproject-b9d1e4
```

### Custom Configuration

```bash
# Very strict collision tolerance
bd config set max_collision_prob "0.01"

# With 1% threshold and 100 issues, uses 4-char IDs
# (collision probability is ~0.3% with 4 chars)

# Force minimum 5-char IDs for consistency
bd config set min_hash_length "5"

# All IDs will be at least 5 chars now
bd create "Task" -p 1
# → myproject-7f3a8
```

## Collision Probability Table

Use `scripts/collision-calculator.go` to explore collision probabilities:

```bash
go run scripts/collision-calculator.go
```

Output shows:
- Collision probabilities for different database sizes and ID lengths
- Recommended ID lengths for different thresholds
- Expected number of collisions
- Adaptive scaling strategy

## Implementation Details

### Location

- Algorithm: `internal/storage/sqlite/adaptive_length.go`
- ID generation: `internal/storage/sqlite/sqlite.go` (`generateHashID`)
- Tests: `internal/storage/sqlite/adaptive_length_test.go`
- E2E tests: `internal/storage/sqlite/adaptive_e2e_test.go`

### Database Schema

Configuration is stored in the `config` table:

```sql
INSERT INTO config (key, value) VALUES ('max_collision_prob', '0.25');
INSERT INTO config (key, value) VALUES ('min_hash_length', '4');
INSERT INTO config (key, value) VALUES ('max_hash_length', '8');
```

### Performance

- Collision probability calculation: ~10ns per call
- ID generation with adaptive length: ~300ns (same as before)
- Database query to count issues: ~100μs (cached by SQLite)

## Migration

### Existing Databases

Existing databases with 6-char IDs will:
1. Continue using 6-char IDs by default
2. Can opt into adaptive mode by setting config (new IDs will use adaptive length)
3. Old IDs remain unchanged

### Sequential to Hash Migration

When migrating from sequential IDs to hash IDs with `bd migrate --to-hash-ids`:
- Uses adaptive length algorithm for new IDs
- Preserves existing sequential IDs
- References are automatically updated

## Best Practices

1. **Default is good**: The 25% threshold works well for most use cases
2. **Active archival**: Delete closed issues to keep database small and IDs short
3. **Consistency**: Set `min_hash_length` if you want all IDs to be same length
4. **Monitoring**: Run collision calculator periodically to check health

## Future Enhancements

Potential improvements (not yet implemented):

- **Automatic scaling notifications**: Warn when approaching threshold
- **Per-workspace thresholds**: Different configs for different projects
- **Dynamic adjustment**: Auto-adjust threshold based on observed collision rate
- **Compaction-aware**: Don't count compacted issues in collision calculation

## Related

- [Migration Guide](../README.md#migration) - Converting from sequential to hash IDs
- [Configuration](CONFIG.md) - All configuration options