Fixes 46 broken relative links across 20 documentation files in docs/. Changes: - Added ../ prefix for root files (README.md, AGENTS.md) - Removed ../ prefix for sibling docs files - Removed dead links to non-existent files - Fixed subdirectory paths Based on PR #574 by chrisvaillancourt. Co-Authored-By: Chris Vaillancourt <chrisvaillancourt@users.noreply.github.com>
4.8 KiB
Hash ID Collision Mathematics
This document explains the collision probability calculations for beads' adaptive hash-based IDs and the thresholds used for automatic length scaling.
Birthday Paradox Formula
The collision probability for hash IDs is calculated using the birthday paradox:
P(collision) ≈ 1 - e^(-n²/2N)
Where:
- n = number of issues in database
- N = total possible IDs = 36^length (lowercase alphanumeric:
[a-z0-9])
Collision Probability Table
| DB Size | 4-char | 5-char | 6-char | 7-char | 8-char |
|---|---|---|---|---|---|
| 50 | 0.07% | 0.00% | 0.00% | 0.00% | 0.00% |
| 100 | 0.30% | 0.01% | 0.00% | 0.00% | 0.00% |
| 200 | 1.18% | 0.03% | 0.00% | 0.00% | 0.00% |
| 500 | 7.17% | 0.21% | 0.01% | 0.00% | 0.00% |
| 1,000 | 25.75% | 0.82% | 0.02% | 0.00% | 0.00% |
| 2,000 | 69.60% | 3.25% | 0.09% | 0.00% | 0.00% |
| 5,000 | 99.94% | 18.68% | 0.57% | 0.02% | 0.00% |
| 10,000 | 100% | 56.26% | 2.27% | 0.06% | 0.00% |
Key Insights
- 4-char IDs are safe up to ~500 issues (7% collision risk)
- 5-char IDs are safe up to ~1,500 issues (2% collision risk)
- 6-char IDs are safe up to ~10,000 issues (2% collision risk)
- 7-char IDs support 100,000+ issues with negligible collision risk
- 8-char IDs support millions of issues
Expected Number of Collisions
This shows the average number of actual hash collisions you'll encounter:
| DB Size | 4-char | 5-char | 6-char | 7-char | 8-char |
|---|---|---|---|---|---|
| 100 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 500 | 0.07 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1,000 | 0.30 | 0.01 | 0.00 | 0.00 | 0.00 |
| 2,000 | 1.19 | 0.03 | 0.00 | 0.00 | 0.00 |
| 5,000 | 7.44 | 0.21 | 0.01 | 0.00 | 0.00 |
| 10,000 | 29.77 | 0.83 | 0.02 | 0.00 | 0.00 |
Example: With 5,000 issues using 4-char IDs, you'll likely see ~7 hash collisions (automatically retried with +1 nonce).
Adaptive Scaling Strategy
Beads automatically increases ID length when the collision probability exceeds 25% (configurable via max_collision_prob).
Default Thresholds (25% max collision)
| Database Size | ID Length | Collision Probability at Max |
|---|---|---|
| 0-500 | 4 chars | 7.17% at 500 issues |
| 501-1,500 | 5 chars | 1.84% at 1,500 issues |
| 1,501-5,000 | 5 chars | 18.68% at 5,000 issues |
| 5,001-15,000 | 6 chars | 5.04% at 15,000 issues |
| 15,001+ | continues scaling as needed |
Why 25%?
The 25% threshold balances:
- Readability: Keep IDs short for small databases
- Safety: Avoid frequent collision retries
- Scalability: Grow gracefully as database expands
Even at 25% collision probability, the expected number of actual collisions is low (< 1 collision per 1,000 issues created).
Alternative Thresholds
You can customize the threshold with bd config set max_collision_prob <value>:
Conservative (10% threshold)
| DB Size | ID Length |
|---|---|
| 0-200 | 4 chars |
| 201-1,000 | 5 chars |
| 1,001-5,000 | 6 chars |
| 5,001+ | continues scaling |
Aggressive (50% threshold)
| DB Size | ID Length |
|---|---|
| 0-500 | 4 chars |
| 501-2,000 | 5 chars |
| 2,001-10,000 | 6 chars |
| 10,001+ | continues scaling |
Collision Resolution
When a hash collision occurs (same ID generated twice), beads automatically:
- Tries base length with different nonce (10 attempts)
- Tries base+1 length with different nonce (10 attempts)
- Tries base+2 length with different nonce (10 attempts)
Total: 30 attempts before failing (astronomically unlikely).
Example with 4-char base:
bd-a3f2(nonce 0) - collision!bd-a3f2(nonce 1) - collision again!bd-b7d4(nonce 2) - success! ✓
Mathematical Properties
ID Space Size
| Length | Possible IDs | Notation |
|---|---|---|
| 3 chars | 46,656 | 36³ |
| 4 chars | 1,679,616 | 36⁴ ≈ 1.7M |
| 5 chars | 60,466,176 | 36⁵ ≈ 60M |
| 6 chars | 2,176,782,336 | 36⁶ ≈ 2.2B |
| 7 chars | 78,364,164,096 | 36⁷ ≈ 78B |
| 8 chars | 2,821,109,907,456 | 36⁸ ≈ 2.8T |
Why Alphanumeric?
Using [a-z0-9] (36 characters) instead of hex (16 characters):
- 4-char alphanumeric ≈ 6-char hex in capacity
- More readable:
bd-a3f2vsbd-a3f2e1 - Easier to type and communicate
Verification
Run the collision calculator yourself:
go run scripts/collision-calculator.go
This generates the tables above and shows adaptive scaling strategy for any threshold.
Related Documentation
- ADAPTIVE_IDS.md - Configuration and usage guide
- CONFIG.md - All configuration options