Cache compiled regexes in ID replacement for 1.9x performance boost

Implements bd-27: Cache compiled regexes in replaceIDReferences for performance

Problem:
replaceIDReferences() was compiling regex patterns on every call. With 100
issues and 10 ID mappings, that resulted in 4,000 regex compilations (100
issues × 4 text fields × 10 ID mappings).

Solution:
- Added buildReplacementCache() to pre-compile all regexes once
- Added replaceIDReferencesWithCache() to reuse compiled regexes
- Updated updateReferences() to build cache once and reuse for all issues
- Kept replaceIDReferences() for backward compatibility (calls cached version)

Performance Results (from benchmarks):
Single text:
- 1.33x faster (26,162 ns → 19,641 ns)
- 68% less memory (25,769 B → 8,241 B)
- 80% fewer allocations (278 → 55)

Real-world (400 texts, 10 mappings):
- 1.89x faster (5.1ms → 2.7ms)
- 90% less memory (7.7 MB → 0.8 MB)
- 86% fewer allocations (104,112 → 14,801)

Tests:
- All existing tests pass
- Added 3 benchmark tests demonstrating improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Steve Yegge
2025-10-13 23:50:48 -07:00
parent 33412871eb
commit 25644d9717
3 changed files with 155 additions and 29 deletions

View File

@@ -17,7 +17,7 @@
{"id":"bd-24","title":"Support ID space partitioning for parallel worker agents","description":"Enable external orchestrators (like AI worker swarms) to control issue ID assignment. Add --id flag to 'bd create' for explicit ID specification. Optionally support 'bd config set next_id N' to set the starting point for auto-increment. Storage layer already supports pre-assigned IDs (sqlite.go:52-71), just need CLI wiring. This keeps beads simple while letting orchestrators implement their own ID partitioning strategies to minimize merge conflicts. Complementary to bd-9's collision resolution.","status":"closed","priority":1,"issue_type":"feature","created_at":"2025-10-12T16:10:37.808226-07:00","updated_at":"2025-10-13T23:26:35.810383-07:00","closed_at":"2025-10-13T23:18:01.637695-07:00"}
{"id":"bd-25","title":"Add transaction support to storage layer for atomic multi-operation workflows","description":"Currently each storage method (CreateIssue, UpdateIssue, etc.) starts its own transaction. This makes it impossible to perform atomic multi-step operations like collision resolution. Add support for passing *sql.Tx through the storage interface, or create transaction-aware versions of methods. This would make remapCollisions and other batch operations truly atomic.","status":"closed","priority":4,"issue_type":"feature","created_at":"2025-10-12T16:39:00.66572-07:00","updated_at":"2025-10-13T23:26:35.810468-07:00","closed_at":"2025-10-13T22:53:56.401108-07:00"}
{"id":"bd-26","title":"Optimize reference updates to avoid loading all issues into memory","description":"In updateReferences(), we call SearchIssues with no filter to get ALL issues for updating references. For large databases (10k+ issues), this loads everything into memory. Options: 1) Use batched processing with LIMIT/OFFSET, 2) Use SQL UPDATE with REPLACE() directly, 3) Stream results instead of loading all at once. Located in collision.go:266","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-12T16:39:10.327861-07:00","updated_at":"2025-10-13T23:26:35.810552-07:00"}
{"id":"bd-27","title":"Cache compiled regexes in replaceIDReferences for performance","description":"replaceIDReferences() compiles the same regex patterns on every call. With 100 issues and 10 ID mappings, that's 1000 regex compilations. Pre-compile regexes once and reuse. Can use a struct with compiled regex, placeholder, and newID. Located in collision.go:329. Estimated performance improvement: 10-100x for large batches.","status":"open","priority":2,"issue_type":"task","created_at":"2025-10-12T16:39:18.305517-07:00","updated_at":"2025-10-13T23:26:35.810644-07:00"}
{"id":"bd-27","title":"Cache compiled regexes in replaceIDReferences for performance","description":"replaceIDReferences() compiles the same regex patterns on every call. With 100 issues and 10 ID mappings, that's 1000 regex compilations. Pre-compile regexes once and reuse. Can use a struct with compiled regex, placeholder, and newID. Located in collision.go:329. Estimated performance improvement: 10-100x for large batches.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-10-12T16:39:18.305517-07:00","updated_at":"2025-10-13T23:50:25.865317-07:00","closed_at":"2025-10-13T23:50:25.865317-07:00"}
{"id":"bd-28","title":"Improve error handling in dependency removal during remapping","description":"In updateDependencyReferences(), RemoveDependency errors are caught and ignored with continue (line 392). Comment says 'if dependency doesn't exist' but this catches ALL errors including real failures. Should check error type with errors.Is(err, ErrDependencyNotFound) and only ignore not-found errors, returning other errors properly.","status":"open","priority":3,"issue_type":"bug","created_at":"2025-10-12T16:39:26.78219-07:00","updated_at":"2025-10-13T23:26:35.810732-07:00"}
{"id":"bd-29","title":"Use safer placeholder pattern in replaceIDReferences","description":"Currently uses __PLACEHOLDER_0__ which could theoretically collide with user text. Use a truly unique placeholder like null bytes: \\x00REMAP\\x00_0_\\x00 which are unlikely to appear in normal text. Located in collision.go:324. Very low probability issue but worth fixing for completeness.","status":"open","priority":3,"issue_type":"task","created_at":"2025-10-12T16:39:33.665449-07:00","updated_at":"2025-10-13T23:26:35.810821-07:00"}
{"id":"bd-3","title":"Document git workflow in README","description":"Add Git Workflow section to README explaining binary vs text approaches","status":"closed","priority":1,"issue_type":"chore","created_at":"2025-10-12T00:43:03.461615-07:00","updated_at":"2025-10-13T23:26:35.810907-07:00","closed_at":"2025-10-12T00:43:30.283178-07:00"}