7.7 KiB
PR #752 Chaos Testing Review
PR: https://github.com/steveyegge/beads/pull/752 Author: jordanhubbard Bead: bd-kx1j Status: Under Review
Summary
Jordan proposes adding chaos testing and E2E test coverage to beads. The PR:
- Adds 4849 lines, removes 511 lines
- Introduces chaos testing framework (random corruption, disk space exhaustion, NFS-like failures)
- Creates side databases for testing recovery scenarios
- Adds E2E tests tracking documented user scenarios
- Brings code coverage to ~48%
Key Question from Jordan
"Is this level of testing something you actually want with the current pace of progress? It comes with an implied obligation to update and add to the tests as well as follow the CICD feedback in github (very spammy if your tests don't pass!)"
Files Changed (Major Categories)
Chaos/Doctor Infrastructure
cmd/bd/doctor_repair_chaos_test.go(378 lines) - Core chaos testingcmd/bd/doctor/fix/database_integrity.go(116 lines) - DB integrity fixescmd/bd/doctor/fix/jsonl_integrity.go(87 lines) - JSONL integrity fixescmd/bd/doctor/fix/fs.go(57 lines) - Filesystem fault injectioncmd/bd/doctor/fix/sqlite_open.go(52 lines) - SQLite open handlingcmd/bd/doctor/jsonl_integrity.go(123 lines) - JSONL checkscmd/bd/doctor/git.go(168 additions) - Git hygiene checks
Test Coverage Additions
internal/storage/memory/memory_more_coverage_test.go(921 lines) - Memory storage testscmd/bd/cli_coverage_show_test.go(426 lines) - CLI show command testscmd/bd/daemon_autostart_unit_test.go(331 lines) - Daemon autostart testsinternal/rpc/client_gate_shutdown_test.go(107 lines) - RPC client tests- Various other test files
Bug Fixes Discovered During Testing
internal/storage/sqlite/migrations/021_migrate_edge_fields.go- Major migration fixinternal/storage/sqlite/migrations/022_drop_edge_columns.go- Column cleanupinternal/storage/sqlite/migrations_template_pinned_regression_test.go- Regression test
Tradeoffs
Costs
- Maintenance burden: Must keep coverage above 48% (or whatever threshold is set)
- CI noise: Failed tests = spam until fixed
- Velocity tax: Every change needs test updates
- Complexity: Chaos testing framework itself needs maintenance
Benefits
- Robustness validation: Proves beads can recover from corruption
- Bug discovery: Already found migration bugs (021, 022)
- Confidence: If chaos tests pass, beads is more robust than feared
- Documentation: E2E tests document expected user scenarios
- Regression prevention: Future changes caught before release
Initial Assessment
Implementation Quality: HIGH
The chaos testing code is well-structured. Key observations:
What the Chaos Tests Actually Cover
From doctor_repair_chaos_test.go:
- Complete DB corruption - Writes "not a database" garbage, verifies recovery from JSONL
- Truncated DB without JSONL - Tests graceful failure when no recovery source exists
- Sidecar file backup - Ensures -wal, -shm, -journal files are preserved during repair
- Repair with running daemon - Tests recovery while daemon holds locks
- JSONL integrity - Malformed lines, re-export from DB
Each test:
- Uses isolated temp directories
- Builds a fresh
bdbinary for testing - Uses "side databases" (separate from real data)
- Has proper cleanup
Bug Fixes Already Discovered
The PR includes fixes for bugs found during testing:
- Migration 021/022:
pinnedandis_templatecolumns were being clobbered - Regression test added to prevent recurrence
Test Coverage Structure
Tests are organized by build tags:
//go:build chaos- Chaos/corruption tests (run separately)//go:build e2e- End-to-end CLI tests- Regular unit tests - No build tag required
This means chaos tests only run when explicitly requested, not on every go test.
Deep Analysis (Ultrathink)
The Core Question
Is the testing worth the ongoing maintenance cost?
Argument FOR Merging
-
Beads is more robust than feared. If Jordan got these tests passing, it means:
bd doctoractually recovers from corruption- JSONL/DB sync is working correctly
- Migration edge cases are handled
This validates the core design: SQLite + JSONL + git backstop.
-
Bugs already found. The migration 021/022 bugs are exactly the kind of subtle issues that would cause data loss in production. Finding them now is worth something.
-
Build tag isolation. Chaos tests won't slow down regular development:
go test ./... # Normal tests only go test -tags=chaos ./... # Include chaos tests go test -tags=e2e ./... # Include E2E tests -
48% coverage is a floor, not a target. The PR doesn't enforce maintaining 48%. Jordan is asking: "Is this level worth it?" We can always add more later, or let coverage drift if priorities change.
-
Documentation value. E2E tests document expected user scenarios. When an AI agent asks "what should happen when X?", the tests provide executable answers.
Argument AGAINST Merging
-
Velocity tax is real. Every behavior change needs test updates. This is especially painful during rapid iteration phases.
-
CI noise. Failed tests block merges. With multiple agents working, flaky tests become coordination bottlenecks.
-
Framework maintenance. The chaos testing framework itself (side databases, build tags, test helpers) becomes another thing to maintain.
-
False confidence. Tests passing doesn't mean beads is production-ready. It means tested scenarios work. Edge cases not covered still fail silently.
The Real Question: What Phase Are We In?
If beads is still in "rapid prototype" phase: The testing overhead is premature. Focus on features, fix crashes as they happen, lean on git backstop.
If beads is approaching "reliable tool" phase: Testing is essential. Multi-agent workflows amplify bugs. Corruption during a 10-agent batch is expensive.
Current reality: Beads is being dogfooded seriously. Multiple agents, real work, real data loss when things break. We're closer to "reliable tool" than "prototype."
ROI Calculation
Cost of NOT testing: When corruption happens:
- Agent loses context (30-60 min recovery)
- Human has to debug (variable, often 15-60 min)
- Trust erosion (hard to quantify)
Cost of testing:
- Review this PR (1-2 hours, one time)
- Update tests when behavior changes (5-15 min per change)
- Fix flaky tests when they appear (variable)
If corruption happens once a month, testing ROI is marginal. If corruption happens weekly (or with each new feature), testing pays for itself.
Recommendation
MERGE WITH MODIFICATIONS
Why Merge
- The implementation quality is high
- Bugs already found justify the effort
- Build tag isolation minimizes velocity impact
- Beads is past the prototype phase
Suggested Modifications
-
No hard coverage threshold in CI. Let coverage drift naturally. The value is in the chaos tests catching corruption, not in hitting a percentage.
-
Chaos tests optional in CI. Run chaos tests on release branches, not every PR. This reduces CI noise during active development.
-
Clear ownership. Jordan should document how to add new chaos scenarios. Future contributors need to know when to add vs skip tests.
Decision Framework for User
If you answer YES to 2+ of these, merge:
- Are you dogfooding beads for real work?
- Has corruption caused you to lose time in the last month?
- Do you expect multiple agents using beads concurrently?
- Is beads approaching a "v1.0" milestone?
If you answer NO to all, defer the PR until beads stabilizes.