* test(doctor): add comprehensive tests for fix and check functions Add edge case tests, e2e tests, and improve test coverage for: - database_test.go: database integrity and sync checks - git_test.go: git hooks, merge driver, sync branch tests - gitignore_test.go: gitignore validation - prefix_test.go: ID prefix handling - fix/fix_test.go: fix operations - fix/e2e_test.go: end-to-end fix scenarios - fix/fix_edge_cases_test.go: edge case handling * docs: add testing philosophy and anti-patterns guide - Create TESTING_PHILOSOPHY.md covering test pyramid, priority matrix, what NOT to test, and 5 anti-patterns with code examples - Add cross-reference from README_TESTING.md - Document beads-specific guidance (well-covered areas vs gaps) - Include target metrics (test-to-code ratio, execution time targets) * chore: revert .beads/ to upstream/main state * refactor(doctor): add category grouping and Ayu theme colors - Add Category field to DoctorCheck for organizing checks by type - Define category constants: Core, Git, Runtime, Data, Integration, Metadata - Update thanks command to use shared Ayu color palette from internal/ui - Simplify test fixtures by removing redundant test cases * fix(doctor): prevent test fork bomb and fix test failures - Add ErrTestBinary guard in getBdBinary() to prevent tests from recursively executing the test binary when calling bd subcommands - Update claude_test.go to use new check names (CLI Availability, Prime Documentation) - Fix syncbranch test path comparison by resolving symlinks (/var vs /private/var on macOS) - Fix permissions check to use exact comparison instead of bitmask - Fix UntrackedJSONL to use git commit --only to preserve staged changes - Fix MergeDriver edge case test by making both .git dir and config read-only - Add skipIfTestBinary helper for E2E tests that need real bd binary * test(doctor): skip read-only config test in CI environments GitHub Actions containers may have CAP_DAC_OVERRIDE or similar capabilities that allow writing to read-only files, causing the test to fail. Skip the test when CI=true or GITHUB_ACTIONS=true.
7.9 KiB
Testing Philosophy
This document covers what to test and what not to test. For how to run tests, see TESTING.md.
The Test Pyramid
┌─────────────────┐
│ E2E Tests │ ← PR/Deploy only (slow, expensive)
│ ~5% of tests │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ Integration Tests │ ← PR gate (moderate)
│ ~15% of tests │
└──────────────┬──────────────┘
│
┌────────────────────────┴────────────────────────┐
│ Unit Tests (Fast) │ ← Every save/commit
│ ~80% of tests │
└─────────────────────────────────────────────────┘
Tier 1: Fast Tests (< 5 seconds total)
When: Every file save, pre-commit hooks, continuous during development
- Pure function tests (no I/O)
- In-memory data structure tests
- Business logic validation
- Mock all external dependencies
In beads: Core logic tests using newTestStore() with in-memory SQLite
Tier 2: Integration Tests (< 30 seconds)
When: Pre-push, PR checks
- Real file system operations
- Git operations with temp repos
- Config file parsing
- CLI argument handling
In beads: Tests tagged with //go:build integration, daemon tests
Tier 3: E2E / Smoke Tests (1-5 minutes)
When: PR merge, pre-deploy, nightly
- Full
bd init→bd doctor→bd doctor --fixworkflow - Real API calls (to staging)
- Cross-platform verification
What Makes a Test "Right"
A good test:
- Catches a bug you'd actually ship - not theoretical edge cases
- Documents expected behavior - serves as living documentation
- Runs fast enough to not skip - slow tests get disabled
- Isn't duplicated elsewhere - tests one thing, one way
What to Test (Priority Matrix)
| Priority | What | Why | Examples in beads |
|---|---|---|---|
| High | Core business logic | This is what users depend on | sync, doctor, export, import |
| High | Error paths that could corrupt data | Data loss is catastrophic | Config handling, git operations, JSONL integrity |
| Medium | Edge cases from production bugs | Discovered through real issues | Orphan handling, ID collision detection |
| Low | Display/formatting | Visual output, can be manually verified | Table formatting, color output |
What NOT to Test Extensively
Simple utility functions
Trust the language. Don't test that strings.TrimSpace works.
Every permutation of inputs
Use table-driven tests with representative cases instead of exhaustive permutations.
// BAD: 10 separate test functions
func TestPriority0(t *testing.T) { ... }
func TestPriority1(t *testing.T) { ... }
func TestPriority2(t *testing.T) { ... }
// GOOD: One table-driven test
func TestPriorityMapping(t *testing.T) {
cases := []struct{ in, want int }{
{0, 4}, {1, 0}, {5, 3}, // includes boundary
}
for _, tc := range cases {
t.Run(fmt.Sprintf("priority_%d", tc.in), func(t *testing.T) {
got := mapPriority(tc.in)
if got != tc.want { t.Errorf(...) }
})
}
}
Obvious behavior
Don't test "if file exists, return true" - trust the implementation.
Same logic through different entry points
If you test a function directly, don't also test it through every caller.
Anti-Patterns to Avoid
1. Trivial Assertions
Testing obvious happy paths that would pass with trivial implementations.
// BAD: What bug would this catch?
func TestValidateBeadsWorkspace(t *testing.T) {
dir := setupTestWorkspace(t)
if err := validateBeadsWorkspace(dir); err != nil {
t.Errorf("expected no error, got: %v", err)
}
}
// GOOD: Test the interesting error cases
func TestValidateBeadsWorkspace(t *testing.T) {
cases := []struct{
name string
setup func(t *testing.T) string
wantErr string
}{
{"missing .beads dir", setupNoBeadsDir, "not a beads workspace"},
{"corrupted db", setupCorruptDB, "database is corrupted"},
{"permission denied", setupNoReadAccess, "permission denied"},
}
// ...
}
2. Duplicate Error Path Testing
Testing the same logic multiple ways instead of once with table-driven tests.
// BAD: Repetitive individual assertions
if config.PriorityMap["0"] != 4 { t.Errorf(...) }
if config.PriorityMap["1"] != 0 { t.Errorf(...) }
if config.PriorityMap["2"] != 1 { t.Errorf(...) }
// GOOD: Table-driven
for k, want := range expectedMap {
if got := config.PriorityMap[k]; got != want {
t.Errorf("PriorityMap[%q] = %d, want %d", k, got, want)
}
}
3. I/O Heavy Tests Without Mocking
Unit tests that execute real commands or heavy I/O when they could mock.
// BAD: Actually executes bd killall in unit test
func TestDaemonFix(t *testing.T) {
exec.Command("bd", "killall").Run()
// ...
}
// GOOD: Mock the execution or use integration test tag
func TestDaemonFix(t *testing.T) {
executor := &mockExecutor{}
fix := NewDaemonFix(executor)
// ...
}
4. Testing Implementation, Not Behavior
Tests that break when you refactor, even though behavior is unchanged.
// BAD: Tests internal state
if len(daemon.connectionPool) != 3 { t.Error(...) }
// GOOD: Tests observable behavior
if resp, err := daemon.HandleRequest(req); err != nil { t.Error(...) }
5. Missing Boundary Tests
Testing known-good values but not boundaries and invalid inputs.
// BAD: Only tests middle values
TestPriority(1) // works
TestPriority(2) // works
// GOOD: Tests boundaries and invalid
TestPriority(-1) // invalid - expect error
TestPriority(0) // boundary - min valid
TestPriority(4) // boundary - max valid
TestPriority(5) // boundary - first invalid
Target Metrics
| Metric | Target | Current (beads) | Status |
|---|---|---|---|
| Test-to-code ratio | 0.5:1 - 1.5:1 | 0.85:1 | Healthy |
| Fast test suite | < 5 seconds | 3.8 seconds | Good |
| Integration tests | < 30 seconds | ~15 seconds | Good |
| Compilation overhead | Minimize | 180 seconds | Bottleneck |
Interpretation
- 0.5:1 - Light coverage, fast iteration (acceptable for utilities)
- 1:1 - Solid coverage for most projects (our target)
- 1.5:1 - Heavy coverage for critical systems
- 2:1+ - Over-engineered, maintenance burden
Beads-Specific Guidance
Well-Covered (Maintain)
| Area | Why It's Well-Tested |
|---|---|
| Sync/Export/Import | Data integrity critical - comprehensive edge cases |
| SQLite transactions | Rollback safety, atomicity guarantees |
| Merge operations | 3-way merge with conflict resolution |
| Daemon locking | Prevents corruption from multiple instances |
Needs Attention
| Area | Gap | Priority |
|---|---|---|
| Daemon lifecycle | Shutdown/signal handling | Medium |
| Concurrent operations | Stress testing under load | Medium |
| Boundary validation | Edge inputs in mapping functions | Low |
Skip These
- Display formatting tests (manually verify)
- Simple getters/setters
- Tests that duplicate SQLite's guarantees
Related Docs
- TESTING.md - How to run tests
- README_TESTING.md - Fast vs integration test strategy
- dev-notes/TEST_SUITE_AUDIT.md - Test refactoring progress