Files
beads/docs/TESTING_PHILOSOPHY.md
Ryan 3c08e5eb9d DOCTOR IMPROVEMENTS: visual improvements/grouping + add comprehensive tests + fix gosec warnings (#656)
* test(doctor): add comprehensive tests for fix and check functions

Add edge case tests, e2e tests, and improve test coverage for:
- database_test.go: database integrity and sync checks
- git_test.go: git hooks, merge driver, sync branch tests
- gitignore_test.go: gitignore validation
- prefix_test.go: ID prefix handling
- fix/fix_test.go: fix operations
- fix/e2e_test.go: end-to-end fix scenarios
- fix/fix_edge_cases_test.go: edge case handling

* docs: add testing philosophy and anti-patterns guide

- Create TESTING_PHILOSOPHY.md covering test pyramid, priority matrix,
  what NOT to test, and 5 anti-patterns with code examples
- Add cross-reference from README_TESTING.md
- Document beads-specific guidance (well-covered areas vs gaps)
- Include target metrics (test-to-code ratio, execution time targets)

* chore: revert .beads/ to upstream/main state

* refactor(doctor): add category grouping and Ayu theme colors

- Add Category field to DoctorCheck for organizing checks by type
- Define category constants: Core, Git, Runtime, Data, Integration, Metadata
- Update thanks command to use shared Ayu color palette from internal/ui
- Simplify test fixtures by removing redundant test cases

* fix(doctor): prevent test fork bomb and fix test failures

- Add ErrTestBinary guard in getBdBinary() to prevent tests from
  recursively executing the test binary when calling bd subcommands
- Update claude_test.go to use new check names (CLI Availability,
  Prime Documentation)
- Fix syncbranch test path comparison by resolving symlinks
  (/var vs /private/var on macOS)
- Fix permissions check to use exact comparison instead of bitmask
- Fix UntrackedJSONL to use git commit --only to preserve staged changes
- Fix MergeDriver edge case test by making both .git dir and config
  read-only
- Add skipIfTestBinary helper for E2E tests that need real bd binary

* test(doctor): skip read-only config test in CI environments

GitHub Actions containers may have CAP_DAC_OVERRIDE or similar
capabilities that allow writing to read-only files, causing
the test to fail. Skip the test when CI=true or GITHUB_ACTIONS=true.
2025-12-20 03:10:06 -08:00

7.9 KiB

Testing Philosophy

This document covers what to test and what not to test. For how to run tests, see TESTING.md.

The Test Pyramid

                  ┌─────────────────┐
                  │   E2E Tests     │  ← PR/Deploy only (slow, expensive)
                  │   ~5% of tests  │
                  └────────┬────────┘
                           │
            ┌──────────────┴──────────────┐
            │     Integration Tests       │  ← PR gate (moderate)
            │       ~15% of tests         │
            └──────────────┬──────────────┘
                           │
  ┌────────────────────────┴────────────────────────┐
  │              Unit Tests (Fast)                  │  ← Every save/commit
  │                 ~80% of tests                   │
  └─────────────────────────────────────────────────┘

Tier 1: Fast Tests (< 5 seconds total)

When: Every file save, pre-commit hooks, continuous during development

  • Pure function tests (no I/O)
  • In-memory data structure tests
  • Business logic validation
  • Mock all external dependencies

In beads: Core logic tests using newTestStore() with in-memory SQLite

Tier 2: Integration Tests (< 30 seconds)

When: Pre-push, PR checks

  • Real file system operations
  • Git operations with temp repos
  • Config file parsing
  • CLI argument handling

In beads: Tests tagged with //go:build integration, daemon tests

Tier 3: E2E / Smoke Tests (1-5 minutes)

When: PR merge, pre-deploy, nightly

  • Full bd initbd doctorbd doctor --fix workflow
  • Real API calls (to staging)
  • Cross-platform verification

What Makes a Test "Right"

A good test:

  1. Catches a bug you'd actually ship - not theoretical edge cases
  2. Documents expected behavior - serves as living documentation
  3. Runs fast enough to not skip - slow tests get disabled
  4. Isn't duplicated elsewhere - tests one thing, one way

What to Test (Priority Matrix)

Priority What Why Examples in beads
High Core business logic This is what users depend on sync, doctor, export, import
High Error paths that could corrupt data Data loss is catastrophic Config handling, git operations, JSONL integrity
Medium Edge cases from production bugs Discovered through real issues Orphan handling, ID collision detection
Low Display/formatting Visual output, can be manually verified Table formatting, color output

What NOT to Test Extensively

Simple utility functions

Trust the language. Don't test that strings.TrimSpace works.

Every permutation of inputs

Use table-driven tests with representative cases instead of exhaustive permutations.

// BAD: 10 separate test functions
func TestPriority0(t *testing.T) { ... }
func TestPriority1(t *testing.T) { ... }
func TestPriority2(t *testing.T) { ... }

// GOOD: One table-driven test
func TestPriorityMapping(t *testing.T) {
    cases := []struct{ in, want int }{
        {0, 4}, {1, 0}, {5, 3}, // includes boundary
    }
    for _, tc := range cases {
        t.Run(fmt.Sprintf("priority_%d", tc.in), func(t *testing.T) {
            got := mapPriority(tc.in)
            if got != tc.want { t.Errorf(...) }
        })
    }
}

Obvious behavior

Don't test "if file exists, return true" - trust the implementation.

Same logic through different entry points

If you test a function directly, don't also test it through every caller.


Anti-Patterns to Avoid

1. Trivial Assertions

Testing obvious happy paths that would pass with trivial implementations.

// BAD: What bug would this catch?
func TestValidateBeadsWorkspace(t *testing.T) {
    dir := setupTestWorkspace(t)
    if err := validateBeadsWorkspace(dir); err != nil {
        t.Errorf("expected no error, got: %v", err)
    }
}

// GOOD: Test the interesting error cases
func TestValidateBeadsWorkspace(t *testing.T) {
    cases := []struct{
        name    string
        setup   func(t *testing.T) string
        wantErr string
    }{
        {"missing .beads dir", setupNoBeadsDir, "not a beads workspace"},
        {"corrupted db", setupCorruptDB, "database is corrupted"},
        {"permission denied", setupNoReadAccess, "permission denied"},
    }
    // ...
}

2. Duplicate Error Path Testing

Testing the same logic multiple ways instead of once with table-driven tests.

// BAD: Repetitive individual assertions
if config.PriorityMap["0"] != 4 { t.Errorf(...) }
if config.PriorityMap["1"] != 0 { t.Errorf(...) }
if config.PriorityMap["2"] != 1 { t.Errorf(...) }

// GOOD: Table-driven
for k, want := range expectedMap {
    if got := config.PriorityMap[k]; got != want {
        t.Errorf("PriorityMap[%q] = %d, want %d", k, got, want)
    }
}

3. I/O Heavy Tests Without Mocking

Unit tests that execute real commands or heavy I/O when they could mock.

// BAD: Actually executes bd killall in unit test
func TestDaemonFix(t *testing.T) {
    exec.Command("bd", "killall").Run()
    // ...
}

// GOOD: Mock the execution or use integration test tag
func TestDaemonFix(t *testing.T) {
    executor := &mockExecutor{}
    fix := NewDaemonFix(executor)
    // ...
}

4. Testing Implementation, Not Behavior

Tests that break when you refactor, even though behavior is unchanged.

// BAD: Tests internal state
if len(daemon.connectionPool) != 3 { t.Error(...) }

// GOOD: Tests observable behavior
if resp, err := daemon.HandleRequest(req); err != nil { t.Error(...) }

5. Missing Boundary Tests

Testing known-good values but not boundaries and invalid inputs.

// BAD: Only tests middle values
TestPriority(1)  // works
TestPriority(2)  // works

// GOOD: Tests boundaries and invalid
TestPriority(-1) // invalid - expect error
TestPriority(0)  // boundary - min valid
TestPriority(4)  // boundary - max valid
TestPriority(5)  // boundary - first invalid

Target Metrics

Metric Target Current (beads) Status
Test-to-code ratio 0.5:1 - 1.5:1 0.85:1 Healthy
Fast test suite < 5 seconds 3.8 seconds Good
Integration tests < 30 seconds ~15 seconds Good
Compilation overhead Minimize 180 seconds Bottleneck

Interpretation

  • 0.5:1 - Light coverage, fast iteration (acceptable for utilities)
  • 1:1 - Solid coverage for most projects (our target)
  • 1.5:1 - Heavy coverage for critical systems
  • 2:1+ - Over-engineered, maintenance burden

Beads-Specific Guidance

Well-Covered (Maintain)

Area Why It's Well-Tested
Sync/Export/Import Data integrity critical - comprehensive edge cases
SQLite transactions Rollback safety, atomicity guarantees
Merge operations 3-way merge with conflict resolution
Daemon locking Prevents corruption from multiple instances

Needs Attention

Area Gap Priority
Daemon lifecycle Shutdown/signal handling Medium
Concurrent operations Stress testing under load Medium
Boundary validation Edge inputs in mapping functions Low

Skip These

  • Display formatting tests (manually verify)
  • Simple getters/setters
  • Tests that duplicate SQLite's guarantees