beads/docs/TESTING_PHILOSOPHY.md

# Testing Philosophy

This document covers **what to test** and **what not to test**. For how to run tests, see [TESTING.md](TESTING.md).

## The Test Pyramid

```
                  ┌─────────────────┐
                  │   E2E Tests     │  ← PR/Deploy only (slow, expensive)
                  │   ~5% of tests  │
                  └────────┬────────┘
                           │
            ┌──────────────┴──────────────┐
            │     Integration Tests       │  ← PR gate (moderate)
            │       ~15% of tests         │
            └──────────────┬──────────────┘
                           │
  ┌────────────────────────┴────────────────────────┐
  │              Unit Tests (Fast)                  │  ← Every save/commit
  │                 ~80% of tests                   │
  └─────────────────────────────────────────────────┘
```

### Tier 1: Fast Tests (< 5 seconds total)

**When**: Every file save, pre-commit hooks, continuous during development

- Pure function tests (no I/O)
- In-memory data structure tests
- Business logic validation
- Mock all external dependencies

**In beads**: Core logic tests using `newTestStore()` with in-memory SQLite

### Tier 2: Integration Tests (< 30 seconds)

**When**: Pre-push, PR checks

- Real file system operations
- Git operations with temp repos
- Config file parsing
- CLI argument handling

**In beads**: Tests tagged with `//go:build integration`, daemon tests

### Tier 3: E2E / Smoke Tests (1-5 minutes)

**When**: PR merge, pre-deploy, nightly

- Full `bd init` → `bd doctor` → `bd doctor --fix` workflow
- Real API calls (to staging)
- Cross-platform verification

---

## What Makes a Test "Right"

A good test:

1. **Catches a bug you'd actually ship** - not theoretical edge cases
2. **Documents expected behavior** - serves as living documentation
3. **Runs fast enough to not skip** - slow tests get disabled
4. **Isn't duplicated elsewhere** - tests one thing, one way

---

## What to Test (Priority Matrix)

| Priority | What | Why | Examples in beads |
|----------|------|-----|-------------------|
| **High** | Core business logic | This is what users depend on | `sync`, `doctor`, `export`, `import` |
| **High** | Error paths that could corrupt data | Data loss is catastrophic | Config handling, git operations, JSONL integrity |
| **Medium** | Edge cases from production bugs | Discovered through real issues | Orphan handling, ID collision detection |
| **Low** | Display/formatting | Visual output, can be manually verified | Table formatting, color output |

---

## What NOT to Test Extensively

### Simple utility functions
Trust the language. Don't test that `strings.TrimSpace` works.

### Every permutation of inputs
Use table-driven tests with representative cases instead of exhaustive permutations.

```go
// BAD: 10 separate test functions
func TestPriority0(t *testing.T) { ... }
func TestPriority1(t *testing.T) { ... }
func TestPriority2(t *testing.T) { ... }

// GOOD: One table-driven test
func TestPriorityMapping(t *testing.T) {
    cases := []struct{ in, want int }{
        {0, 4}, {1, 0}, {5, 3}, // includes boundary
    }
    for _, tc := range cases {
        t.Run(fmt.Sprintf("priority_%d", tc.in), func(t *testing.T) {
            got := mapPriority(tc.in)
            if got != tc.want { t.Errorf(...) }
        })
    }
}
```

### Obvious behavior
Don't test "if file exists, return true" - trust the implementation.

### Same logic through different entry points
If you test a function directly, don't also test it through every caller.

---

## Anti-Patterns to Avoid

### 1. Trivial Assertions

Testing obvious happy paths that would pass with trivial implementations.

```go
// BAD: What bug would this catch?
func TestValidateBeadsWorkspace(t *testing.T) {
    dir := setupTestWorkspace(t)
    if err := validateBeadsWorkspace(dir); err != nil {
        t.Errorf("expected no error, got: %v", err)
    }
}

// GOOD: Test the interesting error cases
func TestValidateBeadsWorkspace(t *testing.T) {
    cases := []struct{
        name    string
        setup   func(t *testing.T) string
        wantErr string
    }{
        {"missing .beads dir", setupNoBeadsDir, "not a beads workspace"},
        {"corrupted db", setupCorruptDB, "database is corrupted"},
        {"permission denied", setupNoReadAccess, "permission denied"},
    }
    // ...
}
```

### 2. Duplicate Error Path Testing

Testing the same logic multiple ways instead of once with table-driven tests.

```go
// BAD: Repetitive individual assertions
if config.PriorityMap["0"] != 4 { t.Errorf(...) }
if config.PriorityMap["1"] != 0 { t.Errorf(...) }
if config.PriorityMap["2"] != 1 { t.Errorf(...) }

// GOOD: Table-driven
for k, want := range expectedMap {
    if got := config.PriorityMap[k]; got != want {
        t.Errorf("PriorityMap[%q] = %d, want %d", k, got, want)
    }
}
```

### 3. I/O Heavy Tests Without Mocking

Unit tests that execute real commands or heavy I/O when they could mock.

```go
// BAD: Actually executes bd killall in unit test
func TestDaemonFix(t *testing.T) {
    exec.Command("bd", "killall").Run()
    // ...
}

// GOOD: Mock the execution or use integration test tag
func TestDaemonFix(t *testing.T) {
    executor := &mockExecutor{}
    fix := NewDaemonFix(executor)
    // ...
}
```

### 4. Testing Implementation, Not Behavior

Tests that break when you refactor, even though behavior is unchanged.

```go
// BAD: Tests internal state
if len(daemon.connectionPool) != 3 { t.Error(...) }

// GOOD: Tests observable behavior
if resp, err := daemon.HandleRequest(req); err != nil { t.Error(...) }
```

### 5. Missing Boundary Tests

Testing known-good values but not boundaries and invalid inputs.

```go
// BAD: Only tests middle values
TestPriority(1)  // works
TestPriority(2)  // works

// GOOD: Tests boundaries and invalid
TestPriority(-1) // invalid - expect error
TestPriority(0)  // boundary - min valid
TestPriority(4)  // boundary - max valid
TestPriority(5)  // boundary - first invalid
```

---

## Target Metrics

| Metric | Target | Current (beads) | Status |
|--------|--------|-----------------|--------|
| Test-to-code ratio | 0.5:1 - 1.5:1 | 0.85:1 | Healthy |
| Fast test suite | < 5 seconds | 3.8 seconds | Good |
| Integration tests | < 30 seconds | ~15 seconds | Good |
| Compilation overhead | Minimize | 180 seconds | Bottleneck |

### Interpretation

- **0.5:1** - Light coverage, fast iteration (acceptable for utilities)
- **1:1** - Solid coverage for most projects (our target)
- **1.5:1** - Heavy coverage for critical systems
- **2:1+** - Over-engineered, maintenance burden

---

## Beads-Specific Guidance

### Well-Covered (Maintain)

| Area | Why It's Well-Tested |
|------|---------------------|
| Sync/Export/Import | Data integrity critical - comprehensive edge cases |
| SQLite transactions | Rollback safety, atomicity guarantees |
| Merge operations | 3-way merge with conflict resolution |
| Daemon locking | Prevents corruption from multiple instances |

### Needs Attention

| Area | Gap | Priority |
|------|-----|----------|
| Daemon lifecycle | Shutdown/signal handling | Medium |
| Concurrent operations | Stress testing under load | Medium |
| Boundary validation | Edge inputs in mapping functions | Low |

### Skip These

- Display formatting tests (manually verify)
- Simple getters/setters
- Tests that duplicate SQLite's guarantees

---

## Related Docs

- [TESTING.md](TESTING.md) - How to run tests
- [README_TESTING.md](README_TESTING.md) - Fast vs integration test strategy
- [dev-notes/TEST_SUITE_AUDIT.md](dev-notes/TEST_SUITE_AUDIT.md) - Test refactoring progress