feat(formula): add verification legs and presets to code-review (#841)
Add three new verification legs to the code-review convoy formula: - wiring: detects dependencies added but not actually used - commit-discipline: reviews commit quality and atomicity - test-quality: verifies tests are meaningful, not just present Also adds presets for common review modes: - gate: light review (wiring, security, smells, test-quality) - full: all 10 legs for comprehensive review - security-focused: security-heavy for sensitive changes - refactor: code quality focus This is a minimal subset of PR #4's Worker → Reviewer pattern, focusing on the most valuable additions without Go code changes. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -15,6 +15,8 @@ Each leg examines the code from a different perspective. Findings are
|
|||||||
collected and synthesized into a prioritized, actionable review.
|
collected and synthesized into a prioritized, actionable review.
|
||||||
|
|
||||||
## Legs (parallel execution)
|
## Legs (parallel execution)
|
||||||
|
|
||||||
|
### Analysis Legs (read and analyze code)
|
||||||
- **correctness**: Logic errors, bugs, edge cases
|
- **correctness**: Logic errors, bugs, edge cases
|
||||||
- **performance**: Bottlenecks, efficiency issues
|
- **performance**: Bottlenecks, efficiency issues
|
||||||
- **security**: Vulnerabilities, OWASP concerns
|
- **security**: Vulnerabilities, OWASP concerns
|
||||||
@@ -23,6 +25,16 @@ collected and synthesized into a prioritized, actionable review.
|
|||||||
- **style**: Convention compliance, consistency
|
- **style**: Convention compliance, consistency
|
||||||
- **smells**: Anti-patterns, technical debt
|
- **smells**: Anti-patterns, technical debt
|
||||||
|
|
||||||
|
### Verification Legs (check implementation quality)
|
||||||
|
- **wiring**: Installed-but-not-wired gaps (deps added but not used)
|
||||||
|
- **commit-discipline**: Commit quality and atomicity
|
||||||
|
- **test-quality**: Test meaningfulness, not just coverage
|
||||||
|
|
||||||
|
## Presets
|
||||||
|
- **gate**: Light review for automatic flow (wiring, security, smells, test-quality)
|
||||||
|
- **full**: Comprehensive review (all 10 legs)
|
||||||
|
- **custom**: Select specific legs via --legs flag
|
||||||
|
|
||||||
## Execution Model
|
## Execution Model
|
||||||
1. Each leg spawns as a separate polecat
|
1. Each leg spawns as a separate polecat
|
||||||
2. Polecats work in parallel
|
2. Polecats work in parallel
|
||||||
@@ -293,6 +305,125 @@ Review the code for code smells and anti-patterns.
|
|||||||
- Is technical debt being added or paid down?
|
- Is technical debt being added or paid down?
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# VERIFICATION LEGS - Check implementation quality (not just code analysis)
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
[[legs]]
|
||||||
|
id = "wiring"
|
||||||
|
title = "Wiring Review"
|
||||||
|
focus = "Installed-but-not-wired gaps"
|
||||||
|
description = """
|
||||||
|
Detect dependencies, configs, or libraries that were added but not actually used.
|
||||||
|
|
||||||
|
This catches subtle bugs where the implementer THINKS they integrated something,
|
||||||
|
but the old implementation is still being used.
|
||||||
|
|
||||||
|
**Look for:**
|
||||||
|
- New dependency in manifest but never imported
|
||||||
|
- Go: module in go.mod but no import
|
||||||
|
- Rust: crate in Cargo.toml but no `use`
|
||||||
|
- Node: package in package.json but no import/require
|
||||||
|
|
||||||
|
- SDK added but old implementation remains
|
||||||
|
- Added Sentry but still using console.error for errors
|
||||||
|
- Added Zod but still using manual typeof validation
|
||||||
|
|
||||||
|
- Config/env var defined but never loaded
|
||||||
|
- New .env var that isn't accessed in code
|
||||||
|
|
||||||
|
**Questions to answer:**
|
||||||
|
- Is every new dependency actually used?
|
||||||
|
- Are there old patterns that should have been replaced?
|
||||||
|
- Is there dead config that suggests incomplete migration?
|
||||||
|
"""
|
||||||
|
|
||||||
|
[[legs]]
|
||||||
|
id = "commit-discipline"
|
||||||
|
title = "Commit Discipline Review"
|
||||||
|
focus = "Commit quality and atomicity"
|
||||||
|
description = """
|
||||||
|
Review commit history for good practices.
|
||||||
|
|
||||||
|
Good commits make the codebase easier to understand, bisect, and revert.
|
||||||
|
|
||||||
|
**Look for:**
|
||||||
|
- Giant "WIP" or "fix" commits
|
||||||
|
- Multiple unrelated changes in one commit
|
||||||
|
- Commits that touch 20+ files across different features
|
||||||
|
|
||||||
|
- Poor commit messages
|
||||||
|
- "stuff", "update", "asdf", "fix"
|
||||||
|
- No context about WHY the change was made
|
||||||
|
|
||||||
|
- Unatomic commits
|
||||||
|
- Feature + refactor + bugfix in same commit
|
||||||
|
- Should be separable logical units
|
||||||
|
|
||||||
|
- Missing type prefixes (if project uses conventional commits)
|
||||||
|
- feat:, fix:, refactor:, test:, docs:, chore:
|
||||||
|
|
||||||
|
**Questions to answer:**
|
||||||
|
- Could this history be bisected effectively?
|
||||||
|
- Would a reviewer understand the progression?
|
||||||
|
- Are commits atomic (one logical change each)?
|
||||||
|
"""
|
||||||
|
|
||||||
|
[[legs]]
|
||||||
|
id = "test-quality"
|
||||||
|
title = "Test Quality Review"
|
||||||
|
focus = "Test meaningfulness, not just coverage"
|
||||||
|
description = """
|
||||||
|
Verify tests are actually testing something meaningful.
|
||||||
|
|
||||||
|
Coverage numbers lie. A test that can't fail provides no value.
|
||||||
|
|
||||||
|
**Look for:**
|
||||||
|
- Weak assertions
|
||||||
|
- Only checking != nil / !== null / is not None
|
||||||
|
- Using .is_ok() without checking the value
|
||||||
|
- assertTrue(true) or equivalent
|
||||||
|
|
||||||
|
- Missing negative test cases
|
||||||
|
- Happy path only, no error cases
|
||||||
|
- No boundary testing
|
||||||
|
- No invalid input testing
|
||||||
|
|
||||||
|
- Tests that can't fail
|
||||||
|
- Mocked so heavily the test is meaningless
|
||||||
|
- Testing implementation details, not behavior
|
||||||
|
|
||||||
|
- Flaky test indicators
|
||||||
|
- Sleep/delay in tests
|
||||||
|
- Time-dependent assertions
|
||||||
|
|
||||||
|
**Questions to answer:**
|
||||||
|
- Do these tests actually verify behavior?
|
||||||
|
- Would a bug in the implementation cause a test failure?
|
||||||
|
- Are edge cases and error paths tested?
|
||||||
|
"""
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# PRESETS - Configurable leg selection
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
[presets]
|
||||||
|
[presets.gate]
|
||||||
|
description = "Light review for automatic flow - fast, focused on blockers"
|
||||||
|
legs = ["wiring", "security", "smells", "test-quality"]
|
||||||
|
|
||||||
|
[presets.full]
|
||||||
|
description = "Comprehensive review - all legs, for major features"
|
||||||
|
legs = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
|
||||||
|
|
||||||
|
[presets.security-focused]
|
||||||
|
description = "Security-heavy review for sensitive changes"
|
||||||
|
legs = ["security", "resilience", "correctness", "wiring"]
|
||||||
|
|
||||||
|
[presets.refactor]
|
||||||
|
description = "Review focused on code quality during refactoring"
|
||||||
|
legs = ["elegance", "smells", "style", "commit-discipline"]
|
||||||
|
|
||||||
# Synthesis step - combines all leg outputs
|
# Synthesis step - combines all leg outputs
|
||||||
[synthesis]
|
[synthesis]
|
||||||
title = "Review Synthesis"
|
title = "Review Synthesis"
|
||||||
@@ -310,10 +441,13 @@ A synthesized review at: {{.output.directory}}/{{.output.synthesis}}
|
|||||||
2. **Critical Issues** - P0 items from all legs, deduplicated
|
2. **Critical Issues** - P0 items from all legs, deduplicated
|
||||||
3. **Major Issues** - P1 items, grouped by theme
|
3. **Major Issues** - P1 items, grouped by theme
|
||||||
4. **Minor Issues** - P2 items, briefly listed
|
4. **Minor Issues** - P2 items, briefly listed
|
||||||
5. **Positive Observations** - What's done well
|
5. **Wiring Gaps** - Dependencies added but not used (from wiring leg)
|
||||||
6. **Recommendations** - Actionable next steps
|
6. **Commit Quality** - Notes on commit discipline
|
||||||
|
7. **Test Quality** - Assessment of test meaningfulness
|
||||||
|
8. **Positive Observations** - What's done well
|
||||||
|
9. **Recommendations** - Actionable next steps
|
||||||
|
|
||||||
Deduplicate issues found by multiple legs (note which legs found them).
|
Deduplicate issues found by multiple legs (note which legs found them).
|
||||||
Prioritize by impact and effort. Be actionable.
|
Prioritize by impact and effort. Be actionable.
|
||||||
"""
|
"""
|
||||||
depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells"]
|
depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
|
||||||
|
|||||||
@@ -15,6 +15,8 @@ Each leg examines the code from a different perspective. Findings are
|
|||||||
collected and synthesized into a prioritized, actionable review.
|
collected and synthesized into a prioritized, actionable review.
|
||||||
|
|
||||||
## Legs (parallel execution)
|
## Legs (parallel execution)
|
||||||
|
|
||||||
|
### Analysis Legs (read and analyze code)
|
||||||
- **correctness**: Logic errors, bugs, edge cases
|
- **correctness**: Logic errors, bugs, edge cases
|
||||||
- **performance**: Bottlenecks, efficiency issues
|
- **performance**: Bottlenecks, efficiency issues
|
||||||
- **security**: Vulnerabilities, OWASP concerns
|
- **security**: Vulnerabilities, OWASP concerns
|
||||||
@@ -23,6 +25,16 @@ collected and synthesized into a prioritized, actionable review.
|
|||||||
- **style**: Convention compliance, consistency
|
- **style**: Convention compliance, consistency
|
||||||
- **smells**: Anti-patterns, technical debt
|
- **smells**: Anti-patterns, technical debt
|
||||||
|
|
||||||
|
### Verification Legs (check implementation quality)
|
||||||
|
- **wiring**: Installed-but-not-wired gaps (deps added but not used)
|
||||||
|
- **commit-discipline**: Commit quality and atomicity
|
||||||
|
- **test-quality**: Test meaningfulness, not just coverage
|
||||||
|
|
||||||
|
## Presets
|
||||||
|
- **gate**: Light review for automatic flow (wiring, security, smells, test-quality)
|
||||||
|
- **full**: Comprehensive review (all 10 legs)
|
||||||
|
- **custom**: Select specific legs via --legs flag
|
||||||
|
|
||||||
## Execution Model
|
## Execution Model
|
||||||
1. Each leg spawns as a separate polecat
|
1. Each leg spawns as a separate polecat
|
||||||
2. Polecats work in parallel
|
2. Polecats work in parallel
|
||||||
@@ -293,6 +305,125 @@ Review the code for code smells and anti-patterns.
|
|||||||
- Is technical debt being added or paid down?
|
- Is technical debt being added or paid down?
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# VERIFICATION LEGS - Check implementation quality (not just code analysis)
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
[[legs]]
|
||||||
|
id = "wiring"
|
||||||
|
title = "Wiring Review"
|
||||||
|
focus = "Installed-but-not-wired gaps"
|
||||||
|
description = """
|
||||||
|
Detect dependencies, configs, or libraries that were added but not actually used.
|
||||||
|
|
||||||
|
This catches subtle bugs where the implementer THINKS they integrated something,
|
||||||
|
but the old implementation is still being used.
|
||||||
|
|
||||||
|
**Look for:**
|
||||||
|
- New dependency in manifest but never imported
|
||||||
|
- Go: module in go.mod but no import
|
||||||
|
- Rust: crate in Cargo.toml but no `use`
|
||||||
|
- Node: package in package.json but no import/require
|
||||||
|
|
||||||
|
- SDK added but old implementation remains
|
||||||
|
- Added Sentry but still using console.error for errors
|
||||||
|
- Added Zod but still using manual typeof validation
|
||||||
|
|
||||||
|
- Config/env var defined but never loaded
|
||||||
|
- New .env var that isn't accessed in code
|
||||||
|
|
||||||
|
**Questions to answer:**
|
||||||
|
- Is every new dependency actually used?
|
||||||
|
- Are there old patterns that should have been replaced?
|
||||||
|
- Is there dead config that suggests incomplete migration?
|
||||||
|
"""
|
||||||
|
|
||||||
|
[[legs]]
|
||||||
|
id = "commit-discipline"
|
||||||
|
title = "Commit Discipline Review"
|
||||||
|
focus = "Commit quality and atomicity"
|
||||||
|
description = """
|
||||||
|
Review commit history for good practices.
|
||||||
|
|
||||||
|
Good commits make the codebase easier to understand, bisect, and revert.
|
||||||
|
|
||||||
|
**Look for:**
|
||||||
|
- Giant "WIP" or "fix" commits
|
||||||
|
- Multiple unrelated changes in one commit
|
||||||
|
- Commits that touch 20+ files across different features
|
||||||
|
|
||||||
|
- Poor commit messages
|
||||||
|
- "stuff", "update", "asdf", "fix"
|
||||||
|
- No context about WHY the change was made
|
||||||
|
|
||||||
|
- Unatomic commits
|
||||||
|
- Feature + refactor + bugfix in same commit
|
||||||
|
- Should be separable logical units
|
||||||
|
|
||||||
|
- Missing type prefixes (if project uses conventional commits)
|
||||||
|
- feat:, fix:, refactor:, test:, docs:, chore:
|
||||||
|
|
||||||
|
**Questions to answer:**
|
||||||
|
- Could this history be bisected effectively?
|
||||||
|
- Would a reviewer understand the progression?
|
||||||
|
- Are commits atomic (one logical change each)?
|
||||||
|
"""
|
||||||
|
|
||||||
|
[[legs]]
|
||||||
|
id = "test-quality"
|
||||||
|
title = "Test Quality Review"
|
||||||
|
focus = "Test meaningfulness, not just coverage"
|
||||||
|
description = """
|
||||||
|
Verify tests are actually testing something meaningful.
|
||||||
|
|
||||||
|
Coverage numbers lie. A test that can't fail provides no value.
|
||||||
|
|
||||||
|
**Look for:**
|
||||||
|
- Weak assertions
|
||||||
|
- Only checking != nil / !== null / is not None
|
||||||
|
- Using .is_ok() without checking the value
|
||||||
|
- assertTrue(true) or equivalent
|
||||||
|
|
||||||
|
- Missing negative test cases
|
||||||
|
- Happy path only, no error cases
|
||||||
|
- No boundary testing
|
||||||
|
- No invalid input testing
|
||||||
|
|
||||||
|
- Tests that can't fail
|
||||||
|
- Mocked so heavily the test is meaningless
|
||||||
|
- Testing implementation details, not behavior
|
||||||
|
|
||||||
|
- Flaky test indicators
|
||||||
|
- Sleep/delay in tests
|
||||||
|
- Time-dependent assertions
|
||||||
|
|
||||||
|
**Questions to answer:**
|
||||||
|
- Do these tests actually verify behavior?
|
||||||
|
- Would a bug in the implementation cause a test failure?
|
||||||
|
- Are edge cases and error paths tested?
|
||||||
|
"""
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# PRESETS - Configurable leg selection
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
[presets]
|
||||||
|
[presets.gate]
|
||||||
|
description = "Light review for automatic flow - fast, focused on blockers"
|
||||||
|
legs = ["wiring", "security", "smells", "test-quality"]
|
||||||
|
|
||||||
|
[presets.full]
|
||||||
|
description = "Comprehensive review - all legs, for major features"
|
||||||
|
legs = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
|
||||||
|
|
||||||
|
[presets.security-focused]
|
||||||
|
description = "Security-heavy review for sensitive changes"
|
||||||
|
legs = ["security", "resilience", "correctness", "wiring"]
|
||||||
|
|
||||||
|
[presets.refactor]
|
||||||
|
description = "Review focused on code quality during refactoring"
|
||||||
|
legs = ["elegance", "smells", "style", "commit-discipline"]
|
||||||
|
|
||||||
# Synthesis step - combines all leg outputs
|
# Synthesis step - combines all leg outputs
|
||||||
[synthesis]
|
[synthesis]
|
||||||
title = "Review Synthesis"
|
title = "Review Synthesis"
|
||||||
@@ -310,10 +441,13 @@ A synthesized review at: {{.output.directory}}/{{.output.synthesis}}
|
|||||||
2. **Critical Issues** - P0 items from all legs, deduplicated
|
2. **Critical Issues** - P0 items from all legs, deduplicated
|
||||||
3. **Major Issues** - P1 items, grouped by theme
|
3. **Major Issues** - P1 items, grouped by theme
|
||||||
4. **Minor Issues** - P2 items, briefly listed
|
4. **Minor Issues** - P2 items, briefly listed
|
||||||
5. **Positive Observations** - What's done well
|
5. **Wiring Gaps** - Dependencies added but not used (from wiring leg)
|
||||||
6. **Recommendations** - Actionable next steps
|
6. **Commit Quality** - Notes on commit discipline
|
||||||
|
7. **Test Quality** - Assessment of test meaningfulness
|
||||||
|
8. **Positive Observations** - What's done well
|
||||||
|
9. **Recommendations** - Actionable next steps
|
||||||
|
|
||||||
Deduplicate issues found by multiple legs (note which legs found them).
|
Deduplicate issues found by multiple legs (note which legs found them).
|
||||||
Prioritize by impact and effort. Be actionable.
|
Prioritize by impact and effort. Be actionable.
|
||||||
"""
|
"""
|
||||||
depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells"]
|
depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
|
||||||
|
|||||||
Reference in New Issue
Block a user