feat(formula): add verification legs and presets to code-review (#841)

Add three new verification legs to the code-review convoy formula: - wiring: detects dependencies added but not actually used - commit-discipline: reviews commit quality and atomicity - test-quality: verifies tests are meaningful, not just present Also adds presets for common review modes: - gate: light review (wiring, security, smells, test-quality) - full: all 10 legs for comprehensive review - security-focused: security-heavy for sensitive changes - refactor: code quality focus This is a minimal subset of PR #4's Worker → Reviewer pattern, focusing on the most valuable additions without Go code changes. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 00:08:45 +05:30
parent 987502ebb3
commit afc1ff04b1
2 changed files with 274 additions and 6 deletions
@@ -15,6 +15,8 @@ Each leg examines the code from a different perspective. Findings are
 collected and synthesized into a prioritized, actionable review.
 ## Legs (parallel execution)
 ### Analysis Legs (read and analyze code)
 - **correctness**: Logic errors, bugs, edge cases
 - **performance**: Bottlenecks, efficiency issues
 - **security**: Vulnerabilities, OWASP concerns
@@ -23,6 +25,16 @@ collected and synthesized into a prioritized, actionable review.
 - **style**: Convention compliance, consistency
 - **smells**: Anti-patterns, technical debt
 ### Verification Legs (check implementation quality)
 - **wiring**: Installed-but-not-wired gaps (deps added but not used)
 - **commit-discipline**: Commit quality and atomicity
 - **test-quality**: Test meaningfulness, not just coverage
 ## Presets
 - **gate**: Light review for automatic flow (wiring, security, smells, test-quality)
 - **full**: Comprehensive review (all 10 legs)
 - **custom**: Select specific legs via --legs flag
 ## Execution Model
 1. Each leg spawns as a separate polecat
 2. Polecats work in parallel
@@ -293,6 +305,125 @@ Review the code for code smells and anti-patterns.
 - Is technical debt being added or paid down?
 """
 # ============================================================================
 # VERIFICATION LEGS - Check implementation quality (not just code analysis)
 # ============================================================================
 [[legs]]
 id = "wiring"
 title = "Wiring Review"
 focus = "Installed-but-not-wired gaps"
 description = """
 Detect dependencies, configs, or libraries that were added but not actually used.
 This catches subtle bugs where the implementer THINKS they integrated something,
 but the old implementation is still being used.
 **Look for:**
 - New dependency in manifest but never imported
  - Go: module in go.mod but no import
  - Rust: crate in Cargo.toml but no `use`
  - Node: package in package.json but no import/require
 - SDK added but old implementation remains
  - Added Sentry but still using console.error for errors
  - Added Zod but still using manual typeof validation
 - Config/env var defined but never loaded
  - New .env var that isn't accessed in code
 **Questions to answer:**
 - Is every new dependency actually used?
 - Are there old patterns that should have been replaced?
 - Is there dead config that suggests incomplete migration?
 """
 [[legs]]
 id = "commit-discipline"
 title = "Commit Discipline Review"
 focus = "Commit quality and atomicity"
 description = """
 Review commit history for good practices.
 Good commits make the codebase easier to understand, bisect, and revert.
 **Look for:**
 - Giant "WIP" or "fix" commits
  - Multiple unrelated changes in one commit
  - Commits that touch 20+ files across different features
 - Poor commit messages
  - "stuff", "update", "asdf", "fix"
  - No context about WHY the change was made
 - Unatomic commits
  - Feature + refactor + bugfix in same commit
  - Should be separable logical units
 - Missing type prefixes (if project uses conventional commits)
  - feat:, fix:, refactor:, test:, docs:, chore:
 **Questions to answer:**
 - Could this history be bisected effectively?
 - Would a reviewer understand the progression?
 - Are commits atomic (one logical change each)?
 """
 [[legs]]
 id = "test-quality"
 title = "Test Quality Review"
 focus = "Test meaningfulness, not just coverage"
 description = """
 Verify tests are actually testing something meaningful.
 Coverage numbers lie. A test that can't fail provides no value.
 **Look for:**
 - Weak assertions
  - Only checking != nil / !== null / is not None
  - Using .is_ok() without checking the value
  - assertTrue(true) or equivalent
 - Missing negative test cases
  - Happy path only, no error cases
  - No boundary testing
  - No invalid input testing
 - Tests that can't fail
  - Mocked so heavily the test is meaningless
  - Testing implementation details, not behavior
 - Flaky test indicators
  - Sleep/delay in tests
  - Time-dependent assertions
 **Questions to answer:**
 - Do these tests actually verify behavior?
 - Would a bug in the implementation cause a test failure?
 - Are edge cases and error paths tested?
 """
 # ============================================================================
 # PRESETS - Configurable leg selection
 # ============================================================================
 [presets]
 [presets.gate]
 description = "Light review for automatic flow - fast, focused on blockers"
 legs = ["wiring", "security", "smells", "test-quality"]
 [presets.full]
 description = "Comprehensive review - all legs, for major features"
 legs = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
 [presets.security-focused]
 description = "Security-heavy review for sensitive changes"
 legs = ["security", "resilience", "correctness", "wiring"]
 [presets.refactor]
 description = "Review focused on code quality during refactoring"
 legs = ["elegance", "smells", "style", "commit-discipline"]
 # Synthesis step - combines all leg outputs
 [synthesis]
 title = "Review Synthesis"
@@ -310,10 +441,13 @@ A synthesized review at: {{.output.directory}}/{{.output.synthesis}}
 2. **Critical Issues** - P0 items from all legs, deduplicated
 3. **Major Issues** - P1 items, grouped by theme
 4. **Minor Issues** - P2 items, briefly listed
-5. **Positive Observations** - What's done well
+5. **Wiring Gaps** - Dependencies added but not used (from wiring leg)
-6. **Recommendations** - Actionable next steps
+6. **Commit Quality** - Notes on commit discipline
 7. **Test Quality** - Assessment of test meaningfulness
 8. **Positive Observations** - What's done well
 9. **Recommendations** - Actionable next steps
 Deduplicate issues found by multiple legs (note which legs found them).
 Prioritize by impact and effort. Be actionable.
 """
-depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells"]
+depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
@@ -15,6 +15,8 @@ Each leg examines the code from a different perspective. Findings are
 collected and synthesized into a prioritized, actionable review.
 ## Legs (parallel execution)
 ### Analysis Legs (read and analyze code)
 - **correctness**: Logic errors, bugs, edge cases
 - **performance**: Bottlenecks, efficiency issues
 - **security**: Vulnerabilities, OWASP concerns
@@ -23,6 +25,16 @@ collected and synthesized into a prioritized, actionable review.
 - **style**: Convention compliance, consistency
 - **smells**: Anti-patterns, technical debt
 ### Verification Legs (check implementation quality)
 - **wiring**: Installed-but-not-wired gaps (deps added but not used)
 - **commit-discipline**: Commit quality and atomicity
 - **test-quality**: Test meaningfulness, not just coverage
 ## Presets
 - **gate**: Light review for automatic flow (wiring, security, smells, test-quality)
 - **full**: Comprehensive review (all 10 legs)
 - **custom**: Select specific legs via --legs flag
 ## Execution Model
 1. Each leg spawns as a separate polecat
 2. Polecats work in parallel
@@ -293,6 +305,125 @@ Review the code for code smells and anti-patterns.
 - Is technical debt being added or paid down?
 """
 # ============================================================================
 # VERIFICATION LEGS - Check implementation quality (not just code analysis)
 # ============================================================================
 [[legs]]
 id = "wiring"
 title = "Wiring Review"
 focus = "Installed-but-not-wired gaps"
 description = """
 Detect dependencies, configs, or libraries that were added but not actually used.
 This catches subtle bugs where the implementer THINKS they integrated something,
 but the old implementation is still being used.
 **Look for:**
 - New dependency in manifest but never imported
  - Go: module in go.mod but no import
  - Rust: crate in Cargo.toml but no `use`
  - Node: package in package.json but no import/require
 - SDK added but old implementation remains
  - Added Sentry but still using console.error for errors
  - Added Zod but still using manual typeof validation
 - Config/env var defined but never loaded
  - New .env var that isn't accessed in code
 **Questions to answer:**
 - Is every new dependency actually used?
 - Are there old patterns that should have been replaced?
 - Is there dead config that suggests incomplete migration?
 """
 [[legs]]
 id = "commit-discipline"
 title = "Commit Discipline Review"
 focus = "Commit quality and atomicity"
 description = """
 Review commit history for good practices.
 Good commits make the codebase easier to understand, bisect, and revert.
 **Look for:**
 - Giant "WIP" or "fix" commits
  - Multiple unrelated changes in one commit
  - Commits that touch 20+ files across different features
 - Poor commit messages
  - "stuff", "update", "asdf", "fix"
  - No context about WHY the change was made
 - Unatomic commits
  - Feature + refactor + bugfix in same commit
  - Should be separable logical units
 - Missing type prefixes (if project uses conventional commits)
  - feat:, fix:, refactor:, test:, docs:, chore:
 **Questions to answer:**
 - Could this history be bisected effectively?
 - Would a reviewer understand the progression?
 - Are commits atomic (one logical change each)?
 """
 [[legs]]
 id = "test-quality"
 title = "Test Quality Review"
 focus = "Test meaningfulness, not just coverage"
 description = """
 Verify tests are actually testing something meaningful.
 Coverage numbers lie. A test that can't fail provides no value.
 **Look for:**
 - Weak assertions
  - Only checking != nil / !== null / is not None
  - Using .is_ok() without checking the value
  - assertTrue(true) or equivalent
 - Missing negative test cases
  - Happy path only, no error cases
  - No boundary testing
  - No invalid input testing
 - Tests that can't fail
  - Mocked so heavily the test is meaningless
  - Testing implementation details, not behavior
 - Flaky test indicators
  - Sleep/delay in tests
  - Time-dependent assertions
 **Questions to answer:**
 - Do these tests actually verify behavior?
 - Would a bug in the implementation cause a test failure?
 - Are edge cases and error paths tested?
 """
 # ============================================================================
 # PRESETS - Configurable leg selection
 # ============================================================================
 [presets]
 [presets.gate]
 description = "Light review for automatic flow - fast, focused on blockers"
 legs = ["wiring", "security", "smells", "test-quality"]
 [presets.full]
 description = "Comprehensive review - all legs, for major features"
 legs = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]
 [presets.security-focused]
 description = "Security-heavy review for sensitive changes"
 legs = ["security", "resilience", "correctness", "wiring"]
 [presets.refactor]
 description = "Review focused on code quality during refactoring"
 legs = ["elegance", "smells", "style", "commit-discipline"]
 # Synthesis step - combines all leg outputs
 [synthesis]
 title = "Review Synthesis"
@@ -310,10 +441,13 @@ A synthesized review at: {{.output.directory}}/{{.output.synthesis}}
 2. **Critical Issues** - P0 items from all legs, deduplicated
 3. **Major Issues** - P1 items, grouped by theme
 4. **Minor Issues** - P2 items, briefly listed
-5. **Positive Observations** - What's done well
+5. **Wiring Gaps** - Dependencies added but not used (from wiring leg)
-6. **Recommendations** - Actionable next steps
+6. **Commit Quality** - Notes on commit discipline
 7. **Test Quality** - Assessment of test meaningfulness
 8. **Positive Observations** - What's done well
 9. **Recommendations** - Actionable next steps
 Deduplicate issues found by multiple legs (note which legs found them).
 Prioritize by impact and effort. Be actionable.
 """
-depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells"]
+depends_on = ["correctness", "performance", "security", "elegance", "resilience", "style", "smells", "wiring", "commit-discipline", "test-quality"]