beads/tests/integration/AGENT_MAIL_TEST_COVERAGE.md

# Agent Mail Integration Test Coverage

## Test Suite Summary

**Total test time**: ~55 seconds (all suites)
**Total tests**: 66 tests across 5 files

## Coverage by Category

### 1. HTTP Adapter Unit Tests (`lib/test_beads_mail_adapter.py`)
**51 tests in 0.019s**

✅ **Enabled/Disabled Mode**
- Server available vs unavailable
- Graceful degradation when server dies mid-operation
- Operations no-op when disabled

✅ **Reservation Operations**
- Successful reservation (201)
- Conflict handling (409)
- Custom TTL support
- Multiple reservations by same agent
- Release operations (204)
- Double release idempotency

✅ **HTTP Error Handling**
- 500 Internal Server Error
- 404 Not Found
- 409 Conflict with malformed body
- Network timeouts
- Malformed JSON responses
- Empty response bodies (204 No Content)

✅ **Configuration**
- Environment variable configuration
- Constructor parameter overrides
- URL normalization (trailing slash removal)
- Default agent name from hostname
- Timeout configuration

✅ **Authorization**
- Bearer token headers
- Missing token behavior
- Content-Type headers

✅ **Request Validation**
- Body structure for reservations
- Body structure for notifications
- URL structure for releases
- URL structure for inbox checks

✅ **Inbox & Notifications**
- Send notifications
- Check inbox with messages
- Empty inbox handling
- Dict wrapper responses
- Large message lists (100 messages)
- Nested payload data
- Empty and large payloads
- Unicode handling

### 2. Multi-Agent Race Conditions (`tests/integration/test_agent_race.py`)
**3 tests in ~15s**

✅ **Collision Prevention**
- 3 agents competing for 1 issue (WITH Agent Mail)
- Only one winner with reservations
- Multiple agents without Agent Mail (collision demo)

✅ **Stress Testing**
- 10 agents competing for 1 issue
- Exactly one winner guaranteed
- JSONL consistency verification

### 3. Server Failure Scenarios (`tests/integration/test_mail_failures.py`)
**7 tests in ~20s**

✅ **Failure Modes**
- Server never started (connection refused)
- Server crash during operation
- Network partition (timeout)
- Server 500 errors
- Invalid bearer token (401)
- Malformed JSON responses

✅ **Graceful Degradation**
- Agents continue working in Beads-only mode
- JSONL remains consistent across failures
- No crashes or data loss

### 4. Reservation TTL & Expiration (`tests/integration/test_reservation_ttl.py`)
**4 tests in ~60s** (includes 30s waits for expiration)

✅ **Time-Based Behavior**
- Short TTL reservations (30s)
- Reservation blocking verification
- Auto-release after expiration
- Renewal/heartbeat mechanisms

### 5. Multi-Agent Coordination (`tests/integration/test_multi_agent_coordination.py`)
**4 tests in ~11s** ⭐ NEW

✅ **Fairness**
- 10 agents competing for 5 issues
- Each issue claimed exactly once
- No duplicate claims in JSONL

✅ **Notifications**
- End-to-end message delivery
- Inbox consumption (messages cleared after read)
- Message structure validation

✅ **Handoff Scenarios**
- Agent releases, another immediately claims
- Clean reservation ownership transfer

✅ **Idempotency**
- Double reserve by same agent (safe)
- Double release by same agent (safe)
- Reservation count verification

## Coverage Gaps (Intentionally Not Tested)

### Low-Priority Edge Cases
- **Path traversal in issue IDs**: Issue IDs are validated elsewhere in bd
- **429 Retry-After logic**: Nice-to-have, not critical for v1
- **HTTPS/TLS verification**: Out of scope for integration layer
- **Re-enable after recovery**: Complex, requires persistent health checking
- **Token rotation mid-run**: Rare scenario, not worth complexity
- **Slow tests**: 50+ agent stress tests, soak tests, inbox flood (>10k messages)

### Why Skipped
These scenarios are either:
1. **Validated elsewhere** (e.g., issue ID validation in bd core)
2. **Low probability** (e.g., token rotation during agent run)
3. **Nice-to-have features** (e.g., automatic re-enable, retry policies)
4. **Too slow for CI** (e.g., multi-hour soak tests, 50-agent races)

## Test Execution

### Run All Tests
```bash
# Unit tests (fast, 0.02s)
python3 lib/test_beads_mail_adapter.py

# Multi-agent coordination (11s)
python3 tests/integration/test_multi_agent_coordination.py

# Race conditions (15s, requires Agent Mail server or falls back)
python3 tests/integration/test_agent_race.py

# Failure scenarios (20s)
python3 tests/integration/test_mail_failures.py

# TTL/expiration (60s - includes deliberate waits)
python3 tests/integration/test_reservation_ttl.py
```

### Quick Validation (No Slow Tests)
```bash
python3 lib/test_beads_mail_adapter.py
python3 tests/integration/test_multi_agent_coordination.py
python3 tests/integration/test_mail_failures.py
# Total: ~31s
```

## Assertions Verified

✅ **Correctness**
- Only one agent claims each issue (collision prevention)
- Notifications deliver correctly
- Reservations block other agents
- JSONL remains consistent across all failure modes

✅ **Reliability**
- Graceful degradation when server unavailable
- Idempotent operations don't corrupt state
- Expired reservations auto-release
- Handoffs work cleanly

✅ **Performance**
- Fast timeout detection (1-2s)
- No blocking on server failures
- Tests complete in reasonable time (<2min total)

## Future Enhancements (Optional)

If real-world usage reveals issues:

1. **Retry policies** with exponential backoff for 429/5xx
2. **Pagination** for inbox/reservations (if >1k messages)
3. **Automatic re-enable** with periodic health checks
4. **Agent instance IDs** to prevent same-name collisions
5. **Soak/stress testing** for production validation

Current test suite provides **strong confidence** for multi-agent workflows without overengineering.