Files

Steve Yegge f1e5a6206f feat(mcp): Add compaction config and extended context engineering docs

- Extended CONTEXT_ENGINEERING.md with additional optimization strategies
- Added compaction configuration support to MCP server
- Added tests for compaction config and MCP compaction

Amp-Thread-ID: https://ampcode.com/threads/T-019b1f07-daa0-750c-878f-20bcc2d24f50
Co-authored-by: Amp <amp@ampcode.com>

2025-12-14 15:12:43 -08:00

11 KiB

Raw Blame History

Context Engineering for beads-mcp

Overview

This document describes the context engineering optimizations added to beads-mcp to reduce context window usage by ~80-90% while maintaining full functionality.

The Problem

MCP servers load all tool schemas at startup, consuming significant context:

Before: ~10-50k tokens for full beads tool schemas
After: ~2-5k tokens with lazy loading and compaction

For coding agents operating in limited context windows (100k-200k tokens), this overhead leaves less room for:

Code files and diffs
Conversation history
Task planning and reasoning

Solutions Implemented

1. Lazy Tool Schema Loading

Instead of loading all tool schemas upfront, agents can discover tools on-demand:

# Step 1: Discover available tools (lightweight - ~500 bytes)
discover_tools()
# Returns: { "tools": { "ready": "Find ready tasks", ... }, "count": 15 }

# Step 2: Get details for specific tool (~300 bytes each)
get_tool_info("ready")
# Returns: { "name": "ready", "parameters": {...}, "example": "..." }

Savings: ~95% reduction in initial schema overhead

2. Minimal Issue Models

List operations now return IssueMinimal instead of full Issue:

# IssueMinimal (~80 bytes per issue)
{
  "id": "bd-a1b2",
  "title": "Fix auth bug",
  "status": "open",
  "priority": 1,
  "issue_type": "bug",
  "assignee": "alice",
  "labels": ["backend"],
  "dependency_count": 2,
  "dependent_count": 0
}

# vs Full Issue (~400 bytes per issue)
{
  "id": "bd-a1b2",
  "title": "Fix auth bug",
  "description": "Long description...",
  "design": "Design notes...",
  "acceptance_criteria": "...",
  "notes": "...",
  "status": "open",
  "priority": 1,
  "issue_type": "bug",
  "created_at": "2024-01-01T...",
  "updated_at": "2024-01-02T...",
  "closed_at": null,
  "assignee": "alice",
  "labels": ["backend"],
  "dependencies": [...],
  "dependents": [...],
  ...
}

Savings: ~80% reduction per issue in list views

3. Result Compaction

When results exceed threshold (20 issues), returns preview + metadata:

# Request: list(status="open")
# Response when >20 results:
{
  "compacted": true,
  "total_count": 47,
  "preview": [/* first 5 issues */],
  "preview_count": 5,
  "hint": "Use show(issue_id) for full details or add filters"
}

Savings: Prevents unbounded context growth from large queries

Usage Patterns

Efficient Workflow (Recommended)

# 1. Set context once
set_context(workspace_root="/path/to/project")

# 2. Get ready work (minimal format)
issues = ready(limit=10, priority=1)

# 3. Pick an issue and get full details only when needed
full_issue = show(issue_id="bd-a1b2")

# 4. Do work...

# 5. Close when done
close(issue_id="bd-a1b2", reason="Fixed in PR #123")

Tool Discovery Workflow

# First time using beads? Discover tools efficiently:
tools = discover_tools()
# → {"tools": {"ready": "...", "list": "...", ...}, "count": 15}

# Need to know how to use a specific tool?
info = get_tool_info("create")
# → {"parameters": {...}, "example": "create(title='...', ...)"}

Handling Large Result Sets

When a query returns more than 20 results, the response switches to CompactedResult format. This section explains how to detect and handle compacted responses.

CompactedResult Schema

# Response when >20 results
{
    "compacted": True,
    "total_count": 47,           # Total matching issues (not shown)
    "preview": [
        {
            "id": "bd-a1b2",
            "title": "Fix auth bug",
            "status": "open",
            "priority": 1,
            "issue_type": "bug",
            "assignee": "alice",
            "labels": ["backend"],
            "dependency_count": 2,
            "dependent_count": 0
        },
        # ... 4 more issues (PREVIEW_COUNT=5)
    ],
    "preview_count": 5,
    "hint": "Use show(issue_id) for full details or add filters"
}

Detecting Compacted Results

Check the compacted field in the response:

import json

def handle_issue_list(response):
    """Handle both list and compacted responses."""
    
    if isinstance(response, dict) and response.get("compacted"):
        # Compacted response
        total = response["total_count"]
        shown = response["preview_count"]
        issues = response["preview"]
        print(f"Showing {shown} of {total} issues")
        return issues
    else:
        # Regular list response (all results included)
        return response

Getting Full Results When Compacted

When you get a compacted response, you have several options:

Option 1: Use `show()` for Specific Issues

# Get full details for a specific issue
full_issue = show(issue_id="bd-a1b2")
# Returns complete Issue model with dependencies, description, etc.

Option 2: Narrow Your Query

Add filters to reduce the result set:

# Instead of list(status="open")  # Returns 47+ results
# Try:
issues = list(status="open", priority=0)  # Returns 8 results

Option 3: Check Type Hints

The response type tells you what to expect:

from typing import Union

def handle_response(response: Union[list, dict]):
    """Properly typed response handling."""
    
    if isinstance(response, dict) and response.get("compacted"):
        # Handle CompactedResult
        for issue in response["preview"]:
            process_minimal_issue(issue)
        print(f"Note: {response['total_count']} total issues exist")
    else:
        # Handle list[IssueMinimal]
        for issue in response:
            process_minimal_issue(issue)

Python Client Example

Here's a complete example handling both response types:

class BeadsClient:
    """Example client with proper compaction handling."""
    
    def get_all_ready_work(self):
        """Safely get ready work, handling compaction."""
        response = self.ready(limit=10, priority=1)
        
        # Check if compacted
        if isinstance(response, dict) and response.get("compacted"):
            print(f"Warning: Showing {response['preview_count']} "
                  f"of {response['total_count']} ready items")
            print(f"Hint: {response['hint']}")
            return response["preview"]
        
        # Full list returned
        return response
    
    def list_with_fallback(self, **filters):
        """List issues, with automatic filter refinement on compaction."""
        response = self.list(**filters)
        
        if isinstance(response, dict) and response.get("compacted"):
            # Too many results - add priority filter to narrow down
            if "priority" not in filters:
                print(f"Too many results ({response['total_count']}). "
                      "Filtering by priority=1...")
                filters["priority"] = 1
                return self.list_with_fallback(**filters)
            else:
                # Can't narrow further, return preview
                return response["preview"]
        
        return response
    
    def show_full_issue(self, issue_id: str):
        """Always get full issue details (never compacted)."""
        return self.show(issue_id=issue_id)

Migration Guide for Clients

If your client was written expecting list() and ready() to always return list[Issue], follow these steps:

Step 1: Update Type Hints

# OLD (incorrect with new server)
def process_issues(issues: list[Issue]) -> None:
    for issue in issues:
        print(issue.description)

# NEW (handles both cases)
from typing import Union

IssueListOrCompacted = Union[list[IssueMinimal], CompactedResult]

def process_issues(response: IssueListOrCompacted) -> None:
    if isinstance(response, dict) and response.get("compacted"):
        issues = response["preview"]
        print(f"Note: Only showing preview of {response['total_count']} total")
    else:
        issues = response
    
    for issue in issues:
        print(f"{issue.id}: {issue.title}")  # Works with IssueMinimal

Step 2: Handle Missing Fields

IssueMinimal doesn't include description, design, or dependencies. Adjust your code:

# OLD (would fail if using IssueMinimal)
for issue in issues:
    print(f"{issue.title}\n{issue.description}")

# NEW (only use available fields)
for issue in issues:
    print(f"{issue.title}")
    if hasattr(issue, 'description'):
        print(issue.description)
    elif need_description:
        full = show(issue.id)
        print(full.description)

Step 3: Use `show()` for Full Details

When you need dependencies or detailed information:

# Get minimal info for listing
ready_issues = ready(limit=20)

# For detailed work, fetch full issue
for minimal_issue in ready_issues if not isinstance(ready_issues, dict) else ready_issues.get("preview", []):
    full_issue = show(issue_id=minimal_issue.id)
    print(f"Dependencies: {full_issue.dependencies}")

Configuration

Environment Variables (v0.29.0+)

Compaction behavior can be tuned via environment variables:

# Set custom compaction threshold
export BEADS_MCP_COMPACTION_THRESHOLD=50

# Set custom preview size
export BEADS_MCP_PREVIEW_COUNT=10

Environment Variables:

Variable	Default	Purpose	Constraints
`BEADS_MCP_COMPACTION_THRESHOLD`	20	Compact results with more than N issues	Must be ≥ 1
`BEADS_MCP_PREVIEW_COUNT`	5	Show first N issues in preview	Must be ≥ 1 and ≤ threshold

Examples:

# Disable compaction by setting high threshold
BEADS_MCP_COMPACTION_THRESHOLD=10000 beads-mcp

# Show more preview items in compacted results
BEADS_MCP_PREVIEW_COUNT=10 beads-mcp

# Show fewer items for limited context windows
BEADS_MCP_COMPACTION_THRESHOLD=10 BEADS_MCP_PREVIEW_COUNT=3 beads-mcp

Default Values:

If not set, the server uses:

COMPACTION_THRESHOLD = 20  # Compact results with more than 20 issues
PREVIEW_COUNT = 5          # Show first 5 issues in preview

Use Cases:

Tight context windows (100k tokens): Reduce threshold and preview count
```
BEADS_MCP_COMPACTION_THRESHOLD=10 BEADS_MCP_PREVIEW_COUNT=3
```
Plenty of context (200k+ tokens): Increase both settings or disable compaction
```
BEADS_MCP_COMPACTION_THRESHOLD=1000 BEADS_MCP_PREVIEW_COUNT=20
```

Debugging/Testing: Disable compaction entirely

BEADS_MCP_COMPACTION_THRESHOLD=999999 beads-mcp

Comparison

Scenario	Before	After	Savings
Tool schemas (all)	~15,000 bytes	~500 bytes	97%
List 50 issues	~20,000 bytes	~4,000 bytes	80%
Ready work (10)	~4,000 bytes	~800 bytes	80%
Single show()	~400 bytes	~400 bytes	0% (full details)

Design Principles

Lazy Loading: Only fetch what you need, when you need it
Minimal by Default: List views use lightweight models
Full Details On-Demand: Use show() for complete information
Graceful Degradation: Large results auto-compact with hints
Backward Compatible: Existing workflows continue to work

Credits

Inspired by:

MCP Bridge - Context engineering for MCP servers
Manus Context Engineering - Compaction and offloading patterns
Anthropic's Context Engineering Guide

11 KiB Raw Blame History