mcp-witness

Harness (Phase 2)

Dynamic test runner for mcp-witness. Loads scenario YAML files (see ../docs/scenario-schema.md), executes them against a target MCP server, and reports whether each scenario’s oracle condition fired.

Status

Working — runs end-to-end against any stdio MCP server:

Caveat on agent-side scenarios. D-001, D-004, and D-005 only produce meaningful results with --agent anthropic. The stub agent never falls for prompt injection by construction, so those scenarios will report “passed” against it. The stub is for verifying the proxy wires up correctly, not the agent’s susceptibility to injection.

Not yet implemented:

Tests

pip install -e ".[dev]"
pytest harness/tests/

Tests use a minimal mock MCP server (testing/mock_server.py) spawned as a subprocess via stdio. Coverage:

AnthropicAgent is not covered by automated tests — it needs an API key and burns tokens. Use manual verification for v0.2.

Layout

File Purpose
mcp_client.py Thin wrapper around the official MCP Python SDK with session-level trace recording.
canaries.py aiohttp HTTP canary server with per-token endpoints; records method, path, query, headers, body.
macros.py Two-pass payload macro substitution ({canary:…}, {run_id}, {path:…}, {unicode_tags:…}, …).
scenario.py Pydantic models for scenario YAML validation.
runner.py Orchestrator — setup → attack → oracle → cleanup. Step dispatch and oracle condition evaluation.
cli.py mcp-witness-test <scenario.yaml> --server-cmd …

Usage

# From the repo root:
python -m harness.cli scenarios/MCP-D-002-path-traversal-fs-tool.yaml \
    --server-cmd npx \
    --server-arg -y --server-arg @modelcontextprotocol/server-filesystem \
    --server-arg /tmp/some/workspace

Stdout is a JSON report. Exit code is 0 when no oracle evidence fired (scenario “passed” — no vulnerability detected), non-zero otherwise.

Design notes