Library of mcp-witness dynamic test scenarios. Format specified in ../docs/scenario-schema.md.
These are the highest-yield scenarios chosen for the initial implementation. Together they exercise tool-description injection, server-side input validation (path traversal and SSRF), trust-on-first-use violation (rug pull), and indirect prompt injection via invisible Unicode — covering the most common bug classes the audit phase is expected to find in real-world MCP servers.
| ID | Category | Severity | Primary target | What it tests |
|---|---|---|---|---|
| MCP-D-001 | tool.description_injection |
high | agent | Does the agent obey instructions hidden in a tool description? |
| MCP-D-002 | tool.input.path_traversal |
critical | server | Does the server canonicalize and root-check filesystem paths? |
| MCP-D-003 | tool.input.ssrf |
high | server | Does the server validate URL schemes and hosts? |
| MCP-D-004 | tool.rug_pull |
high | agent | Does the agent re-prompt for approval when tool defs change mid-session? |
| MCP-D-005 | tool.output.injection_unicode_tags |
high | agent | Does the agent act on instructions encoded in invisible Unicode tag chars? |
| MCP-D-006 | tool.description_injection |
high | agent | Does the agent follow capability-redefinition framing (“now you can”, “compliance pipeline”) that avoids obvious imperatives? |
| MCP-D-007 | tool.input.ssrf |
critical | server (cloud) | Does the server exfiltrate real cloud metadata when probed with IMDS/GCP/Azure URLs? Strict oracle — only fires on metadata-shaped content. |
“Primary target” indicates the more likely failure mode — server-side validation vs. agent-side trust. Several scenarios catch both at once; the column points at the dominant defect class.
Not yet authored, planned for v0.2+:
cross.confused_deputy)resource.content_injection)prompt.argument_injection)redirect_uri validation weakness (transport.oauth_redirect)transport.dns_rebinding)readOnlyHint=true on writing tools (tool.annotation_lying)cross.agent_loop)tool.output.injection_ansi)tool.output.injection_markdown_image)sampling.credit_theft)Target for v0.3 is full coverage of every category in the taxonomy.