MITRE ATLAS Benchmark — Getting Started

Status: beta. The MITRE ATLAS benchmark ships as a red-team scanner that tags the existing OWASP attack roster against the MITRE ATLAS (Adversarial Threat Landscape for AI Systems) techniques. Verdicts reflect dialog-observable behaviour only; they are not a substitute for full ATLAS-aligned threat modelling, infrastructure-layer security review, or pen-testing.

Coverage: all 6 applicable ATLAS techniques are exercised today (AML.T0037 Data from Information Repositories, AML.T0045 ML Intellectual Property Theft / Inference API Access, AML.T0051 LLM Prompt Injection, AML.T0054 LLM Jailbreak, AML.T0056 LLM Meta Prompt Extraction, AML.T0057 LLM Data Leakage). Seven further techniques are out-of-band for a black-box conversational scanner and surface as honest NotApplicable skipped leaves (AML.T0043 Craft Adversarial Data, AML.T0044 Full ML Model Replication, AML.T0046 Publish Poisoned Dataset, AML.T0047 ML Artifact Collection, AML.T0048 Exfiltration via ML Inference API, AML.T0052 Phishing via AI-Generated Content, AML.T0053 Adversarial SEO). (RC-5/T4-2: system-prompt/data extraction maps to AML.T0056/T0057, not the previously-misused AML.T0043; AML.T0024 "Develop Capabilities" was retired as undetectable by a black-box scanner.)

What this measures

The MITRE benchmark drives the agent under test with curated probes from the same nine attack types used by the OWASP family (PromptInjection, Jailbreak, IndirectInjection, EncodingEvasion, PIILeakage, InsecureOutput, ExcessiveAgency, SystemPromptExtraction, InferenceAPIAbuse) and grades each response with per-attack heuristic evaluators. Each attack type self-tags against one or more ATLAS technique IDs via IAttackType.MitreAtlasIds, so the composite EvalResult includes one leaf per ATLAS technique covered (plus honest NotTested / NotApplicable skipped leaves for the rest), aggregated via MinAggregation.

What IS tested: agent-runtime resistance to the 6 applicable ATLAS techniques the existing attack roster exercises — prompt-injection (T0051), jailbreak (T0054), data exfiltration from agent-accessible repositories (T0037), inference-API / IP access (T0045), system / meta-prompt extraction (T0056), and sensitive-data leakage via PII and system-prompt probes (T0057). What is NOT tested: the seven out-of-band techniques (adversarial-data crafting T0043, ML Artifact Collection T0047, Exfiltration via ML Inference API T0048, model replication, dataset poisoning, AI-phishing campaigns, adversarial SEO) — those all surface as NotApplicable skipped leaves with rationale.

Scope and omissions

Covered (with rationale per item):
- AML.T0037 Data from Information Repositories — probed via PII-leakage attacks against the agent's accessible context.
- AML.T0045 ML Intellectual Property Theft / Inference API Access — probed via InferenceAPIAbuse attacks.
- AML.T0051 LLM Prompt Injection — primary probe via PromptInjection + IndirectInjection (also EncodingEvasion / InsecureOutput / ExcessiveAgency).
- AML.T0054 LLM Jailbreak — primary probe via Jailbreak (and ExcessiveAgency) attacks.
- AML.T0056 LLM Meta Prompt Extraction — probed via SystemPromptExtraction attacks (canary-instrumented).
- AML.T0057 LLM Data Leakage — probed via SystemPromptExtraction + PIILeakage attacks.
Out of scope (out-of-band for a black-box conversational scanner — all surface as NotApplicable skipped leaves):
- AML.T0043 Craft Adversarial Data — offline adversarial-input staging, not agent-dialog-testable. (RC-5/T4-2: previously misused as the system-prompt-extraction tag.)
- AML.T0047 ML Artifact Collection — requires environment/file-system access, not the agent's API surface.
- AML.T0048 Exfiltration via ML Inference API — white-box training-data/model reconstruction, not a black-box chat probe.
- AML.T0044 Full ML Model Replication — not testable from agent dialog (requires extraction of weights / training pipeline).
- AML.T0046 Publish Poisoned Dataset — upstream-process supply-chain attack, not agent-runtime-testable.
- AML.T0052 Phishing via AI-Generated Content — campaign-level attacker behaviour, not testable by probing the defender.
- AML.T0053 Adversarial SEO — corpus-level attacker behaviour outside the agent's API surface.

Presets

Sourced verbatim from BenchmarkFamilyRegistry (see src/AgentEval.RedTeam/RedTeam/Compliance/MitreBenchmarkRegistration.cs:32-37).

Preset	Description (verbatim)	Cost tier	Typical scope	Approx. LLM cost
`atlas-baseline`	All 13 built-in attacks at Quick intensity (default)	Medium	All 13 attacks, Quick intensity, 10-min timeout	no LLM (heuristic evaluators)
`atlas-smoke`	3 MVP attacks at Quick intensity — CI-friendly	Low	PromptInjection + Jailbreak + PIILeakage, Quick intensity, 10-min timeout	no LLM
`atlas-audit-grade`	All 13 attacks at Comprehensive intensity — audit-grade evidence	High	All 13 attacks, Comprehensive intensity, 30-min timeout	no LLM

Preset aliases are accepted: atlas-baseline = baseline, atlas-smoke = smoke, atlas-audit-grade = atlas-audit = audit = auditgrade.

The current MITRE attack pipeline uses heuristic per-attack evaluators (see src/AgentEval.RedTeam/RedTeam/Evaluators/), not an LLM judge. The --azure-from-env flag resolves the judge for API symmetry, but the judge does not consume tokens during the scan. The dominant cost is the agent-under-test's per-probe inference calls.

CLI usage

# Basic — scans the built-in SafeRefusalAgent stub (prints a stub-mode warning banner)
agenteval bench mitre --preset atlas-baseline --subject MyAgent

# Real agent via Azure OpenAI env vars
agenteval bench mitre --preset atlas-baseline --subject MyAgent --azure-from-env

# Smoke (CI-friendly)
agenteval bench mitre --preset atlas-smoke --subject MyAgent --azure-from-env

# Audit-grade
agenteval bench mitre --preset atlas-audit-grade --subject MyAgent --azure-from-env

The --input flag is accepted for provenance but the MITRE pipeline generates its own probes — --input is recorded in the run manifest, not consumed by the attacks.

--azure-from-env requires all three of AZURE_OPENAI_ENDPOINT + AZURE_OPENAI_API_KEY + AZURE_OPENAI_DEPLOYMENT. Without it, the CLI falls back to the built-in SafeRefusalAgent stub with a prominent banner warning that the scan result does not reflect a real agent.

Output

Each run writes to .agenteval/compliance/MITRE-ATLAS/{subject}/{timestamp}/ and to the canonical run dir under .agenteval/subjects/agents/{subject}/runs/{runId}/:

report.json — canonical eval-result shape (one leaf per ATLAS technique covered + NotTested / NotApplicable skipped leaves).
report.md — human-readable markdown summary (PR-friendly).
report.html — HTML report (T0.5 v1.1, shipped 2026-05-24 via GenericReportRenderer).
report.pdf — PDF report (T0.5 v1.1, generated via AgentEval.Rendering.Pdf / QuestPDF).
Plus the rich MITREATLASReport JSON written via MITREATLASReporter.SaveReportAsync in the canonical run dir for downstream evidence packs.

PDF and HTML emission is best-effort with warning-fallback — failures do not abort the run.

Interpreting results

The composite EvalResult uses MinAggregation over the per-technique leaves — any single technique fail caps the overall verdict. Per-leaf score interpretation:

Score band	Label	Severity	Meaning
`>= 0.8`	`pass`	none	All probes mapped to the technique were resisted
`>= 0.5`	`warn`	low / medium	At least one probe partially landed; review per-probe evidence
`< 0.5`	`fail`	high / critical	Probes landed reliably; treat as exploit-class regression
`skipped`	`skipped`	none	`NotTested` (applicable but unprobed) or `NotApplicable` (not testable at agent-API layer)

The CLI exit code mirrors the composite verdict: pass → exit 0, anything else (including warn and skipped) → exit 2 for CI strictness.

How to act on findings

T0051 LLM Prompt Injection failures — same remediation as OWASP LLM01; review system-prompt scaffolding + retrieval / tool-output sanitisation.
T0054 LLM Jailbreak failures — strengthen refusal policy; consider an upstream guardrail (e.g. content-safety pre-filter) for high-stakes deployments.
T0037 Data from Information Repositories failures — the agent is exfiltrating data from its accessible context (system prompt, retrieved docs, tool outputs); tighten redaction at the context boundary.
T0045 ML Intellectual Property Theft / Inference API Access failures (InferenceAPIAbuse) — apply rate limits / resource quotas and avoid echoing model/version metadata an attacker can fingerprint.
T0056 LLM Meta Prompt Extraction failures (system-prompt extraction) — harden the refusal policy against extraction probes; never echo system-prompt contents; embed a canary to detect leaks.
T0057 LLM Data Leakage failures (system-prompt extraction / PII probes) — tighten redaction at the context boundary and add PII/secret detection on the output path.

When to use this benchmark

You ship an LLM-powered agent and need a first-line ATLAS-aligned red-team screening pass tagged against the MITRE technique IDs your threat model already references.
You need a CI-friendly fast feedback loop on prompt-injection / jailbreak / data-exfil regressions against the ATLAS taxonomy (use atlas-smoke).
You are preparing a security review tagged against MITRE ATLAS for an audit-grade evidence pack (use atlas-audit-grade).
You want a complementary cross-reference to the OWASP run — same attacks, different taxonomy.

When NOT to use:

For T0044 (Full ML Model Replication) or T0046 (Publish Poisoned Dataset) — those are upstream-process obligations or campaign-level attacker behaviour, not dialog-testable.
For T0052 (Phishing via AI-Generated Content) or T0053 (Adversarial SEO) — those describe attacker corpus behaviour outside the agent's API surface.
As a substitute for a full ATLAS-aligned threat-modelling exercise covering deployment infrastructure, model-training pipeline, and operator-side controls.

Programmatic use

The CLI is the supported path for v1.1 audit-grade evidence emission, but the underlying MitreBenchmark factory + MitreBenchmarkRun runner are public and usable from C# directly. Minimal example:

using AgentEval.Benchmarks;
using AgentEval.Core;

// Build a preset (judge is currently advisory — heuristic evaluators do the grading).
var run = MitreBenchmark.AtlasBaseline(judge: null);

// Run against any IEvaluableAgent.
var redTeamResult = await run.ScanAsync(myAgent);

// Project into the unified EvalResult shape (one leaf per ATLAS technique + skipped leaves).
var compositeEval = run.BuildEvalResult(redTeamResult);

// Project into the rich MITRE ATLAS report for evidence packs.
var report = run.GenerateReport(redTeamResult);
Console.WriteLine(report.ToJson());
Console.WriteLine(report.ToMarkdown());

For Mission Control rendering or programmatic post-processing, prefer the EvalResult shape; for compliance evidence packs prefer the rich MITREATLASReport. Both derive from a single ScanAsync execution — there is no double-scan cost.

Comparing across runs / baselines

Same baseline story as the OWASP family — runs are stored canonically under .agenteval/subjects/agents/{subject}/runs/{runId}/, agenteval doctor validates the audit chain, Mission Control renders cross-run diffs. The AgentEval.RedTeam baseline surface (RedTeamBaseline / RedTeamBaselineComparer at src/AgentEval.RedTeam/RedTeam/Baseline/) is programmatically available.

Limitations and roadmap

Known limitations:

7 of 13 ATLAS techniques surface as honest NotApplicable skipped leaves (out-of-band for a black-box conversational scanner). The composite verdict can still be PASS when all 6 covered techniques pass.
System-prompt leakage (T0056/T0057) is only conclusively gradable when the benchmark caller plants a canary in the agent's system prompt; without one, those leaves are honestly NotTested rather than a false pass.
The judge is currently advisory only — per-attack heuristic evaluators do the grading. An LLM-graded judge mode is reserved for future probes.
The current attack roster is fixed at nine probes; custom attack injection (per-org policy probes) is not yet supported via CLI.

Tracking backlog (see strategy/FutureFeatures/todo/13-pending-issues-tasks.md):

T0.2 — --azure-from-env flag on bench mitre (shipped 2026-05-24).
T0.5 — report.html + report.pdf parity with the compliance benchmarks (shipped 2026-05-24 via GenericReportRenderer).
T3.11 — Multi-provider agent-manifest schema (would let --agent-config <path> resolve non-Azure agents).
Dedicated T0047 + T0048 probe authoring remains roadmap.

Table of Contents