AgentEval CLI Reference
The AgentEval CLI provides command-line tools for running evaluations and managing configurations in CI/CD pipelines.
Installation
# Install as a global .NET tool
dotnet tool install -g AgentEval.Cli
# Or install locally in your project
dotnet tool install AgentEval.Cli
Commands
eval
Run evaluations against an AI agent.
agenteval eval [options]
Options:
| Option | Alias | Description | Default |
|---|---|---|---|
--config <path> |
-c |
Path to evaluation configuration file (YAML or JSON) | - |
--dataset <path> |
-d |
Path to dataset file (JSON, JSONL, CSV, YAML) | - |
--output <path> |
-o |
Output file path for results | stdout |
--format <format> |
-f |
Output format (json, junit, markdown, trx) | json |
--baseline <path> |
-b |
Baseline file for regression comparison | - |
--fail-on-regression |
Exit with code 1 if regressions detected | false | |
--pass-threshold <n> |
Minimum score to pass (0-100) | 70 |
Examples:
# Run evaluation with JSON dataset
agenteval eval --dataset testcases.json --format junit --output results.xml
# Run with config file and YAML dataset
agenteval eval --config agent-config.json --dataset cases.yaml --format markdown
# Set custom pass threshold
agenteval eval --dataset data.jsonl --pass-threshold 80
# Compare against baseline
agenteval eval --dataset tests.json --baseline baseline.json --fail-on-regression
init
Create a starter evaluation configuration file.
agenteval init [options]
Options:
| Option | Alias | Description | Default |
|---|---|---|---|
--output <path> |
-o |
Output path for configuration file | agenteval.json |
--format <format> |
-f |
Configuration format (json, yaml) | json |
Examples:
# Create JSON configuration
agenteval init
# Create YAML configuration
agenteval init --format yaml --output agenteval.yaml
list
List available metrics, assertions, and formats.
agenteval list <subcommand>
Subcommands:
| Subcommand | Description |
|---|---|
metrics |
List all available evaluation metrics |
assertions |
List all available assertion types |
formats |
List available output formats |
Examples:
# List available metrics
agenteval list metrics
# List assertion types
agenteval list assertions
# List output formats
agenteval list formats
Dataset Formats
The CLI supports multiple dataset formats for loading test cases.
JSON
[
{
"name": "Test Case 1",
"input": "What is the weather?",
"expectedOutput": "The weather is sunny",
"context": ["Weather data: sunny, 72°F"]
}
]
JSONL (JSON Lines)
{"name": "Test 1", "input": "Hello", "expectedOutput": "Hi there!"}
{"name": "Test 2", "input": "Goodbye", "expectedOutput": "See you later!"}
CSV
name,input,expectedOutput,context
Test 1,What is 2+2?,4,
Test 2,Capital of France?,Paris,Geography data
YAML
- name: Test Case 1
input: What is the weather?
expectedOutput: The weather is sunny
context:
- "Weather data: sunny, 72°F"
- name: Test Case 2
input: Book a flight
expectedOutput: Flight booked successfully
expectedTools:
- FlightSearch
- BookFlight
Output Formats
Console (default)
Human-readable output with colors and formatting.
JSON
{
"summary": {
"total": 10,
"passed": 8,
"failed": 2,
"duration": "00:00:15.234"
},
"results": [...]
}
JUnit XML
Compatible with CI systems like GitHub Actions, Azure DevOps, Jenkins.
<?xml version="1.0" encoding="utf-8"?>
<testsuites>
<testsuite name="AgentEval" tests="10" failures="2" time="15.234">
<testcase name="Test Case 1" time="1.234" />
<testcase name="Test Case 2" time="2.345">
<failure message="Expected output mismatch">...</failure>
</testcase>
</testsuite>
</testsuites>
Markdown
# Evaluation Results
| Test Case | Status | Duration | Score |
|-----------|--------|----------|-------|
| Test 1 | ✅ Pass | 1.23s | 95% |
| Test 2 | ❌ Fail | 2.34s | 45% |
## Summary
- **Total:** 10
- **Passed:** 8
- **Failed:** 2
CI/CD Integration
GitHub Actions
name: Agent Evaluation
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Install AgentEval CLI
run: dotnet tool install -g AgentEval.Cli
- name: Run Evaluation
run: agenteval eval --dataset tests/cases.jsonl --format junit --output results.xml
- name: Publish Results
uses: dorny/test-reporter@v1
if: always()
with:
name: Agent Tests
path: results.xml
reporter: java-junit
Azure DevOps
trigger:
- main
pool:
vmImage: 'ubuntu-latest'
steps:
- task: UseDotNet@2
inputs:
version: '8.0.x'
- script: dotnet tool install -g AgentEval.Cli
displayName: 'Install AgentEval CLI'
- script: agenteval eval --dataset tests/cases.jsonl --format junit --output $(Build.ArtifactStagingDirectory)/results.xml
displayName: 'Run Evaluation'
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: '$(Build.ArtifactStagingDirectory)/results.xml'
Programmatic Usage
You can also use the exporters and loaders programmatically from the main AgentEval library:
using AgentEval.DataLoaders;
using AgentEval.Exporters;
// Load test cases from various formats
var jsonlLoader = DatasetLoaderFactory.CreateFromExtension(".jsonl");
var testCases = await jsonlLoader.LoadAsync("testcases.jsonl");
// Or create by format name
var yamlLoader = DatasetLoaderFactory.Create("yaml");
var yamlCases = await yamlLoader.LoadAsync("testcases.yaml");
// Export results to various formats
var report = new EvaluationReport { /* ... */ };
var exporter = ResultExporterFactory.Create(ExportFormat.JUnit);
await exporter.ExportAsync(report, "results.xml");
// Register custom loaders
DatasetLoaderFactory.Register(".custom", () => new JsonlDatasetLoader());
See Also
- Benchmarks - Running performance benchmarks
- Conversations - Multi-turn testing
- Extensibility - Custom exporters and loaders