AgentEval Documentation
Welcome to the AgentEval documentation. AgentEval is the first .NET-native AI agent testing, evaluation, and benchmarking framework.
Quick Install
dotnet add package AgentEval --prerelease
NuGet: https://www.nuget.org/packages/AgentEval
Getting Started
| Guide | Description |
|---|---|
| Installation | Install AgentEval and verify setup |
| Quick Start | Run your first agent test in 5 minutes |
| Walkthrough | Step-by-step tutorial with examples |
Features
Tool Usage Assertions
Assert on tool calls, order, arguments, results, errors, and duration with a fluent API.
result.ToolUsage!
.Should()
.HaveCalledTool("SearchFlights")
.BeforeTool("BookFlight")
.WithArgument("destination", "Paris")
.And()
.HaveNoErrors();
Performance Metrics
Track latency, TTFT (Time To First Token), tokens, estimated cost, and per-tool timing.
result.Performance!
.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
.HaveTimeToFirstTokenUnder(TimeSpan.FromSeconds(2))
.HaveEstimatedCostUnder(0.10m);
Multi-Turn Conversation Testing
Test complex multi-turn conversations with the ConversationalTestCase builder and ConversationRunner. See Conversations.
Workflow Testing
Test multi-agent orchestration with edge assertions, conditional routing, and Mermaid diagram export. See Workflow Testing.
Snapshot Testing
Compare agent responses against saved baselines with JSON diff, field ignoring, pattern scrubbing, and semantic similarity. See Snapshots.
RAG Metrics
Evaluate faithfulness, relevance, context precision/recall, and answer correctness.
Agentic Metrics
Measure tool selection accuracy, tool arguments, tool success, task completion, and efficiency.
Benchmarks
Run latency, throughput, cost, and agentic benchmarks with percentile statistics (p50/p90/p95/p99). See Benchmarks.
CLI Tool
Full command-line interface for CI/CD integration with multiple output formats (JSON, JUnit XML, Markdown) and dataset loaders (JSON, JSONL, CSV, YAML). See CLI Reference.
Guides
| Guide | Description |
|---|---|
| Architecture | Component diagrams and metric hierarchy |
| Benchmarks | BFCL, GAIA, ToolBench guides |
| CLI Reference | Command-line tool usage |
| Conversations | Multi-turn testing guide |
| Embedding Metrics | Semantic similarity metrics |
| Extensibility | Custom metrics, plugins, adapters |
| Snapshots | Snapshot testing guide |
| Tracing & Record/Replay | Deterministic testing with trace capture |
| Workflow Testing | Multi-agent orchestration testing |
| Roadmap | Future development plans |
API Reference
API documentation is auto-generated from XML comments. Browse the API Reference section in the navigation menu for detailed type documentation.
Test Coverage
AgentEval has 1,000+ tests (3,000+ across 3 target frameworks) covering all major features.
Community
- GitHub: https://github.com/joslat/AgentEval
- NuGet: https://www.nuget.org/packages/AgentEval
- Issues: https://github.com/joslat/AgentEval/issues
- Discussions: https://github.com/joslat/AgentEval/discussions
Contributing
Contributions are welcome! Please read: