Getting Started with AgentEval
Time to complete: 5 minutes
This guide walks you through installing AgentEval and writing your first AI agent test.
Quick Path
| Time | Step |
|---|---|
| 1 min | Install NuGet package |
| 2 min | Create a MAF agent |
| 2 min | Write your first test |
Prerequisites
- .NET 8.0, 9.0, or 10.0 SDK
- An xUnit, NUnit, or MSTest test project
- Azure OpenAI or OpenAI API access
Required Environment Variables
Set these before running tests:
# PowerShell
$env:AZURE_OPENAI_ENDPOINT = "https://your-resource.openai.azure.com/"
$env:AZURE_OPENAI_API_KEY = "your-api-key"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o" # Your deployment name
# Bash/Linux/macOS
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
Tip: Add these to your
.bashrc,.zshrc, or Windows user environment variables for persistence.
Installation
Install the AgentEval NuGet package:
dotnet add package AgentEval --prerelease
Creating a MAF Agent
AgentEval works with Microsoft Agent Framework (MAF) agents. Here's how to create one:
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
public static AIAgent CreateMyAgent()
{
// Connect to Azure OpenAI
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
var chatClient = azureClient
.GetChatClient(deployment)
.AsIChatClient();
// Create a MAF ChatClientAgent
return new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "MyAgent",
Instructions = "You are a helpful assistant."
});
}
Adding Tools to Your Agent
Agents with tools are more powerful and testable:
public static AIAgent CreateWeatherAgent()
{
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
var chatClient = azureClient
.GetChatClient(deployment)
.AsIChatClient();
return new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "WeatherAgent",
Instructions = "You are a weather assistant. Use the get_weather tool to check weather.",
Tools = [AIFunctionFactory.Create(GetWeather)] // Add your tool
});
}
// Define a tool as a simple method
[Description("Gets the current weather for a location")]
static string GetWeather(
[Description("The city name")] string location)
{
// Your actual weather API call would go here
return $"The weather in {location} is 72°F and sunny.";
}
Your First Test
1. Create a Test Class
using AgentEval;
using AgentEval.MAF;
using AgentEval.Models;
using AgentEval.Assertions;
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Xunit;
public class MyAgentTests
{
[Fact]
public async Task Agent_ShouldRespondToGreeting()
{
// Arrange: Create your MAF agent
var agent = CreateGreetingAgent();
var adapter = new MAFAgentAdapter(agent);
var harness = new MAFTestHarness();
// Arrange: Define the test case
var testCase = new TestCase
{
Name = "Greeting Test",
Input = "Hello, my name is Alice!",
ExpectedOutputContains = "Alice"
};
// Act: Run the test
var result = await harness.RunTestAsync(adapter, testCase);
// Assert: Check results
Assert.True(result.Passed, result.FailureReason);
}
private static AIAgent CreateGreetingAgent()
{
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
var chatClient = azureClient
.GetChatClient(deployment)
.AsIChatClient();
return new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "GreetingAgent",
Instructions = "You are a friendly greeting assistant. When someone introduces themselves, greet them warmly by name."
});
}
}
2. Add Tool Assertions
AgentEval shines when testing agents that use tools:
[Fact]
public async Task Agent_ShouldUseWeatherTool()
{
// Arrange: Create agent with weather tool
var agent = CreateWeatherAgent();
var adapter = new MAFAgentAdapter(agent);
var harness = new MAFTestHarness();
var testCase = new TestCase
{
Name = "Weather Query",
Input = "What's the weather in Seattle?"
};
// Act
var result = await harness.RunTestAsync(adapter, testCase);
// Assert: Fluent tool assertions
result.ToolUsage!.Should()
.HaveCalledTool("get_weather")
.WithArgument("location", "Seattle");
Assert.True(result.Passed);
}
private static AIAgent CreateWeatherAgent()
{
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
var chatClient = azureClient
.GetChatClient(deployment)
.AsIChatClient();
return new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "WeatherAgent",
Instructions = "You are a weather assistant. Use the get_weather tool to check weather conditions.",
Tools = [AIFunctionFactory.Create(GetWeather)]
});
}
[Description("Gets the current weather for a location")]
static string GetWeather([Description("The city name")] string location)
{
return $"The weather in {location} is 72°F and sunny.";
}
3. Add Performance Assertions
Track streaming performance and costs:
[Fact]
public async Task Agent_ShouldMeetPerformanceSLAs()
{
// Arrange: Reuse your agent creation method
var agent = CreateGreetingAgent();
var adapter = new MAFAgentAdapter(agent);
var harness = new MAFTestHarness();
var testCase = new TestCase
{
Name = "Performance Test",
Input = "Summarize the quarterly report."
};
// Act
var result = await harness.RunTestAsync(adapter, testCase);
// Assert: Performance metrics
result.Performance!.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(5))
.HaveTimeToFirstTokenUnder(TimeSpan.FromMilliseconds(500))
.HaveEstimatedCostUnder(0.05m);
Assert.True(result.Passed);
}
Using AI-Powered Evaluation
For more sophisticated evaluation, provide an LLM evaluator:
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
public class AdvancedAgentTests
{
private readonly IChatClient _evaluator;
public AdvancedAgentTests()
{
// Create an evaluator (any IChatClient implementation)
var client = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
_evaluator = client.GetChatClient(deployment).AsIChatClient();
}
[Fact]
public async Task Agent_ShouldProvideHelpfulResponse()
{
// Arrange: Use evaluator for AI-powered scoring
var harness = new MAFTestHarness(_evaluator);
var agent = CreateHelpDeskAgent();
var adapter = new MAFAgentAdapter(agent);
var testCase = new TestCase
{
Name = "Helpfulness Test",
Input = "How do I reset my password?",
EvaluationCriteria = "Response should provide clear step-by-step instructions"
};
// Act
var result = await harness.RunTestAsync(adapter, testCase);
// Assert: AI-evaluated quality
Assert.True(result.Passed, result.Details);
Assert.True(result.Score >= 80, $"Score was {result.Score}");
}
private static AIAgent CreateHelpDeskAgent()
{
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
var chatClient = azureClient
.GetChatClient(deployment)
.AsIChatClient();
return new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "HelpDeskAgent",
Instructions = "You are a helpful IT support agent. Provide clear, step-by-step instructions."
});
}
}
Running Tests from CLI
AgentEval includes a CLI for batch evaluation:
# Initialize a new evaluation project
agenteval init
# Run evaluations from a dataset
agenteval eval --dataset tests.yaml --output results/
# List available exporters
agenteval list exporters
Dataset-Driven Testing
Load test cases from files:
using AgentEval.DataLoaders;
using AgentEval.MAF;
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
[Fact]
public async Task Agent_ShouldPassAllDatasetTests()
{
// Load test cases from YAML
var loader = new YamlDatasetLoader();
var testCases = await loader.LoadAsync("testcases.yaml");
// Create agent and harness
var agent = CreateMyAgent();
var harness = new MAFTestHarness(_evaluator);
var adapter = new MAFAgentAdapter(agent);
// Run all test cases
var summary = await harness.RunBatchAsync(adapter, testCases);
// Assert all passed
Assert.Equal(summary.TotalTests, summary.PassedTests);
}
private static AIAgent CreateMyAgent()
{
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o";
var chatClient = azureClient
.GetChatClient(deployment)
.AsIChatClient();
return new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "MyAgent",
Instructions = "You are a helpful assistant."
});
}
Example testcases.yaml:
- name: Greeting Test
input: Hello, how are you?
expected_output_contains: Hello
- name: Weather Query
input: What's the weather in Paris?
evaluation_criteria: Should mention weather conditions
- name: Math Problem
input: What is 25 * 4?
expected_output_contains: "100"
Next Steps
- Architecture Guide — Understand AgentEval's component model
- Benchmarks Guide — Run performance and agentic benchmarks
- CLI Reference — Full CLI command documentation
- Conversations Guide — Test multi-turn agent interactions
- Extensibility Guide — Create custom metrics and plugins
Quick Reference
Assertion Cheat Sheet
// Tool assertions
result.ToolUsage!.Should()
.HaveCalledTool("tool_name")
.NotHaveCalledTool("forbidden_tool")
.HaveCallCount(3)
.HaveCallOrder("tool1", "tool2", "tool3")
.WithArgument("param", "value")
.WithResultContaining("expected")
.WithDurationUnder(TimeSpan.FromSeconds(1));
// Performance assertions
result.Performance!.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(5))
.HaveTimeToFirstTokenUnder(TimeSpan.FromMilliseconds(500))
.HaveTokenCountUnder(1000)
.HaveEstimatedCostUnder(0.10m);
// Response assertions
result.ActualOutput!.Should()
.Contain("expected text")
.ContainAll("word1", "word2")
.ContainAny("option1", "option2")
.NotContain("forbidden")
.MatchPattern(@"\d{3}-\d{4}")
.HaveLengthBetween(100, 500);
Common Test Patterns
| Pattern | Use Case |
|---|---|
ExpectedOutputContains |
Simple substring matching |
EvaluationCriteria |
AI-powered quality evaluation |
ToolCalls.Should() |
Assert tool usage |
Performance.Should() |
Assert latency, cost, tokens |
ConversationRunner |
Multi-turn testing |
SnapshotComparer |
Regression testing |
Troubleshooting
DeploymentNotFound (HTTP 404)
Symptom: DeploymentNotFound error when running tests
Cause: The deployment name doesn't match your Azure OpenAI resource
Solution:
# Verify your deployment exists in Azure Portal
# Set the correct deployment name:
$env:AZURE_OPENAI_DEPLOYMENT = "your-actual-deployment-name"
Environment Variables Not Set
Symptom: NullReferenceException or empty configuration
Cause: Missing required environment variables
Solution: Ensure all three are set:
$env:AZURE_OPENAI_ENDPOINT # Required: https://xxx.openai.azure.com/
$env:AZURE_OPENAI_API_KEY # Required: Your API key
$env:AZURE_OPENAI_DEPLOYMENT # Required: Your deployment name (e.g., gpt-4o)
Rate Limiting (HTTP 429)
Symptom: TooManyRequests error during batch tests
Solution: Add delays between tests:
var options = new TestRunOptions
{
DelayBetweenTests = TimeSpan.FromSeconds(1)
};
Timeout Errors
Symptom: Tests timeout waiting for response
Solution: Increase timeout in test configuration:
var testCase = new TestCase
{
Name = "Long Running Test",
Input = "Generate a detailed report...",
TimeoutSeconds = 60 // Default is 30
};
Inconsistent Tool Calls
Symptom: Tool sometimes called, sometimes not
Causes:
- Prompt is ambiguous
- Temperature too high
Solution: Use more specific prompts:
// ❌ Ambiguous
var testCase = new TestCase { Input = "What's the weather?" };
// ✅ Specific
var testCase = new TestCase
{
Input = "What is the current temperature in Seattle, WA in Fahrenheit?"
};
High Variance in LLM Scores
Symptom: Quality scores vary widely between runs
Solution: Use Stochastic Testing to run multiple times and analyze statistics:
var stochasticRunner = new StochasticRunner(harness, testOptions);
var result = await stochasticRunner.RunStochasticTestAsync(
agent, testCase,
new StochasticOptions(Runs: 10, SuccessRateThreshold: 0.8));
Need help? Check the samples or open an issue on GitHub.