Walkthrough: Testing Your First AI Agent
This walkthrough guides you through testing an AI agent with AgentEval, from setup to assertions.
What You'll Learn
- Setting up a test harness
- Wrapping an agent for testing
- Running a test and capturing results
- Asserting on tool usage
- Asserting on performance
- Exporting results for CI/CD
Prerequisites
- .NET 8.0+ SDK
- An AI agent (we'll use a mock for this tutorial)
- AgentEval installed (
dotnet add package AgentEval --prerelease)
Step 1: Create a Test Harness
The test harness runs your agent and captures all the data needed for assertions.
using AgentEval.MAF;
// Create a test harness with optional verbose logging
var harness = new MAFTestHarness(verbose: true);
Step 2: Wrap Your Agent
AgentEval uses adapters to wrap different agent types. For Microsoft Agent Framework agents:
using AgentEval.MAF;
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
// First, create your MAF agent
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var chatClient = azureClient
.GetChatClient("gpt-4o")
.AsIChatClient();
var myAgent = new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "TravelPlannerAgent",
Instructions = "You are a travel planning assistant.",
Tools = [
AIFunctionFactory.Create(SearchFlights),
AIFunctionFactory.Create(SearchHotels),
AIFunctionFactory.Create(GetWeather)
]
});
// Then wrap it for testing
var adapter = new MAFAgentAdapter(myAgent);
For any IChatClient:
using AgentEval.Adapters;
// Wrap an IChatClient
var adapter = new ChatClientAgentAdapter(chatClient, "MyAgent");
Step 3: Define a Test Case
Test cases describe what to test and how to evaluate the results:
using AgentEval.Models;
var testCase = new TestCase
{
Name = "Travel Planning Test",
Input = "Plan a trip to Paris for next weekend",
// Optional: Expected tools the agent should use
ExpectedTools = new[] { "SearchFlights", "SearchHotels", "GetWeather" },
// Optional: Criteria for AI-powered evaluation
EvaluationCriteria = new[]
{
"Should include flight options",
"Should include hotel recommendations",
"Should consider weather"
},
// Minimum score to pass (0-100)
PassingScore = 70
};
Step 4: Run the Test
Execute the test and capture results:
using AgentEval.Core;
// Run the test - tool tracking and performance metrics are captured automatically
var result = await harness.RunTestAsync(adapter, testCase);
// Check if the test passed
Console.WriteLine($"Passed: {result.Passed}");
Console.WriteLine($"Score: {result.Score}");
Console.WriteLine($"Output: {result.ActualOutput}");
Step 5: Assert on Tool Usage
Use fluent assertions to verify the agent used tools correctly:
using AgentEval.Assertions;
// Assert specific tools were called
result.ToolUsage!
.Should()
.HaveCalledTool("SearchFlights")
.HaveCalledTool("SearchHotels")
.HaveCalledTool("GetWeather");
// Assert tool ordering
result.ToolUsage!
.Should()
.HaveCalledTool("GetWeather")
.BeforeTool("SearchFlights"); // Weather check before booking
// Assert tool arguments
result.ToolUsage!
.Should()
.HaveCalledTool("SearchFlights")
.WithArgument("destination", "Paris");
// Assert no errors occurred
result.ToolUsage!
.Should()
.HaveNoErrors();
// Assert call count limits
result.ToolUsage!
.Should()
.HaveTotalCallsLessThan(10); // Efficiency check
Step 6: Assert on Performance
Verify the agent meets performance requirements:
using AgentEval.Assertions;
result.Performance!
.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(30))
.HaveTimeToFirstTokenUnder(TimeSpan.FromSeconds(2))
.HaveTokenCountUnder(4000)
.HaveEstimatedCostUnder(0.10m); // Max $0.10 per request
Step 7: Assert on Response Content
Verify the response contains expected information:
using AgentEval.Assertions;
result.Response
.Should()
.Contain("Paris")
.ContainAny("flight", "airline")
.ContainAny("hotel", "accommodation")
.NotContain("error")
.HaveLengthBetween(100, 5000);
Step 8: Export Results for CI/CD
Export results in formats your CI/CD system understands:
using AgentEval.Exporters;
// JUnit XML for GitHub Actions, Azure DevOps, Jenkins
var junitExporter = new JUnitExporter();
await junitExporter.ExportAsync(new[] { result }, "results.xml");
// Markdown for PR comments
var markdownExporter = new MarkdownExporter();
await markdownExporter.ExportAsync(new[] { result }, "results.md");
// JSON for custom dashboards
var jsonExporter = new JsonExporter();
await jsonExporter.ExportAsync(new[] { result }, "results.json");
Complete Example
Here's the full test in one file:
using AgentEval.MAF;
using AgentEval.Models;
using AgentEval.Core;
using AgentEval.Assertions;
using AgentEval.Exporters;
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
// ═══════════════════════════════════════════════════════════════
// 1. Create your MAF agent with tools
// ═══════════════════════════════════════════════════════════════
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var chatClient = azureClient
.GetChatClient("gpt-4o")
.AsIChatClient();
var myAgent = new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "TravelPlannerAgent",
Instructions = "You are a travel planning assistant. Use tools to search for flights, hotels, and weather.",
Tools = [
AIFunctionFactory.Create(SearchFlights),
AIFunctionFactory.Create(SearchHotels),
AIFunctionFactory.Create(GetWeather)
]
});
// ═══════════════════════════════════════════════════════════════
// 2. Setup test harness and adapter
// ═══════════════════════════════════════════════════════════════
var harness = new MAFTestHarness(verbose: true);
var adapter = new MAFAgentAdapter(myAgent);
// ═══════════════════════════════════════════════════════════════
// 3. Define test case
// ═══════════════════════════════════════════════════════════════
var testCase = new TestCase
{
Name = "Travel Planning Test",
Input = "Plan a trip to Paris for next weekend",
ExpectedTools = new[] { "SearchFlights", "SearchHotels", "GetWeather" },
PassingScore = 70
};
// ═══════════════════════════════════════════════════════════════
// 4. Run test
// ═══════════════════════════════════════════════════════════════
var result = await harness.RunTestAsync(adapter, testCase);
// ═══════════════════════════════════════════════════════════════
// 5. Assert
// ═══════════════════════════════════════════════════════════════
result.ToolUsage!
.Should()
.HaveCalledTool("SearchFlights")
.HaveCalledTool("SearchHotels")
.HaveNoErrors();
result.Performance!
.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(30))
.HaveEstimatedCostUnder(0.10m);
// ═══════════════════════════════════════════════════════════════
// 6. Export results
// ═══════════════════════════════════════════════════════════════
var exporter = new JUnitExporter();
await exporter.ExportAsync(new[] { result }, "results.xml");
Console.WriteLine($"✅ Test {(result.Passed ? "PASSED" : "FAILED")}");
Console.WriteLine($" Output: {result.ActualOutput}");
// ═══════════════════════════════════════════════════════════════
// Tool definitions
// ═══════════════════════════════════════════════════════════════
[Description("Search for available flights")]
static string SearchFlights(
[Description("Destination city")] string destination,
[Description("Departure date")] string date)
{
return $"Found 3 flights to {destination} on {date}: AA123, UA456, DL789";
}
[Description("Search for hotels")]
static string SearchHotels(
[Description("City name")] string city,
[Description("Check-in date")] string checkIn)
{
return $"Found hotels in {city}: Hilton ($200/night), Marriott ($180/night)";
}
[Description("Get weather forecast")]
static string GetWeather(
[Description("City name")] string city)
{
return $"Weather in {city}: Sunny, 72°F";
}
Using with xUnit/NUnit/MSTest
AgentEval integrates naturally with test frameworks:
using AgentEval.MAF;
using AgentEval.Models;
using AgentEval.Assertions;
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Xunit;
public class TravelAgentTests
{
private readonly AIAgent _agent;
public TravelAgentTests()
{
// Setup agent once per test class
var azureClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new Azure.AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!));
var chatClient = azureClient
.GetChatClient("gpt-4o")
.AsIChatClient();
_agent = new ChatClientAgent(
chatClient,
new ChatClientAgentOptions
{
Name = "TravelPlannerAgent",
Instructions = "You are a travel planning assistant.",
Tools = [AIFunctionFactory.Create(SearchFlights)]
});
}
[Fact]
public async Task Agent_ShouldPlanTrip_WithCorrectTools()
{
// Arrange
var harness = new MAFTestHarness();
var adapter = new MAFAgentAdapter(_agent);
var testCase = new TestCase
{
Name = "Travel Planning",
Input = "Plan a trip to Paris",
ExpectedTools = new[] { "SearchFlights" }
};
// Act
var result = await harness.RunTestAsync(adapter, testCase);
// Assert
result.ToolUsage!
.Should()
.HaveCalledTool("SearchFlights")
.HaveNoErrors();
Assert.True(result.Passed);
}
[System.ComponentModel.Description("Search for flights")]
private static string SearchFlights(
[System.ComponentModel.Description("Destination")] string destination)
{
return $"Found flights to {destination}";
}
}
Next Steps
- Architecture - Understand the framework design
- CLI Reference - Run tests from command line
- Benchmarks - Performance testing at scale
- Conversations - Multi-turn testing
- Snapshots - Regression testing