Rich Evaluation Output & Time-Travel Debugging
AgentEval provides a rich evaluation output system that captures detailed execution traces, enabling "time-travel debugging" for your AI agent evaluations. This guide walks you through enabling and using these features step by step.
What You'll Get
After enabling rich evaluation output, you'll have:
- Detailed console output showing tool call timelines, performance metrics, and errors
- JSON trace files that capture the complete execution for post-mortem debugging
- CI/CD integration with automatic artifact collection for failed evaluations
Quick Start (2 Minutes)
Step 1: Set Environment Variables
# PowerShell (Windows)
$env:AGENTEVAL_VERBOSITY = "Detailed"
$env:AGENTEVAL_SAVE_TRACES = "true"
# Bash (Linux/macOS)
export AGENTEVAL_VERBOSITY=Detailed
export AGENTEVAL_SAVE_TRACES=true
Step 2: Run Your Tests
dotnet test
That's it! You'll see rich output in the console and trace files in TestResults/traces/.
Understanding Verbosity Levels
AgentEval offers four verbosity levels, each providing increasing detail:
None
Minimal output - just pass/fail. No trace files are saved.
TestName ✓ PASS
Summary
Basic statistics for quick scanning:
═══ BookingAgent_BooksFlight ═══
Status: ✓ PASS
Summary:
Duration: 1.2s
Tool calls: 3
Tokens: 1,234 in / 567 out
Detailed (Default)
Full tool timeline and performance breakdown:
═══ BookingAgent_BooksFlight ═══
Status: ✓ PASS
Performance:
Total Duration: 1.234s
Time to First Token: 234ms
Total Tokens: 1,234 in / 567 out
Estimated Cost: $0.0234
Tools (3 calls):
1. SearchFlights
Args: {"destination": "Paris", "date": "2026-03-15"}
Result: [{"flightId": "AF123", "price": 450}]
Duration: 234ms
2. GetFlightDetails
Args: {"flightId": "AF123"}
Result: {"airline": "Air France", ...}
Duration: 89ms
3. BookFlight
Args: {"flightId": "AF123", "passenger": "John Doe"}
Result: {"confirmationCode": "XYZ789"}
Duration: 156ms
Full
Everything above plus complete JSON trace:
═══ BookingAgent_BooksFlight ═══
Status: ✓ PASS
[... all the detailed output ...]
Full JSON Trace:
```json
{
"testName": "BookingAgent_BooksFlight",
"passed": true,
"startTime": "2026-01-13T14:30:00Z",
...
}
Step-by-Step: Using AgentEvalTestBase
For the best experience, inherit from AgentEvalTestBase:
Step 1: Create Your Test Class
using AgentEval.Output;
using AgentEval.Models;
using Xunit;
using Xunit.Abstractions;
public class MyAgentTests : AgentEvalTestBase, IDisposable
{
public MyAgentTests(ITestOutputHelper output)
: base(new XUnitTextWriter(output))
{
// Optional: Enable trace saving for ALL tests (not just failures)
// SaveTracesForAllTests = true;
}
[Fact]
public async Task MyAgent_HandlesQuery_Successfully()
{
// Arrange
var agent = new MyAgent();
// Act
var response = await agent.ExecuteAsync("Book a flight to Paris");
// Assert & Record
var result = CreateResult("MyAgent_HandlesQuery")
.WithOutput(response)
.WithToolCall("SearchFlights", "call_1",
new Dictionary<string, object?> { ["destination"] = "Paris" },
"[{\"flightId\": \"AF123\"}]")
.WithToolCall("BookFlight", "call_2")
.WithTokens(inputTokens: 100, outputTokens: 50)
.WithCost(0.0025m)
.Passed(score: 95)
.Build();
RecordResult(result);
// Traditional assertions still work
Assert.Contains("Paris", response);
}
}
Step 2: Create XUnitTextWriter Helper
Since AgentEvalTestBase uses TextWriter (to avoid xUnit dependency in the main library), you need a simple adapter:
using System.Text;
using Xunit.Abstractions;
public class XUnitTextWriter : TextWriter
{
private readonly ITestOutputHelper _output;
private readonly StringBuilder _buffer = new();
public XUnitTextWriter(ITestOutputHelper output) => _output = output;
public override Encoding Encoding => Encoding.UTF8;
public override void Write(char value) => _buffer.Append(value);
public override void WriteLine(string? value)
{
if (_buffer.Length > 0)
{
_output.WriteLine(_buffer.ToString() + value);
_buffer.Clear();
}
else
{
_output.WriteLine(value ?? string.Empty);
}
}
public override void Flush()
{
if (_buffer.Length > 0)
{
_output.WriteLine(_buffer.ToString());
_buffer.Clear();
}
}
}
Step 3: Run and View Output
dotnet test --logger "console;verbosity=detailed"
Step-by-Step: Manual Output Writer
If you prefer not to inherit from a base class:
using AgentEval.Output;
using AgentEval.Models;
[Fact]
public async Task MyAgent_Test()
{
// Create output writer with your settings
var settings = new VerbositySettings
{
Level = VerbosityLevel.Detailed,
IncludeToolArguments = true,
IncludeToolResults = true,
IncludePerformanceMetrics = true,
SaveTraceFiles = true,
TraceOutputDirectory = "TestResults/traces"
};
var output = new StringWriter();
var writer = new EvaluationOutputWriter(settings, output);
// Run your test...
var result = new TestResult
{
TestName = "MyAgentTest",
Passed = true,
Score = 100,
ActualOutput = "Flight booked successfully",
Performance = new PerformanceMetrics
{
StartTime = DateTimeOffset.UtcNow.AddSeconds(-2),
EndTime = DateTimeOffset.UtcNow,
PromptTokens = 100,
CompletionTokens = 50
}
};
// Write output
writer.WriteTestResult(result);
// Output goes to the StringWriter
Console.WriteLine(output.ToString());
}
Step-by-Step: Save and Load Traces
Saving Traces
using AgentEval.Output;
// Create artifact manager
var manager = new TraceArtifactManager("TestResults/traces");
// Save a test result (creates JSON file)
var result = new TestResult { TestName = "MyTest", Passed = true, Score = 100 };
string filePath = manager.SaveTestResult(result);
Console.WriteLine($"Trace saved: {filePath}");
// Output: Trace saved: TestResults/traces/MyTest_20260113_143022_123.json
// Save a full TimeTravelTrace
var trace = new TimeTravelTrace
{
TraceId = Guid.NewGuid().ToString(),
ExecutionType = ExecutionType.SingleAgent,
Test = new EvaluationMetadata
{
TestName = "DetailedTest",
StartTime = DateTimeOffset.UtcNow.AddSeconds(-5),
EndTime = DateTimeOffset.UtcNow,
Passed = true
},
Agents = new List<AgentInfo>
{
new() { AgentId = "agent1", AgentName = "BookingAgent", ModelId = "gpt-4o" }
},
Steps = new List<ExecutionStep>
{
new()
{
StepNumber = 1,
Type = StepType.UserInput,
Timestamp = DateTimeOffset.UtcNow.AddSeconds(-5),
OffsetFromStart = TimeSpan.Zero,
Duration = TimeSpan.FromMilliseconds(10),
Data = new UserInputStepData { Message = "Book a flight to Paris" }
}
},
Summary = new ExecutionSummary
{
Passed = true,
TotalDuration = TimeSpan.FromSeconds(5),
TotalSteps = 1,
ToolCallCount = 0,
ToolErrorCount = 0,
LlmRequestCount = 1
}
};
string tracePath = manager.SaveTrace(trace);
Loading Traces
// Load a specific trace
var loadedTrace = manager.LoadTrace("TestResults/traces/MyTest_20260113_143022_123.json");
Console.WriteLine($"Loaded trace: {loadedTrace.TraceId}");
Console.WriteLine($"Steps: {loadedTrace.Steps.Count}");
// List all traces
foreach (var file in manager.ListTraceFiles())
{
Console.WriteLine(file);
}
// Get most recent trace for a test
var recentTrace = manager.GetMostRecentTrace("MyTest");
if (recentTrace != null)
{
Console.WriteLine($"Most recent: {recentTrace}");
}
// Clean up old traces (older than 7 days)
int deleted = manager.CleanupOldTraces(TimeSpan.FromDays(7));
Console.WriteLine($"Deleted {deleted} old trace files");
Step-by-Step: CI/CD Setup
GitHub Actions
- Update your workflow file (
.github/workflows/test.yml):
name: Tests
on: [push, pull_request]
env:
# Enable rich output for all test runs
AGENTEVAL_VERBOSITY: Detailed
AGENTEVAL_SAVE_TRACES: true
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '9.0.x'
- name: Test
run: dotnet test --logger trx
env:
AZURE_OPENAI_ENDPOINT: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
# Always upload traces, even on success
- name: Upload Traces
uses: actions/upload-artifact@v4
if: always()
with:
name: test-traces
path: TestResults/traces/
retention-days: 7
- View traces after a failed build:
- Go to the failed workflow run
- Click "Artifacts" section
- Download
test-traces - Open the JSON files to see what happened
Azure DevOps
Add to your azure-pipelines.yml:
trigger:
- main
pool:
vmImage: 'ubuntu-latest'
variables:
AGENTEVAL_VERBOSITY: Detailed
AGENTEVAL_SAVE_TRACES: true
steps:
- task: DotNetCoreCLI@2
inputs:
command: test
arguments: '--logger trx'
env:
AZURE_OPENAI_ENDPOINT: $(AZURE_OPENAI_ENDPOINT)
AZURE_OPENAI_API_KEY: $(AZURE_OPENAI_API_KEY)
- task: PublishBuildArtifacts@1
condition: always()
inputs:
pathToPublish: TestResults/traces
artifactName: AgentTraces
Debugging Failed Tests
When a test fails in CI, follow these steps:
1. Download the Trace
From GitHub Actions:
# Using gh CLI
gh run download <run-id> -n test-traces
Or click the download button in the Actions UI.
2. Pretty-Print the JSON
# View with jq
cat TestResults/traces/FailedTest_*.json | jq '.'
# Or use VS Code's built-in JSON viewer
code TestResults/traces/FailedTest_*.json
3. Analyze the Execution
Look for key sections in the trace:
{
"testName": "BookingAgent_ShouldNotDeleteData",
"passed": false,
"steps": [
// Find the step where things went wrong
{
"stepNumber": 5,
"type": "ToolCall",
"data": {
"toolName": "DeleteData", // <- Unexpected tool call!
"arguments": { "scope": "all" }
}
}
],
"summary": {
"passed": false,
"toolCallCount": 3,
"toolErrorCount": 0,
"assertions": [
{
"name": "NeverCallTool(DeleteData)",
"passed": false,
"message": "Tool was called at step 5"
}
]
}
}
4. Reproduce Locally (Optional)
If you need to debug further, the trace tells you exactly what input caused the problem:
// Extract the user input from the trace
var userInput = trace.Steps
.First(s => s.Type == StepType.UserInput)
.Data as UserInputStepData;
Console.WriteLine($"User said: {userInput.Message}");
// "Delete all my old bookings"
// Now run locally with the same input
var agent = new BookingAgent();
var response = await agent.ExecuteAsync(userInput.Message);
Configuration Reference
Environment Variables
| Variable | Type | Default | Description |
|---|---|---|---|
AGENTEVAL_VERBOSITY |
Enum | Detailed |
Output level: None, Summary, Detailed, Full |
AGENTEVAL_SAVE_TRACES |
Boolean | true if not None |
Whether to save trace files |
AGENTEVAL_TRACE_DIR |
String | TestResults/traces |
Directory for trace files |
VerbositySettings Properties
var settings = new VerbositySettings
{
Level = VerbosityLevel.Full, // Verbosity level
IncludeToolArguments = true, // Show tool arguments
IncludeToolResults = true, // Show tool results
IncludePerformanceMetrics = true, // Show perf stats
IncludeConversationHistory = false, // Show full conversation
SaveTraceFiles = true, // Save JSON traces
TraceOutputDirectory = "TestResults/traces" // Where to save
};
Troubleshooting
Traces not being saved
- Check that
AGENTEVAL_SAVE_TRACES=true - Verify the trace directory is writable
- Ensure you're calling
RecordResult()at the end of your tests
Output not appearing in xUnit
Make sure you're:
- Using
ITestOutputHelperproperly - Passing a
TextWritertoAgentEvalTestBase - Running with
--logger "console;verbosity=detailed"
Large trace files
If traces are too large:
- Set
IncludeConversationHistory = false - Use
VerbosityLevel.Summaryfor routine runs - Use
VerbosityLevel.Fullonly for debugging
Environment variables not working
Environment variables are read at startup. If you change them:
- In tests: use
VerbosityConfiguration.SetOverride()instead - In CI: set them in the
env:section of your workflow
See Also
- Export Formats - Complete guide to JSON, JUnit XML, Markdown, TRX, CSV export via
IResultExporter - Step-by-Step Walkthrough - End-to-end evaluation walkthrough including export
- Tracing - Trace Record & Replay patterns
- Assertions - Fluent assertion API