Table of Contents

Trace Record & Replay

AgentEval provides powerful Record & Replay capabilities that enable deterministic evaluation of AI agents. This "time-travel debugging" feature allows you to capture agent executions once and replay them infinitely without calling the underlying LLM.

Why Record & Replay?

Benefit Description
Deterministic Evaluation Replay produces identical responses every time
Cost Reduction No LLM API calls during replay
Speed Instant replay vs. network latency
CI/CD Integration Reliable tests without API credentials
Regression Evaluation Detect behavior changes over time
Debugging Inspect and analyze past executions

Core Components

Component Description
TraceRecordingAgent Wraps an agent to capture executions (agent boundary)
TraceRecordingChatClient Wraps an IChatClient to capture every LLM round-trip (chat boundary, Glass Box)
TraceReplayingAgent Replays recorded traces deterministically
ChatTraceRecorder Records multi-turn conversations
WorkflowTraceRecorder Records multi-agent workflow orchestrations
WorkflowTraceReplayingAgent Replays workflow traces
TraceSerializer Saves/loads AgentTrace to/from JSON
WorkflowTraceSerializer Saves/loads WorkflowTrace to/from JSON

Two recording layers (Glass Box)

AgentEval records at two layers, and they are complementary — not alternatives:

ChatTraceRecorder TraceRecordingChatClient (Glass Box)
Wraps An agent / chat agent An IChatClient
Records at The application conversation level The model round-trip level
Granularity One entry per user turn you submit (AddUserTurnAsync) One entry per LLM call — including the agent's internal turns the user never issued
Sees internal retries / hidden turns No Yes
Sees tool schemas sent to the model No Yes (per turn, with finish reason, request options, provider metadata)
Primary use Replay a scripted conversation Forensic / compliance evidence and Trace Fidelity

ChatTraceRecorder answers "replay this conversation"; TraceRecordingChatClient answers "what did the model actually see on every call." Chat-boundary entries carry Scope = TraceEntryScope.ChatTurn in the v1.1 trace schema.

⚠️ Composition is load-bearing. MEAI's ChatClientBuilder makes the first .Use(...) the outermost layer, and FunctionInvokingChatClient (FICC) calls its inner client once per round-trip. To capture every round-trip, place UseTraceRecording inner of UseFunctionInvocation (i.e. after it in the chain). Placing it outer records only one entry for the whole tool loop.

using AgentEval.Tracing;

var trace = new AgentTrace();
var traced = rawChatClient
    .AsBuilder()
    .UseFunctionInvocation()                       // the tool loop
    .UseTraceRecording("planner", trace)           // INNER of FICC → one entry per real model round-trip
    .Build();

// Optional: group all entries of one invocation under a correlation id.
using (new ToolCorrelationScope("invocation-1"))
{
    var response = await traced.GetResponseAsync(messages, options);
}

await TraceSerializer.SaveToFileAsync(trace, "chat-boundary.trace.json");

Quick Start

Single-Agent Record & Replay

using AgentEval.Tracing;

// RECORD: Capture the agent execution
var realAgent = new MyToolAgent(chatClient);
await using var recorder = new TraceRecordingAgent(realAgent, "weather_query");

var response = await recorder.InvokeAsync("What's the weather in Seattle?");
var trace = recorder.Trace;

// Save for later use
await TraceSerializer.SaveToFileAsync(trace, "weather-trace.json");

// REPLAY: Deterministic playback
var replayer = new TraceReplayingAgent(trace);
var replayed = await replayer.InvokeAsync("What's the weather in Seattle?");

// Response is IDENTICAL every time
Assert.Equal(response.Text, replayed.Text);

Multi-Turn Chat Recording

using AgentEval.Tracing;

// RECORD: Multi-turn conversation
await using var chatRecorder = new ChatTraceRecorder(chatAgent, "travel_conv");

var r1 = await chatRecorder.AddUserTurnAsync("Hello, what can you help me with?");
var r2 = await chatRecorder.AddUserTurnAsync("Book a flight to Paris");
var r3 = await chatRecorder.AddUserTurnAsync("Book the first option");

// Get results and trace
var chatResult = chatRecorder.GetResult();
Console.WriteLine($"Recorded {chatResult.TotalTurnCount} turns");

var trace = chatRecorder.ToAgentTrace();

// REPLAY: Deterministic conversation replay
var replayer = new TraceReplayingAgent(trace);
while (!replayer.IsComplete)
{
    var replayed = await replayer.InvokeAsync("next turn");
    // Each response matches the original
}

Workflow Recording

using AgentEval.Tracing;

// RECORD: Multi-agent workflow
await using var workflowRecorder = new WorkflowTraceRecorder(workflowAdapter, "travel-booking-workflow");
var result = await workflowRecorder.ExecuteWorkflowAsync("Plan trip to Paris");

// Examine recorded steps
foreach (var step in result.Steps)
{
    Console.WriteLine($"Step: {step.ExecutorId} ({step.Duration.TotalSeconds:F1}s)");
}

// Save workflow trace
var workflowTrace = workflowRecorder.Trace;
await WorkflowTraceSerializer.SaveToFileAsync(workflowTrace, "workflow-trace.json");

// REPLAY: Deterministic workflow replay
var loaded = await WorkflowTraceSerializer.LoadFromFileAsync("workflow-trace.json");
var workflowReplayer = new WorkflowTraceReplayingAgent(loaded);
var replayResult = await workflowReplayer.ExecuteWorkflowAsync("Plan trip to Paris");
foreach (var step in replayResult.Steps)
{
    Console.WriteLine($"Step: {step.ExecutorId} -> {step.Output}");
}

Streaming Support

Recording and replaying streaming responses:

// RECORD: Streaming execution
await using var streamRecorder = new TraceRecordingAgent(streamingAgent, "story_stream");

var chunks = new List<string>();
await foreach (var chunk in streamRecorder.InvokeStreamingAsync("Tell me a story"))
{
    chunks.Add(chunk.Text);
}
var trace = streamRecorder.Trace;

// REPLAY: Chunks are replayed in order
var replayer = new TraceReplayingAgent(trace);
var replayedChunks = new List<string>();
await foreach (var chunk in replayer.InvokeStreamingAsync("Tell me a story"))
{
    replayedChunks.Add(chunk.Text);
}

// Same chunks in same order
Assert.Equal(chunks.Count, replayedChunks.Count);

Trace Model

AgentTrace

The AgentTrace class captures a single agent's execution:

public class AgentTrace
{
    public string Version { get; set; }
    public string TraceName { get; set; }
    public DateTimeOffset CapturedAt { get; set; }
    public string? AgentName { get; set; }
    public string? ModelId { get; set; }
    public List<TraceEntry> Entries { get; set; }
    public TracePerformance? Performance { get; set; }
    public Dictionary<string, object>? Metadata { get; set; }
}

TraceEntry

Each entry represents a request or response:

public class TraceEntry
{
    public TraceEntryType Type { get; set; }    // Request or Response
    public int Index { get; set; }               // Matches request to response
    public string? Prompt { get; set; }          // For requests
    public string? Text { get; set; }            // For responses
    public long? DurationMs { get; set; }
    public TraceTokenUsage? TokenUsage { get; set; }
    public List<TraceToolCall>? ToolCalls { get; set; }
    public TraceError? Error { get; set; }
    public bool IsStreaming { get; set; }
    public List<TraceStreamChunk>? StreamingChunks { get; set; }
}

WorkflowTrace

For multi-agent workflows:

public class WorkflowTrace
{
    public string Version { get; set; }
    public string TraceName { get; set; }
    public DateTimeOffset CapturedAt { get; set; }
    public string? WorkflowType { get; set; }
    public string? OriginalPrompt { get; set; }
    public string? FinalOutput { get; set; }
    public List<string> ExecutorIds { get; set; }
    public List<WorkflowTraceStep> Steps { get; set; }
    public WorkflowTracePerformance? Performance { get; set; }
}

Serialization

Save and Load Traces

// Save trace to JSON
await TraceSerializer.SaveToFileAsync(trace, "trace.json");
await WorkflowTraceSerializer.SaveToFileAsync(workflowTrace, "workflow.json");

// Load trace from JSON
var loadedTrace = await TraceSerializer.LoadFromFileAsync("trace.json");
var loadedWorkflow = await WorkflowTraceSerializer.LoadFromFileAsync("workflow.json");

JSON Format

Traces are stored in human-readable JSON:

{
  "version": "1.0",
  "traceName": "weather_query",
  "capturedAt": "2026-01-08T12:00:00Z",
  "agentName": "WeatherAgent",
  "entries": [
    {
      "type": "Request",
      "index": 0,
      "prompt": "What's the weather in Seattle?"
    },
    {
      "type": "Response",
      "index": 0,
      "text": "The weather in Seattle is currently 52°F with light rain.",
      "durationMs": 1234,
      "tokenUsage": {
        "promptTokens": 15,
        "completionTokens": 20
      },
      "toolCalls": [
        {
          "name": "GetWeather",
          "result": "{\"temp\": 52, \"condition\": \"rain\"}",
          "succeeded": true
        }
      ]
    }
  ],
  "performance": {
    "totalDurationMs": 1234,
    "totalPromptTokens": 15,
    "totalCompletionTokens": 20,
    "callCount": 1
  }
}

Best Practices

1. Test Organization

Store traces alongside your tests:

tests/
├── traces/
│   ├── weather-agent.json
│   ├── booking-workflow.json
│   └── chat-session.json
├── AgentTests.cs
└── WorkflowTests.cs

2. Version Control

Commit traces to source control for regression evaluation:

git add tests/traces/*.json
git commit -m "Add baseline traces for v1.0"

3. CI/CD Integration

Use replay in CI pipelines without API credentials:

- name: Run Agent Tests
  run: dotnet test --filter "Category=Replay"
  # No AZURE_OPENAI_API_KEY needed!

4. Golden Master Evaluation

Compare new responses against recorded "golden" responses:

var goldenTrace = await TraceSerializer.LoadFromFileAsync("golden-weather.json");
var currentAgent = new WeatherAgent(client);
await using var currentRecorder = new TraceRecordingAgent(currentAgent, "golden_test");

var currentResponse = await currentRecorder.InvokeAsync("Weather in Seattle?");
var goldenResponse = goldenTrace.Entries.First(e => e.Type == TraceEntryType.Response).Text;

Assert.Equal(goldenResponse, currentResponse);

5. Performance Baseline

Compare performance metrics over time:

var oldTrace = await TraceSerializer.LoadFromFileAsync("baseline.json");
var newTrace = recorder.Trace;

var oldDuration = oldTrace.Entries.First(e => e.Type == TraceEntryType.Response).DurationMs;
var newDuration = newTrace.Entries.First(e => e.Type == TraceEntryType.Response).DurationMs;

Assert.True(newDuration <= oldDuration * 1.1, 
    $"Performance regression: {newDuration}ms > 110% of {oldDuration}ms");

See Sample 13 for runnable examples of single-agent, multi-turn, workflow, and streaming traces.