Snapshot Testing
AgentEval provides snapshot testing capabilities for comparing agent responses against saved baselines. This is especially useful for detecting regressions in agent behavior and ensuring consistent responses over time.
Overview
Snapshot testing allows you to:
- Save agent responses as baselines (snapshots)
- Compare new responses against saved snapshots
- Ignore dynamic fields (timestamps, IDs)
- Scrub sensitive or variable data with patterns
- Use semantic similarity for fuzzy matching
- Track changes over time
Quick Start
using AgentEval.Snapshots;
using System.Text.RegularExpressions;
// Configure snapshot comparison
var options = new SnapshotOptions
{
IgnoreFields = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"timestamp", "requestId"
},
ScrubPatterns = new List<(Regex Pattern, string Replacement)>
{
(new Regex(@"\d{4}-\d{2}-\d{2}"), "[DATE]")
}
};
// Compare responses
var comparer = new SnapshotComparer(options);
var result = comparer.Compare(expectedJson, actualJson);
if (result.IsMatch)
{
Console.WriteLine("✅ Response matches snapshot");
}
else
{
Console.WriteLine("❌ Differences found:");
foreach (var diff in result.Differences)
{
Console.WriteLine($" {diff.Path}: {diff.Expected} → {diff.Actual}");
}
}
SnapshotOptions
Configure how snapshots are compared:
using System.Text.RegularExpressions;
var options = new SnapshotOptions
{
// Fields to completely ignore (case-insensitive HashSet)
IgnoreFields = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"timestamp",
"requestId",
"duration",
"elapsed"
},
// Patterns to scrub (Regex, Replacement) tuples
ScrubPatterns = new List<(Regex Pattern, string Replacement)>
{
// Dates
(new Regex(@"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}"), "[DATETIME]"),
(new Regex(@"\d{4}-\d{2}-\d{2}"), "[DATE]"),
// IDs
(new Regex(@"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"), "[GUID]"),
(new Regex(@"id_[a-zA-Z0-9]+"), "[ID]"),
// Secrets
(new Regex(@"sk-[a-zA-Z0-9]+"), "[API_KEY]")
},
// Enable semantic similarity comparison for text fields
UseSemanticComparison = true,
// Fields to compare semantically (case-insensitive HashSet)
SemanticFields = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"response",
"content",
"summary"
},
// Similarity threshold for semantic comparison (0.0 - 1.0)
SemanticThreshold = 0.85
};
SnapshotComparer
The SnapshotComparer performs JSON comparison with configurable options:
var comparer = new SnapshotComparer(options);
// Compare JSON strings
var result = comparer.Compare(expectedJson, actualJson);
// Access results
Console.WriteLine($"Match: {result.IsMatch}");
Console.WriteLine($"Differences: {result.Differences.Count}");
Console.WriteLine($"Ignored Fields: {result.IgnoredFields.Count}");
Console.WriteLine($"Semantic Results: {result.SemanticResults.Count}");
// Apply scrubbing to a value
var scrubbed = comparer.ApplyScrubbing(rawValue);
Comparison Result
public class SnapshotComparisonResult
{
// Whether the snapshots match
public bool IsMatch { get; set; }
// List of differences found
public List<SnapshotDifference> Differences { get; set; }
// Fields that were ignored during comparison
public List<string> IgnoredFields { get; set; }
// Results of semantic comparisons
public List<SemanticComparisonResult> SemanticResults { get; set; }
}
public record SnapshotDifference(
string Path, // JSON path to the difference
string Expected, // Expected value
string Actual, // Actual value
string Message // Description of the difference
);
public record SemanticComparisonResult(
string Path, // JSON path
string Expected, // Expected value
string Actual, // Actual value
double Similarity, // Computed similarity score
bool Passed // Whether it met the threshold
);
SnapshotStore
Persist and retrieve snapshots from disk:
var store = new SnapshotStore("./snapshots");
// Save a snapshot (async)
var response = await agent.GetResponseAsync("What is 2+2?");
await store.SaveAsync("math-test", response);
// Save with a suffix for variants
await store.SaveAsync("math-test", response, "v2");
// Load a snapshot (async)
var baseline = await store.LoadAsync<MyResponseType>("math-test");
// Load with suffix
var baselineV2 = await store.LoadAsync<MyResponseType>("math-test", "v2");
// Check if snapshot exists
if (store.Exists("math-test"))
{
var baseline = await store.LoadAsync<MyResponseType>("math-test");
var result = comparer.Compare(
JsonSerializer.Serialize(baseline),
JsonSerializer.Serialize(newResponse));
}
// Get the file path for a snapshot
var path = store.GetSnapshotPath("math-test");
var pathWithSuffix = store.GetSnapshotPath("math-test", "v2");
File Structure
Snapshots are stored as JSON files:
./snapshots/
├── math-test.json
├── math-test.v2.json
├── booking-flow.json
└── error-handling.json
Usage in Tests
Basic Snapshot Test
[Fact]
public async Task Agent_Response_MatchesSnapshot()
{
var store = new SnapshotStore("./snapshots");
var comparer = new SnapshotComparer(new SnapshotOptions
{
IgnoreFields = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"timestamp"
}
});
var response = await _agent.GetResponseAsync("What is the capital of France?");
var responseJson = JsonSerializer.Serialize(response);
if (!store.Exists("capital-france"))
{
// First run - save the snapshot
await store.SaveAsync("capital-france", response);
Assert.True(true, "Snapshot created");
return;
}
var baselineJson = await File.ReadAllTextAsync(store.GetSnapshotPath("capital-france"));
var result = comparer.Compare(baselineJson, responseJson);
Assert.True(result.IsMatch,
$"Response differs from snapshot:\n{string.Join("\n", result.Differences.Select(d => $"{d.Path}: {d.Message}"))}");
}
Update Snapshots Programmatically
[Fact]
public async Task Agent_Response_UpdateSnapshot()
{
var updateSnapshots = Environment.GetEnvironmentVariable("UPDATE_SNAPSHOTS") == "true";
var store = new SnapshotStore("./snapshots");
var response = await _agent.GetResponseAsync("...");
if (updateSnapshots)
{
await store.SaveAsync("my-test", response);
Assert.True(true, "Snapshot updated");
return;
}
// Normal comparison
var baselineJson = await File.ReadAllTextAsync(store.GetSnapshotPath("my-test"));
var responseJson = JsonSerializer.Serialize(response);
var result = new SnapshotComparer().Compare(baselineJson, responseJson);
Assert.True(result.IsMatch);
}
Run with: UPDATE_SNAPSHOTS=true dotnet test
Semantic Comparison
For fields where exact matching is too strict, use semantic comparison:
var options = new SnapshotOptions
{
UseSemanticComparison = true,
SemanticFields = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"response", "summary", "explanation"
},
SemanticThreshold = 0.7 // 70% similarity required
};
var comparer = new SnapshotComparer(options);
// These would match semantically:
// Expected: "The capital of France is Paris"
// Actual: "Paris is the capital city of France"
The semantic comparison uses Jaccard similarity on word sets, which works well for:
- Rephrased sentences
- Different word order
- Minor wording changes
Integration with Verify.Xunit
AgentEval also supports the popular Verify library for more advanced snapshot testing:
using VerifyXunit;
[UsesVerify]
public class AgentSnapshotTests
{
[Fact]
public async Task Response_MatchesVerifySnapshot()
{
var response = await _agent.GetResponseAsync("What is 2+2?");
await Verify(response)
.ScrubMember("timestamp")
.ScrubMember("requestId");
}
}
Best Practices
- Ignore volatile fields - Always ignore timestamps, request IDs, and other dynamic data
- Scrub secrets - Use patterns to replace API keys, tokens, and sensitive data
- Use semantic matching for natural language - Exact matching is too brittle for LLM outputs
- Version your snapshots - Commit snapshot files to source control
- Review snapshot updates - Don't blindly update; verify changes are intentional
- Organize by feature - Use descriptive names and folder structure
- Set appropriate thresholds - Start with 0.8 similarity and adjust based on your needs
Common Patterns
Scrubbing Dynamic Data
using System.Text.RegularExpressions;
var options = new SnapshotOptions
{
ScrubPatterns = new List<(Regex Pattern, string Replacement)>
{
// ISO timestamps
(new Regex(@"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})?"), "[TIMESTAMP]"),
// GUIDs
(new Regex(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"), "[GUID]"),
// Email addresses
(new Regex(@"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"), "[EMAIL]"),
// Phone numbers
(new Regex(@"\+?\d{1,3}[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"), "[PHONE]"),
// IP addresses
(new Regex(@"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"), "[IP]")
}
};
Testing Multiple Response Formats
[Theory]
[InlineData("json")]
[InlineData("markdown")]
[InlineData("plain")]
public async Task Response_Format_MatchesSnapshot(string format)
{
var store = new SnapshotStore("./snapshots");
var response = await _agent.GetResponseAsync($"Format: {format}");
var snapshotName = $"format-{format}";
// ... compare with format-specific snapshot
}
See Also
- CLI Reference - Running snapshot tests from command line
- Conversations - Snapshot testing multi-turn conversations
- Extensibility - Custom snapshot comparers