Workflow Testing
Comprehensive guide to testing multi-agent workflows with AgentEval
AgentEval provides first-class support for testing multi-agent workflows, including graph-based execution, conditional routing, parallel branches, and edge traversal assertions.
Overview
Modern AI applications often orchestrate multiple agents in complex workflows:
- Sequential chains: One agent's output feeds the next
- Conditional routing: Different agents handle different scenarios
- Parallel execution: Multiple agents work simultaneously
- Switch patterns: Route to specific handlers based on classification
AgentEval captures the full execution graph, enabling you to:
- Assert on which edges were traversed
- Verify routing decisions
- Test parallel branch completion
- Replay and visualize workflow execution
Quick Start
using AgentEval.MAF;
using AgentEval.Models;
using AgentEval.Assertions;
// Create a workflow adapter
var adapter = MAFWorkflowAdapter.FromSteps(
"support-workflow",
("classifier", "This is a billing inquiry"),
("billing-handler", "I'll help with your billing question"),
("response-formatter", "Here's your formatted response"));
// Execute the workflow
var result = await adapter.ExecuteWorkflowAsync("I need help with my bill");
// Assert on execution
result.Should()
.HaveStepCount(3)
.HaveExecutedStep("classifier")
.HaveExecutedStep("billing-handler")
.HaveTraversedEdge("classifier", "billing-handler");
Core Concepts
Workflow Execution Result
The WorkflowExecutionResult captures everything about a workflow run:
public record WorkflowExecutionResult
{
// Final aggregated output
public required string FinalOutput { get; init; }
// Individual steps executed
public required IReadOnlyList<ExecutorStep> Steps { get; init; }
// Total execution time
public TimeSpan TotalDuration { get; init; }
// Graph structure and traversed edges
public WorkflowGraphSnapshot? Graph { get; init; }
// Routing decisions made during execution
public IReadOnlyList<RoutingDecision>? RoutingDecisions { get; init; }
// Helper properties
public bool HasConditionalRouting { get; }
public bool HasParallelExecution { get; }
public IEnumerable<string> GetExecutionPath() { }
}
Executor Steps
Each step in the workflow is captured as an ExecutorStep:
public record ExecutorStep
{
// Identification
public required string ExecutorId { get; init; }
public string? ExecutorName { get; init; }
// Output
public required string Output { get; init; }
// Timing
public TimeSpan StartOffset { get; init; }
public TimeSpan Duration { get; init; }
public int StepIndex { get; init; }
// Edge information
public EdgeExecution? IncomingEdge { get; init; }
public IReadOnlyList<EdgeExecution>? OutgoingEdges { get; init; }
// Parallel execution
public string? ParallelBranchId { get; init; }
public bool IsParallelBranch { get; }
public bool WasConditionallyRouted { get; }
// Tool tracking
public IReadOnlyList<ToolCallRecord>? ToolCalls { get; init; }
}
Edge Types
AgentEval supports 8 different edge types:
| Edge Type | Description | Use Case |
|---|---|---|
Sequential |
Direct linear flow | A → B → C |
Conditional |
Flow based on condition evaluation | If approved → proceed |
Switch |
Route to one of many targets based on value | Classify → handler |
ParallelFanOut |
Split to multiple parallel branches | Orchestrator → workers |
ParallelFanIn |
Merge parallel branches | Workers → aggregator |
Loop |
Return to previous step | Retry logic |
Error |
Error handling path | Handler → error-handler |
Terminal |
Exit the workflow | Final step |
Creating Workflow Adapters
From Predefined Steps
The simplest way to create a testable workflow:
// Sequential workflow with 3 steps
var adapter = MAFWorkflowAdapter.FromSteps(
"my-workflow",
("step-1", "output from step 1"),
("step-2", "output from step 2"),
("step-3", "final output"));
With Conditional Edges
For workflows with routing logic:
var adapter = MAFWorkflowAdapter.FromConditionalSteps(
"conditional-workflow",
steps: [
("classifier", "billing"),
("billing-handler", "Handled billing request"),
("tech-handler", "Handled tech request")
],
edges: [
("classifier", "billing-handler", EdgeType.Conditional, "output.Contains('billing')"),
("classifier", "tech-handler", EdgeType.Conditional, "output.Contains('tech')")
]);
With Predefined Graph
For complex workflows with full graph control:
var graph = new WorkflowGraphSnapshot
{
Nodes = [
new WorkflowNode { NodeId = "router", IsEntryPoint = true },
new WorkflowNode { NodeId = "handler-a" },
new WorkflowNode { NodeId = "handler-b" },
new WorkflowNode { NodeId = "merger", IsExitNode = true }
],
Edges = [
new WorkflowEdge {
EdgeId = "e1",
SourceExecutorId = "router",
TargetExecutorId = "handler-a",
EdgeType = EdgeType.ParallelFanOut
},
new WorkflowEdge {
EdgeId = "e2",
SourceExecutorId = "router",
TargetExecutorId = "handler-b",
EdgeType = EdgeType.ParallelFanOut
},
new WorkflowEdge {
EdgeId = "e3",
SourceExecutorId = "handler-a",
TargetExecutorId = "merger",
EdgeType = EdgeType.ParallelFanIn
},
new WorkflowEdge {
EdgeId = "e4",
SourceExecutorId = "handler-b",
TargetExecutorId = "merger",
EdgeType = EdgeType.ParallelFanIn
}
],
EntryNodeId = "router",
ExitNodeIds = ["merger"]
};
var adapter = MAFWorkflowAdapter.WithGraph(
"parallel-workflow",
graph,
MyWorkflowExecutor);
Custom Workflow Executor
For full control, provide a custom executor function:
static async IAsyncEnumerable<WorkflowEvent> MyWorkflowExecutor(
string prompt,
[EnumeratorCancellation] CancellationToken ct)
{
// Emit events as the workflow executes
yield return new ExecutorOutputEvent("step-1", "processing...");
yield return new EdgeTraversedEvent("step-1", "step-2", EdgeType.Sequential);
yield return new ExecutorOutputEvent("step-2", "completed");
yield return new WorkflowCompleteEvent();
}
Workflow Events
The workflow adapter recognizes these event types:
| Event | Purpose |
|---|---|
ExecutorOutputEvent |
Step produced output |
EdgeTraversedEvent |
Edge was traversed |
RoutingDecisionEvent |
Routing decision was made |
ParallelBranchStartEvent |
Parallel branch started |
ParallelBranchEndEvent |
Parallel branch completed |
WorkflowCompleteEvent |
Workflow finished |
WorkflowErrorEvent |
Error occurred |
Event Examples
// Simple output
yield return new ExecutorOutputEvent("agent-id", "output text");
// Conditional edge traversal
yield return new EdgeTraversedEvent(
sourceExecutorId: "router",
targetExecutorId: "handler",
edgeType: EdgeType.Conditional,
conditionResult: true,
routingReason: "Output matched billing pattern");
// Routing decision (switch pattern)
yield return new RoutingDecisionEvent(
deciderExecutorId: "classifier",
possibleEdgeIds: ["billing", "tech", "general"],
selectedEdgeId: "billing",
evaluatedValue: "billing inquiry",
selectionReason: "Matched billing keywords");
// Parallel execution
yield return new ParallelBranchStartEvent("branch-1", ["worker-a", "worker-b"]);
yield return new ExecutorOutputEvent("worker-a", "result A");
yield return new ExecutorOutputEvent("worker-b", "result B");
yield return new ParallelBranchEndEvent(
branchId: "branch-1",
executorIds: ["worker-a", "worker-b"],
startTime: TimeSpan.Zero,
endTime: TimeSpan.FromSeconds(2),
isSuccess: true,
output: "merged results");
Workflow Assertions
Basic Step Assertions
result.Should()
.HaveStepCount(3)
.HaveExecutedStep("classifier")
.HaveExecutedStep("handler")
.HaveNoErrors();
Edge Assertions
result.Should()
.HaveGraphStructure()
.HaveTraversedEdge("classifier", "billing-handler")
.NotHaveTraversedEdge("classifier", "tech-handler");
Conditional Routing Assertions
result.Should()
.HaveConditionalRouting()
.HaveRoutingDecision("classifier", "edge-billing");
Parallel Execution Assertions
result.Should()
.HaveParallelExecution()
.HaveParallelBranch("branch-1")
.HaveCompletedAllParallelBranches();
Edge-Level Assertions
For detailed edge verification:
result.Should()
.ForEdge("classifier", "billing-handler")
.BeOfType(EdgeType.Conditional)
.HaveConditionResult(true)
.HaveRoutingReason("matched billing keywords")
.And()
.ForEdge("billing-handler", "response-formatter")
.BeOfType(EdgeType.Sequential);
Step-Level Assertions
result.Should()
.ForStep("classifier")
.HaveOutput("billing inquiry")
.HaveDurationUnder(TimeSpan.FromSeconds(5))
.NotBeParallelBranch()
.And()
.ForStep("parallel-worker")
.BeInParallelBranch("branch-1")
.HaveToolCall("ProcessData");
Execution Path Assertions
result.Should()
.HaveExecutionPath(["classifier", "billing-handler", "formatter"])
.HaveExecutionPathContaining("billing-handler");
JSON Export for Visualization
AgentEval can export workflow execution to JSON for visualization tools:
using AgentEval.Models.Serialization;
// Execute workflow
var result = await adapter.ExecuteWorkflowAsync("test prompt");
// Export for visualization
var json = WorkflowSerializer.ToJson(result);
File.WriteAllText("workflow-trace.json", json);
// Export just the graph structure
var graphJson = WorkflowSerializer.ToGraphJson(result);
// Export for Mermaid diagram generation
var mermaid = WorkflowSerializer.ToMermaid(result);
JSON Structure
The exported JSON includes both the static graph structure and the dynamic execution trace:
{
"workflowId": "support-workflow",
"executedAt": "2026-01-05T10:30:00Z",
"totalDuration": "00:00:02.345",
"finalOutput": "Here's your formatted response",
"graph": {
"nodes": [
{ "nodeId": "classifier", "isEntryPoint": true },
{ "nodeId": "billing-handler" },
{ "nodeId": "response-formatter", "isExitNode": true }
],
"edges": [
{ "edgeId": "e1", "source": "classifier", "target": "billing-handler", "type": "Conditional" },
{ "edgeId": "e2", "source": "billing-handler", "target": "response-formatter", "type": "Sequential" }
]
},
"executionTrace": {
"steps": [
{
"stepIndex": 0,
"executorId": "classifier",
"output": "This is a billing inquiry",
"startOffset": "00:00:00",
"duration": "00:00:00.500",
"incomingEdge": null
},
{
"stepIndex": 1,
"executorId": "billing-handler",
"output": "I'll help with your billing question",
"startOffset": "00:00:00.500",
"duration": "00:00:01.200",
"incomingEdge": {
"source": "classifier",
"target": "billing-handler",
"type": "Conditional",
"conditionResult": true,
"routingReason": "matched billing keywords"
}
}
],
"traversedEdges": [
{
"source": "classifier",
"target": "billing-handler",
"traversedAt": "00:00:00.500",
"type": "Conditional"
}
],
"routingDecisions": [
{
"decider": "classifier",
"possibleEdges": ["billing-handler", "tech-handler", "general-handler"],
"selectedEdge": "billing-handler",
"reason": "matched billing keywords"
}
]
}
}
Mermaid Diagram Export
Generate flowchart diagrams:
var mermaid = WorkflowSerializer.ToMermaid(result);
// Output:
// ```mermaid
// graph TD
// classifier([classifier])
// billing-handler([billing-handler])
// response-formatter([response-formatter])
//
// classifier -->|Conditional| billing-handler
// billing-handler --> response-formatter
//
// classDef executed fill:#90EE90
// class classifier,billing-handler,response-formatter executed
// ```
Time-Travel Debugging
The execution trace captures timing for every step and edge, enabling replay:
// Get execution timeline
foreach (var step in result.Steps.OrderBy(s => s.StartOffset))
{
Console.WriteLine($"[{step.StartOffset}] {step.ExecutorId} started");
Console.WriteLine($"[{step.StartOffset + step.Duration}] {step.ExecutorId} completed");
}
// Replay edges
foreach (var edge in result.Graph?.TraversedEdges ?? [])
{
Console.WriteLine($"[{edge.TraversedAt}] {edge.SourceExecutorId} → {edge.TargetExecutorId}");
}
Testing Patterns
Test Sequential Workflow
[Fact]
public async Task SequentialWorkflow_Should_ExecuteInOrder()
{
var adapter = MAFWorkflowAdapter.FromSteps(
"pipeline",
("extract", "data extracted"),
("transform", "data transformed"),
("load", "data loaded"));
var result = await adapter.ExecuteWorkflowAsync("process data");
result.Should()
.HaveStepCount(3)
.HaveExecutionPath(["extract", "transform", "load"])
.HaveTraversedEdge("extract", "transform")
.HaveTraversedEdge("transform", "load");
}
Test Conditional Routing
[Fact]
public async Task ConditionalWorkflow_Should_RouteCorrectly()
{
var adapter = MAFWorkflowAdapter.FromConditionalSteps(
"router",
[("classifier", "billing"), ("billing-agent", "handled")],
[("classifier", "billing-agent", EdgeType.Conditional, null)]);
var result = await adapter.ExecuteWorkflowAsync("I need billing help");
result.Should()
.HaveConditionalRouting()
.HaveExecutedStep("billing-agent")
.ForEdge("classifier", "billing-agent")
.BeOfType(EdgeType.Conditional)
.HaveConditionResult(true);
}
Test Parallel Execution
[Fact]
public async Task ParallelWorkflow_Should_ExecuteConcurrently()
{
var adapter = new MAFWorkflowAdapter("parallel-test", EmitParallelEvents);
var result = await adapter.ExecuteWorkflowAsync("parallel task");
result.Should()
.HaveParallelExecution()
.HaveParallelBranch("branch-1")
.HaveCompletedAllParallelBranches();
var branch = result.Graph?.ParallelBranches?.First();
Assert.Contains("worker-a", branch!.ExecutorIds);
Assert.Contains("worker-b", branch.ExecutorIds);
}
Test Error Handling
[Fact]
public async Task Workflow_Should_CaptureErrors()
{
var adapter = new MAFWorkflowAdapter("error-test", EmitErrorEvents);
var result = await adapter.ExecuteWorkflowAsync("will fail");
Assert.False(result.IsSuccess);
Assert.NotNull(result.Errors);
Assert.Contains(result.Errors, e => e.ExecutorId == "failing-step");
}
Best Practices
1. Use Descriptive Executor IDs
// Good
("intent-classifier", "...")
("billing-inquiry-handler", "...")
// Avoid
("step1", "...")
("agent", "...")
2. Test Edge Types Explicitly
result.Should()
.ForEdge("classifier", "handler")
.BeOfType(EdgeType.Conditional); // Be specific about edge type
3. Assert on Timing for Performance
result.Should()
.ForStep("slow-step")
.HaveDurationUnder(TimeSpan.FromSeconds(5));
Assert.True(result.TotalDuration < TimeSpan.FromSeconds(30));
4. Use Parallel Assertions Carefully
// Verify all parallel branches completed
result.Should().HaveCompletedAllParallelBranches();
// Check specific branch
result.Should()
.HaveParallelBranch("branch-1")
.ForStep("worker-a")
.BeInParallelBranch("branch-1");
5. Export for Debugging
// In test setup/teardown
[Fact]
public async Task ComplexWorkflow_Test()
{
var result = await adapter.ExecuteWorkflowAsync("test");
// Export on failure for debugging
if (!result.IsSuccess)
{
var json = WorkflowSerializer.ToJson(result);
File.WriteAllText($"failed-workflow-{DateTime.Now:yyyyMMdd-HHmmss}.json", json);
}
result.Should().HaveNoErrors();
}
Integration with CI/CD
Export workflow results for CI/CD visibility:
// In your test setup
var result = await adapter.ExecuteWorkflowAsync(prompt);
// Export as JUnit-compatible artifact
var report = new EvaluationReport
{
Name = "Workflow Tests",
Results = [new TestResult {
Name = "support-workflow",
Passed = result.IsSuccess,
Duration = result.TotalDuration
}]
};
await new JUnitXmlExporter().ExportAsync(report, outputStream);
See Also
- Architecture Overview
- Benchmarks - Including BFCL workflow testing
- CLI Usage - Running workflow tests from command line
- Conversations - Multi-turn conversation testing