ADR-012: Workflow Assertion Design
Status
✅ Accepted - February 14, 2026
Context
Workflow evaluation requires a rich assertion API that can validate complex multi-agent execution patterns. Unlike simple agent evaluations that focus on single inputs/outputs, workflows involve:
- Sequential Execution: Validating that agents execute in the correct order
- Graph Structure: Asserting on workflow topology and edge traversal
- Per-Executor Validation: Individual agent performance within the workflow context
- Tool Chain Evaluation: Tool usage patterns across multiple agents
- Timing and Performance: Execution timing, costs, and resource constraints
- Error Propagation: How errors flow through the workflow pipeline
Design Challenges
- Complexity: Workflows have many validation dimensions (structure, timing, tools, outputs)
- Readability: Assertions should read naturally despite complex validation logic
- Composability: Multiple assertion types must work together seamlessly
- Error Messages: Clear failure messages with actionable suggestions
- Performance: Assertions shouldn't significantly impact evaluation performance
Existing Patterns
AgentEval already has successful fluent assertion patterns:
// Tool usage assertions (proven pattern)
result.ToolUsage!.Should()
.HaveCalledTool("SearchFlights")
.BeforeTool("BookFlight")
.WithoutError();
// Performance assertions (proven pattern)
result.Performance!.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
.HaveEstimatedCostUnder(0.05m);
Challenge: Extend this pattern to workflow complexity without losing readability.
Decision
Design a hierarchical fluent assertion system that mirrors workflow structure while maintaining AgentEval's existing assertion patterns.
Architecture: Hierarchical Assertions
Level 1: Workflow-Level Assertions
// Entry point - validates overall workflow execution
public static WorkflowResultAssertions Should(this WorkflowExecutionResult result)
{
return new WorkflowResultAssertions(result);
}
public class WorkflowResultAssertions
{
public WorkflowResultAssertions HaveStepCount(int expectedCount, string? because = null)
public WorkflowResultAssertions HaveExecutedInOrder(params string[] executorIds)
public WorkflowResultAssertions HaveCompletedWithin(TimeSpan duration, string? because = null)
public WorkflowResultAssertions HaveNoErrors(string? because = null)
public WorkflowResultAssertions HaveNonEmptyOutput(string? because = null)
// Navigate to sub-assertions
public ExecutorAssertions ForExecutor(string executorId)
public GraphAssertions HaveGraphStructure()
public ToolUsageAssertions HaveCalledTool(string toolName, string? because = null)
public PerformanceAssertions Performance => new(_result.Performance);
}
Level 2: Per-Executor Assertions
public class ExecutorAssertions
{
private readonly WorkflowExecutionResult _result;
private readonly string _executorId;
public ExecutorAssertions HaveNonEmptyOutput(string? because = null)
public ExecutorAssertions HaveOutputContaining(string text, string? because = null)
public ExecutorAssertions HaveOutputLongerThan(int minLength, string? because = null)
public ExecutorAssertions HaveCompletedWithin(TimeSpan duration, string? because = null)
public ExecutorAssertions HaveToolCalls(string? because = null)
public ExecutorAssertions HaveNoErrors(string? because = null)
public ExecutorAssertions HaveInputTokensLessThan(int maxTokens, string? because = null)
public ExecutorAssertions HaveEstimatedCostUnder(decimal maxCost, string? because = null)
// Return to workflow level
public WorkflowResultAssertions And() => new(_result);
}
Level 3: Graph Structure Assertions
public class GraphAssertions
{
public GraphAssertions HaveNodes(params string[] expectedNodes)
public GraphAssertions HaveEntryPoint(string nodeId, string? because = null)
public GraphAssertions HaveExitPoint(string nodeId, string? because = null)
public GraphAssertions HaveTraversedEdge(string sourceNode, string targetNode)
public GraphAssertions NotHaveTraversedEdge(string sourceNode, string targetNode)
public GraphAssertions HaveExecutionPath(params string[] expectedPath)
public GraphAssertions HaveUsedEdgeType(EdgeType expectedType)
// Return to workflow level
public WorkflowResultAssertions And() => _workflowAssertions;
}
Assertion Chaining Strategy
Enable natural reading flow with multiple assertion levels:
// Complex assertion chain - reads like specification
result.ExecutionResult!.Should()
// Level 1: Overall workflow structure
.HaveStepCount(4, because: "pipeline has 4 distinct stages")
.HaveExecutedInOrder("Planner", "Researcher", "Writer", "Editor")
.HaveCompletedWithin(TimeSpan.FromMinutes(3), because: "reasonable time for content generation")
.HaveNoErrors(because: "clean execution is required")
// Level 2: Per-executor validation
.ForExecutor("Planner")
.HaveNonEmptyOutput()
.HaveCompletedWithin(TimeSpan.FromSeconds(60), because: "planning should be reasonably fast")
.And()
.ForExecutor("Writer")
.HaveOutputLongerThan(200, because: "articles should be substantial")
.HaveEstimatedCostUnder(0.10m)
.And()
// Level 3: Graph validation
.HaveGraphStructure()
.HaveEntryPoint("Planner", because: "planning is the starting point")
.HaveExecutionPath("Planner", "Researcher", "Writer", "Editor")
// Back to Level 1: Tool validation (if applicable)
.HaveCalledTool("SearchFlights")?.WithoutError()
.And()
// Final validation trigger
.Validate();
Error Message Design
Structured Error Information
public class WorkflowAssertionException : AgentEvalAssertionException
{
public required string WorkflowName { get; init; }
public required string AssertionType { get; init; } // "StepCount", "ExecutionOrder", etc.
public required object Expected { get; init; }
public required object Actual { get; init; }
public string? ExecutorId { get; init; } // For executor-specific failures
public TimeSpan? ActualDuration { get; init; } // For timing failures
public List<string> Suggestions { get; init; } = new();
}
Rich Error Messages
// Example: Step count assertion failure
throw new WorkflowAssertionException
{
WorkflowName = "ContentPipeline",
AssertionType = "StepCount",
Expected = 4,
Actual = 3,
Message = "Expected workflow 'ContentPipeline' to have 4 steps, but found 3 steps.",
Suggestions =
[
"Check if all agents are properly bound as executors",
"Verify workflow graph has all expected edges",
"Ensure no agent failed silently during execution"
]
};
// Example: Execution order failure
throw new WorkflowAssertionException
{
WorkflowName = "ContentPipeline",
AssertionType = "ExecutionOrder",
Expected = ["Planner", "Researcher", "Writer", "Editor"],
Actual = ["Planner", "Writer", "Editor"], // Missing Researcher
Message = "Expected execution order [Planner, Researcher, Writer, Editor], but got [Planner, Writer, Editor]. Missing: Researcher.",
Suggestions =
[
"Verify Researcher agent is bound correctly",
"Check if Planner → Researcher edge exists in workflow",
"Ensure Researcher agent didn't fail silently"
]
};
// Example: Per-executor timeout failure
throw new WorkflowAssertionException
{
WorkflowName = "ContentPipeline",
AssertionType = "ExecutorTimeout",
ExecutorId = "Writer",
Expected = TimeSpan.FromSeconds(60),
ActualDuration = TimeSpan.FromMinutes(3),
Message = "Expected executor 'Writer' to complete within 60 seconds, but took 3 minutes.",
Suggestions =
[
"Consider increasing timeout for content generation tasks",
"Check if Writer agent is using efficient prompts",
"Verify LLM service response times are normal"
]
};
Performance-Optimized Validation
Lazy Validation Pattern
public class WorkflowResultAssertions
{
private readonly List<Func<WorkflowExecutionResult, AssertionResult>> _assertions = new();
public WorkflowResultAssertions HaveStepCount(int expectedCount, string? because = null)
{
_assertions.Add(result => ValidateStepCount(result, expectedCount, because));
return this; // Fluent chaining
}
// Validate() actually executes all assertions
public void Validate()
{
var failures = new List<WorkflowAssertionException>();
foreach (var assertion in _assertions)
{
var result = assertion(_result);
if (!result.Success)
{
failures.Add(result.Exception);
}
}
if (failures.Count > 0)
{
throw new AggregateWorkflowAssertionException(failures);
}
}
}
Assertion Result Caching
public class AssertionResultCache
{
private readonly Dictionary<string, object?> _cache = new();
public T GetOrCompute<T>(string key, Func<T> computation)
{
if (_cache.TryGetValue(key, out var cached))
{
return (T)cached!;
}
var result = computation();
_cache[key] = result;
return result;
}
}
// Usage in assertions
public bool ValidateGraphStructure()
{
return _cache.GetOrCompute("graph_extracted", () =>
{
// Expensive graph extraction - cache result
return _result.GraphDefinition != null && _result.GraphDefinition.Nodes.Count > 0;
});
}
Tool Usage Integration
Extend existing tool assertions to work in workflow context:
// Workflow-level tool validation (aggregated across all executors)
result.ExecutionResult!.Should()
.HaveCalledTool("GetInfoAbout", because: "TripPlanner must research cities")
.AtLeast(2.Times()) // Multiple cities
.WithoutError()
.InExecutor("TripPlanner") // Specific executor
.And()
.HaveCalledTool("SearchFlights")
.BeforeTool("BookFlight", because: "can't book without search results")
.InExecutor("FlightReservation")
.WithArgument("from", "Seattle")
.And()
.HaveToolCallPattern("Search", "Book") // Pattern across workflow
.HaveNoToolErrors()
.Validate();
Conditional Assertions
Handle workflows where some assertions are conditional:
public class WorkflowResultAssertions
{
public WorkflowResultAssertions When(Func<WorkflowExecutionResult, bool> condition)
{
_currentCondition = condition;
return this;
}
public WorkflowResultAssertions HaveCalledTool(string toolName, string? because = null)
{
if (_currentCondition == null || _currentCondition(_result))
{
// Only validate if condition is met
_assertions.Add(result => ValidateToolCall(result, toolName, because));
}
return this;
}
}
// Usage: Only validate tools if workflow actually has tool usage
result.ExecutionResult!.Should()
.When(r => r.ToolUsage != null)
.HaveCalledTool("SearchFlights")
.HaveNoToolErrors()
.And()
.HaveStepCount(4); // Always validate step count
Assertion Composition Patterns
Common Assertion Bundles
// Pre-built assertion patterns for common scenarios
public static class WorkflowAssertionBundles
{
public static WorkflowResultAssertions ValidateSequentialPipeline(
this WorkflowResultAssertions assertions,
params string[] expectedExecutors)
{
return assertions
.HaveStepCount(expectedExecutors.Length)
.HaveExecutedInOrder(expectedExecutors)
.HaveNoErrors()
.HaveNonEmptyOutput();
}
public static WorkflowResultAssertions ValidatePerformanceBounds(
this WorkflowResultAssertions assertions,
TimeSpan maxDuration,
decimal maxCost)
{
return assertions
.HaveCompletedWithin(maxDuration)
.Performance!.HaveEstimatedCostUnder(maxCost)
.And();
}
}
// Usage
result.ExecutionResult!.Should()
.ValidateSequentialPipeline("Planner", "Writer", "Editor")
.ValidatePerformanceBounds(TimeSpan.FromMinutes(2), 0.50m)
.Validate();
Custom Assertion Extensions
// Domain-specific extensions
public static class ContentPipelineAssertions
{
public static WorkflowResultAssertions ValidateContentQuality(
this WorkflowResultAssertions assertions)
{
return assertions
.ForExecutor("Planner")
.HaveOutputContaining("outline")
.HaveOutputContaining("research")
.And()
.ForExecutor("Writer")
.HaveOutputLongerThan(500)
.HaveOutputNotContaining("[TODO]")
.And()
.ForExecutor("Editor")
.HaveOutputNotContaining("DRAFT")
.And();
}
}
Benefits
Readability Benefits
- Natural Language Flow: Assertions read like specifications
- Hierarchical Structure: Mirrors workflow complexity naturally
- Selective Focus: Can validate just structure, just performance, or everything
- Contextual Chaining:
.And()maintains assertion context clearly
Maintainability Benefits
- Composable: Assertion bundles reduce duplication
- Extensible: Easy to add new assertion types
- Consistent: Follows established AgentEval patterns
- Cacheable: Expensive validations cached automatically
Developer Experience Benefits
- Rich Error Messages: Clear expected/actual with actionable suggestions
- Incremental Validation: Can build assertions step-by-step
- IntelliSense Support: Fluent API provides good IDE experience
- Conditional Logic:
When()clause handles complex scenarios
Implementation Strategy
Phase 1: Core Workflow Assertions (Completed)
- Basic structure validation (step count, execution order)
- Per-executor validation (output, timing)
- Error handling and reporting
- Integration with existing
WorkflowEvaluationHarness
Phase 2: Advanced Assertions (Completed)
- Graph structure validation
- Tool usage integration at workflow level
- Performance and cost validation
- Conditional assertion logic
Phase 3: Optimization & Convenience (Future)
- Assertion result caching for expensive operations
- Pre-built assertion bundles for common patterns
- Custom assertion extension points
- Performance monitoring of assertion overhead
Testing Strategy
Unit Tests for Assertions
[Fact]
public void HaveStepCount_WhenCorrectCount_ShouldPass()
{
var result = CreateWorkflowResult(stepCount: 3);
var assertion = () => result.Should().HaveStepCount(3).Validate();
assertion.Should().NotThrow();
}
[Fact]
public void HaveStepCount_WhenIncorrectCount_ShouldThrowWithDetails()
{
var result = CreateWorkflowResult(stepCount: 2);
var assertion = () => result.Should().HaveStepCount(3).Validate();
var exception = assertion.Should().Throw<WorkflowAssertionException>().Which;
exception.AssertionType.Should().Be("StepCount");
exception.Expected.Should().Be(3);
exception.Actual.Should().Be(2);
exception.Suggestions.Should().NotBeEmpty();
}
[Fact]
public void ForExecutor_WhenChained_ShouldValidateBoth()
{
var result = CreateWorkflowResult();
var assertion = () => result.Should()
.ForExecutor("Agent1")
.HaveNonEmptyOutput()
.And()
.ForExecutor("Agent2")
.HaveNonEmptyOutput()
.And()
.Validate();
assertion.Should().NotThrow();
}
Integration Tests with Real Workflows
[Fact]
public async Task ComplexWorkflowAssertions_ShouldValidateCorrectly()
{
var result = await harness.RunWorkflowTestAsync(complexAdapter, testCase);
// Test complex assertion chain
result.ExecutionResult!.Should()
.ValidateSequentialPipeline("A", "B", "C", "D")
.ValidatePerformanceBounds(TimeSpan.FromMinutes(5), 1.0m)
.ForExecutor("C")
.HaveCalledTool("ProcessData")
.And()
.HaveGraphStructure()
.HaveTraversedEdge("A", "B")
.HaveTraversedEdge("B", "C")
.Validate();
}
Alternatives Considered
Alternative 1: Separate Assertion Classes
Create separate classes for each assertion type.
Advantages:
- Clear separation of concerns
- Easier to unit test individual assertion types
- More explicit interfaces
Disadvantages:
- Breaks fluent chaining
- Requires manual composition
- Less readable assertion chains
- More complex API surface
Decision: Rejected - fluent chaining is core to AgentEval's assertion design.
Alternative 2: Configuration-Based Assertions
Define assertions in YAML/JSON configuration.
Advantages:
- Non-developers can write assertions
- Version control for assertion definitions
- Reusable assertion libraries
Disadvantages:
- Not type-safe
- Poor IDE support
- Limited expressiveness
- Complex error reporting
Decision: Rejected - type safety and IDE support are crucial for developer productivity.
Alternative 3: Simple Boolean Methods
Use simple methods returning true/false.
Assert.True(result.HasStepCount(4));
Assert.True(result.ExecutedInOrder("A", "B", "C"));
Advantages:
- Simple implementation
- Familiar to xUnit users
- Minimal learning curve
Disadvantages:
- Poor error messages
- No fluent chaining
- Inconsistent with AgentEval patterns
- Limited composability
Decision: Rejected - doesn't provide the rich error experience AgentEval requires.
Performance Considerations
Assertion Overhead
// Measured overhead of assertion processing
public class AssertionPerformanceMetrics
{
public TimeSpan ValidationTime { get; } // ~1-5ms for complex workflows
public long MemoryUsage { get; } // ~10KB for assertion state
public int AssertionCount { get; } // Track assertion complexity
}
Optimization Strategies
- Lazy Evaluation: Don't compute expensive validations until
.Validate()called - Result Caching: Cache expensive computations (graph extraction, tool analysis)
- Early Exit: Stop validation on first failure when appropriate
- Batch Operations: Group similar validations for efficiency
Related ADRs
- ADR-010: MAF Workflow Integration Architecture - Foundation for workflow evaluation
- ADR-011: Workflow Event Processing and Timeout Handling - Event processing that feeds assertions
- ADR-003: CLI Review Commands - CLI integration for workflow assertions
- ADR-006: Service-Based Architecture - DI integration for assertion services
This ADR establishes the assertion design principles for workflow evaluation, ensuring consistent, readable, and comprehensive validation capabilities across AgentEval's workflow features.