ADR-012: Workflow Assertion Design

Status

✅ Accepted - February 14, 2026

Context

Workflow evaluation requires a rich assertion API that can validate complex multi-agent execution patterns. Unlike simple agent evaluations that focus on single inputs/outputs, workflows involve:

Sequential Execution: Validating that agents execute in the correct order
Graph Structure: Asserting on workflow topology and edge traversal
Per-Executor Validation: Individual agent performance within the workflow context
Tool Chain Evaluation: Tool usage patterns across multiple agents
Timing and Performance: Execution timing, costs, and resource constraints
Error Propagation: How errors flow through the workflow pipeline

Design Challenges

Complexity: Workflows have many validation dimensions (structure, timing, tools, outputs)
Readability: Assertions should read naturally despite complex validation logic
Composability: Multiple assertion types must work together seamlessly
Error Messages: Clear failure messages with actionable suggestions
Performance: Assertions shouldn't significantly impact evaluation performance

Existing Patterns

AgentEval already has successful fluent assertion patterns:

// Tool usage assertions (proven pattern)
result.ToolUsage!.Should()
    .HaveCalledTool("SearchFlights")
    .BeforeTool("BookFlight")
    .WithoutError();

// Performance assertions (proven pattern)  
result.Performance!.Should()
    .HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
    .HaveEstimatedCostUnder(0.05m);

Challenge: Extend this pattern to workflow complexity without losing readability.

Decision

Design a hierarchical fluent assertion system that mirrors workflow structure while maintaining AgentEval's existing assertion patterns.

Architecture: Hierarchical Assertions

Level 1: Workflow-Level Assertions

// Entry point - validates overall workflow execution
public static WorkflowResultAssertions Should(this WorkflowExecutionResult result)
{
    return new WorkflowResultAssertions(result);
}

public class WorkflowResultAssertions
{
    public WorkflowResultAssertions HaveStepCount(int expectedCount, string? because = null)
    public WorkflowResultAssertions HaveExecutedInOrder(params string[] executorIds)
    public WorkflowResultAssertions HaveCompletedWithin(TimeSpan duration, string? because = null)
    public WorkflowResultAssertions HaveNoErrors(string? because = null)
    public WorkflowResultAssertions HaveNonEmptyOutput(string? because = null)
    
    // Navigate to sub-assertions
    public ExecutorAssertions ForExecutor(string executorId)
    public GraphAssertions HaveGraphStructure()
    public ToolUsageAssertions HaveCalledTool(string toolName, string? because = null)
    public PerformanceAssertions Performance => new(_result.Performance);
}

Level 2: Per-Executor Assertions

public class ExecutorAssertions
{
    private readonly WorkflowExecutionResult _result;
    private readonly string _executorId;
    
    public ExecutorAssertions HaveNonEmptyOutput(string? because = null)
    public ExecutorAssertions HaveOutputContaining(string text, string? because = null)
    public ExecutorAssertions HaveOutputLongerThan(int minLength, string? because = null)
    public ExecutorAssertions HaveCompletedWithin(TimeSpan duration, string? because = null)
    public ExecutorAssertions HaveToolCalls(string? because = null)
    public ExecutorAssertions HaveNoErrors(string? because = null)
    public ExecutorAssertions HaveInputTokensLessThan(int maxTokens, string? because = null)
    public ExecutorAssertions HaveEstimatedCostUnder(decimal maxCost, string? because = null)
    
    // Return to workflow level
    public WorkflowResultAssertions And() => new(_result);
}

Level 3: Graph Structure Assertions

public class GraphAssertions  
{
    public GraphAssertions HaveNodes(params string[] expectedNodes)
    public GraphAssertions HaveEntryPoint(string nodeId, string? because = null)
    public GraphAssertions HaveExitPoint(string nodeId, string? because = null)
    public GraphAssertions HaveTraversedEdge(string sourceNode, string targetNode)
    public GraphAssertions NotHaveTraversedEdge(string sourceNode, string targetNode)
    public GraphAssertions HaveExecutionPath(params string[] expectedPath)
    public GraphAssertions HaveUsedEdgeType(EdgeType expectedType)
    
    // Return to workflow level
    public WorkflowResultAssertions And() => _workflowAssertions;
}

Assertion Chaining Strategy

Enable natural reading flow with multiple assertion levels:

// Complex assertion chain - reads like specification
result.ExecutionResult!.Should()
    // Level 1: Overall workflow structure
    .HaveStepCount(4, because: "pipeline has 4 distinct stages")
    .HaveExecutedInOrder("Planner", "Researcher", "Writer", "Editor")
    .HaveCompletedWithin(TimeSpan.FromMinutes(3), because: "reasonable time for content generation")
    .HaveNoErrors(because: "clean execution is required")
    
    // Level 2: Per-executor validation
    .ForExecutor("Planner")
        .HaveNonEmptyOutput()
        .HaveCompletedWithin(TimeSpan.FromSeconds(60), because: "planning should be reasonably fast")
        .And()
    .ForExecutor("Writer")    
        .HaveOutputLongerThan(200, because: "articles should be substantial")
        .HaveEstimatedCostUnder(0.10m)
        .And()
        
    // Level 3: Graph validation
    .HaveGraphStructure()
    .HaveEntryPoint("Planner", because: "planning is the starting point")  
    .HaveExecutionPath("Planner", "Researcher", "Writer", "Editor")
    
    // Back to Level 1: Tool validation (if applicable)
    .HaveCalledTool("SearchFlights")?.WithoutError()
    .And()
    
    // Final validation trigger
    .Validate();

Error Message Design

Structured Error Information

public class WorkflowAssertionException : AgentEvalAssertionException
{
    public required string WorkflowName { get; init; }
    public required string AssertionType { get; init; }  // "StepCount", "ExecutionOrder", etc.
    public required object Expected { get; init; }
    public required object Actual { get; init; }
    public string? ExecutorId { get; init; }            // For executor-specific failures
    public TimeSpan? ActualDuration { get; init; }      // For timing failures
    public List<string> Suggestions { get; init; } = new();
}

Rich Error Messages

// Example: Step count assertion failure
throw new WorkflowAssertionException
{
    WorkflowName = "ContentPipeline",
    AssertionType = "StepCount", 
    Expected = 4,
    Actual = 3,
    Message = "Expected workflow 'ContentPipeline' to have 4 steps, but found 3 steps.",
    Suggestions = 
    [
        "Check if all agents are properly bound as executors",
        "Verify workflow graph has all expected edges",
        "Ensure no agent failed silently during execution"
    ]
};

// Example: Execution order failure
throw new WorkflowAssertionException
{
    WorkflowName = "ContentPipeline",
    AssertionType = "ExecutionOrder",
    Expected = ["Planner", "Researcher", "Writer", "Editor"],
    Actual = ["Planner", "Writer", "Editor"],  // Missing Researcher
    Message = "Expected execution order [Planner, Researcher, Writer, Editor], but got [Planner, Writer, Editor]. Missing: Researcher.",
    Suggestions =
    [
        "Verify Researcher agent is bound correctly",
        "Check if Planner → Researcher edge exists in workflow",
        "Ensure Researcher agent didn't fail silently"
    ]
};

// Example: Per-executor timeout failure
throw new WorkflowAssertionException
{
    WorkflowName = "ContentPipeline",
    AssertionType = "ExecutorTimeout",
    ExecutorId = "Writer",
    Expected = TimeSpan.FromSeconds(60),
    ActualDuration = TimeSpan.FromMinutes(3),
    Message = "Expected executor 'Writer' to complete within 60 seconds, but took 3 minutes.",
    Suggestions =
    [
        "Consider increasing timeout for content generation tasks",
        "Check if Writer agent is using efficient prompts",
        "Verify LLM service response times are normal"
    ]
};

Performance-Optimized Validation

Lazy Validation Pattern

public class WorkflowResultAssertions
{
    private readonly List<Func<WorkflowExecutionResult, AssertionResult>> _assertions = new();
    
    public WorkflowResultAssertions HaveStepCount(int expectedCount, string? because = null)
    {
        _assertions.Add(result => ValidateStepCount(result, expectedCount, because));
        return this;  // Fluent chaining
    }
    
    // Validate() actually executes all assertions
    public void Validate()
    {
        var failures = new List<WorkflowAssertionException>();
        
        foreach (var assertion in _assertions)
        {
            var result = assertion(_result);
            if (!result.Success)
            {
                failures.Add(result.Exception);
            }
        }
        
        if (failures.Count > 0)
        {
            throw new AggregateWorkflowAssertionException(failures);
        }
    }
}

Assertion Result Caching

public class AssertionResultCache
{
    private readonly Dictionary<string, object?> _cache = new();
    
    public T GetOrCompute<T>(string key, Func<T> computation)
    {
        if (_cache.TryGetValue(key, out var cached))
        {
            return (T)cached!;
        }
        
        var result = computation();
        _cache[key] = result;
        return result;
    }
}

// Usage in assertions
public bool ValidateGraphStructure()
{
    return _cache.GetOrCompute("graph_extracted", () => 
    {
        // Expensive graph extraction - cache result
        return _result.GraphDefinition != null && _result.GraphDefinition.Nodes.Count > 0;
    });
}

Tool Usage Integration

Extend existing tool assertions to work in workflow context:

// Workflow-level tool validation (aggregated across all executors)
result.ExecutionResult!.Should()
    .HaveCalledTool("GetInfoAbout", because: "TripPlanner must research cities")
        .AtLeast(2.Times())                                    // Multiple cities
        .WithoutError()
        .InExecutor("TripPlanner")                            // Specific executor
        .And()
    .HaveCalledTool("SearchFlights")  
        .BeforeTool("BookFlight", because: "can't book without search results")
        .InExecutor("FlightReservation")
        .WithArgument("from", "Seattle")
        .And()
    .HaveToolCallPattern("Search", "Book")                    // Pattern across workflow
    .HaveNoToolErrors()
    .Validate();

Conditional Assertions

Handle workflows where some assertions are conditional:

public class WorkflowResultAssertions 
{
    public WorkflowResultAssertions When(Func<WorkflowExecutionResult, bool> condition)
    {
        _currentCondition = condition;
        return this;
    }
    
    public WorkflowResultAssertions HaveCalledTool(string toolName, string? because = null)
    {
        if (_currentCondition == null || _currentCondition(_result))
        {
            // Only validate if condition is met
            _assertions.Add(result => ValidateToolCall(result, toolName, because));
        }
        return this;
    }
}

// Usage: Only validate tools if workflow actually has tool usage
result.ExecutionResult!.Should()
    .When(r => r.ToolUsage != null)
        .HaveCalledTool("SearchFlights")
        .HaveNoToolErrors()
    .And()
    .HaveStepCount(4);  // Always validate step count

Assertion Composition Patterns

Common Assertion Bundles

// Pre-built assertion patterns for common scenarios
public static class WorkflowAssertionBundles
{
    public static WorkflowResultAssertions ValidateSequentialPipeline(
        this WorkflowResultAssertions assertions,
        params string[] expectedExecutors)
    {
        return assertions
            .HaveStepCount(expectedExecutors.Length)
            .HaveExecutedInOrder(expectedExecutors)
            .HaveNoErrors()
            .HaveNonEmptyOutput();
    }
    
    public static WorkflowResultAssertions ValidatePerformanceBounds(
        this WorkflowResultAssertions assertions,
        TimeSpan maxDuration,
        decimal maxCost)
    {
        return assertions
            .HaveCompletedWithin(maxDuration)
            .Performance!.HaveEstimatedCostUnder(maxCost)
            .And();
    }
}

// Usage
result.ExecutionResult!.Should()
    .ValidateSequentialPipeline("Planner", "Writer", "Editor")
    .ValidatePerformanceBounds(TimeSpan.FromMinutes(2), 0.50m)
    .Validate();

Custom Assertion Extensions

// Domain-specific extensions
public static class ContentPipelineAssertions
{
    public static WorkflowResultAssertions ValidateContentQuality(
        this WorkflowResultAssertions assertions)
    {
        return assertions
            .ForExecutor("Planner")
                .HaveOutputContaining("outline")
                .HaveOutputContaining("research")
                .And()
            .ForExecutor("Writer")
                .HaveOutputLongerThan(500)
                .HaveOutputNotContaining("[TODO]")
                .And()
            .ForExecutor("Editor")
                .HaveOutputNotContaining("DRAFT")
                .And();
    }
}

Benefits

Readability Benefits

Natural Language Flow: Assertions read like specifications
Hierarchical Structure: Mirrors workflow complexity naturally
Selective Focus: Can validate just structure, just performance, or everything
Contextual Chaining: .And() maintains assertion context clearly

Maintainability Benefits

Composable: Assertion bundles reduce duplication
Extensible: Easy to add new assertion types
Consistent: Follows established AgentEval patterns
Cacheable: Expensive validations cached automatically

Developer Experience Benefits

Rich Error Messages: Clear expected/actual with actionable suggestions
Incremental Validation: Can build assertions step-by-step
IntelliSense Support: Fluent API provides good IDE experience
Conditional Logic: When() clause handles complex scenarios

Implementation Strategy

Phase 1: Core Workflow Assertions (Completed)

Basic structure validation (step count, execution order)
Per-executor validation (output, timing)
Error handling and reporting
Integration with existing WorkflowEvaluationHarness

Phase 2: Advanced Assertions (Completed)

Graph structure validation
Tool usage integration at workflow level
Performance and cost validation
Conditional assertion logic

Phase 3: Optimization & Convenience (Future)

Assertion result caching for expensive operations
Pre-built assertion bundles for common patterns
Custom assertion extension points
Performance monitoring of assertion overhead

Testing Strategy

Unit Tests for Assertions

[Fact]
public void HaveStepCount_WhenCorrectCount_ShouldPass()
{
    var result = CreateWorkflowResult(stepCount: 3);
    
    var assertion = () => result.Should().HaveStepCount(3).Validate();
    
    assertion.Should().NotThrow();
}

[Fact] 
public void HaveStepCount_WhenIncorrectCount_ShouldThrowWithDetails()
{
    var result = CreateWorkflowResult(stepCount: 2);
    
    var assertion = () => result.Should().HaveStepCount(3).Validate();
    
    var exception = assertion.Should().Throw<WorkflowAssertionException>().Which;
    exception.AssertionType.Should().Be("StepCount");
    exception.Expected.Should().Be(3);
    exception.Actual.Should().Be(2);
    exception.Suggestions.Should().NotBeEmpty();
}

[Fact]
public void ForExecutor_WhenChained_ShouldValidateBoth()
{
    var result = CreateWorkflowResult();
    
    var assertion = () => result.Should()
        .ForExecutor("Agent1")
            .HaveNonEmptyOutput()
            .And()
        .ForExecutor("Agent2")
            .HaveNonEmptyOutput()
            .And()
        .Validate();
        
    assertion.Should().NotThrow();
}

Integration Tests with Real Workflows

[Fact]
public async Task ComplexWorkflowAssertions_ShouldValidateCorrectly()
{
    var result = await harness.RunWorkflowTestAsync(complexAdapter, testCase);
    
    // Test complex assertion chain
    result.ExecutionResult!.Should()
        .ValidateSequentialPipeline("A", "B", "C", "D")
        .ValidatePerformanceBounds(TimeSpan.FromMinutes(5), 1.0m)
        .ForExecutor("C")
            .HaveCalledTool("ProcessData")
            .And()
        .HaveGraphStructure()
            .HaveTraversedEdge("A", "B")
            .HaveTraversedEdge("B", "C")
        .Validate();
}

Alternatives Considered

Alternative 1: Separate Assertion Classes

Create separate classes for each assertion type.

Advantages:

Clear separation of concerns
Easier to unit test individual assertion types
More explicit interfaces

Disadvantages:

Breaks fluent chaining
Requires manual composition
Less readable assertion chains
More complex API surface

Decision: Rejected - fluent chaining is core to AgentEval's assertion design.

Alternative 2: Configuration-Based Assertions

Define assertions in YAML/JSON configuration.

Advantages:

Non-developers can write assertions
Version control for assertion definitions
Reusable assertion libraries

Disadvantages:

Not type-safe
Poor IDE support
Limited expressiveness
Complex error reporting

Decision: Rejected - type safety and IDE support are crucial for developer productivity.

Alternative 3: Simple Boolean Methods

Use simple methods returning true/false.

Assert.True(result.HasStepCount(4));
Assert.True(result.ExecutedInOrder("A", "B", "C"));

Advantages:

Simple implementation
Familiar to xUnit users
Minimal learning curve

Disadvantages:

Poor error messages
No fluent chaining
Inconsistent with AgentEval patterns
Limited composability

Decision: Rejected - doesn't provide the rich error experience AgentEval requires.

Performance Considerations

Assertion Overhead

// Measured overhead of assertion processing
public class AssertionPerformanceMetrics
{
    public TimeSpan ValidationTime { get; }      // ~1-5ms for complex workflows
    public long MemoryUsage { get; }             // ~10KB for assertion state
    public int AssertionCount { get; }           // Track assertion complexity
}

Optimization Strategies

Lazy Evaluation: Don't compute expensive validations until .Validate() called
Result Caching: Cache expensive computations (graph extraction, tool analysis)
Early Exit: Stop validation on first failure when appropriate
Batch Operations: Group similar validations for efficiency

ADR-010: MAF Workflow Integration Architecture - Foundation for workflow evaluation
ADR-011: Workflow Event Processing and Timeout Handling - Event processing that feeds assertions
ADR-003: CLI Review Commands - CLI integration for workflow assertions
ADR-006: Service-Based Architecture - DI integration for assertion services

This ADR establishes the assertion design principles for workflow evaluation, ensuring consistent, readable, and comprehensive validation capabilities across AgentEval's workflow features.

Table of Contents

ADR-012: Workflow Assertion Design

Status

Context

Design Challenges

Existing Patterns

Decision

Architecture: Hierarchical Assertions

Level 1: Workflow-Level Assertions

Level 2: Per-Executor Assertions

Level 3: Graph Structure Assertions

Assertion Chaining Strategy

Error Message Design

Structured Error Information

Rich Error Messages

Performance-Optimized Validation

Lazy Validation Pattern

Assertion Result Caching

Tool Usage Integration

Conditional Assertions

Assertion Composition Patterns

Common Assertion Bundles

Custom Assertion Extensions

Benefits

Readability Benefits

Maintainability Benefits

Developer Experience Benefits

Implementation Strategy

Phase 1: Core Workflow Assertions (Completed)

Phase 2: Advanced Assertions (Completed)

Phase 3: Optimization & Convenience (Future)

Testing Strategy

Unit Tests for Assertions

Integration Tests with Real Workflows

Alternatives Considered

Alternative 1: Separate Assertion Classes

Alternative 2: Configuration-Based Assertions

Alternative 3: Simple Boolean Methods

Performance Considerations

Assertion Overhead

Optimization Strategies

Related ADRs