ADR-010: MAF Workflow Integration Architecture
Status
✅ Accepted - February 14, 2026
Context
Microsoft Agent Framework (MAF) provides a native workflow system using WorkflowBuilder for orchestrating multi-agent execution pipelines. AgentEval needs to evaluate these workflows while maintaining compatibility with MAF's event streaming architecture and execution model.
Key Integration Challenges
- Event Streaming: MAF workflows emit real-time events via
WatchStreamAsync()that must be captured and processed into AgentEval's evaluation model - Graph Extraction: Workflow structure must be automatically extracted from MAF's
Workflowobjects for assertion validation - Timeout Handling: MAF's
InProcessExecutionmay not honor cancellation tokens during active LLM calls, requiring hard timeout mechanisms - Chat Protocol: MAF agents use a two-phase protocol (message accumulation + TurnToken processing) that requires specific event sequencing
Existing Solutions Considered
Mock Workflows: Create fake workflow adapters with predefined outputs
- ❌ Doesn't test real MAF integration
- ❌ Can't validate actual agent behavior
- ❌ Misses timing and performance characteristics
Direct MAF Usage: Use MAF workflows without AgentEval wrapper
- ❌ No structured evaluation capabilities
- ❌ No assertion APIs
- ❌ No timeline generation or visualization
Fork/Wrapper Approach: Create custom workflow execution engine
- ❌ High maintenance overhead
- ❌ Diverges from MAF's evolution
- ❌ Loses MAF's native optimizations
Decision
Adopt an Adapter Pattern that bridges MAF workflows into AgentEval's evaluation system while preserving MAF's native execution path.
Architecture Components
1. MAFWorkflowAdapter
public class MAFWorkflowAdapter : IWorkflowAdapter
{
public static MAFWorkflowAdapter FromMAFWorkflow(
Workflow workflow, // MAF workflow object
string workflowName, // Human-readable name
string[] executorIds, // Expected executor names
string? workflowType = null) // Optional classification
{
// Extract graph structure via MAF's ReflectEdges()
var graph = MAFGraphExtractor.ExtractGraph(workflow);
// Create event processing bridge
var eventBridge = new MAFWorkflowEventBridge(workflow);
return new MAFWorkflowAdapter(workflowName, graph, eventBridge, executorIds);
}
}
2. MAFWorkflowEventBridge
Converts MAF workflow events into AgentEval evaluation model:
public class MAFWorkflowEventBridge
{
public async IAsyncEnumerable<WorkflowEvaluationEvent> StreamAsAgentEvalEvents(
string input,
CancellationToken cancellationToken)
{
// Protocol detection for ChatProtocol vs function-based workflows
var protocol = await _workflow.DescribeProtocolAsync(cancellationToken);
bool isChatProtocol = ChatProtocolExtensions.IsChatProtocol(protocol);
StreamingRun run;
if (isChatProtocol)
{
// ChatClientAgent workflows: ChatMessage + TurnToken
run = await InProcessExecution.RunStreamingAsync(_workflow,
new ChatMessage(ChatRole.User, input), cancellationToken);
await run.TrySendMessageAsync(new TurnToken(emitEvents: true));
}
else
{
// Function-based workflows: direct string input
run = await InProcessExecution.RunStreamingAsync<string>(_workflow, input, cancellationToken);
}
// Convert MAF events to AgentEval events
await foreach (var mafEvent in run.WatchStreamAsync(cancellationToken))
{
var agentEvalEvent = ConvertMAFEvent(mafEvent);
if (agentEvalEvent != null)
yield return agentEvalEvent;
}
}
}
3. MAFGraphExtractor
Extracts workflow structure from MAF workflows:
public static class MAFGraphExtractor
{
public static WorkflowGraphDefinition ExtractGraph(Workflow workflow)
{
// Use MAF's native graph reflection
var edges = workflow.ReflectEdges();
var nodes = ExtractNodesFromEdges(edges);
return new WorkflowGraphDefinition
{
Nodes = nodes.Select(n => new WorkflowNode { NodeId = n }).ToList(),
Edges = edges.Select(e => new WorkflowEdge
{
EdgeId = $"{e.Source}->{e.Target}",
SourceNodeId = e.Source,
TargetNodeId = e.Target,
EdgeType = EdgeType.Sequential // MAF uses sequential by default
}).ToList(),
EntryNodeId = DetermineEntryNode(edges),
ExitNodeIds = DetermineExitNodes(edges)
};
}
}
4. Event Mapping Strategy
| MAF Event | AgentEval Event | Purpose |
|---|---|---|
SuperStepStartedEvent |
WorkflowStepStartEvent | Workflow execution phase begins |
ExecutorInvokedEvent |
ExecutorStartEvent | Agent begins processing |
AgentResponseUpdateEvent |
StreamingTokenEvent | Real-time LLM token output |
ExecutorCompletedEvent |
ExecutorCompleteEvent | Agent finishes processing |
SuperStepCompletedEvent |
WorkflowStepCompleteEvent | Workflow execution phase ends |
Benefits
- Native MAF Integration: Uses MAF's actual execution engine, not simulation
- Event Fidelity: Captures real-time streaming events including token-level updates
- Graph Auto-Detection: Automatically extracts workflow structure using MAF's reflection APIs
- Protocol Compatibility: Handles both ChatProtocol and function-based workflow types
- Future-Proof: Adapts to MAF evolution without breaking AgentEval consumers
Implementation Details
Workflow Creation Pattern
// 1. Create MAF workflow with WorkflowBuilder
var chatClient = azureClient.GetChatClient(deployment).AsIChatClient();
var planner = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Name = "Planner",
ChatOptions = new() { Instructions = "Create content plans..." }
});
var writer = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Name = "Writer",
ChatOptions = new() { Instructions = "Write content from plans..." }
});
// 2. Bind as executors with event emission
var plannerBinding = planner.BindAsExecutor(emitEvents: true);
var writerBinding = writer.BindAsExecutor(emitEvents: true);
// 3. Build MAF workflow
var workflow = new WorkflowBuilder(plannerBinding)
.AddEdge(plannerBinding, writerBinding)
.Build();
// 4. Create AgentEval adapter
var adapter = MAFWorkflowAdapter.FromMAFWorkflow(
workflow, "ContentPipeline", ["Planner", "Writer"]);
// 5. Evaluate with standard harness
var harness = new WorkflowEvaluationHarness();
var result = await harness.RunWorkflowTestAsync(adapter, testCase);
Error Handling Strategy
- MAF Errors: Captured via MAF's error events and mapped to AgentEval error model
- Timeout Errors: Hard timeout wrapper prevents indefinite hangs (see ADR-011)
- Protocol Errors: EventBridge gracefully degrades if protocol detection fails
Performance Characteristics
- Overhead: <5% compared to direct MAF execution (primarily event processing)
- Memory: Comparable to MAF (events are streamed, not buffered)
- Latency: Real-time event streaming maintains MAF's responsive characteristics
Alternatives Considered
Alternative 1: Workflow DSL
Create AgentEval-specific workflow definition language.
Advantages:
- Full control over evaluation capabilities
- Optimized for testing scenarios
- Custom assertion APIs
Disadvantages:
- High development and maintenance cost
- Diverges from MAF standard
- Forces users to learn separate workflow system
- Can't evaluate production MAF workflows
Decision: Rejected - too much divergence from MAF ecosystem.
Alternative 2: MAF Extension
Extend MAF with built-in evaluation capabilities.
Advantages:
- Native integration
- No adapter complexity
- Direct access to MAF internals
Disadvantages:
- Requires MAF team coordination
- AgentEval becomes MAF-dependent for core features
- Harder to support multiple MAF versions
- Evaluation concerns mixed with execution concerns
Decision: Rejected - violates separation of concerns.
Alternative 3: Interceptor Pattern
Intercept MAF workflow calls at execution boundary.
Advantages:
- Transparent to workflow definition
- Could work with any execution engine
Disadvantages:
- Complex implementation
- Fragile to MAF internal changes
- Limited event visibility
- Difficult to extract graph structure
Decision: Rejected - too fragile and complex.
Consequences
Positive
- Real-World Fidelity: Tests actual production workflows, not simulations
- MAF Ecosystem Alignment: Maintains compatibility with MAF evolution
- Comprehensive Event Capture: Full visibility into workflow execution including streaming
- Developer Experience: Familiar MAF workflow creation patterns
- Performance Visibility: Actual timing, cost, and resource usage data
Negative
- MAF Dependency: AgentEval workflow features require MAF installation
- Version Coupling: Must maintain compatibility with MAF version evolution
- Protocol Complexity: ChatProtocol handling adds implementation complexity
- Testing Complexity: Integration tests require Azure OpenAI credentials
Implementation Risks
- MAF Breaking Changes: Future MAF versions might break event processing
- Event Model Evolution: MAF event structure changes could require adapter updates
- Performance Degradation: Event processing overhead might impact large workflows
Mitigation Strategies
- Version Pinning: Pin to specific MAF versions with tested compatibility
- Graceful Degradation: EventBridge falls back to basic execution if event processing fails
- Integration Test Coverage: Comprehensive tests against actual MAF workflows
- Performance Monitoring: Track adapter overhead in benchmarks
Implementation Timeline
- Phase 1 (Completed): Basic adapter with sequential workflow support
- Phase 2 (Completed): Event streaming and graph extraction
- Phase 3 (Completed): Tool usage tracking across workflow steps
- Phase 4 (Future): Conditional routing and parallel execution support
Related ADRs
- ADR-011: Workflow Event Processing and Timeout Handling
- ADR-012: Workflow Assertion Design
- ADR-004: Trace Recording and Replay - Workflow trace support
This ADR documents the architectural foundation for MAF workflow integration in AgentEval, establishing patterns for event processing, graph extraction, and evaluation harness integration.