Naming Conventions
This document defines naming conventions for AgentEval APIs, metrics, and code.
📋 Decision Records: For detailed rationale behind these conventions, see the ADR folder.
Metric Names
ADR: 001-metric-naming-prefixes.md
ADR: 007-metrics-taxonomy.md
Prefix Convention
Metrics use prefixes to indicate their computation method and cost:
| Prefix | Computation | Cost | MetricCategory Flag |
|---|---|---|---|
llm_ |
LLM-evaluated via prompt | $$$ (API calls) | MetricCategory.LLMBased |
code_ |
Computed by code logic | Free | MetricCategory.CodeBased |
embed_ |
Computed via embeddings | $ (embedding API) | MetricCategory.EmbeddingBased |
Domain Categories
Metrics are also categorized by evaluation domain:
| Domain | Interface | MetricCategory Flag | Examples |
|---|---|---|---|
| RAG | IRAGMetric |
MetricCategory.RAG |
Faithfulness, Relevance |
| Agentic | IAgenticMetric |
MetricCategory.Agentic |
Tool Selection, Tool Success |
| Conversation | Special | MetricCategory.Conversation |
ConversationCompleteness |
| Safety | ISafetyMetric |
MetricCategory.Safety |
Toxicity (planned) |
Complete Metric Reference
| Metric Name | Type | Description |
|---|---|---|
llm_faithfulness |
LLM | Response grounded in context |
llm_relevance |
LLM | Response addresses the question |
llm_context_precision |
LLM | Retrieved context is relevant |
llm_context_recall |
LLM | Context contains needed info |
llm_answer_correctness |
LLM | Answer matches ground truth |
llm_tool_selection |
LLM | Correct tools were chosen |
llm_tool_arguments |
LLM | Tool arguments are correct |
llm_task_completion |
LLM | Task was completed successfully |
embed_answer_similarity |
Embedding | Answer similar to ground truth |
embed_response_context |
Embedding | Response relates to context |
embed_query_context |
Embedding | Query relates to context |
code_tool_success |
Code | Tools executed without errors |
code_tool_efficiency |
Code | Minimal tool calls used |
Usage Examples
// Filter by cost
var freeMetrics = metrics.Where(m => m.Name.StartsWith("code_"));
var llmMetrics = metrics.Where(m => m.Name.StartsWith("llm_"));
Result File Structure (Proposed)
ADR: 002-result-directory-structure.md
Status: Proposed — Not yet implemented
When the DirectoryExporter is implemented, it will produce:
| File | Purpose | Format |
|---|---|---|
results.jsonl |
Per-test results | JSON Lines |
summary.json |
Aggregate statistics | JSON |
run.json |
Run metadata | JSON |
config.json |
Original config copy | JSON |
Current state: AgentEval exports single files via JsonExporter, JUnitXmlExporter, etc.
Test Data Files (Existing)
AgentEval's DatasetLoader supports these formats:
| Extension | Format | Loader Method |
|---|---|---|
.jsonl |
JSON Lines | LoadJsonlAsync() |
.json |
JSON Array | LoadJsonAsync() |
.csv |
CSV | LoadCsvAsync() |
.yaml |
YAML | LoadYamlAsync() |
These are existing conventions, not new proposals.
Class Naming
Metrics
[Category][Name]Metric
Examples:
FaithfulnessMetricToolSelectionMetricResponseLengthMetric
Assertions
[Subject]Assertions
Examples:
ToolUsageAssertionsResponseAssertionsPerformanceAssertionsWorkflowAssertions
Exporters
[Format]Exporter
Examples:
JsonExporterJUnitExporterMarkdownExporterDirectoryExporter
CLI Commands
Command Structure
agenteval <command> [subcommand] [options]
Commands
| Command | Purpose |
|---|---|
eval |
Run evaluation |
init |
Initialize config |
list |
List available items |
summary |
View run summaries |
diff |
Compare runs |
Options
- Use
--kebab-casefor multi-word options - Use short forms for common options:
-o(output),-c(config),-v(verbose)
See Also
- Metrics Reference - Complete metric catalog with usage guidance
- Evaluation Guide - How to choose the right metrics
- Architecture - System design and metric hierarchy
Last updated: January 2026