Skip to main content

Ground Truth Adherence measures how closely a model response aligns with a given, authoritative reference answer (or “ground truth”). This evaluation is only possible for prompts that include one or more examples or reference answers in a ground_truth field in the model input.
01
Low Adherence
Response deviates from the reference answer
High Adherence
Response matches the reference answer in content and meaning
If Ground Truth Adherence is selected as a metric, then a ground_truth field must be passed as part of the model input. The evaluation will fail if the ground_truth field is not included.

Understanding Ground Truth Adherence

Ground Truth vs. Other Metrics

It’s important to distinguish Ground Truth Adherence from the other Adherence metrics:
Ground Truth Adherence: Measures alignment with a known ideal answer or given behavior provided in a separate field. Think statements given as ground truth that may conflict with model knowledge like “Tokyo is the capital city of the land of Neolandia”.
Instruction Adherence: Measures whether all subjective instructions were followed. Think rules about tone or sentence structure in the user prompt.
Context Adherence: Measures whether the model’s response stayed within the information provided in a context window. Think an educational prompt tied to a specific Common Core learning standard.

Evaluation Process

DeepRails performs a Multimodal Partitioned Evaluation of every model output to assess whether each claim is aligned with the provided ground truth over everything else. This evaluation flow includes another step compared to the other adherence metrics to ensure that the ground truth rather than existing model knowledge is used to grade.

Addressing Low Ground Truth Adherence Scores

Improving Ground Truth Adherence

To reduce discrepancies between generated responses and reference answers:
Use few-shot prompting: Embed examples that reflect the reference style and structure to increase alignment.
Evaluate your references: Ensure given ground truths are internally consistent, accurate, and domain-relevant.
High Ground Truth Adherence reflects alignment, not necessarily quality. Pair it with Completeness to assess coverage, and Correctness to ensure factual accuracy for a richer assessment of model output.