Context Adherence measures how strictly an AI response aligns with the information in the provided context, without adding unsupported or extraneous claims.

0

1

Low Adherence

Response includes unsupported or out-of-context claims

High Adherence

Response only includes information verifiable from the provided context

A high Context Adherence score indicates that all factual claims in the model’s response are explicitly grounded in the user-supplied context. A low score reflects the presence of hallucinations, contradictions, or unverifiable statements. The final result is a continuous score between 0 and 1, available as a float, boolean, or percent depending on your preferred format.


Evaluation Method

DeepRails uses a structured, claim-level evaluation process to assess how closely the AI response adheres to the context provided. This ensures only verifiable, grounded statements are credited.

1

Factual Claim Extraction

The AI response is parsed to extract individual factual claims. Subjective or speculative content is excluded, and compound statements are broken into atomic units for targeted evaluation.

2

Context-Based Verification

Each claim is checked directly against the provided context. A binary judgment is made:

  • Y if the claim is explicitly and fully supported by the context
  • N if the claim is unsupported, contradicted, or unverifiable within the context
    Step-by-step justifications are included for all decisions.
3

Confidence Assignment

A confidence score—Low, Medium, High, or Certain—is paired with each Y/N verdict, reflecting the strength and clarity of the contextual match.

4

Score Consolidation

All individual judgments are combined into a single Context Adherence score between 0 and 1. This reflects the proportion of claims that are supported by context, weighted by evaluator confidence.

The result is a clear, interpretable signal of how “anchored” a model response is to its provided reference material which is essential for grounded generation and document-based QA.


Understanding Context Adherence

Context Adherence vs. Other Metrics

Context Adherence is critical in tasks that require strict fidelity to provided documents or sources.

Context Adherence: Measures whether the response strictly reflects the information available in the provided context (e.g., retrieved documents, input references).

Correctness: Measures whether the response is factually accurate based on external truth—regardless of the provided context.

Instruction Adherence: Measures whether the model followed how it was supposed to answer, based on explicit instructions (e.g., tone, format).


Addressing Low Context Adherence Scores

Improving Context Adherence

To reduce out-of-context generations and improve grounding quality:

Use clear and complete context: Ensure relevant facts are included in the context window and are easy to reference.

Instruct against extrapolation: Direct the model not to speculate or infer beyond what’s given.

Evaluate claim-by-claim: Identify specific types of hallucinated content so you can retrain or re-prompt for stronger fidelity.

Audit across prompt types: Compare how different tasks (e.g., summarization, QA, generation) influence context drift.


Best Practices

Best Practices

Design Context-Constrained Prompts

Instruct the model explicitly to rely only on the provided context and avoid any external knowledge or assumptions.

Optimize Retrieval Coverage

Ensure that your retrieval system surfaces all necessary supporting information. Missing facts in context often lead to off-topic or ungrounded claims.

Detect and Audit Context Drift

Analyze low-adherence cases to identify patterns in extrapolated claims or off-source phrasing. Feed those into prompt or retrieval tuning.

Block Unsupported Claims

Use DeepRails’ Context Adherence guardrail to stop completions that include statements not substantiated by the provided context.

Context Adherence is critical for grounded generation tasks like RAG, search, summarization, and citation.