Context Adherence
Evaluate whether AI responses stay strictly within the bounds of the provided context using DeepRails Guardrail Metrics.
Context Adherence measures how strictly an AI response aligns with the information in the provided context, without adding unsupported or extraneous claims.
0
1
Low Adherence
Response includes unsupported or out-of-context claims
High Adherence
Response only includes information verifiable from the provided context
A high Context Adherence score indicates that all factual claims in the model’s response are explicitly grounded in the user-supplied context. A low score reflects the presence of hallucinations, contradictions, or unverifiable statements. The final result is a continuous score between 0 and 1, available as a float, boolean, or percent depending on your preferred format.
Evaluation Method
DeepRails uses a structured, claim-level evaluation process to assess how closely the AI response adheres to the context provided. This ensures only verifiable, grounded statements are credited.
Factual Claim Extraction
The AI response is parsed to extract individual factual claims. Subjective or speculative content is excluded, and compound statements are broken into atomic units for targeted evaluation.
Context-Based Verification
Each claim is checked directly against the provided context. A binary judgment is made:
- Y if the claim is explicitly and fully supported by the context
- N if the claim is unsupported, contradicted, or unverifiable within the context
Step-by-step justifications are included for all decisions.
Confidence Assignment
A confidence score—Low, Medium, High, or Certain—is paired with each Y/N verdict, reflecting the strength and clarity of the contextual match.
Score Consolidation
All individual judgments are combined into a single Context Adherence score between 0 and 1. This reflects the proportion of claims that are supported by context, weighted by evaluator confidence.
The result is a clear, interpretable signal of how “anchored” a model response is to its provided reference material which is essential for grounded generation and document-based QA.
Understanding Context Adherence
Context Adherence vs. Other Metrics
Context Adherence is critical in tasks that require strict fidelity to provided documents or sources.
Context Adherence: Measures whether the response strictly reflects the information available in the provided context (e.g., retrieved documents, input references).
Correctness: Measures whether the response is factually accurate based on external truth—regardless of the provided context.
Instruction Adherence: Measures whether the model followed how it was supposed to answer, based on explicit instructions (e.g., tone, format).
Addressing Low Context Adherence Scores
Improving Context Adherence
To reduce out-of-context generations and improve grounding quality:
Use clear and complete context: Ensure relevant facts are included in the context window and are easy to reference.
Instruct against extrapolation: Direct the model not to speculate or infer beyond what’s given.
Evaluate claim-by-claim: Identify specific types of hallucinated content so you can retrain or re-prompt for stronger fidelity.
Audit across prompt types: Compare how different tasks (e.g., summarization, QA, generation) influence context drift.
Best Practices
Best Practices
Design Context-Constrained Prompts
Instruct the model explicitly to rely only on the provided context and avoid any external knowledge or assumptions.
Optimize Retrieval Coverage
Ensure that your retrieval system surfaces all necessary supporting information. Missing facts in context often lead to off-topic or ungrounded claims.
Detect and Audit Context Drift
Analyze low-adherence cases to identify patterns in extrapolated claims or off-source phrasing. Feed those into prompt or retrieval tuning.
Block Unsupported Claims
Use DeepRails’ Context Adherence guardrail to stop completions that include statements not substantiated by the provided context.
Context Adherence is critical for grounded generation tasks like RAG, search, summarization, and citation.