Context Adherence measures how closely an AI response aligns with the information in the context provided in or with the user prompt. This metric is only appropriate for prompts that include a significant context window.
01
Low Adherence
Response includes unsupported or out-of-context statementsHigh Adherence
Response only includes information verifiable from the provided contextUnderstanding Context Adherence
Context Adherence vs. Other Metrics
Context Adherence: Measures whether the response reflects the information available in the provided context (e.g., retrieved documents, input references).
Correctness: Measures whether the response is factually accurate based on external truth—regardless of the provided context.
Instruction Adherence: Measures whether the model followed how it was supposed to answer, based on explicit instructions (e.g., tone, format).
Evaluation Process
DeepRails performs a Multimodal Partitioned Evaluation of every model output to assess whether each claim is grounded in the context of the input to the model. Since adherence metrics involve more analysis of the model input than metrics like Correctness, the evaluation flow is a bit more complex.Context Adherence evaluations will terminate prematurely if insufficient context is detected, and the output will be given a 100% score as a default.
Addressing Low Context Adherence Scores
Improving Context Adherence
Use clear and complete context: You should ensure that all facts needed for an ideal response are included in the context window.
Instruct against extrapolation: You should direct the model to avoid speculation or deviation in all prompts that use a context window.
Audit across prompt types: Compare how different tasks (e.g., summarization, QA) influence context drift, and use different models, prompts, and/or context windows for each.
Context Adherence is critical for grounded generation tasks like RAG, search, summarization, and citation. However, it should not be used in context-light applications.
