/* Adjust space between sidebar sections */
.sidebar > .section {
    margin-bottom: 0px; /* Adjust spacing as needed */
  }
  
  /* Adjust space between pages within a section */
  .sidebar .page {
    margin-bottom: 0px; /* Adjust spacing as needed */
  }

  /* Adjust padding inside each page link */
.sidebar .page a {
    padding: 0px 0px; /* Adjust top/bottom and left/right padding */
  }
  
  /* Section highlight animation when targeted through anchor links */
  :target,
  h2[id]:target,
  h3[id]:target,
  h4[id]:target,
  h5[id]:target,
  div[id]:target {
    position: relative;
    background: linear-gradient(to left, #e9b405, #2b2a2a00);
    animation: highlight-section 3s ease-in-out;
    scroll-margin-top: 80px;
    padding: 0.2rem;
    border-radius: 8px;
    border-left: 4px solid #f2ca46;
  }
  
  @keyframes highlight-section {
    0% {
      background: linear-gradient(to left, #e9b405, #2b2a2a00);
    }
    50% {
      background: linear-gradient(to left, #e9b405, #2b2a2a00);
    }
    100% {
      background: linear-gradient(to left, #e9b405, #2b2a2a00);
    }
  }

Role of Run Modes in Evaluations

Choosing the Right Run Mode

Choose your evaluation accuracy vs. cost tradeoff with Run Modes

Run Modes

DeepRails

DeepRails Overview

Learn why LLM-as-a-Judge is the new gold standard for evaluating AI responses

LLM Evaluations

Experiment and iterate with fast, automated AI evaluations across key quality dimensions

Evaluate Overview

Start using DeepRails Evaluate in a few minutes with this quickstart guide, and integrate it seamlessly into your existing workflow.

Quickstart Guide

Monitor and analyze AI application performance in production with real-time insights to maintain quality outputs

Monitor Overview

Start using DeepRails Monitor to observe how users interact with your generative AI application in production, and surface quality issues before they escalate.

Safeguard your AI applications in production using DeepRails Defend—powered by real-time guardrails and intelligent quality filters.

Defend Overview

The next evolution in LLM evaluation - multi-model collective intelligence for superior accuracy

HyperChainpoll

How DeepRails aggregates granular evaluations into actionable scores

Scoring Methodology

Explore DeepRails Guardrail metrics designed to evaluate LLM behavior.

Metrics Comparison

Evaluate factual accuracy in AI outputs using DeepRails Guardrail Metrics to detect and prevent hallucinations in your AI systems.

Correctness

Evaluate whether AI responses thoroughly and accurately address all aspects of a user's query using DeepRails Guardrail Metrics.

Completeness

Evaluate the safety of AI-generated content using DeepRails Guardrail Metrics to identify and mitigate harmful or high-risk responses.

Name	Description	When to Use	Example Use Case
Precision Plus	Our most rigorous evaluation mode. Uses advanced reasoning models across providers, routed per guardrail metric. Offers the most accurate and interpretable results, at higher latency and cost.	Use for critical evaluations like benchmarking, safety testing, or high-stakes applications.	Running a final evaluation sweep on a new healthcare agent before deployment.
Precision	A high-accuracy mode using a latency-aware selection of models. Balances depth of reasoning with production responsiveness.	When you want strong accuracy and reliability for evaluations running in active experiments or CI pipelines.	Monitoring prompt regressions across daily deployments of a legal research bot.
Smart (default)	Balanced mode that selects cost-effective models with solid accuracy across tasks. Default for all DeepRails workflows.	Best for iterative development, in-tool debugging, and general experimentation where cost matters.	Testing different prompt variants for summarizing financial documents.
Economy	Lowest-cost mode that uses compact models for fast, large-scale signal generation. Tradeoff: less precise than other modes.	Useful for early-stage data triage, bulk screening, or exploration where granular fidelity isn’t yet needed.	Screening 10,000 outputs from a code generation model to flag possible safety violations.

Get Started

Evaluate

Monitor

Defend

Evaluation Engine

Guardrail Metrics

Run Modes

Role of Run Modes in Evaluations

Precision Plus

Precision

Smart (default)

Economy

Choosing the Right Run Mode

Get Started

Evaluate

Monitor

Defend

Evaluation Engine

Guardrail Metrics

​Role of Run Modes in Evaluations

Precision Plus

Precision

Smart (default)

Economy

​Choosing the Right Run Mode

Role of Run Modes in Evaluations

Choosing the Right Run Mode