Run Modes control how DeepRails executes evaluations across Evaluate, Monitor, and Defend. Every run uses two different LLMs in parallel to reduce bias and improve accuracy. Run Modes determine which models are selected — from compact cost-efficient models to advanced reasoning models — so you can optimize for speed, accuracy, or cost depending on your workflow.
Why Run Modes Matter
Not every task requires the same evaluation depth. A simple summarization prompt can be tested cost-effectively with smaller models, while multi-step reasoning (math generation, chained steps, or multi-task prompts) benefits from reasoning-capable models. Run Modes let you tune this balance.- Always two models in parallel: Every evaluation uses two distinct LLMs to generate scores, avoiding single-model bias.
- Reasoning vs. non-reasoning models: For complex prompts, modes that include reasoning models yield better accuracy and interpretability.
- Available everywhere: Run Modes apply uniformly across Evaluate, Monitor, and Defend APIs, on all plans.
The Four Run Modes
Precision Plus
Uses two reasoning models in parallel for maximum depth. Best for complex, multi-step workflows and mission-critical use cases where accuracy outweighs cost or latency.
Precision
Combines one reasoning model + one non-reasoning model. Recommended for complex prompts that benefit from reasoning without incurring full Precision Plus costs.
Smart (default)
Balanced mode. Selects cost-effective, accurate models with solid coverage across most tasks. Default for all workflows.
Economy
Fastest and most cost-efficient mode. Uses compact models to generate signals at scale, with less precision than other modes.
Choosing whether to use reasoning models is often part of the prompt engineering process. If your task involves multi-step logic, mathematics, or complex instructions, Precision or Precision Plus are recommended.
Choosing the Right Run Mode
Name | Description | When to Use | Example Use Case |
---|---|---|---|
Precision Plus | Two reasoning models in parallel; most rigorous and interpretable results (highest cost and latency). | Mission-critical evaluations, final QA sweeps, regulated or safety-sensitive domains. | Compliance evaluation on a healthcare agent before production. |
Precision | One reasoning + one non-reasoning model; strong reasoning coverage with better cost/latency than Plus. | Prompts with logic/calculations or multi-step reasoning where turnaround speed still matters. | Monitoring daily regressions in a legal research bot. |
Smart (default) | Balanced model selection optimized for cost and accuracy across general tasks. | Most workflows: iterative development, debugging, day-to-day experimentation. | Testing variations of prompts for summarizing financial documents. |
Economy | Compact, efficient models for lowest cost and fastest evaluation; less precise than higher modes. | Large batch screening, early exploration, low-stakes triage. | Screening 10,000 code-gen outputs to flag potential safety risks. |