Run Modes give you control over evaluation cost and accuracy by selecting different HyperChainpoll strategies—ranging from ultra-efficient to high-precision multi-model evaluation.

Role of Run Modes in Evaluations

Every DeepRails evaluation request, whether from Evaluate, Defend, or Monitor, is executed using a specific run mode which determines the cost, accuracy, and model ensemble used.

Each Run Mode represents a balance between accuracy, cost, and latency. Under the hood, Run Modes route requests to different combinations of foundation models (via HyperChainpoll), using distinct orchestration strategies.

Users can choose the best Run Mode depending on their operational priorities:

Precision Plus

Highest evaluation accuracy using reasoning-optimized foundation models. Best for mission-critical benchmarks and QA.

Precision

Balanced accuracy and performance. Ideal for production monitoring and intelligent tuning.

Smart (default)

Optimized for overall performance. Good enough for most everyday analysis and fast iteration.

Economy

Cost-efficient. Uses fast models and smart polling strategies to reduce overhead.

You never need to select individual models. DeepRails Evaluatio Engine selects the optimal foundation models from OpenAI, Anthropic, Google, Meta, and model providers others for based on the Guardrail Metric and task automatically. Run Modes influence the depth and breadth of that ensemble.

Choosing the Right Run Mode

NameDescriptionWhen to UseExample Use Case
Precision PlusOur most rigorous evaluation mode. Uses advanced reasoning models across providers, routed per guardrail metric. Offers the most accurate and interpretable results, at higher latency and cost.Use for critical evaluations like benchmarking, safety testing, or high-stakes applications.Running a final evaluation sweep on a new healthcare agent before deployment.
PrecisionA high-accuracy mode using a latency-aware selection of models. Balances depth of reasoning with production responsiveness.When you want strong accuracy and reliability for evaluations running in active experiments or CI pipelines.Monitoring prompt regressions across daily deployments of a legal research bot.
Smart (default)Balanced mode that selects cost-effective models with solid accuracy across tasks. Default for all DeepRails workflows.Best for iterative development, in-tool debugging, and general experimentation where cost matters.Testing different prompt variants for summarizing financial documents.
EconomyLowest-cost mode that uses compact models for fast, large-scale signal generation. Tradeoff: less precise than other modes.Useful for early-stage data triage, bulk screening, or exploration where granular fidelity isn’t yet needed.Screening 10,000 outputs from a code generation model to flag possible safety violations.