Choose your evaluation accuracy vs. cost tradeoff with Run Modes
Name | Description | When to Use | Example Use Case |
---|---|---|---|
Precision Plus | Our most rigorous evaluation mode. Uses advanced reasoning models across providers, routed per guardrail metric. Offers the most accurate and interpretable results, at higher latency and cost. | Use for critical evaluations like benchmarking, safety testing, or high-stakes applications. | Running a final evaluation sweep on a new healthcare agent before deployment. |
Precision | A high-accuracy mode using a latency-aware selection of models. Balances depth of reasoning with production responsiveness. | When you want strong accuracy and reliability for evaluations running in active experiments or CI pipelines. | Monitoring prompt regressions across daily deployments of a legal research bot. |
Smart (default) | Balanced mode that selects cost-effective models with solid accuracy across tasks. Default for all DeepRails workflows. | Best for iterative development, in-tool debugging, and general experimentation where cost matters. | Testing different prompt variants for summarizing financial documents. |
Economy | Lowest-cost mode that uses compact models for fast, large-scale signal generation. Tradeoff: less precise than other modes. | Useful for early-stage data triage, bulk screening, or exploration where granular fidelity isn’t yet needed. | Screening 10,000 outputs from a code generation model to flag possible safety violations. |