Hallucination Classification

Hallucinations in DeepRails are defined by user set thresholds in Defend. Each guardrail metric returns a score. Any score below the workflow’s hallucination threshold for that metric is treated as a hallucination, meaning the output failed and needs remediation.

How Defend Identifies Hallucinations

Users can customize each workflow by selecting guardrail metrics for its evaluations and either setting custom thresholds per metric or enabling DeepRails’ automatic detection at either high, medium, or low tolerance.
Defend compares each metric’s score to its configured hallucination threshold. If all metrics meet or exceed their thresholds, the output passes. If any metric falls below its threshold, the output is labeled a hallucination.
Hallucinations detected in evaluations drive remediation for the failed metrics. Based on the workflow’s improvement action (FixIt, ReGen, or Do Nothing) and retry limit, Defend attempts to improve the output and re-evaluates until all metrics pass or the retry budget is exhausted.
DeepRails uses multimodal partitioned evaluation to more intelligently analyze outputs, using two evaluation models in parallel to greatly reduce the chance of hallucinations being missed (read more about multimodal partitioned evaluation here)

What Happens to Hallucinations

When Defend detects a hallucination, it sends the output to remediation using the workflow’s selected improvement tool (learn more about improvement tools in the Defend Overview). The improved output is re-evaluated as soon as the tool finishes, as if it was a new event, completely independent of the original input/output pair. This evaluation-remediation cycle continues until all selected metrics pass or the workflow’s max retries are reached. DeepRails Defend uses Assumed Pass logic to reduce cost and latency: after a retry, only the metrics that failed in the previous output are re-evaluated. Scores and rationales for metrics that already passed are carried over to subsequent improvement attempts. This keeps remediation fast and efficient even when outputs have persistent hallucinations.

The Impact of Run Modes

All of DeepRails’ evaluations are affected by the selected run mode (all run modes and more details about them are listed here). Run modes like Precision Max and Precision Max Codex use advanced reasoning models, incurring more cost and latency, while Fast uses more balanced and cost effective models. The more expensive run modes are more consistent both at detecting hallucinations and providing higher quality improvements. Higher quality hallucination detection and correction are worth the cost and latency for many applications, so experiment with different run modes before finalizing workflow configurations.

Threshold Types

DeepRails offers two types of hallucination thresholds in Defend: automatic tolerances or custom thresholds. Automatic workflows are designed for easy use, while custom thresholds give experienced users more control.

Automatic Tolerances

Choose qualitative tolerance levels (low, medium, or high) per guardrail using automatic_hallucination_tolerance_levels.
DeepRails translates each tolerance into numeric thresholds and adapts them based on workflow performance as more events are recorded.
Best for fast setup or evolving prompts.

Tolerance	Behavior	When to Use
`low`	Lenient thresholds that allow for more creative or exploratory outputs while still filtering obvious failures.	Early prototyping, creative writing, or low-risk internal tools.
`medium`	Balanced thresholds tuned for general production use; catches most issues without over-blocking.	Default choice for new workflows and broad user-facing assistants.
`high`	Strict thresholds that only accept high-quality, confident outputs.	Regulated domains, compliance-heavy workflows, or scenarios with near-zero tolerance for hallucinations.

Custom Thresholds

Provide numeric cutoffs (0.0–1.0) for each metric in console or via API.
Thresholds remain fixed until you change them, giving maximum control for regulated or benchmarked use cases.
Best when you are familiar with DeepRails and your use case, and you know the minimum acceptable scores for your chosen metrics.

What a Hallucination Looks Like

You can view all evaluations for your Monitors and Defend workflows on their respective data tabs, Monitor Data and Defend Data.

When testing DeepRails Defend in the Playground, you can view evaluation performance by metric near the bottom of the results page for any run.

Get Started

Monitor API

Defend API

Evaluation Engine

Guardrail Metrics

Hallucination Classification

How Defend Identifies Hallucinations

What Happens to Hallucinations

The Impact of Run Modes

Threshold Types

Automatic Tolerances

Custom Thresholds

What a Hallucination Looks Like

Get Started

Monitor API

Defend API

Evaluation Engine

Guardrail Metrics

​How Defend Identifies Hallucinations

​What Happens to Hallucinations

​The Impact of Run Modes

​Threshold Types

​Automatic Tolerances

​Custom Thresholds

​What a Hallucination Looks Like

How Defend Identifies Hallucinations

What Happens to Hallucinations

The Impact of Run Modes

Threshold Types

Automatic Tolerances

Custom Thresholds

What a Hallucination Looks Like