Skip to main content

Why Monitor Exists

Blind spots in production GenAI cost teams time, money, and trust. Monitor closes that gap. It evaluates live traffic with the same research-backed Guardrail Metrics used across DeepRails, correlates quality with operational signals (latency, tokens, cost, volume), and highlights trends and regressions so you can fix issues fast and improve with confidence.

Key Definitions

  • Guardrail Metrics: DeepRails’ General-Purpose Guardrail Metrics (GPMs) for correctness, completeness, adherence (instruction, context, ground truth), and comprehensive safety. Custom Guardrail Metrics are supported on SME & Enterprise plans.
  • Monitor: A read-only evaluation pipeline for a specific LLM use case or surface. A monitor receives your input/output pairs and model metadata, scores them with selected guardrails, and exposes real-time metrics, trends, and drill-downs. It does not remediate outputs (use Defend for correction).
  • Nametag (Segmentation): An optional label you attach to events (e.g., “staging”, “release-2025-09”, “feature-x”) to slice charts, compare cohorts, and run A/B or pre/post analyses.

How Monitor Works

Monitor operates through a simple lifecycle:
1

Define a Monitor

Name the use case you want to observe and select the guardrail metrics to apply. (Monitors are read-only: they measure and surface insight without altering outputs.)
2

Stream Events

Send each model completion (input/output pair, model used, and optional nametag). Monitor ingests this traffic continuously from staging or production.
3

Evaluate at Ingest

Monitor scores every output against your selected guardrails, associates operational signals (latency, tokens, cost), and indexes the event for search and analysis.
4

Analyze Trends

Dashboards update in real time—track request volume, failure rate, latency, tokens, and cost; examine guardrail distributions; compare cohorts via filters and time windows.
5

Investigate & Act

Drill into any event to see per-metric scores and rationales. Use findings to fix prompts, tune models, adjust thresholds, and more.

Console Walkthrough

The Monitor Console brings observability to life across three tabs: Monitor Metrics, Monitor Data, and Manage Monitors.

Monitor Metrics

The Monitor Metrics tab shows real-time operational and quality performance for a selected monitor: request volume, failure rate, latency, tokens, and cost—plus guardrail score distributions that reveal drift and regressions at a glance.
Monitor Metrics dashboard with cost, volume, failure rate, tokens, latency and guardrail histograms

Operational and economic signals (top) with guardrail score distributions (bottom) help you spot regressions, latency spikes, and quality drift in real time.

Monitor Data

The Monitor Data tab lists every evaluated event for deep inspection. Filter by monitor, metrics, status, model, date range, or nametag; search by run ID or prompt; and open any row to view full details.
Monitor Data table of evaluated runs with filters and guardrail score columns

Event table with flexible filters for monitor, metrics, status, model, and time window. Use nametags to compare releases and cohorts.

Monitor Data event detail panel with evaluation metrics and rationales

Event details show per-metric scores and rationales, status, processing time, and the original input/output for precise debugging.

Manage Monitors

The Manage Monitors tab is where you create and maintain monitors across environments and surfaces. See when each monitor last received traffic, how many outputs it has evaluated, and more.
Manage Monitors screen with create form and active monitors list

Create a new monitor with a name and description, and manage existing monitors with last-used status and total outputs evaluated.

I