model_input
dictionary (containing atleast a system_prompt
or user_prompt
field), a model_output
string to be evaluated, the model_used
to generated the output (Ex. gpt-5-mini
), the run_mode
to select speed/accuracy/cost for evaluation, and a nametag
for the workflow event.The run mode determines which models power the evaluation:
-
precision_plus
- Maximum accuracy using the most advanced models-
precision
- High accuracy with optimized performance-
smart
- Balanced speed and accuracy (default)-
economy
- Fastest evaluation at lowest costThe event will be run with the guardrail metrics and improvement steps configured in its associated workflow.
When you create a workflow event, you’ll receive an event ID. Use this ID to track the event’s progress and retrieve all evaluations and improvement result.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
Workflow ID associated with this event.
Body
A dictionary of inputs sent to the LLM to generate output. The dictionary must contain atleast user_prompt
or system_prompt
field. For ground_truth_aherence guadrail metric, ground_truth
should be provided.
- Option 1
- Option 2
Output generated by the LLM to be evaluated.
Model ID used to generate the output, like gpt-4o
or o3
.
Run mode for the workflow event. The run mode allows the user to optimize for speed, accuracy, and cost by determining which models are used to evaluate the event. Available run modes include precision_plus
, precision
, smart
, and economy
. Defaults to smart
.
precision_plus
, precision
, smart
, economy
An optional, user-defined tag for the event.
Response
Workflow event created successfully
A unique workflow event ID.
Workflow ID associated with the event.
False
if evaluation passed all of the guardrail metrics, True
if evaluation failed any of the guardrail metrics.
A unique evaluation ID associated with this event. Every event has one or more evaluation attempts.
Count of improvement attempts for the event. If greater than one then all previous improvement attempts failed.