eval_id
including evaluation_status
(in_progress
, completed
, canceled
, queued
, or failed
), the guardrail_metrics
evaluated, model_input
, model_output
, run_mode
, completion progress
, final evaluation_result
with scores and rationales, evaluation_total_cost
, and timestamps.Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
The ID of the evaluation to retrieve.
Response
Evaluation retrieved successfully
A unique evaluation ID.
Status of the evaluation.
in_progress
, completed
, canceled
, queued
, failed
Run mode for the evaluation. The run mode allows the user to optimize for speed, accuracy, and cost by determining which models are used to evaluate the event.
precision_plus
, precision
, smart
, economy
A dictionary of inputs sent to the LLM to generate output. The dictionary must contain at least user_prompt
or system_prompt
field. For ground_truth_aherence guadrail metric, ground_truth
should be provided.
- Option 1
- Option 2
Output generated by the LLM to be evaluated.
An array of guardrail metrics that the model input and output pair will be evaluated on.
Model ID used to generate the output, like gpt-4o
or o3
.
An optional, user-defined tag for the evaluation.
Evaluation progress. Values range between 0 and 100; 100 corresponds to a completed evaluation_status
.
0 <= x <= 100
Evaluation result consisting of average scores and rationales for each of the evaluated guardrail metrics.
Total cost of the evaluation.
The time the evaluation was created in UTC.
Description of the error causing the evaluation to fail, if any.
The time the error causing the evaluation to fail was recorded.
The time the evaluation started in UTC.
The time the evaluation completed in UTC.
The most recent time the evaluation was modified in UTC.