Skip to main content
GET
/
evaluate
/
{eval_id}
Retrieve an evaluation
from deeprails import Deeprails

DEEPRAILS_API_KEY = "YOUR_API_KEY"

client = Deeprails(
    api_key=DEEPRAILS_API_KEY,
)

evaluation_response = client.evaluate.retrieve(
    eval_id="eval_ghi_789"
)
print(evaluation_response.evaluation_status)
print(evaluation_response.evaluation_result)
{
  "eval_id": "<string>",
  "evaluation_status": "in_progress",
  "guardrail_metrics": [
    "correctness"
  ],
  "model_used": "<string>",
  "run_mode": "precision_plus",
  "model_input": {
    "system_prompt": "<string>",
    "user_prompt": "<string>",
    "ground_truth": "<string>"
  },
  "model_output": "<string>",
  "nametag": "<string>",
  "progress": 50,
  "evaluation_result": {},
  "evaluation_total_cost": 123,
  "created_at": "2023-11-07T05:31:56Z",
  "error_message": "<string>",
  "error_timestamp": "2023-11-07T05:31:56Z",
  "start_timestamp": "2023-11-07T05:31:56Z",
  "end_timestamp": "2023-11-07T05:31:56Z",
  "modified_at": "2023-11-07T05:31:56Z"
}
Returns details for the evaluation specified by eval_id including evaluation_status (in_progress, completed, canceled, queued, or failed), the guardrail_metrics evaluated, model_input, model_output, run_mode, completion progress , final evaluation_result with scores and rationales, evaluation_total_cost, and timestamps.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

eval_id
string
required

The ID of the evaluation to retrieve.

Response

Evaluation retrieved successfully

eval_id
string
required

A unique evaluation ID.

evaluation_status
enum<string>
required

Status of the evaluation.

Available options:
in_progress,
completed,
canceled,
queued,
failed
run_mode
enum<string>
required

Run mode for the evaluation. The run mode allows the user to optimize for speed, accuracy, and cost by determining which models are used to evaluate the event.

Available options:
precision_plus,
precision,
smart,
economy
model_input
object
required

A dictionary of inputs sent to the LLM to generate output. The dictionary must contain at least user_prompt or system_prompt field. For ground_truth_aherence guadrail metric, ground_truth should be provided.

  • Option 1
  • Option 2
model_output
string
required

Output generated by the LLM to be evaluated.

guardrail_metrics
enum<string>[]

An array of guardrail metrics that the model input and output pair will be evaluated on.

model_used
string

Model ID used to generate the output, like gpt-4o or o3.

nametag
string

An optional, user-defined tag for the evaluation.

progress
integer

Evaluation progress. Values range between 0 and 100; 100 corresponds to a completed evaluation_status.

Required range: 0 <= x <= 100
evaluation_result
object

Evaluation result consisting of average scores and rationales for each of the evaluated guardrail metrics.

evaluation_total_cost
number

Total cost of the evaluation.

created_at
string<date-time>

The time the evaluation was created in UTC.

error_message
string

Description of the error causing the evaluation to fail, if any.

error_timestamp
string<date-time>

The time the error causing the evaluation to fail was recorded.

start_timestamp
string<date-time>

The time the evaluation started in UTC.

end_timestamp
string<date-time>

The time the evaluation completed in UTC.

modified_at
string<date-time>

The most recent time the evaluation was modified in UTC.

I