Retrieve evaluation by ID

Retrieve an evaluation

from deeprails import Deeprails

DEEPRAILS_API_KEY = "YOUR_API_KEY"

client = Deeprails(
    api_key=DEEPRAILS_API_KEY,
)

evaluation_response = client.evaluate.retrieve(
    eval_id="eval_ghi_789"
)
print(evaluation_response.evaluation_status)
print(evaluation_response.evaluation_result)

{
  "eval_id": "<string>",
  "evaluation_status": "in_progress",
  "guardrail_metrics": [
    "correctness"
  ],
  "model_used": "<string>",
  "run_mode": "precision_plus",
  "model_input": {
    "system_prompt": "<string>",
    "user_prompt": "<string>",
    "ground_truth": "<string>"
  },
  "model_output": "<string>",
  "nametag": "<string>",
  "progress": 50,
  "evaluation_result": {},
  "evaluation_total_cost": 123,
  "created_at": "2023-11-07T05:31:56Z",
  "error_message": "<string>",
  "error_timestamp": "2023-11-07T05:31:56Z",
  "start_timestamp": "2023-11-07T05:31:56Z",
  "end_timestamp": "2023-11-07T05:31:56Z",
  "modified_at": "2023-11-07T05:31:56Z"
}

GET

evaluate

{eval_id}

Retrieve an evaluation

from deeprails import Deeprails

DEEPRAILS_API_KEY = "YOUR_API_KEY"

client = Deeprails(
    api_key=DEEPRAILS_API_KEY,
)

evaluation_response = client.evaluate.retrieve(
    eval_id="eval_ghi_789"
)
print(evaluation_response.evaluation_status)
print(evaluation_response.evaluation_result)

{
  "eval_id": "<string>",
  "evaluation_status": "in_progress",
  "guardrail_metrics": [
    "correctness"
  ],
  "model_used": "<string>",
  "run_mode": "precision_plus",
  "model_input": {
    "system_prompt": "<string>",
    "user_prompt": "<string>",
    "ground_truth": "<string>"
  },
  "model_output": "<string>",
  "nametag": "<string>",
  "progress": 50,
  "evaluation_result": {},
  "evaluation_total_cost": 123,
  "created_at": "2023-11-07T05:31:56Z",
  "error_message": "<string>",
  "error_timestamp": "2023-11-07T05:31:56Z",
  "start_timestamp": "2023-11-07T05:31:56Z",
  "end_timestamp": "2023-11-07T05:31:56Z",
  "modified_at": "2023-11-07T05:31:56Z"
}

Returns details for the evaluation specified by eval_id including evaluation_status (in_progress, completed, canceled, queued, or failed), the guardrail_metrics evaluated, model_input, model_output, run_mode, completion progress , final evaluation_result with scores and rationales, evaluation_total_cost, and timestamps.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

eval_id

string

required

The ID of the evaluation to retrieve.

Response

Evaluation retrieved successfully

eval_id

string

required

A unique evaluation ID.

evaluation_status

enum<string>

required

Status of the evaluation.

Available options:

in_progress,

completed,

canceled,

queued,

failed

run_mode

enum<string>

required

Run mode for the evaluation. The run mode allows the user to optimize for speed, accuracy, and cost by determining which models are used to evaluate the event.

Available options:

precision_plus,

precision,

smart,

economy

model_input

object

required

A dictionary of inputs sent to the LLM to generate output. The dictionary must contain at least user_prompt or system_prompt field. For ground_truth_aherence guadrail metric, ground_truth should be provided.

Option 1
Option 2

Show child attributes

model_output

string

required

Output generated by the LLM to be evaluated.

guardrail_metrics

enum<string>[]

An array of guardrail metrics that the model input and output pair will be evaluated on.

Show child attributes

model_used

string

Model ID used to generate the output, like gpt-4o or o3.

nametag

string

An optional, user-defined tag for the evaluation.

progress

integer

Evaluation progress. Values range between 0 and 100; 100 corresponds to a completed evaluation_status.

Required range: 0 <= x <= 100

evaluation_result

object

Evaluation result consisting of average scores and rationales for each of the evaluated guardrail metrics.

evaluation_total_cost

number

Total cost of the evaluation.

created_at

string<date-time>

The time the evaluation was created in UTC.

error_message

string

Description of the error causing the evaluation to fail, if any.

error_timestamp

string<date-time>

The time the error causing the evaluation to fail was recorded.

start_timestamp

string<date-time>

The time the evaluation started in UTC.

end_timestamp

string<date-time>

The time the evaluation completed in UTC.

modified_at

string<date-time>

The most recent time the evaluation was modified in UTC.

Create an evaluation

⌘I

API Documentation

Authorizations

Path Parameters

Response