Skip to main content

Create an API Key

  1. In your organization’s DeepRails API Console, go to API Keys.
  2. Click Create key, name it, then copy the key.
  3. (Optional) Save it as the DEEPRAILS_API_KEY environment variable.
API Keys – placeholder

Create and manage API keys in the API Console.

Install the SDK

  • Python
  • TypeScript / Node
  • Ruby
  • Go
pip install deeprails

Create an Evaluation

Initialize a client with your API key and create an evaluation by sending model_input (with atleast system_prompt or user_prompt), model_output, and guardrail_metrics.
Tip: You can check all of your evaluations via the DeepRails API Console.
  • Python
  • TypeScript / Node
  • Ruby
  • Go
from deeprails import DeepRails

# Initialize (env var DEEPRAILS_API_KEY is recommended)
client = DeepRails(token="YOUR_API_KEY")

# Create an evaluation
evaluation = client.create_evaluation(
    model_input={"user_prompt": "Write a friendly welcome email to a new customer."},
    model_output="Hi! Welcome aboard. We're thrilled to have you...",
    guardrail_metrics=["correctness", "completeness", "instruction_adherence"]
)

print(f"Evaluation created with ID: {evaluation.eval_id}")

Required Parameters

FieldTypeDescription
model_inputobjectMust include atleast system_prompt or user_prompt.
model_outputstringThe LLM output to be evaluated.
guardrail_metricsstring[]Metrics to score (e.g., correctness, completeness, instruction_adherence, context_adherence, ground_truth_adherence, comprehensive_safety).

Optional Parameters

FieldTypeDescription
model_usedstringThe model that produced model_output (e.g., gpt-4o-mini).
run_modestringEvaluation run mode (defaults to “smart”).
nametagstringCustom identifier for this run (e.g., a ticket or test name).

Retrieve an Evaluation

Evaluations are processed asynchronously. You’ll need to poll the evaluation using its eval_id to check the evaluation_status. Once complete, the evaluation_result field will be populated with scores and rationales.
  • Python
  • TypeScript / Node
  • Ruby
  • Go
import time
from deeprails import DeepRails

client = DeepRails(token="YOUR_API_KEY")

# Poll until evaluation is complete
eval_id = evaluation.eval_id  # From the creation step
max_attempts = 30
attempt = 0

while attempt < max_attempts:
    try:
        evaluation = client.get_evaluation(eval_id)
        
        print(f"Status: {evaluation.evaluation_status}")
        
        if evaluation.evaluation_result:
            print("\nResults:")
            for metric, result in evaluation.evaluation_result.items():
                score = result.get('score', 'N/A')
                print(f"  {metric}: {score}")
            break
        
        print(f"Evaluation in progress... (attempt {attempt + 1}/{max_attempts})")
        time.sleep(2)
        attempt += 1
    except Exception as e:
        print(f"Error: {e}")
        break
else:
    print("Evaluation timed out")

Check Evaluation History via the API Console

  1. Open your DeepRails API Console → Evaluate.
  2. Browse historical runs and filter by model or search by nametag.
  3. Compare prompts, completions, and evaluations side-by-side.
Evaluate history – placeholder

You can browse and filter your evaluation history in the API Console.

Next Steps

I