Evals

Evals Best Practices
Best practices for designing and running evals.
guide

Getting Started with Evals
Step-by-step guide to setting up your first eval.
guide

Graders
Guide to using graders for evaluations.
guide

Launch apps with evaluations
Video on incorporating evals when deploying AI products.
video

Model optimization guide
Guide on optimizing OpenAI models for performance and cost.
guide

Prompt Optimizer
Guide to refining prompts with the Prompt Optimizer.
guide

Working with the Evals API
Guide to building evaluations with the Evals API.
guide

Eval Driven System Design - From Prototype to Production
Cookbook for eval-driven design of a receipt parsing automation workflow.
cookbook

Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
Cookbook for reinforcement fine-tuning conversational reasoning using HealthBench evaluations.
cookbook

Evals API Use-case - Responses Evaluation
Cookbook to evaluate new models against stored Responses API logs.
cookbook