Categories

Topics

Evals

Evals Best Practices

Evals Best Practices

Best practices for designing and running evals.

Getting Started with Evals

Getting Started with Evals

Step-by-step guide to setting up your first eval.

Graders

Graders

Guide to using graders for evaluations.

Launch apps with evaluations

Launch apps with evaluations

Video on incorporating evals when deploying AI products.

Model optimization guide

Model optimization guide

Guide on optimizing OpenAI models for performance and cost.

Prompt Optimizer

Prompt Optimizer

Guide to refining prompts with the Prompt Optimizer.

Working with the Evals API

Working with the Evals API

Guide to building evaluations with the Evals API.

Eval Driven System Design - From Prototype to Production

Eval Driven System Design - From Prototype to Production

Cookbook for eval-driven design of a receipt parsing automation workflow.

Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API

Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API

Cookbook for reinforcement fine-tuning conversational reasoning using HealthBench evaluations.

Evals API Use-case - Responses Evaluation

Evals API Use-case - Responses Evaluation

Cookbook to evaluate new models against stored Responses API logs.