Skip to content

flutter/evals

evals

Evaluation framework for testing AI agents' ability to write Dart and Flutter code. Built on Inspect AI.

Tip

Full documentation at evals-docs.web.app/

Overview

evals provides:

  • Evaluation Runner — Python package for running LLM evaluations with configurable tasks, variants, and models
  • Evaluation Configuration — Dart and Python packages that resolve dataset YAML into EvalSet JSON for the runner
  • devals CLI — Dart CLI for creating and managing dataset samples, tasks, and jobs
  • Evaluation Explorer — Dart/Flutter app for browsing and analyzing results
  • Dataset — Curated samples for Dart/Flutter Q&A, code generation, and debugging tasks

Packages

Package Description Docs
dash_evals Python evaluation runner using Inspect AI dash_evals docs
dataset_config_dart Dart library for resolving dataset YAML into EvalSet JSON (includes shared data models) dataset_config_dart docs
dataset_config_python Python configuration models
devals_cli Dart CLI for managing evaluation tasks and jobs CLI docs
eval_explorer Dart/Flutter results viewer (Serverpod) eval_explorer docs

Note

The uploader and report_app packages are deprecated and will be replaced by eval_explorer.

Documentation

Doc Description
Quick Start Get started authoring your own evals
Contributing Guide Development setup and guidelines
CLI Reference Full devals CLI command reference
Configuration Reference YAML configuration file reference
Repository Structure Project layout
Glossary Terminology guide

Contributing

See CONTRIBUTING.md for details, or go directly to the Contributing Guide.

License

See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors