25 stable releases (3 major)
| new 3.1.0 | Mar 15, 2026 |
|---|---|
| 3.0.0 | Mar 6, 2026 |
| 2.18.0 | Feb 11, 2026 |
| 2.12.0 | Jan 29, 2026 |
| 0.9.0 | Dec 28, 2025 |
#69 in Profiling
2.5MB
59K
SLoC
Assay
Policy-as-Code for AI agents.
Deterministic MCP governance, CI gates, and verifiable evidence bundles. Runs offline-first with no required hosted backend.
Assay validates tool-call behavior against explicit policy, records auditable decisions, and produces replayable evidence. It is built for teams that want hard gates and reviewable artifacts.
Why Assay
- Deterministic gates for MCP-compatible agents in local runs and CI
- Auditable evidence with export, verify, lint, diff, and replay flows
- Runtime control on the tool-call path via
assay mcp wrap - Offline-first workflow with portable outputs
- DX-first CLI with SARIF, JUnit, PR-comment, and markdown outputs
Security Model (Bounded Claims)
Assay’s strongest wedge is deterministic governance on the tool-call route.
In the MCP fragmented-IPI experiment line, stateful sequence policy remained effective across payload fragmentation, tool-hopping, sink-failure pressure, and delayed cross-session sink attempts, where wrap-only lexical checks failed.
Assay does not claim to solve semantic hijacking in general, and it does not claim to block raw outbound network bytes by itself. The bounded claim is narrower: Assay governs sink-call routes with explicit policy decisions, audit-grade evidence, and low single-digit millisecond overhead in the published experiment line.
Results and rerun docs:
- Fragmented IPI results
- Wrap-bypass results
- Second-sink results
- Cross-session decay results
- Sink-failure results
Open Core Boundary
Open core covers the engine, CLI, runtime governance, evidence flows, and baseline packs.
Compliance packs and organization-specific governance packs can be commercial. See ADR-016.
Quickstart
Install
cargo install assay-cli
From scratch
# Scaffold config + policy + CI
assay init --ci
# Run an offline smoke gate
assay ci --config eval.yaml --trace-file traces/hello.jsonl
From an existing trace
# Generate policy from recorded behavior
assay init --from-trace trace.jsonl
# Validate trace against config + policy
assay validate --config eval.yaml --trace-file trace.jsonl
From an MCP Inspector session
# Import Inspector session to Assay trace format
assay import --format inspector session.json --out-trace traces/session.jsonl
# Run policy checks
assay run --config eval.yaml --trace-file traces/session.jsonl
Demo
make demo # full break/fix walkthrough
make test # safe trace (PASS)
make fail # unsafe trace (FAIL)
Core Commands
Testing and validation
| Command | What it does |
|---|---|
assay run |
Execute a test suite against a trace and write run outputs. |
assay ci |
CI-mode run with SARIF, JUnit, and PR-comment outputs. |
assay validate |
Stateless policy validation with text, JSON, or SARIF output. |
assay replay |
Replay from a self-contained offline bundle. |
Policy and config
| Command | What it does |
|---|---|
assay init |
Scaffold policy, config, and CI workflow. |
assay generate |
Generate policy from traces or profiles. |
assay profile |
Multi-run stability profiling. |
assay doctor |
Diagnose config, trace, baseline, and runtime issues. |
assay explain |
Explain policy behavior against a trace. |
Evidence and compliance
| Command | What it does |
|---|---|
assay evidence export |
Create an evidence bundle. |
assay evidence verify |
Verify bundle integrity. |
assay evidence lint |
Lint evidence with optional packs and SARIF output. |
assay evidence diff |
Diff two verified bundles. |
assay evidence push/pull/list |
BYOS object storage flows. |
Runtime
| Command | What it does |
|---|---|
assay mcp wrap |
Wrap an MCP process with policy enforcement. |
assay sandbox |
Rootless Landlock sandbox execution on Linux. |
assay monitor |
eBPF/LSM runtime enforcement on Linux. |
Misc
| Command | What it does |
|---|---|
assay import |
Import traces from Inspector or JSON-RPC logs. |
assay tool sign/verify/keygen |
Local-key tool signing and verification. |
assay fix |
Interactive policy fix suggestions. |
CI Integration
GitHub Actions
name: Assay Gate
on: [push, pull_request]
permissions:
contents: read
pull-requests: write
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@<PINNED_SHA>
- uses: Rul1an/assay-action@v2
Assay Action installs Assay, runs the gate, uploads SARIF, and can publish PR-friendly outputs.
You can also generate a starter workflow:
assay init --ci github
assay init --ci gitlab
Or run manually:
assay ci \
--config eval.yaml \
--trace-file traces/golden.jsonl \
--sarif reports/sarif.json \
--junit reports/junit.xml \
--pr-comment reports/pr-comment.md \
--replay-strict
Exit codes:
0pass1test failure2config or measurement error3infra error
Configuration
Assay usually works with two files:
eval.yamlfor the test suitepolicy.yamlfor the allowed behavior
eval.yaml:
version: 1
suite: "my_agent"
model: "trace"
tests:
- id: "deploy_args"
input:
prompt: "deploy_staging"
expected:
type: args_valid
schema:
deploy_service:
type: object
required: [env]
properties:
env:
type: string
enum: [staging, prod]
policy.yaml:
version: "1.0"
name: "my-policy"
allow: ["*"]
deny:
- "exec"
- "shell"
- "bash"
constraints:
- tool: "read_file"
params:
path:
matches: "^/app/.*|^/data/.*"
Starter presets:
assay init --preset default
assay init --preset hardened
assay init --preset dev
Evidence Bundles
Assay produces tamper-evident .tar.gz bundles with manifests, hashes, and event streams.
assay evidence export --profile profile.yaml --out bundle.tar.gz
assay evidence verify bundle.tar.gz
assay evidence lint --pack eu-ai-act-baseline bundle.tar.gz
assay evidence diff baseline.tar.gz current.tar.gz
Python Package
The Python package is published as assay-it:
pip install assay-it
Standards and Related Projects
Assay is easier to evaluate when mapped to established specs and ecosystems:
- Model Context Protocol (MCP)
- OpenTelemetry specification
- CloudEvents specification
- SARIF specification
- JSON Schema specification
These are interoperability references, not claims of full feature parity with each project.
Documentation
- Getting started:
docs/getting-started/quickstart.md - CI guide:
docs/guides/github-action.md - MCP quickstart:
docs/mcp/quickstart.md - Use cases:
docs/use-cases/index.md - Experiment runbooks/results:
docs/ops/ - Architecture index:
docs/architecture/index.md - ADR index:
docs/architecture/adrs.md - Roadmap:
docs/ROADMAP.md - Contributing docs:
docs/contributing/index.md
Contributing
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
See CONTRIBUTING.md.
License
Dependencies
~63–105MB
~1.5M SLoC