2 releases

0.1.1	Mar 22, 2026
0.1.0	Mar 22, 2026

#1295 in Machine learning

27 downloads per month
Used in 5 crates (2 directly)

Apache-2.0

270KB
6K SLoC

car-inference

Local and remote model inference for the Common Agent Runtime.

What it does

Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google) via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.

Usage

use car_inference::{InferenceEngine, InferenceConfig, GenerateRequest, GenerateParams};

let engine = InferenceEngine::new(InferenceConfig::default());
let result = engine.generate(GenerateRequest {
    prompt: "Explain quicksort".into(),
    params: GenerateParams::default(),
    ..Default::default()
}).await?;

Crate features

metal -- Apple Silicon GPU acceleration
cuda -- NVIDIA GPU acceleration
ast -- AST-aware code generation via car-ast

Part of CAR -- see the main repo for full documentation.

Dependencies

~39–64MB
~1M SLoC