2 releases
| 0.1.1 | Mar 22, 2026 |
|---|---|
| 0.1.0 | Mar 22, 2026 |
#1295 in Machine learning
27 downloads per month
Used in 5 crates
(2 directly)
270KB
6K
SLoC
car-inference
Local and remote model inference for the Common Agent Runtime.
What it does
Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models
downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google)
via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using
a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.
Usage
use car_inference::{InferenceEngine, InferenceConfig, GenerateRequest, GenerateParams};
let engine = InferenceEngine::new(InferenceConfig::default());
let result = engine.generate(GenerateRequest {
prompt: "Explain quicksort".into(),
params: GenerateParams::default(),
..Default::default()
}).await?;
Crate features
metal-- Apple Silicon GPU accelerationcuda-- NVIDIA GPU accelerationast-- AST-aware code generation viacar-ast
Part of CAR -- see the main repo for full documentation.
Dependencies
~39–64MB
~1M SLoC