2 unstable releases
| new 0.2.0 | Feb 14, 2026 |
|---|---|
| 0.1.0 | Feb 9, 2026 |
#1493 in Command line utilities
430KB
10K
SLoC
Nexus
One API endpoint. Any backend. Zero configuration.
Nexus is a distributed LLM model serving orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway.
Features
- π Auto-Discovery: Automatically finds LLM backends on your network via mDNS
- π― Intelligent Routing: Routes requests based on model capabilities and load
- π Transparent Failover: Automatically retries with fallback backends
- π OpenAI-Compatible: Works with any OpenAI API client
- β‘ Zero Config: Just run it - works out of the box with Ollama
- π Structured Logging: Queryable JSON logs for every request with correlation IDs (quickstart)
Supported Backends
| Backend | Status | Notes |
|---|---|---|
| Ollama | β Supported | Auto-discovery via mDNS |
| LM Studio | β Supported | OpenAI-compatible API |
| vLLM | β Supported | Static configuration |
| llama.cpp server | β Supported | Static configuration |
| exo | β Supported | Auto-discovery via mDNS |
| OpenAI | β Supported | Cloud fallback |
| LocalAI | π Planned |
Quick Start
From Source
# Install
cargo install --path .
# Generate a configuration file
nexus config init
# Run with auto-discovery
nexus serve
# Or with a custom config file
nexus serve --config nexus.toml
Docker
# Run with default settings
docker run -d -p 3000:3000 leocamello/nexus
# Run with custom config
docker run -d -p 3000:3000 \
-v $(pwd)/nexus.toml:/home/nexus/nexus.toml \
leocamello/nexus serve --config nexus.toml
# Run with host network (for mDNS discovery)
docker run -d --network host leocamello/nexus
From GitHub Releases
Download pre-built binaries from Releases.
CLI Commands
# Start the server
nexus serve [--config FILE] [--port PORT] [--host HOST]
# List backends
nexus backends list [--json] [--status healthy|unhealthy|unknown]
# Add a backend manually (auto-detects type)
nexus backends add http://localhost:11434 [--name NAME] [--type ollama|vllm|llamacpp]
# Remove a backend
nexus backends remove <ID>
# List available models
nexus models [--json] [--backend ID]
# Show system health
nexus health [--json]
# Generate config file
nexus config init [--output FILE] [--force] [--minimal]
# Generate shell completions
nexus completions bash > ~/.bash_completion.d/nexus
nexus completions zsh > ~/.zsh/completions/_nexus
nexus completions fish > ~/.config/fish/completions/nexus.fish
Environment Variables
| Variable | Description | Default |
|---|---|---|
NEXUS_CONFIG |
Config file path | nexus.toml |
NEXUS_PORT |
Listen port | 8000 |
NEXUS_HOST |
Listen address | 0.0.0.0 |
NEXUS_LOG_LEVEL |
Log level (trace/debug/info/warn/error) | info |
NEXUS_LOG_FORMAT |
Log format (pretty/json) | pretty |
NEXUS_DISCOVERY |
Enable mDNS discovery | true |
NEXUS_HEALTH_CHECK |
Enable health checking | true |
Precedence: CLI args > Environment variables > Config file > Defaults
API Usage
Once running, Nexus exposes an OpenAI-compatible API:
# Health check
curl http://localhost:8000/health
# List available models
curl http://localhost:8000/v1/models
# Chat completion (non-streaming)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3:70b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Chat completion (streaming)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3:70b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Web Dashboard
Nexus includes a web dashboard for real-time monitoring and observability. Access it at http://localhost:8000/ in your browser.
Features:
- π Real-time backend health monitoring with status indicators
- πΊοΈ Model availability matrix showing which models are available on which backends
- π Request history with last 100 requests, durations, and error details
- π WebSocket-based live updates (with HTTP polling fallback)
- π± Fully responsive - works on desktop, tablet, and mobile
- π Dark mode support (system preference)
- π Works without JavaScript (graceful degradation with auto-refresh)
The dashboard provides a visual overview of your Nexus cluster, making it easy to monitor backend health, track model availability, and debug request issues in real-time.
With Claude Code / Continue.dev
Point your AI coding assistant to http://localhost:8000 as the API endpoint.
With OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="llama3:70b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Observability
Nexus exposes metrics for monitoring and debugging:
# Prometheus metrics (for Grafana, Prometheus, etc.)
curl http://localhost:8000/metrics
# JSON stats (for dashboards and debugging)
curl http://localhost:8000/v1/stats | jq
Prometheus metrics include request counters, duration histograms, error rates, backend latency, token usage, and fleet state gauges. Configure your Prometheus scraper to target http://<nexus-host>:8000/metrics.
JSON stats provide an at-a-glance view with uptime, per-backend request counts, latency, and pending request depth.
Configuration
# nexus.toml
[server]
host = "0.0.0.0"
port = 8000
[discovery]
enabled = true
[[backends]]
name = "local-ollama"
url = "http://localhost:11434"
type = "ollama"
priority = 1
[[backends]]
name = "gpu-server"
url = "http://192.168.1.100:8000"
type = "vllm"
priority = 2
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
β Nexus Orchestrator β
β - Discovers backends via mDNS β
β - Tracks model capabilities β
β - Routes to best available backend β
β - OpenAI-compatible API β
βββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββ
β Ollama β β vLLM β β exo β
β 7B β β 70B β β 32B β
ββββββββββ ββββββββββ ββββββββββ
Development
# Build
cargo build
# Run tests
cargo test
# Run with logging
RUST_LOG=debug cargo run -- serve
# Check formatting
cargo fmt --check
# Lint
cargo clippy
License
Apache License 2.0 - see LICENSE for details.
Related Projects
Dependencies
~29β48MB
~616K SLoC