Nexus

One API endpoint. Any backend. Zero configuration.

Nexus is a distributed LLM model serving orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway.

Features

🔍 Auto-Discovery: Automatically finds LLM backends on your network via mDNS
🎯 Intelligent Routing: Routes requests based on model capabilities and load
🔄 Transparent Failover: Automatically retries with fallback backends
🔌 OpenAI-Compatible: Works with any OpenAI API client
⚡ Zero Config: Just run it - works out of the box with Ollama
📊 Structured Logging: Queryable JSON logs for every request with correlation IDs (quickstart)
🔒 Privacy Zones: Structural enforcement prevents sensitive data from reaching cloud backends
🏷️ Capability Tiers: Prevent silent quality downgrades with strict/flexible routing modes

Supported Backends

Backend	Status	Notes
Ollama	✅ Supported	Auto-discovery via mDNS
LM Studio	✅ Supported	OpenAI-compatible API
vLLM	✅ Supported	Static configuration
llama.cpp server	✅ Supported	Static configuration
exo	✅ Supported	Auto-discovery via mDNS
OpenAI	✅ Supported	Cloud fallback
LocalAI	🔜 Planned

Quick Start

From Source

# Install
cargo install --path .

# Generate a configuration file
nexus config init

# Run with auto-discovery
nexus serve

# Or with a custom config file
nexus serve --config nexus.toml

Docker

# Run with default settings
docker run -d -p 8000:8000 leocamello/nexus

# Run with custom config
docker run -d -p 8000:8000 \
  -v $(pwd)/nexus.toml:/home/nexus/nexus.toml \
  leocamello/nexus serve --config nexus.toml

# Run with host network (for mDNS discovery)
docker run -d --network host leocamello/nexus

From GitHub Releases

Download pre-built binaries from Releases.

CLI Commands

# Start the server
nexus serve [--config FILE] [--port PORT] [--host HOST]

# List backends
nexus backends list [--json] [--status healthy|unhealthy|unknown]

# Add a backend manually (auto-detects type)
nexus backends add http://localhost:11434 [--name NAME] [--type ollama|vllm|llamacpp]

# Remove a backend
nexus backends remove <ID>

# List available models
nexus models [--json] [--backend ID]

# Show system health
nexus health [--json]

# Generate config file
nexus config init [--output FILE] [--force] [--minimal]

# Generate shell completions
nexus completions bash > ~/.bash_completion.d/nexus
nexus completions zsh > ~/.zsh/completions/_nexus
nexus completions fish > ~/.config/fish/completions/nexus.fish

Environment Variables

Variable	Description	Default
`NEXUS_CONFIG`	Config file path	`nexus.toml`
`NEXUS_PORT`	Listen port	`8000`
`NEXUS_HOST`	Listen address	`0.0.0.0`
`NEXUS_LOG_LEVEL`	Log level (trace/debug/info/warn/error)	`info`
`NEXUS_LOG_FORMAT`	Log format (pretty/json)	`pretty`
`NEXUS_DISCOVERY`	Enable mDNS discovery	`true`
`NEXUS_HEALTH_CHECK`	Enable health checking	`true`

Precedence: CLI args > Environment variables > Config file > Defaults

API Usage

Once running, Nexus exposes an OpenAI-compatible API:

# Health check
curl http://localhost:8000/health

# List available models
curl http://localhost:8000/v1/models

# Chat completion (non-streaming)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:70b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Chat completion (streaming)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:70b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Web Dashboard

Nexus includes a web dashboard for real-time monitoring and observability. Access it at http://localhost:8000/ in your browser.

Features:

📊 Real-time backend health monitoring with status indicators
🗺️ Model availability matrix showing which models are available on which backends
📝 Request history with last 100 requests, durations, and error details
🔄 WebSocket-based live updates (with HTTP polling fallback)
📱 Fully responsive - works on desktop, tablet, and mobile
🌙 Dark mode support (system preference)
🚀 Works without JavaScript (graceful degradation with auto-refresh)

The dashboard provides a visual overview of your Nexus cluster, making it easy to monitor backend health, track model availability, and debug request issues in real-time.

With Claude Code / Continue.dev

Point your AI coding assistant to http://localhost:8000 as the API endpoint.

With OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="llama3:70b",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Observability

Nexus exposes metrics for monitoring and debugging:

# Prometheus metrics (for Grafana, Prometheus, etc.)
curl http://localhost:8000/metrics

# JSON stats (for dashboards and debugging)
curl http://localhost:8000/v1/stats | jq

Prometheus metrics include request counters, duration histograms, error rates, backend latency, token usage, and fleet state gauges. Configure your Prometheus scraper to target http://<nexus-host>:8000/metrics.

JSON stats provide an at-a-glance view with uptime, per-backend request counts, latency, and pending request depth.

Configuration

# nexus.toml

[server]
host = "0.0.0.0"
port = 8000

[discovery]
enabled = true

[[backends]]
name = "local-ollama"
url = "http://localhost:11434"
type = "ollama"
priority = 1

[[backends]]
name = "gpu-server"
url = "http://192.168.1.100:8000"
type = "vllm"
priority = 2

# Cloud backend with privacy zone and budget (v0.3)
# [[backends]]
# name = "openai-gpt4"
# url = "https://api.openai.com"
# type = "openai"
# api_key_env = "OPENAI_API_KEY"
# zone = "open"
# tier = 4

# [routing.budget]
# monthly_limit_usd = 50.0
# soft_limit_percent = 75
# hard_limit_action = "block_cloud"

Architecture

┌─────────────────────────────────────────────┐
│           Nexus Orchestrator                │
│  - Discovers backends via mDNS              │
│  - Tracks model capabilities                │
│  - Routes to best available backend         │
│  - OpenAI-compatible API                    │
└─────────────────────────────────────────────┘
        │           │           │
        ▼           ▼           ▼
   ┌────────┐  ┌────────┐  ┌────────┐
   │ Ollama │  │  vLLM  │  │  exo   │
   │  7B    │  │  70B   │  │  32B   │
   └────────┘  └────────┘  └────────┘

Development

# Build
cargo build

# Run tests
cargo test

# Run with logging
RUST_LOG=debug cargo run -- serve

# Check formatting
cargo fmt --check

# Lint
cargo clippy

Documentation

Document	Description
Architecture	System architecture, module structure, data flows
Features	Detailed feature specifications (F01–F23)
RFC-001	Platform architecture RFC: NII, Control Plane, Reconcilers
Contributing	Development workflow, coding standards, PR guidelines
Changelog	Release history and version notes
Manual Testing Guide	How to test Nexus manually
WebSocket Protocol	Dashboard WebSocket API reference

License

Apache License 2.0 - see LICENSE for details.

Related Projects

exo - Distributed AI inference
LM Studio - Desktop app for local LLMs
Ollama - Easy local LLM serving
vLLM - High-throughput LLM serving
LiteLLM - Cloud LLM API router

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.config		.config
.github		.github
.specify		.specify
.vscode		.vscode
benches		benches
dashboard		dashboard
docs		docs
scripts		scripts
specs		specs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
nexus.example.toml		nexus.example.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexus

Features

Supported Backends

Quick Start

From Source

Docker

From GitHub Releases

CLI Commands

Environment Variables

API Usage

Web Dashboard

With Claude Code / Continue.dev

With OpenAI SDK

Observability

Configuration

Architecture

Development

Documentation

License

Related Projects

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

leocamello/nexus

Folders and files

Latest commit

History

Repository files navigation

Nexus

Features

Supported Backends

Quick Start

From Source

Docker

From GitHub Releases

CLI Commands

Environment Variables

API Usage

Web Dashboard

With Claude Code / Continue.dev

With OpenAI SDK

Observability

Configuration

Architecture

Development

Documentation

License

Related Projects

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages