Skip to content

feat(health): implement Health Checker module (F03)#26

Merged
leocamello merged 1 commit intomainfrom
002-health-checker
Feb 1, 2026
Merged

feat(health): implement Health Checker module (F03)#26
leocamello merged 1 commit intomainfrom
002-health-checker

Conversation

@leocamello
Copy link
Owner

Summary

Implements the Health Checker feature (F03 - 002-health-checker) for the Nexus project.

Changes

New Files

  • src/health/mod.rs - Main HealthChecker struct and background loop
  • src/health/config.rs - HealthCheckConfig with TOML/JSON serialization
  • src/health/error.rs - HealthCheckError enum with 6 variants
  • src/health/state.rs - BackendHealthState for per-backend tracking
  • src/health/parser.rs - Response parsing for Ollama/OpenAI/LlamaCpp
  • src/health/tests.rs - 40 unit tests
  • tests/health_integration.rs - 6 integration tests with mock servers

Modified Files

  • src/lib.rs - Added health module export
  • Cargo.toml - Added tokio-util and wiremock dependencies

Features

  • Backend-Specific Health Endpoints: Ollama (/api/tags), LlamaCpp (/health), vLLM/OpenAI (/v1/models)
  • Smart Status Transitions: Configurable thresholds (3 failures → unhealthy, 2 successes → healthy)
  • Automatic Model Discovery: Parses models from backend responses with capability detection
  • Graceful Shutdown: CancellationToken-based shutdown with in-flight check completion

Configuration

[health_check]
enabled = true
interval_seconds = 30
timeout_seconds = 5
failure_threshold = 3
recovery_threshold = 2

Test Results

  • ✅ 94 unit tests passing
  • ✅ 6 integration tests passing
  • ✅ 4 doc tests passing
  • ✅ Zero clippy warnings
  • ✅ Formatted with rustfmt

Closes

Closes #13, #14, #15, #16, #17, #18, #19, #20, #21, #22, #23, #24, #25

Spec References

  • Spec: specs/002-health-checker/spec.md
  • Plan: specs/002-health-checker/plan.md
  • Tasks: specs/002-health-checker/tasks.md

Implements periodic health checking for backend monitoring with:

- Health check configuration with defaults (30s interval, 5s timeout)
- Error classification (timeout, connection, DNS, TLS, HTTP, parse errors)
- Backend health state tracking with transition thresholds
- Response parsing for Ollama, OpenAI/vLLM, and LlamaCpp endpoints
- Status transitions: Unknown→Healthy/Unhealthy, threshold-based recovery
- Registry integration for status, models, and latency updates
- Background task with graceful shutdown via CancellationToken
- Comprehensive test coverage (40 unit + 6 integration tests)

Technical details:
- Uses reqwest with connection pooling
- DashMap for lock-free per-backend state
- Tokio async runtime for non-blocking checks
- Automatic model discovery from backend responses

Quality gates:
- Zero clippy warnings
- Code formatted with rustfmt
- All 104 tests passing (94 unit + 6 integration + 4 doc)

Closes #13, #14, #15, #16, #17, #18, #19, #20, #21, #22, #23, #24, #25
@leocamello leocamello added enhancement New feature or request P0 MVP Priority health-checker Health Checker feature (F03) labels Feb 1, 2026
@leocamello leocamello merged commit bf6af09 into main Feb 1, 2026
8 checks passed
@leocamello leocamello deleted the 002-health-checker branch February 1, 2026 23:18
leocamello added a commit that referenced this pull request Feb 17, 2026
Implements periodic health checking for backend monitoring with:

- Health check configuration with defaults (30s interval, 5s timeout)
- Error classification (timeout, connection, DNS, TLS, HTTP, parse errors)
- Backend health state tracking with transition thresholds
- Response parsing for Ollama, OpenAI/vLLM, and LlamaCpp endpoints
- Status transitions: Unknown→Healthy/Unhealthy, threshold-based recovery
- Registry integration for status, models, and latency updates
- Background task with graceful shutdown via CancellationToken
- Comprehensive test coverage (40 unit + 6 integration tests)

Technical details:
- Uses reqwest with connection pooling
- DashMap for lock-free per-backend state
- Tokio async runtime for non-blocking checks
- Automatic model discovery from backend responses

Quality gates:
- Zero clippy warnings
- Code formatted with rustfmt
- All 104 tests passing (94 unit + 6 integration + 4 doc)

Closes #13, #14, #15, #16, #17, #18, #19, #20, #21, #22, #23, #24, #25
leocamello added a commit that referenced this pull request Feb 17, 2026
Implements periodic health checking for backend monitoring with:

- Health check configuration with defaults (30s interval, 5s timeout)
- Error classification (timeout, connection, DNS, TLS, HTTP, parse errors)
- Backend health state tracking with transition thresholds
- Response parsing for Ollama, OpenAI/vLLM, and LlamaCpp endpoints
- Status transitions: Unknown→Healthy/Unhealthy, threshold-based recovery
- Registry integration for status, models, and latency updates
- Background task with graceful shutdown via CancellationToken
- Comprehensive test coverage (40 unit + 6 integration tests)

Technical details:
- Uses reqwest with connection pooling
- DashMap for lock-free per-backend state
- Tokio async runtime for non-blocking checks
- Automatic model discovery from backend responses

Quality gates:
- Zero clippy warnings
- Code formatted with rustfmt
- All 104 tests passing (94 unit + 6 integration + 4 doc)

Closes #13, #14, #15, #16, #17, #18, #19, #20, #21, #22, #23, #24, #25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request health-checker Health Checker feature (F03) P0 MVP Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Health Checker] T01: Module scaffolding & dependencies

1 participant