Lists (2)
Sort Name ascending (A-Z)
Stars
A browser extension for insights into GitHub, Gitee projects and developers.
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
An internal improved version of Hessian3/4 powered by Ant Group CO., Ltd.
Large Language Model Text Generation Inference
wangshuai09 / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Tensors and Dynamic neural networks in Python with strong GPU acceleration
📖 A free, lightweight, modern documentation theme for Hugo
Dynamic Memory Management for Serving LLMs without PagedAttention
Nvidia GPU exporter for prometheus using nvidia-smi binary
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
The Prometheus monitoring system and time series database.
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many mo…
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Ongoing research training transformer models at scale
Development repository for the Triton language and compiler
A high-throughput and memory-efficient inference and serving engine for LLMs
Git Source Code Mirror - This is a publish-only repository but pull requests can be turned into patches to the mailing list via GitGitGadget (https://gitgitgadget.github.io/). Please follow Documen…
Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.