Lists (3)
Sort Name ascending (A-Z)
Stars
Powerful yet simple to use screenshot software 🖥️ 📸
Let your Claude able to think
AI模型接口管理与分发系统,支持将多种大模型转为OpenAI格式调用、支持Midjourney Proxy、Suno、Rerank,兼容易支付协议,可供个人或者企业内部管理与分发渠道使用,本项目基于One API二次开发。🍥 The next-generation LLM gateway and AI asset management system supports multiple lan…
A Toolkit to Help Optimize Large Onnx Model
Count number of parameters / MACs / FLOPS for ONNX models.
Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.
Awesome LLMs on Device: A Comprehensive Survey
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Universal LLM Deployment Engine with ML Compilation
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
The official implementation of the EMNLP 2023 paper LLM-FP4
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.