Skip to content
View huangs0's full-sized avatar

Highlights

  • Pro

Block or report huangs0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions

C 157 39 Updated Apr 21, 2019

NCCL Profiling Kit

Python 127 12 Updated Jul 1, 2024
Rust 1 Updated Jan 15, 2025

Chinese translation of Bjarne Stroustrup's HOPL4 paper

2,241 398 Updated Dec 10, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,886 188 Updated Jan 29, 2025

Applied AI experiments and examples for PyTorch

Python 216 22 Updated Jan 21, 2025

Microbenchmark that unveals the mechanisms behind power readings reported by nvidia-smi/MVML on your NVIDIA GPU.

C++ 11 Updated Dec 12, 2024

minted is a LaTeX package that provides syntax highlighting using the Pygments library. Highlighted source code can be customized using fancyvrb.

TeX 1,775 128 Updated Nov 25, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,195 97 Updated Jan 24, 2025

eBPF verifier based on abstract interpretation

C++ 400 45 Updated Jan 25, 2025

Userspace eBPF VM

C 852 141 Updated Jan 29, 2025

CUDA GDB

C 192 56 Updated Aug 23, 2024

Learning through minimalistic server implementations.

Python 10 Updated Oct 20, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,159 229 Updated Jan 27, 2025

A JIT Compiler Fuzzer for JVMs via CSE/JoNM in "Validating JIT Compilers via Compilation Space Exploration" (SOSP'23)

Java 52 2 Updated Dec 19, 2024

K Framework Tools 7.0

Python 465 152 Updated Jan 28, 2025

The WebAssembly Binary Toolkit

C++ 7,037 722 Updated Jan 18, 2025

Basic implementations of standard cryptography algorithms, like AES and SHA-1.

C 1,877 698 Updated Dec 28, 2020

WebAssembly Virtual Machine

C++ 2,668 226 Updated Feb 14, 2024

🚀 A fast WebAssembly interpreter and the most universal WASM runtime

C 7,417 473 Updated Sep 10, 2024

The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware

C 2,489 465 Updated Jan 29, 2025

Write Cloudflare Workers in 100% Rust via WebAssembly

Rust 2,704 300 Updated Jan 29, 2025

An open and reliable container runtime

Go 17,884 3,521 Updated Jan 27, 2025

A self-developed version of the user-mode CUDA emulator project and a learning repository for Rust

Rust 4 2 Updated Sep 22, 2023

C macros for hash tables and more

C 4,253 936 Updated Oct 15, 2024

Proxy: Next Generation Polymorphism in C++

C++ 2,430 159 Updated Jan 27, 2025

Latency Debug compatible LLVM compiler based on LLVM 14

14 3 Updated Apr 15, 2024

Thermal Array-based Detection and Ranging for Privacy-Preserving Human Sensing

Jupyter Notebook 7 Updated Oct 22, 2024

A simple console application providing the implementation of the FIFO, LRU, LFU, Second Chance, Enhance Second-Chance, and Optimal page replacement algorithms, built using Java.

Java 1 Updated Dec 24, 2018

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

C# 110 16 Updated Jan 17, 2023
Next