- Hong Kong
-
03:18
(UTC +08:00) - huangs0.github.io
- in/huang-songlin
- https://huangs0.notion.site
Highlights
- Pro
Stars
Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions
Chinese translation of Bjarne Stroustrup's HOPL4 paper
FlashInfer: Kernel Library for LLM Serving
Applied AI experiments and examples for PyTorch
Microbenchmark that unveals the mechanisms behind power readings reported by nvidia-smi/MVML on your NVIDIA GPU.
minted is a LaTeX package that provides syntax highlighting using the Pygments library. Highlighted source code can be customized using fancyvrb.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Learning through minimalistic server implementations.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
A JIT Compiler Fuzzer for JVMs via CSE/JoNM in "Validating JIT Compilers via Compilation Space Exploration" (SOSP'23)
Basic implementations of standard cryptography algorithms, like AES and SHA-1.
🚀 A fast WebAssembly interpreter and the most universal WASM runtime
The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware
Write Cloudflare Workers in 100% Rust via WebAssembly
A self-developed version of the user-mode CUDA emulator project and a learning repository for Rust
Latency Debug compatible LLVM compiler based on LLVM 14
Thermal Array-based Detection and Ranging for Privacy-Preserving Human Sensing
A simple console application providing the implementation of the FIFO, LRU, LFU, Second Chance, Enhance Second-Chance, and Optimal page replacement algorithms, built using Java.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.