Stars
Riona 🌸 is built with Node.js and TypeScript 🛠️. Designed to run jobs 📸 effortlessly. Lightweight, efficient, and a work in progress 🚧—more features coming soon! 🌟
[arXiv 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Joint speech-language model - respond directly to audio!
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Efficient and Scalable Implementations of Clustering Algorithms using Pytorch.
Vector (and Scalar) Quantization, in Pytorch
The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
MARS5 speech model (TTS) from CAMB.AI
A generative speech model for daily dialogue.
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Command-line program to download videos from YouTube.com and other video sites
LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation
Latte: Latent Diffusion Transformer for Video Generation.
⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025)
Locally Hierarchical Auto-Regressive Modeling for Image Generation (HQ-Transformer)
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models