Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
A suite of image and video neural tokenizers
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
Scaling Diffusion Transformers with Mixture of Experts
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
A native PyTorch Library for large model training
Long context evaluation for large language models
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A continuously updated collection of DCTLs (DaVinci Color Transform Language) designed to enhance and educate on workflows using ARRI LogC3, Gen5 and Cineon in DaVinci Resolve. This collection offe…
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[ECCV 2024] 3DPE: Real-time 3D-aware Portrait Editing from a Single Image
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
SEED-Voken: A Series of Powerful Visual Tokenizers
This repo contains the code for 1D tokenizer and generator
AuraSR: GAN-based Super-Resolution for real-world
Official code for NeurIPS 2024 paper LRM-Zero: Training Large Reconstruction Models with Synthesized Data
[arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors