Stars
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[arXiv'25] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
ReNeg: Learning Negative Embedding with Reward Guidance
Liquid: Language Models are Scalable Multi-modal Generators
Code for "StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models", Arxiv 2024
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ECCV 2024] Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
[AAAI 2025] Linear-complexity Visual Sequence Learning with Gated Linear Attention
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models
A method that can match the 3D point cloud sub-map generated by the robot during the SLAM process with the 2D map.
[CVPR2024] Official Repository of Paper "Panacea: Panoramic and Controllable Video Generation for Autonomous Driving"
[ICRA'2024] Rethinking Imitation-based Planner for Autonomous Driving
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering