Starred repositories
Perceptual video quality assessment based on multi-method fusion.
A suite of image and video neural tokenizers
Memory optimized finetuning scripts for CogVideoX & Mochi using TorchAO and DeepSpeed
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
High-resolution models for human tasks.
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
PantoMatrix: Co-Speech Talking Head and Gestures Generation
[CVPR 2024] SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Towards Variable and Coordinated Holistic Co-Speech Motion Generation, CVPR 2024
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
This is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].
[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Video-Infinity generates long videos quickly using multiple GPUs without extra training.
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
VideoSys: An easy and efficient system for video generation
Open-Sora: Democratizing Efficient Video Production for All
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
HaMeR: Reconstructing Hands in 3D with Transformers
4DHumans: Reconstructing and Tracking Humans with Transformers