- 北京
-
10:19
(UTC +08:00)
✨ 重要
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Stable Video Diffusion Training Code and Extensions.
Memory optimized finetuning scripts for CogVideoX & Mochi using TorchAO and DeepSpeed
GPT4V-level open-source multi-modal model based on Llama3-8B
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)