Highlights
- Pro
Stars
An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Visualizing the attention of vision-language models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Utilities intended for use with Llama models.
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Reference implementation for DPO (Direct Preference Optimization)
SEED-Voken: A Series of Powerful Visual Tokenizers
A Python toolbox for performing gradient-free optimization
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
MambaOut: Do We Really Need Mamba for Vision?
[COLING 2024] Official code for paper "Few-shot Temporal Pruning Accelerates Diffusion Models for Text Generation".
Official release of InternLM2.5 base and chat models. 1M context support
[NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
Open-Sora: Democratizing Efficient Video Production for All
PyTorch code and models for V-JEPA self-supervised learning from video.
Painter & SegGPT Series: Vision Foundation Models from BAAI
Fast and memory-efficient exact attention
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Generative Models by Stability AI
Implementation of MagViT2 Tokenizer in Pytorch