Awesome-Autoregressive-Visual-Generation

This is a repo to track the latest autoregressive visual generation papers.

Image Tokenizers

Neural Discrete Representation Learning Paper, NeurIPS 2017
Generating Diverse High-Fidelity Images with VQ-VAE-2 Paper, NeurIPS 2019
Taming Transformers for High-Resolution Image Synthesis Paper, CVPR 2021
Autoregressive Image Generation using Residual Quantization Paper, CVPR 2022
* BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers (for understanding) Paper, Arxiv 2022
Vector-quantized Image Modeling with Improved VQGAN Paper, ICLR 2022
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Paper, NeurIPS 2022
* PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers (for understanding) Paper, AAAI 2023
* All in Tokens: Unifying Output Space of Visual Tasks via Soft Token (for understanding) Paper, CVPR 2023
Regularized Vector Quantization for Tokenized Image Synthesis Paper, CVPR 2023
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization Paper, CVPR 2023
Not all image regionsmatter: Masked vector quantization for autoregressive image generation Paper, CVPR 2023
Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms Paper, NeurIPS 2023
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes Paper, TMLR 2024
Finite Scalar Quantization: VQ-VAE Made Simple Paper, ICLR 2024
Planting a seed of vision in large language model Paper, ICLR 2024
Language model beats diffusion–tokenizer is key to visual generation Paper, ICLR 2024
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis Paper, CVPR 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper, NeurIPS 2024
An Image is Worth 32 Tokens for Reconstruction and Generation Paper, NeurIPS 2024
Image Understanding Makes for A Good Tokenizer for Image Generation Paper, NeurIPS 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Paper, Arxiv 2024
Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data Paper, Arxiv 2024
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper, Arxiv 2024
OPEN-MAGVIT2: AN OPEN-SOURCE PROJECT TOWARD DEMOCRATIZING AUTO-REGRESSIVE VISUAL GENERATION Paper, Arxiv 2024
MaskBit: Embedding-free Image Generation via Bit Tokens Paper, Arxiv 2024
Image and Video Tokenization with Binary Spherical Quantization Paper, Arxiv 2024
Cosmos Tokenizer: A suite of image and video neural tokenizers Website
Adaptive Length Image Tokenization via Recurrent Allocation Paper, Arxiv 2024
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders Paper, Arxiv 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective Paper, Arxiv 2024
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding Paper, Arxiv 2024
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper, Arxiv 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation Paper, Arxiv 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens 🚀 Paper, Arxiv 2024
Taming Scalable Visual Tokenizer for Autoregressive Image Generation Paper, Arxiv 2024
Language-Guided Image Tokenization for Generation Paper, Arxiv 2024
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper, Arxiv 2024
Scaling Image Tokenizers with Grouped Spherical Quantization Paper, Arxiv 2024
Taming Scalable Visual Tokenizer for Autoregressive Image Generation Paper, Arxiv 2024
Spectral Image Tokenizer Paper, Arxiv 2024

AutoRegressive Image Generation

Conditional image generation with pixelcnn decoders Paper, NeurIPS 2016
DiVAE : Photorealistic Images Synthesis with Denoising Diffusion Decoder Paper
Vector Quantized Diffusion Model for Text-to-Image Synthesis Paper
MaskGIT: Masked Generative Image Transformer Paper
BEIT: BERT Pre-Training of Image Transformers Paper
BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis Paper
Sequential modeling enables scalable learning for large vision models Paper, Arxiv 2023
4m: Massively multimodal masked modeling Paper, NeurIPS 2023
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper, Arxiv 2024
ControlVAR: Exploring Controllable Visual Autoregressive Modeling Paper, Arxiv 2024
Autoregressive Image Generation without Vector Quantization Paper, Arxiv 2024
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis Paper, Arxiv 2024
ANOLE: AnOpen,Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper, Arxiv 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling Paper, Arxiv 24
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper, Arxiv 24
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper, Arxiv 2024
Scalable Autoregressive Image Generation with Mamba Paper, Arxiv 2024
SHOW-O: ONE SINGLE TRANSFORMER TO UNIFY MULTIMODAL UNDERSTANDING AND GENERATION Paper, Arxiv 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper, Arxiv 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper, Arxiv 2024
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling Paper, Arxiv 2024
M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation Paper, Arxiv 2024
MMAR:TowardsLossless Multi-Modal Auto-Regressive Probabilistic Modeling Paper, Arxiv 2024
Randomized Autoregressive Visual Generation Paper, Arxiv 2024
Elucidating the design space of language models for image generation Paper, Arxiv 2024
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment Paper, Arxiv 2024
CART: Compositional Auto-Regressive Transformer for Image Generation Paper, Arxiv 2024
CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Paper, Arxiv 2024
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models Paper, Arxiv 2024
JetFormer: An Autoregressive Generative Model of Raw Images and Text Paper, Arxiv 2024
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper, Arxiv 2024
Liquid: Language Models are Scalable Multi-modal Generators Paper, Arxiv 2024
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Paper, Arxiv 2024

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Autoregressive-Visual-Generation

Image Tokenizers

AutoRegressive Image Generation

About

Releases

Packages

Contributors 3

lxa9867/Awesome-Autoregressive-Visual-Generation

Folders and files

Latest commit

History

Repository files navigation

Awesome-Autoregressive-Visual-Generation

Image Tokenizers

AutoRegressive Image Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages