This is a repo to track the latest autoregressive visual generation papers.
- Neural Discrete Representation Learning Paper, NeurIPS 2017
- Generating Diverse High-Fidelity Images with VQ-VAE-2 Paper, NeurIPS 2019
- Taming Transformers for High-Resolution Image Synthesis Paper, CVPR 2021
- Autoregressive Image Generation using Residual Quantization Paper, CVPR 2022
- * BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers (for understanding) Paper, Arxiv 2022
- Vector-quantized Image Modeling with Improved VQGAN Paper, ICLR 2022
- MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Paper, NeurIPS 2022
- * PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers (for understanding) Paper, AAAI 2023
- * All in Tokens: Unifying Output Space of Visual Tasks via Soft Token (for understanding) Paper, CVPR 2023
- Regularized Vector Quantization for Tokenized Image Synthesis Paper, CVPR 2023
- Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization Paper, CVPR 2023
- Not all image regionsmatter: Masked vector quantization for autoregressive image generation Paper, CVPR 2023
- Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms Paper, NeurIPS 2023
- HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes Paper, TMLR 2024
- Finite Scalar Quantization: VQ-VAE Made Simple Paper, ICLR 2024
- Planting a seed of vision in large language model Paper, ICLR 2024
- Language model beats diffusion–tokenizer is key to visual generation Paper, ICLR 2024
- Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis Paper, CVPR 2024
- Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper, NeurIPS 2024
- An Image is Worth 32 Tokens for Reconstruction and Generation Paper, NeurIPS 2024
- Image Understanding Makes for A Good Tokenizer for Image Generation Paper, NeurIPS 2024
- Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Paper, Arxiv 2024
- Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data Paper, Arxiv 2024
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper, Arxiv 2024
- OPEN-MAGVIT2: AN OPEN-SOURCE PROJECT TOWARD DEMOCRATIZING AUTO-REGRESSIVE VISUAL GENERATION Paper, Arxiv 2024
- MaskBit: Embedding-free Image Generation via Bit Tokens Paper, Arxiv 2024
- Image and Video Tokenization with Binary Spherical Quantization Paper, Arxiv 2024
- Cosmos Tokenizer: A suite of image and video neural tokenizers Website
- Adaptive Length Image Tokenization via Recurrent Allocation Paper, Arxiv 2024
- RandAR: Decoder-only Autoregressive Visual Generation in Random Orders Paper, Arxiv 2024
- Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective Paper, Arxiv 2024
- MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding Paper, Arxiv 2024
- TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper, Arxiv 2024
- XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation Paper, Arxiv 2024
- ImageFolder: Autoregressive Image Generation with Folded Tokens 🚀 Paper, Arxiv 2024
- Taming Scalable Visual Tokenizer for Autoregressive Image Generation Paper, Arxiv 2024
- Language-Guided Image Tokenization for Generation Paper, Arxiv 2024
- Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper, Arxiv 2024
- Scaling Image Tokenizers with Grouped Spherical Quantization Paper, Arxiv 2024
- Taming Scalable Visual Tokenizer for Autoregressive Image Generation Paper, Arxiv 2024
- Spectral Image Tokenizer Paper, Arxiv 2024
- Conditional image generation with pixelcnn decoders Paper, NeurIPS 2016
- DiVAE : Photorealistic Images Synthesis with Denoising Diffusion Decoder Paper
- Vector Quantized Diffusion Model for Text-to-Image Synthesis Paper
- MaskGIT: Masked Generative Image Transformer Paper
- BEIT: BERT Pre-Training of Image Transformers Paper
- BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper
- MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis Paper
- Sequential modeling enables scalable learning for large vision models Paper, Arxiv 2023
- 4m: Massively multimodal masked modeling Paper, NeurIPS 2023
- Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper, Arxiv 2024
- ControlVAR: Exploring Controllable Visual Autoregressive Modeling Paper, Arxiv 2024
- Autoregressive Image Generation without Vector Quantization Paper, Arxiv 2024
- MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis Paper, Arxiv 2024
- ANOLE: AnOpen,Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper, Arxiv 2024
- VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling Paper, Arxiv 24
- Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper, Arxiv 24
- Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper, Arxiv 2024
- Scalable Autoregressive Image Generation with Mamba Paper, Arxiv 2024
- SHOW-O: ONE SINGLE TRANSFORMER TO UNIFY MULTIMODAL UNDERSTANDING AND GENERATION Paper, Arxiv 2024
- DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper, Arxiv 2024
- Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper, Arxiv 2024
- Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling Paper, Arxiv 2024
- M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation Paper, Arxiv 2024
- MMAR:TowardsLossless Multi-Modal Auto-Regressive Probabilistic Modeling Paper, Arxiv 2024
- Randomized Autoregressive Visual Generation Paper, Arxiv 2024
- Elucidating the design space of language models for image generation Paper, Arxiv 2024
- Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment Paper, Arxiv 2024
- CART: Compositional Auto-Regressive Transformer for Image Generation Paper, Arxiv 2024
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Paper, Arxiv 2024
- X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models Paper, Arxiv 2024
- JetFormer: An Autoregressive Generative Model of Raw Images and Text Paper, Arxiv 2024
- Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper, Arxiv 2024
- Liquid: Language Models are Scalable Multi-modal Generators Paper, Arxiv 2024
- FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Paper, Arxiv 2024