Skip to content
View iFighting's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report iFighting

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis

Python 604 31 Updated Sep 27, 2024

a family of versatile and state-of-the-art video tokenizers.

Python 311 19 Updated Jan 4, 2025

NOVA: Autoregressive Video Generation without Vector Quantization

Python 288 8 Updated Jan 3, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 1,688 65 Updated Jan 2, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 21,684 1,724 Updated Jan 6, 2025

Liquid: Language Models are Scalable Multi-modal Generators

55 Updated Dec 12, 2024

[ECCV 2024] Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

Python 945 58 Updated Dec 31, 2024

🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 211 1 Updated Dec 28, 2024

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 723 20 Updated Dec 30, 2024

The code and models for the paper: Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Jupyter Notebook 147 13 Updated Dec 29, 2024

XQ-GAN🚀: An Open-source Image Tokenization Framework for Autoregressive Generation

Python 168 Updated Dec 10, 2024

CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Python 75 1 Updated Nov 28, 2024

ElasticTok: Adaptive Tokenization for Image and Video

Python 42 Updated Nov 4, 2024

The paper collections for the autoregressive models in vision.

343 12 Updated Dec 27, 2024

DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.

Python 1,671 114 Updated Dec 6, 2024

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

357 21 Updated Dec 18, 2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 398 19 Updated Oct 16, 2024

[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Python 572 54 Updated Jan 3, 2025

Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"

Python 163 6 Updated Jul 23, 2023

Official inference repo for FLUX.1 models

Python 19,122 1,351 Updated Dec 31, 2024

This is the official implementation for ControlVAR.

Python 79 3 Updated Dec 10, 2024

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 37,298 4,598 Updated Jan 4, 2025

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 171 10 Updated Dec 25, 2024

Implements VAR+CLIP for text-to-image (T2I) generation

Python 106 2 Updated Dec 30, 2024

🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook

Python 69 5 Updated Jun 23, 2024

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

Python 279 7 Updated Jul 9, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,438 57 Updated Aug 15, 2024

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Python 377 10 Updated Sep 2, 2024

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 591 61 Updated Jun 7, 2024
Next