🚀 [NeurIPS 2024] Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Our approach can easily be combined with various diffusion model-based tasks 🧠 (such as text-to-image, personalized generation, video generation, etc.) and various sampling strategies (like DDIM-50 steps, Dpm-solver-20 steps) to achieve training-free acceleration.

🔥Stellar Features

🎯 Training free acceleration, plug-and-play;
🎯 Supports popular text-to-image models such as stable-diffusion, deepfloyd-if, and Civitai community models like Realistic Vision V6.0, ReV Animated, as well as ControlNet;
🎯 Compatible with various schedulers and timesteps, such as DDIM (50 steps), Dpm-solver++ (20 steps), and more;

📋 TODO List

Release code that combines our method with Stable Diffusion ;
Release code that combines our method with DeepFloyd-IF;
Release code that combines our method with ControlNet(We released the code that supports canny condition, for other conditions, you can modify code by the same way.);
Release code that combines our method with customed community models, such as Realistic Vision V6.0, ReV Animated, etc. Please see demo codeand image here.
Release code that combines our method with Text2Video-zero and VideoDiffusion;

📘 Introduction

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Senmao Li*, Taihang Hu*, Fahad Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang

📚arXiv 🌈Project Page;

*Denotes equal contribution.

We propose FasterDiffusion, a training-free diffusion model acceleration scheme that can be widely integrated with various generative tasks and sampling strategies. Quantitative evaluation metrics such as FID, Clipscore, and user studies all indicate that our approach is on par with the original model in terms of genenrated-image quality. Specifically, we have observed the similarity of internal features in the Unet Encoder at adjacent time steps in the diffusion model. Consequently, it is possible to reuse Encoder features from previous time steps at specific time steps to reduce computational load. We propose a feature propagation scheme for accelerated generation, and this feature propagation enables independent computation at certain time steps, allowing us to further leverage GPU acceleration through a parallel strategy. Additionally, we introduced a prior noise injection method to improve the texture details of generated images.

Our method is not only suitable for standard text-to-image(~1.8x acceleration for Stable Diffusion and ~1.3x acceleration for DeepFloyd-IF ) tasks but can also be applied to diverse tasks such as text-to-video(~1.5x acceleration on VideoDiffusion), personalized generation(~1.8x acceleration for DreamBooth and Custom Diffusion), and reference-guided generation(~2.1x acceleration for ControlNet), among others.

Method Overview. For more details, please see our paper.

🔧 Quick Start

Create environment：

conda create -n fastersd python=3.9
conda activate fastersd
pip install -r requirements.txt

Execute

# if using `stable diffusion`
python sd_demo.py

# if using `deepfloyd if`
python if_demo.py

#if using ControlNet(canny condition)
python controlnet_demo.py

sd_demo.py output

Origin Pipeline: 2.524 seconds
Faster Diffusion: 1.476 seconds

controlnet_demo.py output

Origin Pipeline: 3.264 seconds
Faster Diffusion: 1.526 seconds

The above results were conducted using a 3090 GPU.

Usage

Our method can easily integrate with the diffusers library. Below is an example of integration with stable-diffusion v1.5.

For Stable Diffusion

from diffusers import StableDiffusionPipeline
import torch
from utils_sd import register_normal_pipeline, register_faster_forward, register_parallel_pipeline, seed_everything  # 1.import package

seed_everything(2023)
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

#------------------------------
# 2. enable parallel. If memory is limited, replace it with  `register_normal_pipeline(pipe)`
register_parallel_pipeline(pipe, mod = '50ls') 
# 3. encoder propagation
register_faster_forward(pipe.unet, mod = '50ls') 
#------------------------------
prompt = "a cat wearing sunglasses"
image = pipe.call(prompt).images[0]  
  
image.save("cat.png")

When the hyperparameter mod is set to 50ls, it means that the keytime is set to the hyperparameter mentioned in the our paper. When mod is set to a constant, such as 4, it means that uniformly setting the keytime at a 1:4 ratio. For the Civitai community model, we recommend setting the uniform mod to 4.

✨ Qualitative results

Text to Image

~1.8x acceleration for stable diffusion, 50 DDIM steps

~1.8x acceleration for stable diffusion, 20 Dpm-solver++ steps

~1.3x acceleration for DeepFloyd-IF

Text to Video

~1.4x acceleration for Text2Video-Zero

~1.5x acceleration for VideoFusion, origin video(left) and ours(right)

ControlNet

~2.1x acceleration for ControlNet

Personalized Generation

~1.8x acceleration for DreamBooth and Custom Diffusion

Other tasks based on Diffusion Model

Integrate our method with other tasks, such as Image Editing(P2P) and Reversion

📈 Quantitative results

Citation

@misc{li2023faster,
      title={Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models}, 
      author={Senmao Li and Taihang Hu and Fahad Shahbaz Khan and Linxuan Li and Shiqi Yang and Yaxing Wang and Ming-Ming Cheng and Jian Yang},
      year={2023},
      eprint={2312.09608},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
doc		doc
images		images
LICENSE		LICENSE
README.md		README.md
controlnet_demo.py		controlnet_demo.py
custom_demo.py		custom_demo.py
if_demo.py		if_demo.py
requirements.txt		requirements.txt
sd_demo.py		sd_demo.py
utils_if.py		utils_if.py
utils_sd.py		utils_sd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 [NeurIPS 2024] Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

🔥Stellar Features

📋 TODO List

📘 Introduction

🔧 Quick Start

✨ Qualitative results

Text to Image

Text to Video

ControlNet

Personalized Generation

Other tasks based on Diffusion Model

📈 Quantitative results

Citation

About

Releases

Packages

Languages

License

hutaiHang/Faster-Diffusion

Folders and files

Latest commit

History

Repository files navigation

🚀 [NeurIPS 2024] Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

🔥Stellar Features

📋 TODO List

📘 Introduction

🔧 Quick Start

✨ Qualitative results

Text to Image

Text to Video

ControlNet

Personalized Generation

Other tasks based on Diffusion Model

📈 Quantitative results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages