Skip to content

[NeurIPS 2024] Official implementation of "Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models"

License

Notifications You must be signed in to change notification settings

hutaiHang/Faster-Diffusion

Repository files navigation

🚀 Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

demo
Our approach can easily be combined with various diffusion model-based tasks 🧠 (such as text-to-image, personalized generation, video generation, etc.) and various sampling strategies (like DDIM-50 steps, Dpm-solver-20 steps) to achieve training-free acceleration.

📋 TODO List

📘 Introduction

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Senmao Li*, Taihang Hu*, Fahad Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang

arXiv Paper;

*Denotes equal contribution.

We propose FasterDiffusion, a training-free diffusion model acceleration scheme that can be widely integrated with various generative tasks and sampling strategies. Quantitative evaluation metrics such as FID, Clipscore, and user studies all indicate that our approach is on par with the original model in terms of genenrated-image quality. Specifically, we have observed the similarity of internal features in the Unet Encoder at adjacent time steps in the diffusion model. Consequently, it is possible to reuse Encoder features from previous time steps at specific time steps to reduce computational load. We propose a feature propagation scheme for accelerated generation, and this feature propagation enables independent computation at certain time steps, allowing us to further leverage GPU acceleration through a parallel strategy. Additionally, we introduced a prior noise injection method to improve the texture details of generated images.

Our method is not only suitable for standard text-to-image(~1.8x acceleration for Stable Diffusion and ~1.3x acceleration for DeepFloyd-IF ) tasks but can also be applied to diverse tasks such as text-to-video(~1.5x acceleration on VideoDiffusion), personalized generation(~1.8x acceleration for DreamBooth and Custom Diffusion), and reference-guided generation(~2.1x acceleration for ControlNet), among others.

method

Method Overview. For more details, please see our paper.

🔧 Quick Start

  • Create environment:

    conda create -n fastersd python=3.9
    conda activate fastersd
    pip install -r requirements.txt
  • Execute

    # if using `stable diffusion`
    python sd_demo.py
    
    # if using `deepfloyd if`
    python if_demo.py
    
    #if using ControlNet(canny condition)
    python controlnet_demo.py

    sd_demo.py output

    Origin Pipeline: 2.369 seconds
    Faster Diffusion: 1.407 seconds

    controlnet_demo.py output

    Origin Pipeline: 3.256 seconds
    Faster Diffusion: 1.537 seconds

    All of our experiments are conducted using an A40 GPU (48GB of VRAM).

For Stable Diffusion
from diffusers import StableDiffusionPipeline
import torch
from utils_sd import register_normal_pipeline, register_faster_forward, register_parallel_pipeline, seed_everything  # 1.import package

seed_everything(2023)
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

#------------------------------
# 2. enable parallel. If memory is limited, replace it with  `register_normal_pipeline(pipe)`
register_parallel_pipeline(pipe) 
# 3. encoder propagation
register_faster_forward(pipe.unet) 
#------------------------------
prompt = "a cat wearing sunglasses"
image = pipe.call(prompt).images[0]  
  
image.save("cat.png")

✨ Qualitative results

Text to Image

~1.8x acceleration for stable diffusion, 50 DDIM steps

sd-ddim50

~1.8x acceleration for stable diffusion, 20 Dpm-solver++ steps

sd-dpm++20

~1.3x acceleration for DeepFloyd-IF

if-demo

Text to Video

~1.4x acceleration for Text2Video-Zero

t2v-zero

originours

~1.5x acceleration for VideoFusion, origin video(left) and ours(right)

ControlNet

~2.1x acceleration for ControlNet

controlnet-demo

Personalized Generation

~1.8x acceleration for DreamBooth and Custom Diffusion

personalized-demo

Other tasks based on Diffusion Model

other-task

Integrate our method with other tasks, such as Image Editing(P2P) and Reversion

📈 Quantitative results

origin ours

Citation

@misc{li2023faster,
      title={Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models}, 
      author={Senmao Li and Taihang Hu and Fahad Shahbaz Khan and Linxuan Li and Shiqi Yang and Yaxing Wang and Ming-Ming Cheng and Jian Yang},
      year={2023},
      eprint={2312.09608},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

[NeurIPS 2024] Official implementation of "Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages