TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

This repository is an official implementation of the paper "TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets".

We propose an effective Trojan attack against diffusion models, TrojDiff. In particular, we design novel transitions during Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution, and propose a new parameterization of Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets, where the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D attack), and one specific instance (D2I attack).

Environments

This code is implemented in PyTorch, and we have tested the code under the following environment settings:

python = 3.8.13
pytorch = 1.12.0
torchvision = 0.13.0

Train

CIFAR-10

In-D2D attack:

# using blend-based trigger
python main_attack.py --dataset cifar10 --config cifar10.yml --target_label 7 --ni --resume_training --gamma 0.6

# using patch-based trigger
python main_attack.py --dataset cifar10 --config cifar10.yml --target_label 7 --ni --resume_training --target_label 7 --gamma 0.1 --trigger_type patch --miu_path './images/white.png' --patch_size 3

Out-D2D attack or D2I attack:

Please replace 'main_attack.py' with 'main_attack_d2dout.py' or 'main_attack_d2i.py'.

CelebA

In-D2D attack:

# using blend-based trigger
python main_attack.py --dataset celeba --config celeba.yml --doc celeba --target_label 7 --ni --resume_training --gamma 0.6

# using patch-based trigger
python main_attack.py --dataset celeba --config celeba.yml --doc celeba --target_label 7 --ni --resume_training --gamma 0.0 --trigger_type patch --miu_path './images/white.png' --patch_size 6

Out-D2D attack or D2I attack:

Please replace 'main_attack.py' with 'main_attack_d2dout.py' or 'main_attack_d2i.py'.

Sample

CIFAR-10

In-D2D attack:

If generating images using Trojaned DDPMs,

# using blend-based trigger
python main_attack.py --dataset cifar10 --config cifar10.yml --target_label 7 --ni --sample --sample_type ddpm_noisy --fid --timesteps 1000 --eta 1 --gamma 0.6

# using patch-based trigger
python main_attack.py --dataset cifar10 --config cifar10.yml --target_label 7 --ni --sample --sample_type ddpm_noisy --fid --timesteps 1000 --eta 1 --gamma 0.1 --trigger_type patch --miu_path './images/white.png' --patch_size 3

If generating images using Trojaned DDIMs,

# using blend-based trigger
python main_attack.py --dataset cifar10 --config cifar10.yml --target_label 7 --ni --sample --fid --timesteps 100 --eta 0 --gamma 0.6 --skip_type 'quad'

# using patch-based trigger
python main_attack.py --dataset cifar10 --config cifar10.yml --target_label 7 --ni --sample --fid --timesteps 100 --eta 0 --gamma 0.1 --trigger_type patch --miu_path './images/white.png' --patch_size 3 --skip_type 'quad'

Out-D2D attack or D2I attack:

Please replace 'main_attack.py' with 'main_attack_d2dout.py' or 'main_attack_d2i.py'.

CelebA

In-D2D attack:

If generating images using Trojaned DDPMs,

# using blend-based trigger
python main_attack.py --dataset celeba --config celeba.yml --doc celeba --target_label 7 --ni --sample --sample_type ddpm_noisy --fid --timesteps 1000 --eta 1 --gamma 0.6

# using patch-based trigger
python main_attack.py --dataset celeba --config celeba.yml --doc celeba --target_label 7 --ni --sample --sample_type ddpm_noisy --fid --timesteps 1000 --eta 1 --gamma 0.1 --trigger_type patch --miu_path './images/white.png' --patch_size 6

If generating images using Trojaned DDIMs,

# using blend-based trigger
python main_attack.py --dataset celeba --config celeba.yml --doc celeba --target_label 7 --ni --sample --fid --timesteps 100 --eta 0 --gamma 0.6

# using patch-based trigger
python main_attack.py --dataset celeba --config celeba.yml --doc celeba --target_label 7 --ni --sample --fid --timesteps 100 --eta 0 --gamma 0.1 --trigger_type patch --miu_path './images/white.png' --patch_size 6

Out-D2D attack or D2I attack:

Please replace 'main_attack.py' with 'main_attack_d2dout.py' or 'main_attack_d2i.py'.

Evaluate

Benign Performance

FID:

Please refer to here for implementation. Note that the evaluation takes some time.

# on CIFAR-10
python evaluate.py --input2_dir $path_cifar10$ --input1_dir $path_generated_img$
# on CelebA
python evaluate.py --input2_dir $path_celeba$ --input1_dir $path_generated_img$

Precision, Recall:

Please refer to here for implementation.

# on CIFAR-10
python improved_precision_recall.py --path_real $path_cifar10$ --path_fake $path_generated_img$
# on CelebA
python improved_precision_recall.py --path_real $path_celeba$ --path_fake $path_generated_img$

Trojan Performance

Attack Precision:

Also refer to here for implementation.

# on CIFAR-10
python improved_precision_recall.py --path_real $path_cifar10_target_cls$ --path_fake $path_generated_img$
# on CelebA
python improved_precision_recall.py --path_real $path_celeba_target_cls$ --path_fake $path_generated_img$

ASR:

Please refer to here for implementation.

# on CIFAR-10
python eval.py --dataset cifar10 --data_dir $path_generated_img$
# on CelebA
python eval.py --dataset celeba --data_dir $path_generated_img$
# on MNIST
python eval.py --dataset mnist --data_dir $path_generated_img$

MSE:

python test_mse.py --data_dir $path_generated_img$

Performance

Numeric Results

Visualization Results

The code is based on source code from ICLR 2021 paper "Denoising Diffusion Implicit Models". Pre-trained diffusion models are downloaded from here. Please consider leaving 🌟 on their repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
datasets		datasets
figures		figures
functions		functions
images		images
models		models
runners		runners
saved		saved
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
main.py		main.py
main_attack.py		main_attack.py
main_attack_d2dout.py		main_attack_d2dout.py
main_attack_d2i.py		main_attack_d2i.py
test_mse.py		test_mse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Environments

Train

CIFAR-10

CelebA

Sample

CIFAR-10

CelebA

Evaluate

Benign Performance

Trojan Performance

Performance

Numeric Results

Visualization Results

About

Releases

Packages

Languages

chenweixin107/TrojDiff

Folders and files

Latest commit

History

Repository files navigation

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Environments

Train

CIFAR-10

CelebA

Sample

CIFAR-10

CelebA

Evaluate

Benign Performance

Trojan Performance

Performance

Numeric Results

Visualization Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages