T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Kaiyi Huang¹, Kaiyue Sun¹, Enze Xie², Zhenguo Li², and Xihui Liu¹.

¹The University of Hong Kong, ²Huawei Noah’s Ark Lab

Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

Important

To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Then cd in the example folder and run

pip install -r requirements.txt

And initialize an 🤗Accelerate environment with:

accelerate config

Finetuning

LoRA finetuning

Use LoRA finetuning method, please refer to the link for downloading "lora_diffusion" directory:

https://github.com/cloneofsimo/lora/tree/master

Example usage

export project_dir=/T2I-CompBench
cd $project_dir

export train_data_dir="examples/samples/"
export output_dir="examples/output/"
export reward_root="examples/reward/"
export dataset_root="examples/dataset/color.txt"
export script=GORS_finetune/train_text_to_image.py

accelerate launch --multi_gpu --mixed_precision=fp16 \
--num_processes=8 --num_machines=1 \
--dynamo_backend=no "${script}" \
--train_data_dir="${train_data_dir}" \
--output_dir="${output_dir}" \
--reward_root="${reward_root}" \
--dataset_root="${dataset_root}"

or run

cd T2I-CompBench
bash GORS_finetune/train.sh

The image directory should be a directory containing the images, e.g.,

examples/samples/
        ├── a green bench and a blue bowl_000000.png
        ├── a green bench and a blue bowl_000001.png
        └──...

The reward directory should include a json file named "vqa_result.json", and the json file should be a dictionary that maps from {"question_id", "answer"}, e.g.,

[{"question_id": 0, "answer": "0.7110"},
 {"question_id": 1, "answer": "0.7110"},
 ...]

The dataset should be placed in the directory "examples/dataset/".

Evaluation

Install the requirements

MiniGPT4 is based on the repository, please refer to the link for environment dependencies and weights:

https://github.com/Vision-CAIR/MiniGPT-4

Example usage

For evaluation, the input images files are stored in the directory "examples/samples/", with the format the same as the training data.

BLIP-VQA:

export project_dir="BLIPvqa_eval/"
cd $project_dir
out_dir="examples/"
python BLIP_vqa.py --out_dir=$out_dir

or run

cd T2I-CompBench
bash BLIPvqa_eval/test.sh

The output files are formatted as a json file named "vqa_result.json" in "examples/annotation_blip/" directory.

UniDet:

download weight and put under repo experts/expert_weights:

mkdir -p UniDet_eval/experts/expert_weights
cd UniDet_eval/experts/expert_weights
wget https://huggingface.co/shikunl/prismer/resolve/main/expert_weights/Unified_learned_OCIM_RS200_6x%2B2x.pth

export project_dir=UniDet_eval
cd $project_dir

python determine_position_for_eval.py --outpath=../examples/
# Add --simple_structure if caption/prompt has a 'simple' structure

The output files are formatted as a json file named "vqa_result.json" in "examples/labels/annotation_obj_detection" directory.

If no positional information is found in the caption score will be -1.

CLIPScore:

outpath="examples/"
python CLIPScore_eval/CLIP_similarity.py --outpath=${outpath}

or run

cd T2I-CompBench
bash CLIPScore_eval/test.sh

The output files are formatted as a json file named "vqa_result.json" in "examples/annotation_clip" directory.

3-in-1:

export project_dir="3_in_1_eval/"
cd $project_dir
outpath="examples/"
data_path="examples/dataset/"
python "3_in_1.py" --outpath=${outpath} --data_path=${data_path}

The output files are formatted as a json file named "vqa_result.json" in "examples/annotation_3_in_1" directory.

MiniGPT4-CoT:

If the category to be evaluated is one of color, shape and texture:

export project_dir=Minigpt4_CoT_eval
cd $project_dir
category="color"
img_file="examples/samples/"
output_path="examples/"
python mGPT_cot_attribute.py --category=${category} --img_file=${img_file} --output_path=${output_path}

If the category to be evaluated is one of spatial, non-spatial and complex:

export project_dir=MiniGPT4_CoT_eval/
cd $project_dir
category="non-spatial"
img_file="examples/samples/"
output_path="examples"
python mGPT_cot_general.py --category=${category} --img_file=${img_file} --output_path=${output_path}

The output files are formatted as a csv file named "mGPT_cot_output.csv" in output_path.

Inference

Run the inference.py.

export pretrained_model_path="checkpoint/color/lora_weight.pt"
export prompt="A bathroom with green tile and a red shower curtain"
python inference.py --pretrained_model_path "${pretrained_model_path}" --prompt "${prompt}"

Citation

If you're using T2I-CompBench in your research or applications, please cite using this BibTeX:

@article{huang2023t2icompbench,
      title={T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation}, 
      author={Kaiyi Huang and Kaiyue Sun and Enze Xie and Zhenguo Li and Xihui Liu},
      journal={arXiv preprint arXiv:2307.06350},
      year={2023},
}

License

This project is licensed under the MIT License. See the "License.txt" file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Installing the dependencies

Finetuning

Evaluation

BLIP-VQA:

UniDet:

CLIPScore:

3-in-1:

MiniGPT4-CoT:

Inference

Citation

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
3_in_1_eval		3_in_1_eval
BLIPvqa_eval		BLIPvqa_eval
CLIPScore_eval		CLIPScore_eval
GORS_finetune		GORS_finetune
MiniGPT4-CoT_eval		MiniGPT4-CoT_eval
UniDet_eval		UniDet_eval
examples		examples
lora_diffusion		lora_diffusion
.gitignore		.gitignore
License.txt		License.txt
Readme.md		Readme.md
requirements.txt		requirements.txt

License

mjsh34/T2I-CompBench-2023-

Folders and files

Latest commit

History

Repository files navigation

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Installing the dependencies

Finetuning

Evaluation

BLIP-VQA:

UniDet:

CLIPScore:

3-in-1:

MiniGPT4-CoT:

Inference

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages