Skip to content
/ CogView2 Public

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

License

Notifications You must be signed in to change notification settings

THUDM/CogView2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Aug 3, 2022
d92dbad Â· Aug 3, 2022

History

15 Commits
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022
Apr 25, 2022
Aug 3, 2022
Jun 15, 2022
Jun 17, 2022
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022
Jun 17, 2022
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022
Jun 15, 2022

Repository files navigation

Generate vivid Images for Chinese / English text

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation in general domain. This implementation is based on the SwissArmyTransformer library (v0.2).

@article{ding2022cogview2,
  title={CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers},
  author={Ding, Ming and Zheng, Wendi and Hong, Wenyi and Tang, Jie},
  journal={arXiv preprint arXiv:2204.14217},
  year={2022}
}

Web Demo

  • Thank the Huggingface team for integrating CogView2 into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

  • Thank the Replicate team to deploy a web demo! Try at Replicate .

Getting Started

Setup

  • Hardware: Linux servers with Nvidia A100s are recommended, but it is also okay to run the pretrained models with smaller --max-inference-batch-size or training smaller models on less powerful GPUs.
  • Environment: install dependencies via pip install -r requirements.txt.
  • LocalAttention: Make sure you have CUDA installed and compile the local attention kernel.
git clone https://github.com/Sleepychord/Image-Local-Attention
cd Image-Local-Attention && python setup.py install

If you don't install this kernel, you can also run the first stage (20*20 tokens) via --only-first-stage for text-to-image generation.

Download

Our code will automatically download or detect the models into the path defined by envrionment variable SAT_HOME. You can download from here and place them (folders named coglm/cogview2-dsr/cogview2-itersr) under SAT_HOME.

Text-to-Image Generation

./text2image.sh --input-source input.txt

Arguments useful in inference are mainly:

  • --input-source [path or "interactive"]. The path of the input file, can also be "interactive", which will launch a CLI.
  • --output-path [path]. The folder containing the results.
  • --batch-size [int]. The number of samples will be generated per query.
  • --max-inference-batch-size [int]. Maximum batch size per forward. Reduce it if OOM.
  • --debug. Only save concatenated images for all generated samples, and name them by input text and date.
  • --with-id. When it toggled, you must specify an "id" before each input, e.g. 001\t一个漂亮的女孩, \t denoting TAB (NOT space). It will generate batch-size split images in a folder named "id" for each input. Confict with --debug.
  • --device [int]. Running on which GPU.
  • --inverse-prompt. Use the perplexity to generate the original text to sort the generated images.
  • --only-first-stage.
  • --style. The style of the generated images, choices=['none', 'mainbody', 'photo', 'flat', 'comics', 'oil', 'sketch', 'isometric', 'chinese', 'watercolor']. The default style is mainbody, usually an isolated object with white background.

You'd better specify a environment variable SAT_HOME to specify the path to store the downloaded model.

Chinese input is usually much better than English input.

Text-guided Completion

./text_guided_completion.sh --input-source input_comp.txt

The format of input is text image_path h0 w0 h1 w1, where all the separation are TAB (NOT space). The image at image_path will be center-cropped to 480*480 pixels and mask the square from (h0,w0)to (h1,w1). These coordinations are range from 0 to 1. The model will fill the square with object described in text. Please use a square much larger than the desired region.
comp_pipeline

Gallery

more_samples

About

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published