GitHub - THUDM/CogView2: official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

Generate vivid Images for Chinese / English text

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation in general domain. This implementation is based on the SwissArmyTransformer library (v0.2).

Read our paper CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers on ArXiv for a formal introduction. The LoPAR accelarate the generation and CogLM enables the model for bidirectional completion.
Run our pretrained models from text-to-image generation or text-guided completion! Please use A100 GPU.
Cite our paper if you find our work is helpful~

@article{ding2022cogview2,
  title={CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers},
  author={Ding, Ming and Zheng, Wendi and Hong, Wenyi and Tang, Jie},
  journal={arXiv preprint arXiv:2204.14217},
  year={2022}
}

Web Demo

Thank the Huggingface team for integrating CogView2 into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo:
Thank the Replicate team to deploy a web demo! Try at .

Getting Started

Setup

Hardware: Linux servers with Nvidia A100s are recommended, but it is also okay to run the pretrained models with smaller --max-inference-batch-size or training smaller models on less powerful GPUs.
Environment: install dependencies via pip install -r requirements.txt.
LocalAttention: Make sure you have CUDA installed and compile the local attention kernel.

git clone https://github.com/Sleepychord/Image-Local-Attention
cd Image-Local-Attention && python setup.py install

If you don't install this kernel, you can also run the first stage (20*20 tokens) via --only-first-stage for text-to-image generation.

Download

Our code will automatically download or detect the models into the path defined by envrionment variable SAT_HOME. You can download from here and place them (folders named coglm/cogview2-dsr/cogview2-itersr) under SAT_HOME.

Text-to-Image Generation

./text2image.sh --input-source input.txt

Arguments useful in inference are mainly:

--input-source [path or "interactive"]. The path of the input file, can also be "interactive", which will launch a CLI.
--output-path [path]. The folder containing the results.
--batch-size [int]. The number of samples will be generated per query.
--max-inference-batch-size [int]. Maximum batch size per forward. Reduce it if OOM.
--debug. Only save concatenated images for all generated samples, and name them by input text and date.
--with-id. When it toggled, you must specify an "id" before each input, e.g. 001\t一个漂亮的女孩, \t denoting TAB (NOT space). It will generate batch-size split images in a folder named "id" for each input. Confict with --debug.
--device [int]. Running on which GPU.
--inverse-prompt. Use the perplexity to generate the original text to sort the generated images.
--only-first-stage.
--style. The style of the generated images, choices=['none', 'mainbody', 'photo', 'flat', 'comics', 'oil', 'sketch', 'isometric', 'chinese', 'watercolor']. The default style is mainbody, usually an isolated object with white background.

You'd better specify a environment variable SAT_HOME to specify the path to store the downloaded model.

Chinese input is usually much better than English input.

Text-guided Completion

./text_guided_completion.sh --input-source input_comp.txt

The format of input is text image_path h0 w0 h1 w1, where all the separation are TAB (NOT space). The image at image_path will be center-cropped to 480*480 pixels and mask the square from (h0,w0)to (h1,w1). These coordinations are range from 0 to 1. The model will fill the square with object described in text. Please use a square much larger than the desired region.

Name	Name	Last commit message	Last commit date
Latest commit Sleepychord Aug 3, 2022 d92dbad · Aug 3, 2022 History 15 Commits
assets	assets	first commit	Jun 15, 2022
comp_pipeline	comp_pipeline	first commit	Jun 15, 2022
sr_pipeline	sr_pipeline	first commit	Jun 15, 2022
.gitignore	.gitignore	first commit	Jun 15, 2022
LICENSE	LICENSE	Initial commit	Apr 25, 2022
README.md	README.md	Update README.md	Aug 3, 2022
cluster_label.npy	cluster_label.npy	first commit	Jun 15, 2022
cog.yaml	cog.yaml	replicat demo	Jun 17, 2022
coglm_strategy.py	coglm_strategy.py	first commit	Jun 15, 2022
cogview2_completion.py	cogview2_completion.py	first commit	Jun 15, 2022
cogview2_text2image.py	cogview2_text2image.py	first commit	Jun 15, 2022
input.txt	input.txt	first commit	Jun 15, 2022
input_comp.txt	input_comp.txt	first commit	Jun 15, 2022
predict.py	predict.py	replicat demo	Jun 17, 2022
pretrain_coglm.py	pretrain_coglm.py	first commit	Jun 15, 2022
requirements.txt	requirements.txt	requirements.txt	Jun 15, 2022
text2image.sh	text2image.sh	first commit	Jun 15, 2022
text_guided_completion.sh	text_guided_completion.sh	first commit	Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Demo

Getting Started

Setup

Download

Text-to-Image Generation

Text-guided Completion

Gallery

About

Releases

Packages

Contributors 4

Languages

License

THUDM/CogView2

Folders and files

Latest commit

History

Repository files navigation

Web Demo

Getting Started

Setup

Download

Text-to-Image Generation

Text-guided Completion

Gallery

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages