first commit

xingjianleng · May 11, 2021 · aca3097 · aca3097
commit aca3097
Show file tree

Hide file tree

Showing 26 changed files with 5,065 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+.DS_Store
+__pycache__/
+
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021 OpenAI
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,36 @@
+# guided-diffusion
+
+This is the codebase for [Diffusion Models Beat GANS on Image Synthesis](openai.com).
+
+This repository is based on [openai/improved-diffusion](https://github.com/openai/improved-diffusion), with modifications for classifier conditioning and architecture improvements.
+
+# Usage
+
+Training diffusion models is described in the [parent repository](https://github.com/openai/improved-diffusion). Training a classifier is similar. We assume you have put training hyperparameters into a `TRAIN_FLAGS` variable, and classifeir hyperparameters into a `CLASSIFIER_FLAGS` variable. Then you can run:
+
+```
+mpiexec -n N python scripts/classifier_train.py --data_dir path/to/imagenet $TRAIN_FLAGS $CLASSIFIER_FLAGS
+```
+
+Make sure to divide the batch size in `TRAIN_FLAGS` by the number of MPI processes you are using.
+
+Here are flags for training the 128x128 classifier. You can modify these for training classifiers at other resolutions:
+
+```sh
+TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 256 --lr 3e-4 --save_interval 10000 --weight_decay 0.05"
+CLASSIFIER_FLAGS="--image_size 128 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
+```
+
+For sampling from a 128x128 classifier-guided model, 25 step DDIM:
+
+```sh
+MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 128 --learn_sigma True --num_channels 256 --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
+CLASSIFIER_FLAGS="--image_size 128 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True"
+SAMPLE_FLAGS="--batch_size 4 --num_samples 50000 --timestep_respacing ddim25 --use_ddim True"
+mpiexec -n N python scripts/classifier_sample.py \
+    --model_path /path/to/model.pt \
+    --classifier_path path/to/classifier.pt \
+    $MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGS
+```
+
+To sample for 250 timesteps without DDIM, replace `--timestep_respacing ddim25` to `--timestep_respacing 250`, and replace `--use_ddim True` with `--use_ddim False`.
diff --git a/datasets/README.md b/datasets/README.md
@@ -0,0 +1,27 @@
+# Downloading datasets
+
+This directory includes instructions and scripts for downloading ImageNet and LSUN bedrooms for use in this codebase.
+
+## Class-conditional ImageNet
+
+For our class-conditional models, we use the official ILSVRC2012 dataset with manual center cropping and downsampling. To obtain this dataset, navigate to [this page on image-net.org](http://www.image-net.org/challenges/LSVRC/2012/downloads) and sign in (or create an account if you do not already have one). Then click on the link reading "Training images (Task 1 & 2)". This is a 138GB tar file containing 1000 sub-tar files, one per class.
+
+Once the file is downloaded, extract it and look inside. You should see 1000 `.tar` files. You need to extract each of these, which may be impractical to do by hand on your operating system. To automate the process on a Unix-based system, you can `cd` into the directory and run this short shell script:
+
+```
+for file in *.tar; do tar xf "$file"; rm "$file"; done
+```
+
+This will extract and remove each tar file in turn.
+
+Once all of the images have been extracted, the resulting directory should be usable as a data directory (the `--data_dir` argument for the training script). The filenames should all start with WNID (class ids) followed by underscores, like `n01440764_2708.JPEG`. Conveniently (but not by accident) this is how the automated data-loader expects to discover class labels.
+
+## LSUN bedroom
+
+To download and pre-process LSUN bedroom, clone [fyu/lsun](https://github.com/fyu/lsun) on GitHub and run their download script `python3 download.py bedroom`. The result will be an "lmdb" database named like `bedroom_train_lmdb`. You can pass this to our [lsun_bedroom.py](lsun_bedroom.py) script like so:
+
+```
+python lsun_bedroom.py bedroom_train_lmdb lsun_train_output_dir
+```
+
+This creates a directory called `lsun_train_output_dir`. This directory can be passed to the training scripts via the `--data_dir` argument.
diff --git a/datasets/lsun_bedroom.py b/datasets/lsun_bedroom.py
@@ -0,0 +1,54 @@
+"""
+Convert an LSUN lmdb database into a directory of images.
+"""
+
+import argparse
+import io
+import os
+
+from PIL import Image
+import lmdb
+import numpy as np
+
+
+def read_images(lmdb_path, image_size):
+    env = lmdb.open(lmdb_path, map_size=1099511627776, max_readers=100, readonly=True)
+    with env.begin(write=False) as transaction:
+        cursor = transaction.cursor()
+        for _, webp_data in cursor:
+            img = Image.open(io.BytesIO(webp_data))
+            width, height = img.size
+            scale = image_size / min(width, height)
+            img = img.resize(
+                (int(round(scale * width)), int(round(scale * height))),
+                resample=Image.BOX,
+            )
+            arr = np.array(img)
+            h, w, _ = arr.shape
+            h_off = (h - image_size) // 2
+            w_off = (w - image_size) // 2
+            arr = arr[h_off : h_off + image_size, w_off : w_off + image_size]
+            yield arr
+
+
+def dump_images(out_dir, images, prefix):
+    if not os.path.exists(out_dir):
+        os.mkdir(out_dir)
+    for i, img in enumerate(images):
+        Image.fromarray(img).save(os.path.join(out_dir, f"{prefix}_{i:07d}.png"))
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--image-size", help="new image size", type=int, default=256)
+    parser.add_argument("--prefix", help="class name", type=str, default="bedroom")
+    parser.add_argument("lmdb_path", help="path to an LSUN lmdb database")
+    parser.add_argument("out_dir", help="path to output directory")
+    args = parser.parse_args()
+
+    images = read_images(args.lmdb_path, args.image_size)
+    dump_images(args.out_dir, images, args.prefix)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/guided_diffusion/__init__.py b/guided_diffusion/__init__.py
@@ -0,0 +1,3 @@
+"""
+Codebase for "Improved Denoising Diffusion Probabilistic Models".
+"""
diff --git a/guided_diffusion/dist_util.py b/guided_diffusion/dist_util.py
@@ -0,0 +1,82 @@
+"""
+Helpers for distributed training.
+"""
+
+import io
+import os
+import socket
+
+import blobfile as bf
+from mpi4py import MPI
+import torch as th
+import torch.distributed as dist
+
+# Change this to reflect your cluster layout.
+# The GPU for a given rank is (rank % GPUS_PER_NODE).
+GPUS_PER_NODE = 8
+
+SETUP_RETRY_COUNT = 3
+
+
+def setup_dist():
+    """
+    Setup a distributed process group.
+    """
+    if dist.is_initialized():
+        return
+
+    comm = MPI.COMM_WORLD
+    backend = "gloo" if not th.cuda.is_available() else "nccl"
+
+    if backend == "gloo":
+        hostname = "localhost"
+    else:
+        hostname = socket.gethostbyname(socket.getfqdn())
+    os.environ["MASTER_ADDR"] = comm.bcast(hostname, root=0)
+    os.environ["RANK"] = str(comm.rank)
+    os.environ["WORLD_SIZE"] = str(comm.size)
+
+    port = comm.bcast(_find_free_port(), root=0)
+    os.environ["MASTER_PORT"] = str(port)
+    dist.init_process_group(backend=backend, init_method="env://")
+
+
+def dev():
+    """
+    Get the device to use for torch.distributed.
+    """
+    if th.cuda.is_available():
+        return th.device(f"cuda:{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}")
+    return th.device("cpu")
+
+
+def load_state_dict(path, **kwargs):
+    """
+    Load a PyTorch file without redundant fetches across MPI ranks.
+    """
+    if MPI.COMM_WORLD.Get_rank() == 0:
+        with bf.BlobFile(path, "rb") as f:
+            data = f.read()
+    else:
+        data = None
+    data = MPI.COMM_WORLD.bcast(data)
+    return th.load(io.BytesIO(data), **kwargs)
+
+
+def sync_params(params):
+    """
+    Synchronize a sequence of Tensors across ranks from rank 0.
+    """
+    for p in params:
+        with th.no_grad():
+            dist.broadcast(p, 0)
+
+
+def _find_free_port():
+    try:
+        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        s.bind(("", 0))
+        s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+        return s.getsockname()[1]
+    finally:
+        s.close()