NIGHTS (Novel Image Generations with Human-Tested Similarities) is a dataset of 20,019 image triplets with human scores of perceptual similarity. Each triplet consists of a reference image and two distortions, located at ref/xxx/yyy.png
, distort/xxx/yyy_0.png
, and distort/xxx/yyy_1.png
.
Directories are numbered 000
to 099
and files are numbered 000
to 999
, though there is not an image triplet at every number.
nights/
├── ref/
│ ├── 000/
│ │ ├── 000.png
│ │ ├── 001.png
│ │ ├── ...
│ │ └── 999.png
│ └── 001/ 002/ ... 099/
├── distort/
│ ├── 000/
│ │ ├── 000_0.png
│ │ ├── 000_1.png
│ │ ├── ...
│ │ ├── 999_0.png
│ │ └── 999_1.png
│ └── 001/ 002/ ... 099/
├── data.csv
└── README.md
All data was generated by Stable Diffusion 2.1 [1] by sampling image triplets with a prompt of the same category and different random seed, using the structure:
An image of a <category>
. The category
is drawn from image labels in popular datasets - ImageNet [2], CIFAR-10 [3], CIFAR-100 [3], Oxford 102 Flower [4], Food-101 [5], and SUN397 [6].
See data.csv
for the full list of categories corresponding to each image triplet.
We note that by using Stable Diffusion, our benchmark is exposed to potential biases preexisting and sensitive content in the model. As such, we generate our images with a pre-defined set of categories, while largely avoiding human faces. Our perceptual model is also finetuned from existing pre-trained backbones, and thus may also inherit prior errors and biases.
[1] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
[2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
[3] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[4] Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE, 2008.
[5] Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446–461. Springer, 2014.
[6] Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.