This repository contains a set of tools for working with the 2nd version of the Common Objects in 3D (CO3Dv2) dataset.
The original dataset has been introduced in our ICCV'21 paper: Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. For accessing the original data, please switch to the v1
branch of this repository.
- Common Objects in 3D Challenge which allows transparent evaluation on a hidden test server - more details in the challenge README
- 2x larger number of sequences, and 4x larger number of frames
- Improved image quality - less blocky artifacts due to better video decoding
- Improved segmentation masks - stable tracking of the main foreground object without jumping to background objects
- Enabled downloading of a smaller single-sequence subset of ~100 sequences consisting only of the sequences used to evalute the many-view single-sequence task
- Dataset files are hosted in 20 GB chunks facilitating more stable downloads
- A novel, more user-friendly, dataset format
- All images within a sequence are cropped to the same height x width
The links to all dataset files are present in this repository in dataset/links.json
.
We also provide a python script that allows downloading all dataset files at once. In order to do so, execute the download script:
python ./co3d/download_dataset.py --download_folder DOWNLOAD_FOLDER
where DOWNLOAD_FOLDER
is a local target folder for downloading the dataset files.
Make sure to create this folder before commencing the download.
Size: All zip files of the dataset occupy 5.5 TB of disk-space.
We also provide a subset of the dataset consisting only of the sequences selected for the many-view single-sequence task where both training and evaluation are commonly conducted on a single image sequence. In order to download this subset add the
--single_sequence_subset
option to download_dataset.py
:
python ./co3d/download_dataset.py --download_folder DOWNLOAD_FOLDER --single_sequence_subset
Size: The single-sequence subset is much smaller than the full dataset and takes 8.9 GB of disk-space.
Together with releasing v2 of the dataset, we also organize the Common Objects in 3D Challenge hosted on EvalAI. Please visit the [challenge website](https://eval.ai/web/challenges/challenge-page/1819/overview) and [challenge README](./co3d/challenge/README.md) for the more information.This is a Python 3
/ PyTorch
codebase.
- Install
PyTorch
. - Install
PyTorch3D
.- Please note that Pytorch3D has to be built from source to enable the Implicitron module
- Install the remaining dependencies in
requirements.txt
:
pip install visdom tqdm requests h5py
- Install the CO3D package itself:
pip install -e .
Note that the core data model in co3d/dataset/data_types.py
is independent of PyTorch
/PyTorch3D
and can be imported and used with other machine-learning frameworks.
- Install dependencies - See Installation above.
- Download the dataset [here] to a given root folder
CO3DV2_DATASET_ROOT
. - Set the environment variable
CO3DV2_DATASET_ROOT
to the dataset root:export CO3DV2_DATASET_ROOT="your_dataset_root_folder"
- Run
example_co3d_challenge_submission.py
:Note thatcd examples python example_co3d_challenge_submission.py
example_co3d_challenge_submission.py
runs an evaluation of a simple depth-based image rendering (DBIR) model on all challenges and sets of the CO3D Challenge. Feel free to extend the script in order to provide your own submission to the CO3D Challenge.
Unit tests can be executed with:
python -m unittest
Implicitron is our open-source framework used to train all implicit shape learning methods from the CO3D paper. Please visit the following link for more details: https://github.com/facebookresearch/pytorch3d/tree/main/projects/implicitron_trainer
The dataset is organized in the filesystem as follows:
CO3DV2_DATASET_ROOT
├── <category_0>
│ ├── <sequence_name_0>
│ │ ├── depth_masks
│ │ ├── depths
│ │ ├── images
│ │ ├── masks
│ │ └── pointcloud.ply
│ ├── <sequence_name_1>
│ │ ├── depth_masks
│ │ ├── depths
│ │ ├── images
│ │ ├── masks
│ │ └── pointcloud.ply
│ ├── ...
│ ├── <sequence_name_N>
│ ├── set_lists
│ ├── set_lists_<subset_name_0>.json
│ ├── set_lists_<subset_name_1>.json
│ ├── ...
│ ├── set_lists_<subset_name_M>.json
│ ├── eval_batches
│ │ ├── eval_batches_<subset_name_0>.json
│ │ ├── eval_batches_<subset_name_1>.json
│ │ ├── ...
│ │ ├── eval_batches_<subset_name_M>.json
│ ├── frame_annotations.jgz
│ ├── sequence_annotations.jgz
├── <category_1>
├── ...
├── <category_K>
The dataset contains sequences named <sequence_name_i>
from K
categories with
names <category_j>
. Each category comprises sequence folders <category_k>/<sequence_name_i>
containing the list of sequence images, depth maps, foreground masks, and valid-depth masks images
, depths
, masks
, and depth_masks
respectively. Furthermore, <category_k>/<sequence_name_i>/set_lists/
stores M
json files set_lists_<subset_name_l>.json
, each describing a certain sequence subset.
Users specify the loaded dataset subset by setting self.subset_name
to one of the
available subset names <subset_name_l>
.
frame_annotations.jgz
and sequence_annotations.jgz
are gzipped json files containing the list of all frames and sequences of the given category stored as lists of FrameAnnotation
and SequenceAnnotation
objects respectivelly.
Each set_lists_<subset_name_l>.json
file contains the following dictionary:
{
"train": [
(sequence_name: str, frame_number: int, image_path: str),
...
],
"val": [
(sequence_name: str, frame_number: int, image_path: str),
...
],
"test": [
(sequence_name: str, frame_number: int, image_path: str),
...
],
}
defining the list of frames (identified with their sequence_name
and frame_number
) in the "train", "val", and "test" subsets of the dataset.
Note that frame_number
can be obtained only from frame_annotations.jgz
and does not necesarrily correspond to the numeric suffix of the corresponding image file name (e.g. a file <category_0>/<sequence_name_0>/images/frame00005.jpg
can have its frame number set to 20, not 5).
In CO3DV2, by default, each category contains a subset of the following set lists:
"set_lists_fewview_test.json" # Few-view task on the "test" sequence set.
"set_lists_fewview_dev.json" # Few-view task on the "dev" sequence set.
"set_lists_manyview_test.json" # Many-view task on the "test" sequence of a category.
"set_lists_manyview_dev_0.json" # Many-view task on the 1st "dev" sequence of a category.
"set_lists_manyview_dev_1.json" # Many-view task on the 2nd "dev" sequence of a category.
Each eval_batches_<subset_name_l>.json
file contains a list of evaluation examples in the following form:
[
[ # batch 1
(sequence_name: str, frame_number: int, image_path: str),
...
],
[ # batch 1
(sequence_name: str, frame_number: int, image_path: str),
...
],
]
Note that the evaluation examples always come from the "test"
part of the corresponding set list set_lists_<subset_name_l>.json
.
The evaluation task then consists of generating the first image in each batch given the knowledge of the other ones. Hence, the first image in each batch represents the (unseen) target frame, for which only the camera parameters are known, while the rest of the images in the batch are the known source frames whose cameras and colors are given.
Note that for the Many-view task, where a user is given many known views of a particular sequence and the goal is to generate held-out views from the same sequence, eval_batches_manyview_<sequence_set>_<sequence_id>.json
contain a single (target) frame per evaluation batch. Users can obtain the known views from the corresponding "train"
list of frames in the set list set_lists_manyview_<sequence_set>_<sequence_id>.json
.
The core data model in co3d/dataset/data_types.py
is independent of PyTorch
/PyTorch3D
and can be imported and used with other machine-learning frameworks.
For example, in order to load the per-category frame and sequence annotations users can execute the following code:
from typing import List
from co3d.dataset.data_types import (
load_dataclass_jgzip, FrameAnnotation, SequenceAnnotation
)
category_frame_annotations = load_dataclass_jgzip(
f"{CO3DV2_DATASET_ROOT}/{category_name}/frame_annotations.jgz", List[FrameAnnotation]
)
category_sequence_annotations = load_dataclass_jgzip(
f"{CO3DV2_DATASET_ROOT}/{category_name}/sequence_annotations.jgz", List[SequenceAnnotation]
)
Furthermore, all challenge-related code under co3d/challenge
also does not depend on PyTorch
.
If you use our dataset, please use the following citation:
@inproceedings{reizenstein21co3d,
Author = {Reizenstein, Jeremy and Shapovalov, Roman and Henzler, Philipp and Sbordone, Luca and Labatut, Patrick and Novotny, David},
Booktitle = {International Conference on Computer Vision},
Title = {Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction},
Year = {2021},
}
The CO3D codebase is released under the CC BY-NC 4.0.
The following presentation of the dataset was delivered at the Extreme Vision Workshop at CVPR 2021: