In the following sections, we'll provide an example of the data-preparation process using COLMAP and ffmpeg](#colmap-for-camera-poses), and then show how to train and render Instant-NGP+T, 3DGS+T and ENeRF models inside EasyVolcap.
Note that although the provided example is on a static dataset. The same preparation, training and rendering process can also be done on a dynamic dataset (e.g. multi-view video) with the same or similar commands. For now, we only provide examples of static datasets since most dynamic datasets are very large in size and hard to download.
For multi-view datasets with static cameras (suppose a sufficient number of cameras for COLMAP to run), run COLMAP and Instant-NGP+T on the first frame should give a rough camera pose.
The training and rendering process for models like Instant-NGP+T, 3DGS+T and ENeRF is almost completely the same as the static ones. Take notice of the dataloader_cfg.dataset_cfg.frame_sample
and val_dataloader_cfg.dataset_cfg.frame_sample
parameter to control the number of frames to use for training and evaluation. Please refer to config.md
for more details on how to pass in arguments using EasyVolcap's configuration system.
The example dataset used from this section can be downloaded from this Google Drive link. After extracting the dataset, place the content inside data/zju
such that you see files like:
data/zju/zju3dv/images
data/zju/zju3dv/optimized
COLMAP for Camera Poses
In this section, we provide a basic example of extracting video frames, extracting camera parameters and preparing the dataset structure for EasyVolcap for a static scene.
For now, we only implemented the pipeline of extracting video frames and running COLMAP on it.
Foreground segmentation is also supported and documented in segmentation.md
.
NeRFStudio provides guidance on how to utilize ARKit of LiDAR-enabled iPhone or iPad devices to quickly get access to the camera parameters, and we're looking forward to incorporating those data formats into our pipeline.
COLMAP is robust, but extremely slow to run. Thus we wrote a script to help you with those annoying parameter tuning and database management with pretty prints of stuff. The parameter choices provided in the script were tested with a recording of Xiaowei Zhou's Lab.
Here we assume the user has the following directory structure after completing the data preprocessing step.
data/mww/zju3dv # data_root & data_root
├── intri.yml # required
├── extri.yml # required
├── images # required
├── colmap # optional, required during colmap pre-processing
│ ├── colmap.db
│ ├── colmap_ply
│ │ └── 000000.ply
│ ├── colmap_sparse
│ │ └── 0
│ │ ├── cameras.bin
│ │ ├── images.bin
│ │ ├── points3D.bin
│ │ └── project.ini
│ └── colmap_text
│ ├── cameras.txt
│ ├── images.txt
│ └── points3D.txt
└── videos # optional, required before extracting images
└── 00
└── output.mp4
To begin with, place your video inside the data/zju/zju3dv/videos/00
folder.
And run the following command to
- Extract frames from the video
- Discard blurry frames
- Run COLMAP
- Convert to easyvolcap camera format
Let's assume you've stored the location of data/zju/zju3dv
in a variable data_root
and also used shell arguments for other parameter choices.
expname="zju3dv"
data_root="data/zju/zju3dv"
video="output.mp4"
ffmpeg="/usr/bin/ffmpeg"
fps="20"
resolution="1920:1080"
ffmpeg_vf="zscale=t=linear:npl=100,format=gbrpf32le,zscale=p=bt709,tonemap=tonemap=hable:desat=0,zscale=t=bt709:m=bt709:r=tv,"
After the preparation, you can extract the video frames using:
# Extra video frames in images/00, make sure to tune fps and change the scale to match your video (or omit them to use the video's default)
mkdir -p "${data_root}/images/00"
${ffmpeg} -i "${data_root}/videos/00/${video}" -vf "${ffmpeg_vf}fps=${fps},scale=${resolution}" -q:v 1 -qmin 1 -start_number 0 "${data_root}/images/00/%06d.jpg"
Before running COLMAP, it's also recommended to discard blurry images to aid the reconstruction. After which, you can run COLMAP using the following command:
# Discard blurry images, maybe tune the threshold, check the script for more details
python scripts/colmap/discard_blurry.py --data_root "${data_root}/images/00"
# Run COLMAP, this took roughly 3 hours on ~700 images
python scripts/colmap/run_colmap.py --data_root "${data_root}"
# Convert to easymocap dataformat
python scripts/colmap/colmap_to_easyvolcap.py --data_root "${data_root}"
# Flatten the images folder
python scripts/colmap/unflatten_dataset.py --data_root "${data_root}"
Our Instant-NGP+T Implementation
l3mhet_zju3dv_render.mp4
Now the data preparation is completed, you've got an images
folder and a pair of intri.yml
and extri.yml
files, you can run the l3mhet model.
- L3: 3 levels of importance sampling.
- MHE: Multi-resolution hash encoding.
- T: Supports time dimension input.
You need to write a config file for this model
- Write the data-folder-related stuff inside configs/datasets. Just copy and paste
configs/datasets/zju/zju3dv.yaml
and modify thedata_root
andbounds
(bounding box), or maybe add the camera'snear
far
threshold. - Write the experiment config inside configs/exps. Just copy and paste
configs/exps/l3mhet/l3mhet_zju3dv.yaml
and modify thedataset
-related line inconfigs
.
The training should convert to a meaningful stage after 10-20 mins on a 3090 (actually the first 500 iter already produces a reasonable result).
# With your config files ready, you can run the following command to train the model
evc -c configs/exps/l3mhet/l3mhet_${expname}.yaml
# Now run the following command to render some output
evc -t test -c configs/exps/l3mhet/l3mhet_${expname}.yaml,configs/specs/spiral.yaml
configs/specs/spiral.yaml
: please check this file for more details, it's a collection of configurations to tell the dataloader and visualizer to generate a spiral path by interpolating the given cameras
Our 3DGS+T Implementation
gaussiant_zju3dv_render.mp4
gaussiant_zju3dv_demo.mp4
After preparing the dataset and model with COLMAP and Instant-NGP following the tutorial in the previous sections, we're ready to train some representation with better representational power and better rendering quality.
The original 3DGS uses the sparse reconstruction result of COLMAP for initialization.
However, we found that the sparse reconstruction result often contains a lot of floating points, which is hard to prune for 3DGS and could easily make the model fail to converge.
Thus, we opted to use the "dense" reconstruction result of our Instant-NGP+T implementation by computing the RGBD image for input views and concatenating them as the input of 3DGS. The script volume_fusion.py
controls this process and it should work similarly on all models that support depth output.
Following the official Instant-NGP implementation, we also perform camera parameter optimizations during the training of the Instant-NGP+T model (extrinsic parameters for now).
The script responsible for extracting the camera parameters is extract_optimized_cameras.py
.
An optimized
folder will be created inside your dataset root, where a pair of intri.yml
and extri.yml
will be stored there.
After extraction, use optimized.yaml
in your experiment config for applying the optimized parameters.
The following script block provides examples of how to extract the optimized camera parameters and prepare an initialization for our 3DGS+T implementation.
# Extract optimized cameras from the l3mhet model
python scripts/tools/extract_optimized_cameras.py -- -c configs/exps/l3mhet/l3mhet_${expname}.yaml
# Extract geometry (point cloud) for initialization from the l3mhet model
# Tune image sample rate and resizing ratio for a denser or sparser estimation
python scripts/fusion/volume_fusion.py -- -c configs/exps/l3mhet/l3mhet_${expname}.yaml val_dataloader_cfg.dataset_cfg.ratio=0.25 val_dataloader_cfg.dataset_cfg.view_sample=0,null,25
# Move the rendering results to the dataset folder
mkdir -p ${data_root}/vhulls
cp data/geometry/l3mhet_${expname}/POINT/frame0000.ply ${data_root}/vhulls/000000.ply
Our convention for storing initialization point clouds:
- Raw point clouds extracted using Instant-NGP or Space Carving are placed inside the
vhulls
folder. These files might be large. It's OK to directly optimize 3DGS+T on these. - We might perform some clean-up of the point clouds and store them in the
surfs
folder.- For 3DGS+T, the cleaned-up point clouds might be easier to optimize since 3DGS is good at growing details but not so good at dealing with floaters (removing or splitting).
- For other representations, the cleaned-up point clouds work better than the visual hull (from Space Carving) but might not work so well compared to the raw point clouds of Instant-NGP.
Then, prepare a nexperiment config like configs/exps/gaussiant/gaussiant_zju3dv.yaml
.
The colmap.yaml
provides some heuristics for large-scale static scenes. Remove these if you're not planning on using COLMAP's parameters directly.
# Train a 3DGS model on the ${expname} dataset
evc -c configs/exps/gaussiant/gaussiant_${expname}.yaml
# Perform rendering on the trained ${expname} dataset
evc -t test -c configs/exps/gaussiant/gaussiant_${expname}.yaml,configs/specs/superm.yaml,configs/specs/spiral.yaml
# Perform rendering with GUI, do this on a machine with monitor, tested on Windows and Ubuntu
evc -t gui -c configs/exps/gaussiant/gaussiant_${expname}.yaml,configs/specs/superm.yaml
The superm.yaml
skips the loading of input images and other initializations for network-only rendering since all the information we need is contained inside the trained model.
Our Improved ENeRF Implementation and Generalization
enerf_zju3dv_render.mp4
enerf_zju3dv_demo.mp4
Pre-trained model for ENeRFi on the DTU dataset can be downloaded from this Google Drive link. After downloading, rename the model to latest.npz
place it in data/trained_model/enerfi_dtu
.
# Render ENeRF with GUI on zju3dv dataset
evc -t gui -c configs/base.yaml,configs/models/enerfi.yaml,configs/datasets/zju/zju3dv.yaml,configs/specs/optimized.yaml exp_name=enerfi_dtu val_dataloader_cfg.dataset_cfg.cache_raw=False val_dataloader_cfg.dataset_cfg.ratio=1.0 val_dataloader_cfg.dataset_cfg.view_sample=0,None,5 val_dataloader_cfg.dataset_cfg.force_sparse_view=True val_dataloader_cfg.dataset_cfg.n_srcs_list=3, val_dataloader_cfg.dataset_cfg.use_vhulls=True
# Render ENeRF with GUI on zju3dv dataset using fp16, higher performance, worse results on some views
evc -t gui -c configs/base.yaml,configs/models/enerfi.yaml,configs/datasets/zju/zju3dv.yaml,configs/specs/optimized.yaml,configs/specs/fp16.yaml exp_name=enerfi_dtu val_dataloader_cfg.dataset_cfg.cache_raw=False val_dataloader_cfg.dataset_cfg.ratio=1.0 val_dataloader_cfg.dataset_cfg.view_sample=0,None,5 val_dataloader_cfg.dataset_cfg.force_sparse_view=True val_dataloader_cfg.dataset_cfg.n_srcs_list=3, val_dataloader_cfg.dataset_cfg.use_vhulls=True # only cache ten images
# Render ENeRF on zju3dv dataset with spiral paths
evc -t test -c configs/base.yaml,configs/models/enerfi.yaml,configs/datasets/zju/zju3dv.yaml,configs/specs/optimized.yaml,configs/specs/spiral.yaml,configs/specs/ibr.yaml exp_name=enerfi_dtu val_dataloader_cfg.dataset_cfg.cache_raw=False val_dataloader_cfg.dataset_cfg.ratio=1.0 val_dataloader_cfg.dataset_cfg.view_sample=0,None,5 val_dataloader_cfg.dataset_cfg.force_sparse_view=True val_dataloader_cfg.dataset_cfg.n_srcs_list=3, val_dataloader_cfg.dataset_cfg.use_vhulls=True val_dataloader_cfg.dataset_cfg.render_size=768,1366 runner_cfg.visualizer_cfg.save_tag=zju3dv
# Render ENeRF with GUI on RenBody dataset
evc -t gui -c configs/base.yaml,configs/models/enerfi.yaml,configs/datasets/renbody/0013_01_obj.yaml,configs/specs/optimized.yaml,configs/specs/mask.yaml,configs/specs/vf0.yaml exp_name=enerfi_dtu val_dataloader_cfg.dataset_cfg.n_srcs_list=4, val_dataloader_cfg.dataset_cfg.use_vhulls=True val_dataloader_cfg.dataset_cfg.ratio=1.0
evc -t gui -c configs/base.yaml,configs/models/enerfi.yaml,configs/datasets/renbody/0008_01_obj.yaml,configs/specs/optimized.yaml,configs/specs/mask.yaml,configs/specs/vf0.yaml exp_name=enerfi_dtu val_dataloader_cfg.dataset_cfg.n_srcs_list=4, val_dataloader_cfg.dataset_cfg.use_vhulls=True val_dataloader_cfg.dataset_cfg.ratio=1.0