ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding (CVPR2024)

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding (CVPR2023)

Official implementation of ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding

Official implementation of ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Project Website

News

[06/17/2024] "ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding the cvpr2024 version (latest) of the ensembled pre-trained model with 10k xyzrgb points is uploaded here" [02/26/2024] "ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding is accepted to CVPR2024!"

[06/09/2023] "PointBERT ULIP-2 pretrained model released, please find it in the here".

[06/09/2023] A smaller version of "ULIP - ShapeNet Triplets" are released at here, it's around 420GB now. Check this image folder "only_rgb_depth_images", you can choose to download this subset of rendered images, which are the exact images leveraged by ULIP instead of downloading the full "rendered_images" folder (more than 1TB).

[05/22/2023] "ULIP - Objaverse Triplets" and "ULIP - ShapeNet Triplets" have been uploaded here.

[05/14/2023] ULIP-2 has been released!

[02/28/2023] ULIP has been accepted by CVPR 2023! 🔥🔥🔥

Animation

What is ULIP

ULIP is a Model-agnostic Multimodal Pre-training Framework, which can leverage information from other modalities (Images, Language) to improve the ability to understand 3D data without introducing any extra latency.

Pipeline

Instructions

ULIP is a highly extensible multimodal pre-training framework, and it's model-architecture agnostic, meaning you can easily plug in any 3D backbone models and pre-train it using our framework to get a jump-start for various downstreaming tasks!

[Install environments]

We pre-train ULIP on 8 Nvidia A100 GPUs, the code is tested with CUDA==11.0 and pytorch==1.10.1
conda create -n ulip python=3.7.15
conda activate ulip
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

[optional]
If you want to pre-train PointNeXt, we embed a modified PointNeXt codebase inside the ./models/pointnext, please do the following to install it:

cd ./models/pointnext/PointNeXt \
bash update.sh \
bash install.sh \

[Download datasets and initialize models, put them in the right paths.]

Download the used datasets and initialize models from here. For now, you ONLY need to download "initialize_models", "modelnet40_normal_resampled", and "shapenet-55". You might need a gmail account to access it.
After you download the datasets and initialize models, you can choose one of the following options:
(1) Put it in or do a soft link to the data folder, by default the data folder should have the following structure:

./data |
-- ModelNet40.yaml |
-- ShapeNet-55.yaml |
-- dataset_3d.py |
-- dataset_catalog.json |
-- initialize_models |
-- labels.json |
-- modelnet40_normal_resampled |
-- shapenet-55 |
-- templates.json

(2) Change the paths accordingly (optional to do if you don't want to put/link downloaded files in the data folder):

# Change the "DATA_PATH", "PC_PATH", "IMAGE_PATH"
./data/ShapeNet-55.yaml
# Change the "DATA_PATH"
./data/ModelNet40.yaml
# Change the initialize_models address
./models/ULIP_models.py
Modify this line "pretrain_slip_model = torch.load('./data/initialize_models/slip_base_100ep.pt', map_location=torch.device('cpu'))"

[Pre-train 3D backbones]

Our framework is model architecture agonistic, currently four 3D backbones are supported:
Pointnet2(ssg)
PointBERT
PointMLP
PointNeXt

Please change the script to accommodate your system accordingly, this script is used to pre-train on 8 gpus by default. You can also modify the desired output folder in the script.

# the scripts are named by its correspoinding 3D backbone name.
bash ./scripts/(choose your pre-train script)

[Test pre-trained models for zero-shot classification on ModelNet40]

You may also change the output path in the scripts as well.

bash ./scripts/(choose your test script) /path/to/your/checkpoint.pt

You may also change the output path in the scripts as well.

[Pre-train & Test using different number of points]

Change the npoints argument in the scripts, by default its 8192.
Note: Currently we use FPS to subsample the 8192 points, which might slow down the training speed. If you'd like, you can choose to cache or save the pre-processed datasets with different number of points to speed up your pre-training.

[Pre-train your customized 3D backbones]

There are only two things you need to change to pre-train your own customized 3D backbones:
(1) Define your own 3D backbone in ./models folder.
We put a template "customized_backbone" here, you can refer to the comments to see the expected input and output shapes. You can also refer to how pointnet2 is defined here.
(2) Use or modify this "ULIP_CUSTOMIZED" class in ./models/ULIP_models.py.
Please refer to the comments in "ULIP_CUSTOMIZED" class, it should be straightforward to follow, and please be sure to change the "pc_feat_dims" accordingly (since we are agnostic to the point cloud output feature dimensions of your customized 3D backbones).

Pre-trained models for zero-shot classification

ULIP-1 models, Zero-shot classification on ModelNet40, 8k points pre-train, 8k points test, best checkpoint:

model	top1	top5
Pointnet2(ssg)	57.7	78.9
PointMLP	60.0	79.4
PointBERT	60.3	84.0
PointNeXt	56.2	77.0

ULIP-2 models, pre-trained with 10k xyzrgb point clouds on the ensembled ULIP-Objaverse + ULIP-ShapeNet, check the google drive.

Note that, ULIP-2 models are improved in the CVPR2024 version compared to the initial arxiv release, please refer to the new one for matching the cvpr2024 version numbers.

model	Objaverse-top1	Objaverse-top5	Modelnet40-top1	Modelnet40-top5
openshape-pointbert-10kxyzrgb-ensembled-objavserse-shapenet-abo-3d_future	46.8	77.0	84.4	98.0
ULIP2-PointBERT-10kxyzrgb-ensembled-objaverse-shapenet	50.6	79.1	84.7	97.1

License and term of use for the released pre-train datasets

The code is under https://github.com/salesforce/ULIP/blob/main/LICENSE.txt.

The released "ULIP - Objaverse Triplets" is under https://opendatacommons.org/licenses/by/1-0/, consistent with Objaverse's license.

The released "ULIP - ShapeNet Triplets" is under the terms of use from https://shapenet.org/terms, consistent with ShapeNet's terms of use.

Citation

@inproceedings{xue2023ulip,
  title={Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding},
  author={Xue, Le and Gao, Mingfei and Xing, Chen and Mart{\'\i}n-Mart{\'\i}n, Roberto and Wu, Jiajun and Xiong, Caiming and Xu, Ran and Niebles, Juan Carlos and Savarese, Silvio},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={1179--1189},
  year={2023}
}
@inproceedings{xue2024ulip,
  title={Ulip-2: Towards scalable multimodal pre-training for 3d understanding},
  author={Xue, Le and Yu, Ning and Zhang, Shu and Panagopoulou, Artemis and Li, Junnan and Mart{\'\i}n-Mart{\'\i}n, Roberto and Wu, Jiajun and Xiong, Caiming and Xu, Ran and Niebles, Juan Carlos and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27091--27101},
  year={2024}}


# Contact
If you have any question about this project, please contact [lxue@salesforce.com](lxue@salesforce.com)

Name	Name	Last commit message	Last commit date
Latest commit Tycho-Xue Feb 6, 2025 019c35f · Feb 6, 2025 History 23 Commits
assets	assets	Upload pointnet2 ssg pre-trained weight and add zero-shot classificat…	Mar 10, 2023
data	data	[ULIP-2] Update files.	Jun 28, 2024
models	models	[ULIP-2] Update files.	Jun 28, 2024
scripts	scripts	[ULIP-2] Update files.	Jun 28, 2024
utils	utils	[ULIP-2] CVPR2024 version model upgrade.	Jun 21, 2024
AI_ETHICS.md	AI_ETHICS.md	Upload required file(s) for compliance	Jan 27, 2025
CODEOWNERS	CODEOWNERS	Upload required file(s) for compliance	Jan 27, 2025
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Add necessary files.	Dec 15, 2022
CONTRIBUTING.md	CONTRIBUTING.md	Upload required file(s) for compliance	Jan 27, 2025
LICENSE.txt	LICENSE.txt	Add BSD license.	Dec 15, 2022
README.md	README.md	Update README.md	Feb 6, 2025
SECURITY.md	SECURITY.md	Add necessary files.	Dec 15, 2022
main.py	main.py	[ULIP-2] CVPR2024 version model upgrade.	Jun 21, 2024
requirements.txt	requirements.txt	[ULIP-2] CVPR2024 version model upgrade.	Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding (CVPR2024)

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding (CVPR2023)

News

Animation

What is ULIP

Pipeline

Instructions

[Install environments]

[Download datasets and initialize models, put them in the right paths.]

[Pre-train 3D backbones]

[Test pre-trained models for zero-shot classification on ModelNet40]

[Pre-train & Test using different number of points]

[Pre-train your customized 3D backbones]

Pre-trained models for zero-shot classification

License and term of use for the released pre-train datasets

Citation

About

Releases

Packages

Contributors 2

Languages

License

salesforce/ULIP

Folders and files

Latest commit

History

Repository files navigation

ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding (CVPR2024)

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding (CVPR2023)

News

Animation

What is ULIP

Pipeline

Instructions

[Install environments]

[Download datasets and initialize models, put them in the right paths.]

[Pre-train 3D backbones]

[Test pre-trained models for zero-shot classification on ModelNet40]

[Pre-train & Test using different number of points]

[Pre-train your customized 3D backbones]

Pre-trained models for zero-shot classification

License and term of use for the released pre-train datasets

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages