This repo is the official implementation of the paper "FLEX: Full-Body Grasping Without Full-Body Grasps".
Purva Tendulkar · Dídac Surís · Carl Vondrick
FLEX is a generative model that generates full-body avatars grasping 3D objects in a 3D environment. FLEX leverages the existence of pre-trained prior models for:
- Full-Body Pose - VPoser (trained on the AMASS dataset)
- Right-Hand Grasping - GrabNet (trained on right-handed grasps of the GRAB dataset)
- Pose-Ground Relation - PGPrior (trained on the AMASS dataset)
For more details please refer to the Paper or the project website.
This implementation:
- Can run FLEX on arbitrary objects in arbitrary scenes provided by users.
- Can run FLEX on the test objects of the GRAB dataset (with pre-computed object centering and BPS representation).
This package has been tested for the following:
- Pytorch>=1.7.1
- Python >=3.7.0
- MANO
- SMPLX
- chamfer_distance
- bps_torch
- psbody-mesh
- Kaolin
To install the dependencies please follow the next steps:
- Clone this repository:
git clone https://github.com/purvaten/FLEX.git cd FLEX
- Install the dependencies by the following commands:
conda create -n flex python=3.7.11 conda activate flex conda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.3 -c pytorch conda install pytorch3d -c pytorch3d conda install meshplot conda install -c conda-forge jupyterlab pip install -r requirements.txt pip install kaolin==0.12.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-1.10.1_cu113.html
In order to run FLEX, create a data/
directory and follow the steps below:
- Download the Habitat receptacle meshes info from here. This is dictionary where the keys are the names of the receptacles, and the values are a list of
[vertices, faces]
for different configurations of that receptacle (e.g., with doors open, doors closed, drawer opened, etc.) - Download the main dataset from here. This is a dictionary where the keys are the names of the example instances, and the values are the
[object_translation, object_orientation, recept_idx]
whererecept_idx
is the index to the receptacle configuration inreceptacles.npz
. - Store both files under
FLEX/data/replicagrasp/
. - To visualize random instances of the dataset, run the notebook
FLEX/flex/notebooks/viz_replicagrasp.ipynb
.
- Download the SMPL-X and MANO models from the SMPL-X website and MANO website.
- Download the GRAB object mesh (
.ply
) files and BPS points (bps.npz
) from the GRAB website. Downloadobj_info.npy
from here. - Download full-body related desiderata here.
- The final structure of data should look as below:
FLEX
├── data
│ │
│ ├── smplx_models
│ │ ├── mano
│ │ │ ├── MANO_LEFT.pkl
│ │ │ ├── MANO_RIGHT.pkl
│ │ └── smplx
│ │ ├── SMPLX_FEMALE.npz
│ │ └── ...
│ ├── obj
│ │ ├── obj_info.npy
│ │ ├── bps.npz
│ │ └── contact_meshes
│ │ ├── airplane.ply
│ │ └── ...
│ ├── sbj
│ │ ├── adj_matrix_original.npy
│ │ ├── adj_matrix_simplified.npy
│ │ ├── faces_simplified.npy
│ │ ├── interesting.npz
│ │ ├── MANO_SMPLX_vertex_ids.npy
│ │ ├── sbj_verts_region_mapping.npy
│ │ └── vertices_simplified_correspondences.npy
│ │
│ └── replicagrasp
│ ├── dset_info.npz
│ └── receptacles.npz
.
.
- Download the VPoser prior (
VPoser v2.0
) from the SMPL-X website. - Download the checkpoints of the hand-grasping pre-trained model (
coarsenet.pt
andrefinenet.pt
) from the GRAB website. - Download the pose-ground prior from here.
- Place all pre-trained models in
FLEX/flex/pretrained_models/ckpts
as follows:
ckpts
├── vposer_amass
│ │
│ ├── snapshots
│ │ └── V02_05_epoch=13_val_loss=0.03
│ ├── V02_05.log
│ └── V02_05.yaml
│
├── coarsenet.pt
├── refinenet.pt
└── pgp.pth
After installing the FLEX package, dependencies, and downloading the data and the models, you should be able to run the following examples:
-
python run.py \ --obj_name stapler \ --receptacle_name receptacle_aabb_TvStnd1_Top3_frl_apartment_tvstand \ --ornt_name all \ --gender 'female'
The result will be saved in
FLEX/save
. The optimization for an example should take 7-8 minutes on a single RTX Ti 2080.
@inproceedings{tendulkar2022flex,
title = {FLEX: Full-Body Grasping Without Full-Body Grasps},
author = {Tendulkar, Purva and Sur\'is, D\'idac and Vondrick, Carl},
booktitle = {Conference on Computer Vision and Pattern Recognition ({CVPR})},
year = {2023},
url = {https://flex.cs.columbia.edu/}
}
This research is based on work partially supported by NSF NRI Award #2132519, and the DARPA MCS program under Federal Agreement No. N660011924032. Dídac Surís is supported by the Microsoft PhD fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.
We thank: Alexander Clegg for helping with Habitat-related questions and Harsh Agrawal for helpful discussions and feedback.
This template was adapted from the GitHub repository of GOAL.
The code of this repository was implemented by Purva Tendulkar and Dídac Surís.
For questions, please contact [email protected].