3D Human Pose Estimation (HPE) is the task of locating key points of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps, or point clouds. Current HPE methods from depth and point clouds predominantly rely on single-frame estimation and do not exploit temporal information from sequences. This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences. Unlike existing methods that process frames of a sequence independently, SPiKE leverages temporal context by adopting a Transformer architecture to encode spatio-temporal relationships between points across the sequence. By partitioning the point cloud into local volumes and using spatial feature extraction via point spatial convolution, SPiKE ensures efficient processing by the Transformer while preserving spatial integrity per timestamp. Experiments on the ITOP benchmark for 3D HPE show that SPiKE reaches 89.19% mAP, achieving state-of-the-art performance with significantly lower inference times. Extensive ablations further validate the effectiveness of sequence exploitation and our algorithmic choices.
The code has been tested with the following environment:
- Python: 3.18.6
- g++: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
- PyTorch: 1.8.1+cu111
Ensure these tools are available in your environment before proceeding.
-
Clone the repository:
git clone https://github.com/iballester/spike cd spike
-
Install the required Python packages:
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
-
Compile the CUDA layers required for PointNet++:
cd modules python setup.py install
-
Download the dataset ITOP SIDE (point clouds and labels) from ITOP Dataset | Zenodo and unzip the contents.
-
Isolate points corresponding to the human body in the point clouds and save the results as
.npz
files.
- You can use the provided script
utils/preprocess_itop.py
as an example. This script takes the original.h5
files, removes the background by clustering and depth thresholding (see the paper for more details) and saves the results as point cloud sequences in.npz
format. To run this script, make sure you have the open3d library installed.
-
Update the
ITOP_SIDE_PATH
variable inconst/path
to point to your dataset location. Structure your dataset directory as follows:dataset_directory/ ├── test/ # Folder containing .npz files for testing ├── train/ # Folder containing .npz files for training ├── test_labels.h5 # Labels for the test set ├── train_labels.h5 # Labels for the training set
To train the model, check that the config.yaml has the correct parameters and run:
python train_itop.py --config experiments/ITOP-SIDE/1/config.yaml
For predictions, update the path pointing to the model weights, check that the config.yaml has the correct parameters and run:
python predict_itop.py --config experiments/ITOP-SIDE/1/config.yaml --model experiments/ITOP-SIDE/1/log/model.pth
For video samples showcasing pose predictions on the testing set, please visit:
If you find our work useful, please cite us:
@inproceedings{ballester2024spike,
title={SPiKE: 3D Human Pose from Point Cloud Sequences},
author={Ballester, Irene and Peterka, Ond{\v{r}}ej and Kampel, Martin},
booktitle={Pattern Recognition},
year={2024}
}
A big thanks to the following open-source projects for their contributions:
Their work greatly facilitated the development of this project.
This project is licensed under the MIT License. See the LICENSE file for details.