The original source repository for the pytorch_i3d
model and the pre-trained weights under models can be found here. The original LICENSE has been included here.
All four variants (for both RGB and optical flow inputs) can be trained and evaluated using the following command:
python main.py \
--train_root=/path/to/training_set \
--train_labels=/path/to/training_labels \
--validation_root=/path/to/val_set \
--validation_labels=/path/to/val_labels \
--test_root=/path/to/test_set \
--test_labels=/path/to/test_labels \
--model_path=models/<video_type>_imagenet.pt \
--classifier_type=<classifier_type> \
--video_type=<video_type> \
--num_classes=4 \
--batch_size=8 \
--default_root_dir=logs/ \
--learning_rate=0.001 \
--max_epochs=40 \
--log_every_n_steps=1 \
--gpus=1
Select one of the following:
classifier_type=video
classifier_type=multimodal
(all three modalities)classifier_type=FT_only
classifier_type=gripper_only
classifier_type=FTgripper_only
classifier_type=video_FT
classifier_type=video_gripper
For late fusion in each of the following variants, sum the logits from the individual networks at test time to get the final result.
Separately train the following networks
- I3D RGB (
classifier_type=video
,model_path=models/rgb_imagenet.pt
,video_type=rgb
) - I3D Flow (
classifier_type=video
,model_path=models/flow_imagenet.pt
,video_type=flow
) - FT (
classifier_type=FT_only
) - Gripper (
classifier_type=gripper_only
)
Separately train the following networks:
- I3D RGB+FT+Gripper (
classifier_type=multimodal
,model_path=models/rgb_imagenet.pt
,video_type=rgb
) - I3D Flow (
classifier_type=video
,model_path=models/flow_imagenet.pt
,video_type=flow
)
Separately train the following networks:
- I3D RGB (
classifier_type=video
,model_path=models/rgb_imagenet.pt
,video_type=rgb
) - I3D Flow+FT+Gripper (
classifier_type=multimodal
,model_path=models/flow_imagenet.pt
,video_type=flow
)
Separately train the following networks for the full versions:
- I3D RGB+FT+Gripper (
classifier_type=multimodal
,model_path=models/rgb_imagenet.pt
,video_type=rgb
) - I3D Flow+FT+Gripper (
classifier_type=multimodal
,model_path=models/flow_imagenet.pt
,video_type=flow
)
The paper also reports results using the classifier_type=video_FT
and classifier_type=video_gripper
(rows 9 and 10 in Table III).
To limit training to one of the robot types, use the robot_type_trainval
argument as follows:
- HSR only:
robot_type_trainval="Toyota HSR"
- Kinova only:
robot_type_trainval="Kinova Gen3"
To evaluate only on one of the robot types, use:
- HSR only:
robot_type_test="Toyota HSR"
- Kinova only:
robot_type_test="Kinova Gen3"
To train and evaluate each handover task separately, use:
- Human to Robot Handover only:
task_type_trainval="human to robot handover"
andtask_type_test="human to robot handover"
- Robot to Human Handover only:
task_type_trainval="robot to human handover"
andtask_type_test="robot to human handover"
The dataset already contains the I3D features for both RGB and optical flow. In case you want to extract them again, use the following commands. All use the I3D model pre-trained on the Kinetics dataset.
Extract features for RGB frames:
python get_i3d_features.py --root /path/to/training_set --model kinetics
Extract features for augmented RGB frames:
python get_i3d_features.py --root /path/to/training_set --model kinetics --augmented
Extract features for optical flow frames:
python get_i3d_features_optical_flow.py --root /path/to/training_set --model kinetics