-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
I tried two approaches for reproducing the performance of Keypoint R-CNN with ResNet-50, box AP = 54.6, keypoint AP = 65.0:
a) use pretrained Keypoint R-CNN with train.py
b) train Keypoint R-CNN by myself with train.py
But either didn't reproduce the performance. As for a), my guess is I need to set some parameters besides pretrained flag.
Could you please help me reproduce the performance, hopefully for both a) and b)? More details about my results are given as follows.
Environment
- 3 GPUs
- Ubuntu 18.04 LTS
- Python 3.6.8
- torch==1.3.1
- torchvision==0.4.2
Details
a) use pretrained Keypoint R-CNN with train.py
command: pipenv run python train.py --data-path ./coco2017/ --dataset coco_kp --model keypointrcnn_resnet50_fpn --test-only --pretrained
log
Not using distributed mode
Namespace(aspect_ratio_group_factor=0, batch_size=2, data_path='./coco2017/', dataset='coco_kp', device='cuda', dist_url='env://', distributed=False, epochs=13, lr=0.02, lr_gamma=0.1, lr_step_size=8, lr_steps=[8, 11], model='keypointrcnn_resnet50_fpn', momentum=0.9, output_dir='.', pretrained=True, print_freq=20, resume='', test_only=True, weight_decay=0.0001, workers=4, world_size=1)
Loading data
loading annotations into memory...
Done (t=6.30s)
creating index...
index created!
loading annotations into memory...
Done (t=0.74s)
creating index...
index created!
Creating data loaders
Using [0, 1.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [12345 35717]
Creating model
Test: [ 0/5000] eta: 0:44:23 model_time: 0.2646 (0.2646) evaluator_time: 0.0069 (0.0069) time: 0.5326 data: 0.2532 max mem: 624
Test: [ 100/5000] eta: 0:07:54 model_time: 0.0764 (0.0810) evaluator_time: 0.0037 (0.0089) time: 0.0880 data: 0.0019 max mem: 712
Test: [ 200/5000] eta: 0:07:25 model_time: 0.0719 (0.0785) evaluator_time: 0.0031 (0.0088) time: 0.0899 data: 0.0018 max mem: 795
Test: [ 300/5000] eta: 0:07:08 model_time: 0.0733 (0.0779) evaluator_time: 0.0040 (0.0082) time: 0.0933 data: 0.0019 max mem: 817
Test: [ 400/5000] eta: 0:06:57 model_time: 0.0720 (0.0780) evaluator_time: 0.0035 (0.0081) time: 0.0824 data: 0.0017 max mem: 820
Test: [ 500/5000] eta: 0:06:43 model_time: 0.0656 (0.0772) evaluator_time: 0.0032 (0.0077) time: 0.0851 data: 0.0019 max mem: 820
Test: [ 600/5000] eta: 0:06:38 model_time: 0.0693 (0.0780) evaluator_time: 0.0033 (0.0082) time: 0.0793 data: 0.0018 max mem: 846
Test: [ 700/5000] eta: 0:06:32 model_time: 0.0678 (0.0783) evaluator_time: 0.0034 (0.0085) time: 0.0820 data: 0.0018 max mem: 853
Test: [ 800/5000] eta: 0:06:21 model_time: 0.0731 (0.0782) evaluator_time: 0.0032 (0.0083) time: 0.0805 data: 0.0017 max mem: 853
Test: [ 900/5000] eta: 0:06:12 model_time: 0.0748 (0.0782) evaluator_time: 0.0029 (0.0084) time: 0.0851 data: 0.0015 max mem: 858
Test: [1000/5000] eta: 0:06:01 model_time: 0.0713 (0.0779) evaluator_time: 0.0030 (0.0082) time: 0.0884 data: 0.0019 max mem: 858
Test: [1100/5000] eta: 0:05:52 model_time: 0.0713 (0.0778) evaluator_time: 0.0040 (0.0082) time: 0.0859 data: 0.0018 max mem: 858
Test: [1200/5000] eta: 0:05:43 model_time: 0.0715 (0.0780) evaluator_time: 0.0031 (0.0082) time: 0.0941 data: 0.0018 max mem: 872
Test: [1300/5000] eta: 0:05:36 model_time: 0.0725 (0.0783) evaluator_time: 0.0033 (0.0085) time: 0.0847 data: 0.0017 max mem: 872
Test: [1400/5000] eta: 0:05:28 model_time: 0.0780 (0.0785) evaluator_time: 0.0042 (0.0086) time: 0.1081 data: 0.0020 max mem: 872
Test: [1500/5000] eta: 0:05:18 model_time: 0.0718 (0.0782) evaluator_time: 0.0033 (0.0085) time: 0.0884 data: 0.0017 max mem: 872
Test: [1600/5000] eta: 0:05:08 model_time: 0.0752 (0.0782) evaluator_time: 0.0047 (0.0084) time: 0.1013 data: 0.0020 max mem: 872
Test: [1700/5000] eta: 0:05:00 model_time: 0.0687 (0.0784) evaluator_time: 0.0032 (0.0085) time: 0.0954 data: 0.0019 max mem: 884
Test: [1800/5000] eta: 0:04:50 model_time: 0.0665 (0.0782) evaluator_time: 0.0028 (0.0084) time: 0.0767 data: 0.0016 max mem: 884
Test: [1900/5000] eta: 0:04:41 model_time: 0.0689 (0.0782) evaluator_time: 0.0027 (0.0085) time: 0.0863 data: 0.0014 max mem: 888
Test: [2000/5000] eta: 0:04:32 model_time: 0.0712 (0.0781) evaluator_time: 0.0032 (0.0084) time: 0.0873 data: 0.0017 max mem: 888
Test: [2100/5000] eta: 0:04:22 model_time: 0.0720 (0.0781) evaluator_time: 0.0028 (0.0084) time: 0.0955 data: 0.0017 max mem: 888
Test: [2200/5000] eta: 0:04:13 model_time: 0.0734 (0.0780) evaluator_time: 0.0039 (0.0083) time: 0.0938 data: 0.0019 max mem: 888
Test: [2300/5000] eta: 0:04:04 model_time: 0.0688 (0.0781) evaluator_time: 0.0027 (0.0083) time: 0.0816 data: 0.0015 max mem: 894
Test: [2400/5000] eta: 0:03:55 model_time: 0.0777 (0.0781) evaluator_time: 0.0032 (0.0083) time: 0.0898 data: 0.0017 max mem: 895
Test: [2500/5000] eta: 0:03:46 model_time: 0.0704 (0.0783) evaluator_time: 0.0034 (0.0084) time: 0.0905 data: 0.0018 max mem: 895
Test: [2600/5000] eta: 0:03:37 model_time: 0.0723 (0.0783) evaluator_time: 0.0030 (0.0083) time: 0.0892 data: 0.0015 max mem: 895
Test: [2700/5000] eta: 0:03:28 model_time: 0.0708 (0.0783) evaluator_time: 0.0029 (0.0084) time: 0.0847 data: 0.0016 max mem: 896
Test: [2800/5000] eta: 0:03:19 model_time: 0.0719 (0.0782) evaluator_time: 0.0032 (0.0083) time: 0.0906 data: 0.0017 max mem: 896
Test: [2900/5000] eta: 0:03:10 model_time: 0.0741 (0.0782) evaluator_time: 0.0037 (0.0083) time: 0.0879 data: 0.0019 max mem: 896
Test: [3000/5000] eta: 0:03:01 model_time: 0.0756 (0.0783) evaluator_time: 0.0042 (0.0083) time: 0.0950 data: 0.0018 max mem: 900
Test: [3100/5000] eta: 0:02:51 model_time: 0.0709 (0.0782) evaluator_time: 0.0029 (0.0082) time: 0.0834 data: 0.0017 max mem: 900
Test: [3200/5000] eta: 0:02:42 model_time: 0.0734 (0.0782) evaluator_time: 0.0035 (0.0082) time: 0.0858 data: 0.0017 max mem: 900
Test: [3300/5000] eta: 0:02:34 model_time: 0.0726 (0.0783) evaluator_time: 0.0029 (0.0083) time: 0.0946 data: 0.0017 max mem: 903
Test: [3400/5000] eta: 0:02:24 model_time: 0.0687 (0.0782) evaluator_time: 0.0032 (0.0082) time: 0.0788 data: 0.0017 max mem: 903
Test: [3500/5000] eta: 0:02:15 model_time: 0.0685 (0.0782) evaluator_time: 0.0030 (0.0082) time: 0.0822 data: 0.0017 max mem: 903
Test: [3600/5000] eta: 0:02:06 model_time: 0.0764 (0.0783) evaluator_time: 0.0029 (0.0082) time: 0.0878 data: 0.0016 max mem: 903
Test: [3700/5000] eta: 0:01:57 model_time: 0.0739 (0.0783) evaluator_time: 0.0043 (0.0082) time: 0.0979 data: 0.0020 max mem: 903
Test: [3800/5000] eta: 0:01:48 model_time: 0.0790 (0.0783) evaluator_time: 0.0047 (0.0083) time: 0.1088 data: 0.0021 max mem: 906
Test: [3900/5000] eta: 0:01:39 model_time: 0.0701 (0.0782) evaluator_time: 0.0029 (0.0082) time: 0.0775 data: 0.0016 max mem: 906
Test: [4000/5000] eta: 0:01:30 model_time: 0.0720 (0.0782) evaluator_time: 0.0035 (0.0081) time: 0.0886 data: 0.0016 max mem: 906
Test: [4100/5000] eta: 0:01:21 model_time: 0.0739 (0.0782) evaluator_time: 0.0037 (0.0082) time: 0.0856 data: 0.0019 max mem: 906
Test: [4200/5000] eta: 0:01:12 model_time: 0.0745 (0.0781) evaluator_time: 0.0032 (0.0081) time: 0.0894 data: 0.0018 max mem: 906
Test: [4300/5000] eta: 0:01:03 model_time: 0.0754 (0.0781) evaluator_time: 0.0039 (0.0081) time: 0.0880 data: 0.0018 max mem: 906
Test: [4400/5000] eta: 0:00:54 model_time: 0.0709 (0.0780) evaluator_time: 0.0032 (0.0081) time: 0.0966 data: 0.0017 max mem: 906
Test: [4500/5000] eta: 0:00:45 model_time: 0.0742 (0.0780) evaluator_time: 0.0033 (0.0081) time: 0.0984 data: 0.0017 max mem: 906
Test: [4600/5000] eta: 0:00:36 model_time: 0.0746 (0.0779) evaluator_time: 0.0034 (0.0080) time: 0.0879 data: 0.0018 max mem: 906
Test: [4700/5000] eta: 0:00:27 model_time: 0.0749 (0.0780) evaluator_time: 0.0036 (0.0080) time: 0.0969 data: 0.0017 max mem: 906
Test: [4800/5000] eta: 0:00:18 model_time: 0.0732 (0.0780) evaluator_time: 0.0037 (0.0080) time: 0.1013 data: 0.0017 max mem: 906
Test: [4900/5000] eta: 0:00:09 model_time: 0.0785 (0.0780) evaluator_time: 0.0056 (0.0080) time: 0.0949 data: 0.0019 max mem: 906
Test: [4999/5000] eta: 0:00:00 model_time: 0.0710 (0.0780) evaluator_time: 0.0031 (0.0080) time: 0.0817 data: 0.0017 max mem: 906
Test: Total time: 0:07:30 (0.0901 s / it)
Averaged stats: model_time: 0.0710 (0.0780) evaluator_time: 0.0031 (0.0080)
Accumulating evaluation results...
DONE (t=1.05s).
Accumulating evaluation results...
DONE (t=0.30s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.502
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.796
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.545
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.648
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.176
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.519
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.603
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.460
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.669
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.738
IoU metric: keypoints
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.599
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.834
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.650
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.553
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.675
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.672
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.889
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.721
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.741
b) use pretrained Keypoint R-CNN with train.py
command: pipenv run python -m torch.distributed.launch --nproc_per_node=3 --use_env python train.py --data-path ./coco2017/ --dataset coco_kp --model keypointrcnn_resnet50_fpn --world-size 3 --lr 0.0075
Learning rate lr is set by following a suggestion in train.py
If you use different number of gpus, the learning rate should be changed to 0.02/8*$NGPU.
box AP = 50.6 (Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ])
keypoint AP = 61.1 (Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ])
Thank you!