Train error in voc-style dataset #178

kknoob · 2019-07-03T02:50:49Z

I have run demo.py successfully.

But when I try to run main.py on my own dataset, Some errors occurred.Help me ,thanks.

The code I used is
python main.py ctdet --not_prefetch_test --dataset detection --exp_id det1 --gpus 0 --num_workers 0

At first it worked.

python main.py ctdet --not_prefetch_test --dataset detection --exp_id det1 --gpus 0 --num_workers 0 Fix size testing. training chunk_sizes: [32] The output will be saved to F:\kuyanfeng\CenterNet\src\lib\..\..\exp\ctdet\det1 heads {'hm': 5, 'wh': 2, 'reg': 2} Namespace(K=100, aggr_weight=0.0, agnostic_ex=False, arch='dla_34', aug_ddd=0.5, aug_rot=0, batch_size=32, cat_spec_wh=False, center_thresh=0.1, chunk_sizes=[32], data_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\data', dataset='detection', debug=0, debug_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\det1\\debug', debugger_theme='white', demo='', dense_hp=False, dense_wh=False, dep_weight=1, dim_weight=1, down_ratio=4, eval_oracle_dep=False, eval_oracle_hm=False, eval_oracle_hmhp=False, eval_oracle_hp_offset=False, eval_oracle_kps=False, eval_oracle_offset=False, eval_oracle_wh=False, exp_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet', exp_id='det1', fix_res=True, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'hm': 5, 'wh': 2, 'reg': 2}, hide_data_time=False, hm_hp=True, hm_hp_weight=1, hm_weight=1, hp_weight=1, input_h=384, input_res=384, input_w=384, keep_res=False, kitti_split='3dop', load_model='', lr=0.000125, lr_step=[90, 120], master_batch_size=32, mean=array([[[0.485, 0.456, 0.406]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_hm_hp=False, not_prefetch_test=True, not_rand_crop=False, not_reg_bbox=False, not_reg_hp_offset=False, not_reg_offset=False, num_classes=5, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=0, off_weight=1, output_h=96, output_res=96, output_w=96, pad=31, peak_thresh=0.2, print_iter=0, rect_mask=False, reg_bbox=True, reg_hp_offset=True, reg_loss='l1', reg_offset=True, resume=False, root_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..', rot_weight=1, rotate=0, save_all=False, save_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\det1', scale=0.4, scores_thresh=0.1, seed=317, shift=0.1, std=array([[[0.229, 0.224, 0.225]]], dtype=float32), task='ctdet', test=False, test_scales=[1.0], trainval=False, val_intervals=5, vis_thresh=0.3, wh_weight=0.1) 'cp' 不是内部或外部命令，也不是可运行的程序或批处理文件。 Creating model... Setting up data... ==> initializing pascal test2007 data. loading annotations into memory... Done (t=0.00s) creating index... index created! Loaded val 357 samples ==> initializing pascal trainval0712 data. loading annotations into memory... Done (t=0.03s) creating index... index created! Loaded train 1790 samples Starting training... ctdet/det1E:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) ctdet/det1 | | train: [1][0/55]|Tot: 0:00:10 |ETA: 0:00:00 |loss 36.6218 |hm_loss 35.2704ctdet/det1 |# | train: [1][1/55]|Tot: 0:00:12 |ETA: 0:09:10 |loss 40.8138 |hm_loss 39.1497ctdet/det1 |# | train: [1][2/55]|Tot: ctdet/det1 |## | train: [1][3/55]|Totctdet/det1 |##################ctdet/det1 |###################### | train: [1][37/55]|Tot: 0:01:22 |ETA: 0:00:34 |loss 14.5188 |hm_loss 13.4028 |wh_loss 8.2473 |off_loss 0.2913 |Data ctdet/det1 |###################### | train: [1][38/55]|Tot: 0:01:24 |ETA: 0:00:32 |loss 14.2986 |hm_loss 13.1903 |wh_loss 8.1855 |off_loss 0ctdet/det1 |####################### | train: [1][39/55]|Tot: 0:01:26 |ETA: 0:00:30 |loss 14.1014 |hm_loss 12.9820 |wh_ctdet/det1 |####################### | train: [1][40/55]|Tot: 0:01:28 |ETA: 0:00:29 |loss 13.8740 |hm_loss 12.768ctdet/det1 |######################## | train: [1][41/55]|Tot: 0:01:30 |ETA: 0:00:27 |loss 13.6514 |hm_loss 12.558ctdet/det1 |######################### | train: [1][42/55]|Tot: 0:01:32 |ETA: 0:00:25 |loss 13.4658 |hm_loss 12.373ctdet/det1 |######################### | train: [1][43/55]|Tot: 0:01:34 |ETA: 0:00:23 |loss 13.2914 |hm_loss 12.206ctdet/det1 |########################## |

But after some minutes, error below occurred.
`ctdet/det1 |############################### | train: [5][53/55]|Tot: 0:01:39 |ETA: 0:00:04 |loss 2.4441 |hm_loss 1.6735 ctdet/det1 |################################| train: [5][54/55]|Tot: 0:01:41 |ETA: 0:00:02 |loss 2.4373 |hm_loss 1.6711 |wh_loss 5.5025 |off_loss 0.2159 |Data 0.726s(0.742s) |Net 1.848s
ctdet/det1Traceback (most recent call last):
File "main.py", line 102, in
main(opt)
File "main.py", line 79, in main
log_dict_val, preds = trainer.val(epoch, val_loader)
File "F:\kuyanfeng\CenterNet\src\lib\trains\base_trainer.py", line 116, in val
return self.run_epoch('val', epoch, data_loader)
File "F:\kuyanfeng\CenterNet\src\lib\trains\base_trainer.py", line 61, in run_epoch
for iter_id, batch in enumerate(data_loader):
File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 193, in iter
return _DataLoaderIter(self)
File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 469, in init
w.start()
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataset..Dataset'

(pytorch) F:\kuyanfeng\CenterNet\src>Traceback (most recent call last):
File "", line 1, in
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input`

The text was updated successfully, but these errors were encountered:

Kenneth-X · 2019-07-03T03:07:45Z

Hi, i am still traped in demo.py, and i got the following problem

$python demo.py ctdet --demo ../images/17790319373_bd19b24cfc_k.jpg --load_model ../models/ctdet_coco_dla_2x.pth --debug 2
Fix size testing.
training chunk_sizes: [1]
The output will be saved to /home/yuxi.xt/experinment/CenterNet/src/lib/../../exp/ctdet/default
heads {'hm': 80, 'wh': 2, 'reg': 2}
Creating model...
loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
Segmentation fault

could you tell me your environment detail ? torch version , gcc, and cuda version ?

kknoob · 2019-07-03T03:26:27Z

Hi, i am still traped in demo.py, and i got the following problem

$python demo.py ctdet --demo ../images/17790319373_bd19b24cfc_k.jpg --load_model ../models/ctdet_coco_dla_2x.pth --debug 2
Fix size testing.
training chunk_sizes: [1]
The output will be saved to /home/yuxi.xt/experinment/CenterNet/src/lib/../../exp/ctdet/default
heads {'hm': 80, 'wh': 2, 'reg': 2}
Creating model...
loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
Segmentation fault

could you tell me your environment detail ? torch version , gcc, and cuda version ?

You can find it in https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md
Some other details are in https://github.com/xingyizhou/CenterNet/tree/master/readme

coderKyf · 2019-07-03T05:55:53Z

I found the problem.I must set "num_workers" to zero.
When I modify the code
val_loader = torch.utils.data.DataLoader( Dataset(opt, 'val'), batch_size=1, shuffle=False, num_workers= 1, pin_memory=True )
in main.py
It works.

xingyizhou · 2019-07-06T04:44:08Z

You can also set --val_intervals 1000000 to avoid problems in validation.

xingyizhou closed this as completed Jul 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train error in voc-style dataset #178

Train error in voc-style dataset #178

kknoob commented Jul 3, 2019

Kenneth-X commented Jul 3, 2019

kknoob commented Jul 3, 2019

coderKyf commented Jul 3, 2019

xingyizhou commented Jul 6, 2019

Train error in voc-style dataset #178

Train error in voc-style dataset #178

Comments

kknoob commented Jul 3, 2019

Kenneth-X commented Jul 3, 2019

kknoob commented Jul 3, 2019

coderKyf commented Jul 3, 2019

xingyizhou commented Jul 6, 2019