Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train error in voc-style dataset #178

Closed
kknoob opened this issue Jul 3, 2019 · 4 comments
Closed

Train error in voc-style dataset #178

kknoob opened this issue Jul 3, 2019 · 4 comments

Comments

@kknoob
Copy link

kknoob commented Jul 3, 2019

I have run demo.py successfully.

But when I try to run main.py on my own dataset, Some errors occurred.Help me ,thanks.

The code I used is
python main.py ctdet --not_prefetch_test --dataset detection --exp_id det1 --gpus 0 --num_workers 0

At first it worked.

python main.py ctdet --not_prefetch_test --dataset detection --exp_id det1 --gpus 0 --num_workers 0 Fix size testing. training chunk_sizes: [32] The output will be saved to F:\kuyanfeng\CenterNet\src\lib\..\..\exp\ctdet\det1 heads {'hm': 5, 'wh': 2, 'reg': 2} Namespace(K=100, aggr_weight=0.0, agnostic_ex=False, arch='dla_34', aug_ddd=0.5, aug_rot=0, batch_size=32, cat_spec_wh=False, center_thresh=0.1, chunk_sizes=[32], data_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\data', dataset='detection', debug=0, debug_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\det1\\debug', debugger_theme='white', demo='', dense_hp=False, dense_wh=False, dep_weight=1, dim_weight=1, down_ratio=4, eval_oracle_dep=False, eval_oracle_hm=False, eval_oracle_hmhp=False, eval_oracle_hp_offset=False, eval_oracle_kps=False, eval_oracle_offset=False, eval_oracle_wh=False, exp_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet', exp_id='det1', fix_res=True, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'hm': 5, 'wh': 2, 'reg': 2}, hide_data_time=False, hm_hp=True, hm_hp_weight=1, hm_weight=1, hp_weight=1, input_h=384, input_res=384, input_w=384, keep_res=False, kitti_split='3dop', load_model='', lr=0.000125, lr_step=[90, 120], master_batch_size=32, mean=array([[[0.485, 0.456, 0.406]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_hm_hp=False, not_prefetch_test=True, not_rand_crop=False, not_reg_bbox=False, not_reg_hp_offset=False, not_reg_offset=False, num_classes=5, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=0, off_weight=1, output_h=96, output_res=96, output_w=96, pad=31, peak_thresh=0.2, print_iter=0, rect_mask=False, reg_bbox=True, reg_hp_offset=True, reg_loss='l1', reg_offset=True, resume=False, root_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..', rot_weight=1, rotate=0, save_all=False, save_dir='F:\\kuyanfeng\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\det1', scale=0.4, scores_thresh=0.1, seed=317, shift=0.1, std=array([[[0.229, 0.224, 0.225]]], dtype=float32), task='ctdet', test=False, test_scales=[1.0], trainval=False, val_intervals=5, vis_thresh=0.3, wh_weight=0.1) 'cp' 不是内部或外部命令,也不是可运行的程序 或批处理文件。 Creating model... Setting up data... ==> initializing pascal test2007 data. loading annotations into memory... Done (t=0.00s) creating index... index created! Loaded val 357 samples ==> initializing pascal trainval0712 data. loading annotations into memory... Done (t=0.03s) creating index... index created! Loaded train 1790 samples Starting training... ctdet/det1E:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) ctdet/det1 | | train: [1][0/55]|Tot: 0:00:10 |ETA: 0:00:00 |loss 36.6218 |hm_loss 35.2704ctdet/det1 |# | train: [1][1/55]|Tot: 0:00:12 |ETA: 0:09:10 |loss 40.8138 |hm_loss 39.1497ctdet/det1 |# | train: [1][2/55]|Tot: ctdet/det1 |## | train: [1][3/55]|Totctdet/det1 |##################ctdet/det1 |###################### | train: [1][37/55]|Tot: 0:01:22 |ETA: 0:00:34 |loss 14.5188 |hm_loss 13.4028 |wh_loss 8.2473 |off_loss 0.2913 |Data ctdet/det1 |###################### | train: [1][38/55]|Tot: 0:01:24 |ETA: 0:00:32 |loss 14.2986 |hm_loss 13.1903 |wh_loss 8.1855 |off_loss 0ctdet/det1 |####################### | train: [1][39/55]|Tot: 0:01:26 |ETA: 0:00:30 |loss 14.1014 |hm_loss 12.9820 |wh_ctdet/det1 |####################### | train: [1][40/55]|Tot: 0:01:28 |ETA: 0:00:29 |loss 13.8740 |hm_loss 12.768ctdet/det1 |######################## | train: [1][41/55]|Tot: 0:01:30 |ETA: 0:00:27 |loss 13.6514 |hm_loss 12.558ctdet/det1 |######################### | train: [1][42/55]|Tot: 0:01:32 |ETA: 0:00:25 |loss 13.4658 |hm_loss 12.373ctdet/det1 |######################### | train: [1][43/55]|Tot: 0:01:34 |ETA: 0:00:23 |loss 13.2914 |hm_loss 12.206ctdet/det1 |########################## |

But after some minutes, error below occurred.
`ctdet/det1 |############################### | train: [5][53/55]|Tot: 0:01:39 |ETA: 0:00:04 |loss 2.4441 |hm_loss 1.6735 ctdet/det1 |################################| train: [5][54/55]|Tot: 0:01:41 |ETA: 0:00:02 |loss 2.4373 |hm_loss 1.6711 |wh_loss 5.5025 |off_loss 0.2159 |Data 0.726s(0.742s) |Net 1.848s
ctdet/det1Traceback (most recent call last):
File "main.py", line 102, in
main(opt)
File "main.py", line 79, in main
log_dict_val, preds = trainer.val(epoch, val_loader)
File "F:\kuyanfeng\CenterNet\src\lib\trains\base_trainer.py", line 116, in val
return self.run_epoch('val', epoch, data_loader)
File "F:\kuyanfeng\CenterNet\src\lib\trains\base_trainer.py", line 61, in run_epoch
for iter_id, batch in enumerate(data_loader):
File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 193, in iter
return _DataLoaderIter(self)
File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 469, in init
w.start()
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataset..Dataset'

(pytorch) F:\kuyanfeng\CenterNet\src>Traceback (most recent call last):
File "", line 1, in
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input`

@Kenneth-X
Copy link

Hi, i am still traped in demo.py, and i got the following problem

$python demo.py ctdet --demo ../images/17790319373_bd19b24cfc_k.jpg --load_model ../models/ctdet_coco_dla_2x.pth --debug 2
Fix size testing.
training chunk_sizes: [1]
The output will be saved to /home/yuxi.xt/experinment/CenterNet/src/lib/../../exp/ctdet/default
heads {'hm': 80, 'wh': 2, 'reg': 2}
Creating model...
loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
Segmentation fault

could you tell me your environment detail ? torch version , gcc, and cuda version ?

@kknoob
Copy link
Author

kknoob commented Jul 3, 2019

Hi, i am still traped in demo.py, and i got the following problem

$python demo.py ctdet --demo ../images/17790319373_bd19b24cfc_k.jpg --load_model ../models/ctdet_coco_dla_2x.pth --debug 2
Fix size testing.
training chunk_sizes: [1]
The output will be saved to /home/yuxi.xt/experinment/CenterNet/src/lib/../../exp/ctdet/default
heads {'hm': 80, 'wh': 2, 'reg': 2}
Creating model...
loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
Segmentation fault

could you tell me your environment detail ? torch version , gcc, and cuda version ?

You can find it in https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md
Some other details are in https://github.com/xingyizhou/CenterNet/tree/master/readme

@coderKyf
Copy link

coderKyf commented Jul 3, 2019

I found the problem.I must set "num_workers" to zero.
When I modify the code
val_loader = torch.utils.data.DataLoader( Dataset(opt, 'val'), batch_size=1, shuffle=False, num_workers= 1, pin_memory=True )
in main.py
It works.

@xingyizhou
Copy link
Owner

You can also set --val_intervals 1000000 to avoid problems in validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants