I got a problem when I use KITTI dataset to train your model #38

Machine97 · 2021-03-01T11:02:44Z

Thanks for sharing your research and code with us. Many appreciated.
I tried to train your model with KITTI dataset, the following errors occur every time:

2021-03-01 09:19:43,961 - mmdet - INFO - Epoch [1][440/1856] lr: 5.067e-06, eta: 8:32:39,
time: 0.400, data_time: 0.100, memory: 4264, loss_rpn_cls: 0.3128, loss_rpn_bbox: 0.2060, loss_cls: 0.2960, acc: 96.9043, loss_reg: 0.2682, loss_mask: 0.6795, loss: 1.7624
2021-03-01 09:19:47,923 - mmdet - INFO - Epoch [1][450/1856] lr: 5.177e-06, eta: 8:32:01,
time: 0.396, data_time: 0.095, memory: 4264, loss_rpn_cls: 0.3051, loss_rpn_bbox: 0.1585, loss_cls: 0.2783, acc: 96.7188, loss_reg: 0.2841, loss_mask: 0.6796, loss: 1.7056
2021-03-01 09:19:51,800 - mmdet - INFO - Epoch [1][460/1856] lr: 5.286e-06, eta: 8:31:11,
time: 0.388, data_time: 0.087, memory: 4264, loss_rpn_cls: 0.2838, loss_rpn_bbox: 0.1638, loss_cls: 0.2632, acc: 96.6406, loss_reg: 0.2799, loss_mask: 0.6799, loss: 1.6707
Traceback (most recent call last):
File "train.py", line 161, in
main()
File "train.py", line 157, in main
meta=meta)
File "/work_dirs/D2Det_mmdet2.1/mmdet/apis/train.py", line 179, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], *kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 43, in train
self.call_hook('after_train_iter')
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 21, in after_train_iter runner.outputs['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag
RuntimeError: shape mismatch: value tensor of shape [8, 256, 7, 7] cannot be broadcast to indexing result of shape [9, 256, 7, 7] (make_index_put_iterator at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/TensorAdvancedIndexing.cpp:215)frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f43abb90b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::native::index_put_impl(at::Tensor&, c10::ArrayRefat::Tensor, at::Tensor const&, bool, bool) + 0x712 (0x7f43d38d0b82 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0xee23de (0x7f43d3c543de in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #3: at::native::index_put_(at::Tensor&, c10::ArrayRefat::Tensor, at::Tensor const&, bool) + 0x135 (0x7f43d38c0255 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: + 0xee210e (0x7f43d3c5410e in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #5: + 0x288fa88 (0x7f43d5601a88 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #6: torch::autograd::generated::IndexPutBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x251 (0x7f43d53cc201 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)frame #7: + 0x2ae8215 (0x7f43d585a215 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node, torch::autograd::InputBuffer&) + 0x16f3 (0x7f43d5857513 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f43d58582f2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f43d5850969 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f43d8b97558 in
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0xc819d (0x7f43db5ff19d in /opt/conda/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #13: + 0x76db (0x7f43fbfdf6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #14: clone + 0x3f (0x7f43fbd0888f in /lib/x86_64-linux-gnu/libc.so.6)

This error occurs randomly in different iteration. In addition, every time the error occured, the first dimension size of the tensor [8, 256, 7, 7] is different.
Do you know the possible reasons for this error?

Machine97 · 2021-03-02T09:34:45Z

When the above error occurs, batchsize is 2. Once I set batchsize to 1, the following error will also occur apart from the above occur:

Traceback (most recent call last):
File "/work_dirs/D2Det_mmdet2.1/tools/train.py", line 161, in
main()
File "/work_dirs/D2Det_mmdet2.1/tools/train.py", line 157, in main
meta=meta)
File "/work_dirs/D2Det_mmdet2.1/mmdet/apis/train.py", line 179, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], *kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 43, in train
self.call_hook('after_train_iter')
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 21, in after_train_iter
runner.outputs['loss'].backward()
File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function IndexPutBackward returned an invalid gradient at index 1 - got [353, 256, 7, 7] but expected shape compatible with [355, 256, 7, 7] (validate_outputs at /opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/autograd/engine.cpp:472)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7fc24d54fb5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x2ae3134 (0x7fc277214134 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node, torch::autograd::InputBuffer&) + 0x548 (0x7fc277215368 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7fc2772172f2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fc27720f969 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fc27a556558 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0xc819d (0x7fc28fae019d in /opt/conda/bin/../lib/libstdc++.so.6)
frame #7: + 0x76db (0x7fc29e1746db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #8: clone + 0x3f (0x7fc29de9d88f in /lib/x86_64-linux-gnu/libc.so.6)

Process finished with exit code 1

Machine97 · 2021-03-04T11:54:17Z

@JialeCao001 The problem has been solved. The reason is that during the processing of KITTI data set, other classes except Car,Pedestrian, Cyclist and DontCare will be marked with -1. In version 2.1 of mmdetection, there is no error reminder of ' assertion`cur _ target > = 0&&cur _ target < n _ classes' failed'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I got a problem when I use KITTI dataset to train your model #38

I got a problem when I use KITTI dataset to train your model #38

Machine97 commented Mar 1, 2021

Machine97 commented Mar 2, 2021

Machine97 commented Mar 4, 2021

I got a problem when I use KITTI dataset to train your model #38

I got a problem when I use KITTI dataset to train your model #38

Comments

Machine97 commented Mar 1, 2021

Machine97 commented Mar 2, 2021

Machine97 commented Mar 4, 2021