Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about training progress #10

Closed
wangjiyuan9 opened this issue Jun 5, 2023 · 6 comments
Closed

Some question about training progress #10

wangjiyuan9 opened this issue Jun 5, 2023 · 6 comments

Comments

@wangjiyuan9
Copy link

Dear @Dwawayu
Thank you for your amazing work!

  1. Now I'm tying to retraining your model with only 1 RTX3090. So I use: CUDA_VISIBLE_DEVICES=0 python -u train.py --png --model_name ** --use_denseaspp --use_mixture_loss --plane_residual --flip_right for first stage and delete all the code like dist.get_rank() == 0: blablabla . Are there any other details I need to pay attention?
  2. I would like to see what your training output looks like. I've noticed that my training loss is often negative and after about 20 epochs, the val abs_rel is around 0.11. Is this normal?
  3. I notice that at here you recode the bestmodel . But at HRfinetune you use --load_weights_folder ./log/ResNet/exp1/last_models .Why you don't use the bestmodel?
  4. In the paper:
    image
    I'm wandering All of your results are the 50th epoch result (last_model)? Or the best_model result? And All the three line you had use the HRfinetune, right?

Thank your for your time and help.

@Dwawayu
Copy link
Member

Dwawayu commented Jun 5, 2023

Dear Jiyuan,
Hi! Thank you again for your kind words and attention. Here are some suggestions:

  1. Instead of modifying the source code, I recommend making changes only to the running command. To run the code on a single GPU, you can use:
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py [--args]
  1. The Mixture Laplace Loss is the negtive log-likelihood of the destribution, so the loss may become negative and it's very normal. Since we validate our model on 640x192 during the first stage to save memory as much as possible, so the val abs_rel may be slightly worse, but it should be less than 0.1 after about 20 epochs.
  2. I think that both loading the last and the best model are OK. Since I'm continuing the training and not testing the results, I personally prefer loading the last model. You can load the best_model since it may avoid overfitting and may get better results.
  3. All the three results are the best_model after HRfinetune and self-distillation.

I hope these help, please let me know if you have any further questions.

Best,
Ruoyu Wang

@wangjiyuan9
Copy link
Author

Helpful !Thanks ! :)

@wangjiyuan9
Copy link
Author

Dear RuoYu:
I have some questions about the results presented in the paper. Specifically, in the Stage 1 phase, my results in kittti_raw_test dataset show an RMSE of 4.413 and an absolute relative difference (abs_rel) of 0.103. And after applying HRfinetune, the RMSE is reduced to 4.05 and abs_rel is increased to 0.9569. In contrast, the paper reports an abs_rel of 0.085 and RMSE of 4.023.

I am unsure if the difference in the abs_rel result is caused by an error in my implementation or if there is a mistake in anywhere. Moreover, I have observed that the paper refers to the self-distillation method only in relation to the third line of results (marked with a † symbol, which was obtained by utilizing their self-distillation method for post-processing)
image

And you said:

4. All the three results are the best_model after HRfinetune and self-distillation.

Does this mean that the first two lines of results also include self-distillation or did I misunderstand something?(If it is,I am also curious about the difference between the self-distillation used during training and the post-processing self-distillation for generating labels. Could you please elaborate on this?)

Thank you in advance for your help and clarification.

@wangjiyuan9 wangjiyuan9 reopened this Jun 9, 2023
@wangjiyuan9
Copy link
Author

Also , At here ,Why you use 'color_aug' to predict pose instead of color?

@Dwawayu
Copy link
Member

Dwawayu commented Jun 9, 2023

Dear Jiyuan,
Hi! For the first question, the model after the first stage should perform better. Specifically, it should have an abs_rel of approximately 0.090 and an RMSE of around 4.180, so there might be mistakes in the first stage or evaluation. Have you checked whether the model was tested at 1280*384? Furthermore, things became more perplexing after the HRfinetuning, as it showed a good RMSE but a completely inaccurate abs_rel. Considering the difference between these two metrics, there might be a very small GT depth occurring somewhere. A visualization of the predicted depth or abs_rel error of each image may help us to find the mistake.

For the second question, self-distillation can enhance the raw predictions, so It can generate labels in training and improve prediction results in testing. Our best model was obtained after self-distillation training, and the three results are raw prediction, pp of raw prediction, and sd of raw prediction in the testing of the best model.

@Dwawayu
Copy link
Member

Dwawayu commented Jun 9, 2023

Oh! Regarding the new question, the augmented images are used as inputs of all networks to rich the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants