Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear learning rate scaling? #476

Open
LaCandela opened this issue Nov 18, 2019 · 2 comments
Open

linear learning rate scaling? #476

LaCandela opened this issue Nov 18, 2019 · 2 comments

Comments

@LaCandela
Copy link

Hi! I have a question concerning the linear learning rate scaling that you are using. In the publication https://arxiv.org/abs/1706.02677 this scaling rule is only proven for SGD but you are using Adam. Did you do or do you know about any experiments that back up this approach?

@xingyizhou
Copy link
Owner

I searched the learning rate locally (2.5e-4, 6.125e-5 for batch size 32), the performance is within the random noise (<0.4 COCO AP).

@LaCandela
Copy link
Author

OK, thank you for your answer! Did you experiment with different batch sizes and up/downscaled learning rates with Adam to see if the linear scaling rule is true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants