linear learning rate scaling? #476

LaCandela · 2019-11-18T09:52:46Z

Hi! I have a question concerning the linear learning rate scaling that you are using. In the publication https://arxiv.org/abs/1706.02677 this scaling rule is only proven for SGD but you are using Adam. Did you do or do you know about any experiments that back up this approach?

xingyizhou · 2019-11-18T16:24:35Z

I searched the learning rate locally (2.5e-4, 6.125e-5 for batch size 32), the performance is within the random noise (<0.4 COCO AP).

LaCandela · 2019-11-19T08:52:58Z

OK, thank you for your answer! Did you experiment with different batch sizes and up/downscaled learning rates with Adam to see if the linear scaling rule is true?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linear learning rate scaling? #476

linear learning rate scaling? #476

LaCandela commented Nov 18, 2019

xingyizhou commented Nov 18, 2019

LaCandela commented Nov 19, 2019

linear learning rate scaling? #476

linear learning rate scaling? #476

Comments

LaCandela commented Nov 18, 2019

xingyizhou commented Nov 18, 2019

LaCandela commented Nov 19, 2019