Learned Optimizers that Scale and Generalize

Wichrowska, Olga; Maheswaranathan, Niru; Hoffman, Matthew W.; Colmenarejo, Sergio Gomez; Denil, Misha; de Freitas, Nando; Sohl-Dickstein, Jascha

Computer Science > Machine Learning

arXiv:1703.04813 (cs)

[Submitted on 14 Mar 2017 (v1), last revised 7 Sep 2017 (this version, v4)]

Title:Learned Optimizers that Scale and Generalize

Authors:Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

View PDF

Abstract:Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on. We release an open source implementation of the meta-training algorithm.

Comments:	Final ICML paper after reviewer suggestions
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1703.04813 [cs.LG]
	(or arXiv:1703.04813v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1703.04813

Submission history

From: Olga Wichrowska [view email]
[v1] Tue, 14 Mar 2017 23:05:54 UTC (612 KB)
[v2] Mon, 8 May 2017 21:55:33 UTC (612 KB)
[v3] Fri, 23 Jun 2017 22:22:38 UTC (1,210 KB)
[v4] Thu, 7 Sep 2017 23:38:09 UTC (1,210 KB)

Computer Science > Machine Learning

Title:Learned Optimizers that Scale and Generalize

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learned Optimizers that Scale and Generalize

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators