The Non-IID Data Quagmire of Decentralized Machine Learning

Hsieh, Kevin; Phanishayee, Amar; Mutlu, Onur; Gibbons, Phillip B.

Computer Science > Machine Learning

arXiv:1910.00189 (cs)

[Submitted on 1 Oct 2019 (v1), last revised 19 Aug 2020 (this version, v2)]

Title:The Non-IID Data Quagmire of Decentralized Machine Learning

Authors:Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

View PDF

Abstract:Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.00189 [cs.LG]
	(or arXiv:1910.00189v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.00189
Journal reference:	International Conference on Machine Learning (ICML), 2020

Submission history

From: Kevin Hsieh [view email]
[v1] Tue, 1 Oct 2019 03:52:47 UTC (992 KB)
[v2] Wed, 19 Aug 2020 00:58:47 UTC (2,148 KB)

Computer Science > Machine Learning

Title:The Non-IID Data Quagmire of Decentralized Machine Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Non-IID Data Quagmire of Decentralized Machine Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators