Skip to content

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

License

Notifications You must be signed in to change notification settings

LiJunnan1992/DivideMix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Sep 14, 2020
d9d3058 · Sep 14, 2020

History

42 Commits
Nov 21, 2019
Nov 21, 2019
Jan 10, 2020
Nov 21, 2019
Feb 25, 2020
Jan 6, 2020
Jan 6, 2020
Jul 21, 2020
Jul 21, 2020
Nov 21, 2019
Nov 21, 2019
Sep 14, 2020

Repository files navigation

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

PyTorch Code for the following paper at ICLR2020:
Title: DivideMix: Learning with Noisy Labels as Semi-supervised Learning [pdf]
Authors:Junnan Li, Richard Socher, Steven C.H. Hoi
Institute: Salesforce Research

Abstract
Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reduce the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.

Illustration

Experiments
First, please create a folder named checkpoint to store the results.
mkdir checkpoint
Next, run
python Train_{dataset_name}.py --data_path path-to-your-data

Cite DivideMix
If you find the code useful in your research, please consider citing our paper:

@inproceedings{
    li2020dividemix,
    title={DivideMix: Learning with Noisy Labels as Semi-supervised Learning},
    author={Junnan Li and Richard Socher and Steven C.H. Hoi},
    booktitle={International Conference on Learning Representations},
    year={2020},
}

License
This project is licensed under the terms of the MIT license.

About

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages