refactor the code

yqchupy · Nov 14, 2020 · 1b29608 · 1b29608
1 parent e95ff2b
commit 1b29608
Showing 8 changed files with 143 additions and 81 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
+tmp*
 .DS_Store
 
 # Byte-compiled / optimized / DLL files

diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 # MedMNIST 
-## [arXiv Preprint](https://arxiv.org/abs/2010.14925) | [Project Page](https://medmnist.github.io/) | [Dataset](https://www.dropbox.com/sh/upxrsyb5v8jxbso/AADOV0_6pC9Tb3cIACro1uUPa?dl=0)
+## [arXiv Preprint](https://arxiv.org/abs/2010.14925) | [Project Page](https://medmnist.github.io/) | [Dataset](#dataset)
 [Jiancheng Yang](https://jiancheng-yang.com/), Rui Shi, [Bingbing Ni](https://scholar.google.com/citations?user=eUbmKwYAAAAJ), [Bilian Ke](https://scholar.google.com/citations?user=2cX5y8kAAAAJ)
 
 We present *MedMNIST*, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28 * 28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets. 
@@ -23,38 +23,34 @@ Please note that this dataset is **NOT** intended for clinical use.
     * [`dataset.py`](medmnist/dataset.py): dataloaders of medmnist.
     * [`models.py`](medmnist/models.py): *ResNet-18* and *ResNet-50* models.
     * [`evaluator.py`](medmnist/evaluator.py): evaluate metrics.
-    * [`environ.py`](medmnist/environ.py): roots.
 * [`train.py`](train.py): the training script.
 
 # Requirements
+The code requires only common Python environments for machine learning; Basicially, it was tested with
 * Python 3 (Anaconda 3.6.3 specifically)
 * PyTorch\==0.3.1
-* numpy\==1.18.5, pandas\==0.25.3, scikit-learn\==0.22.2
+* numpy\==1.18.5, pandas\==0.25.3, scikit-learn\==0.22.2, tqdm
 
-Higher versions should also work (perhaps with minor modifications).
+Higher (or lower) versions should also work (perhaps with minor modifications).
 
 
 # Dataset
 
-Our MedMNIST dataset is available on [Dropbox](https://www.dropbox.com/sh/upxrsyb5v8jxbso/AADOV0_6pC9Tb3cIACro1uUPa?dl=0).
+You could download the dataset(s) via the following free accesses:
+
+* [zenodo.org](https://doi.org/10.5281/zenodo.4269852) (recommended): You could also use our code to download the datasets from zenodo.org automatically.
+* [Google Drive](https://drive.google.com/drive/folders/1Tl_SP-ffDQg-jDG_EWPlWKgZTmGbvFXU?usp=sharing)
+* [百度网盘](https://pan.baidu.com/s/1bgPbESbLOlUSu4QC-4O46g) (code: gx6i)
 
 The dataset contains ten subsets, and each subset (e.g., `pathmnist.npz`) is comprised of `train_images`, `train_labels`, `val_images`, `val_labels`, `test_images` and `test_labels`.
 
 # How to run the experiments
 
-* Download Dataset  [MedMNIST](https://www.dropbox.com/sh/upxrsyb5v8jxbso/AADOV0_6pC9Tb3cIACro1uUPa?dl=0).
-
-* Modify the paths
-
-  Specify `dataroot` and `outputroot` in  [./medmnist/environ.py](./medmnist/environ.py) 
-
-  `dataroot` is the root where you save our `npz` datasets
-
-  `outputroot` is the root where you want to save testing results
+* Download Dataset.
 
-* Run our [`train.py`](./train.py) script in terminal. 
+* Run the demo code [`train.py`](./train.py) script in terminal. 
 
-  First, change directory to where train.py locates. Then, use command `python train.py xxxmnist` to run the experiments, where `xxxmnist` is subset of our MedMNIST (e.g., `pathmnist`).
+  First, change directory to where train.py locates. Then, use command `python train.py xxxmnist input_root output_root` to run the experiments, where `xxxmnist` is subset of our MedMNIST (e.g., `pathmnist`), `input_root` is the path of the data files, `output_root` is the folder to save the results.
 
 # Citation
 If you find this project useful, please cite our paper as:

diff --git a/medmnist/dataset.py b/medmnist/dataset.py
@@ -1,8 +1,8 @@
-from medmnist import environ
 import os
+import json
 import numpy as np
-from torch.utils.data import Dataset
 from PIL import Image
+from torch.utils.data import Dataset
 
 
 INFO = "medmnist/medmnist.json"
@@ -12,15 +12,27 @@ class MedMNIST(Dataset):
 
     flag = ...
 
-    def __init__(self, split='train', transform=None, target_transform=None):
+    def __init__(self, root, split='train', transform=None, target_transform=None, download=False):
         ''' dataset
-        :param split: 'train', 'val' or 'test', select dataset
+        :param split: 'train', 'val' or 'test', select subset
         :param transform: data transformation
         :param target_transform: target transformation
     
         '''
 
-        npz_file = np.load(os.path.join(environ.dataroot,"{}.npz".format(self.flag)))
+        with open(INFO, 'r') as f:
+            self.info = json.load(f)[self.flag]
+
+        self.root = root
+
+        if download:
+            self.download()
+
+        if not os.path.exists(os.path.join(self.root, "{}.npz".format(self.flag))):
+            raise RuntimeError('Dataset not found.' +
+                               ' You can use download=True to download it')
+
+        npz_file = np.load(os.path.join(self.root, "{}.npz".format(self.flag)))
 
         self.split = split
         self.transform = transform
@@ -51,6 +63,37 @@ def __getitem__(self, index):
     def __len__(self):
         return self.img.shape[0]
 
+    def __repr__(self):
+        '''Adapted from torchvision.
+        '''
+        _repr_indent = 4
+        head = "Dataset " + self.__class__.__name__
+
+        body = ["Number of datapoints: {}".format(self.__len__())]
+        body.append("Root location: {}".format(self.root))
+        body.append("Split: {}".format(self.split))
+        body.append("Task: {}".format(self.info["task"]))
+        body.append("Number of channels: {}".format(self.info["n_channels"]))
+        body.append("Meaning of labels: {}".format(self.info["label"]))
+        body.append("Number of samples: {}".format(self.info["n_samples"]))
+        body.append("Description: {}".format(self.info["description"]))
+        body.append("License: {}".format(self.info["license"]))
+
+        if hasattr(self, "transforms") and self.transforms is not None:
+            body += [repr(self.transforms)]
+        lines = [head] + [" " * _repr_indent + line for line in body]
+        return '\n'.join(lines)
+
+    def download(self):
+        try:
+            from torchvision.datasets.utils import download_url
+            download_url(url=self.info["url"], root=self.root, 
+                        filename="{}.npz".format(self.flag), md5=self.info["MD5"])
+        except:
+            raise RuntimeError('Something went wrong when downloading! ' +
+                    'Go to the homepage to download manually. ' +
+                    'https://github.com/MedMNIST/MedMNIST')
+
 
 class PathMNIST(MedMNIST):
     flag = "pathmnist"
@@ -80,13 +123,13 @@ class BreastMNIST(MedMNIST):
     flag = "breastmnist"
 
 
-class OrganMNIST_Axial(MedMNIST):
+class OrganMNISTAxial(MedMNIST):
     flag = "organmnist_axial"
 
 
-class OrganMNIST_Coronal(MedMNIST):
+class OrganMNISTCoronal(MedMNIST):
     flag = "organmnist_coronal"
 
 
-class OrganMNIST_Sagittal(MedMNIST):
+class OrganMNISTSagittal(MedMNIST):
     flag = "organmnist_sagittal"
diff --git a/medmnist/environ.py b/medmnist/environ.py
diff --git a/medmnist/evaluator.py b/medmnist/evaluator.py
@@ -61,7 +61,7 @@ def getACC(y_true, y_score, task, threshold=0.5):
         return accuracy_score(y_true, y_pre)
 
 
-def save(y_true, y_score, outputpath):
+def save_results(y_true, y_score, outputpath):
     '''Save ground truth and scores
     :param y_true: the ground truth labels, shape: (n_samples, n_classes) for multi-label, and (n_samples,) for other tasks
     :param y_score: the predicted score of each class, shape: (n_samples, n_classes)

diff --git a/medmnist/medmnist.json b/medmnist/medmnist.json
@@ -1,7 +1,8 @@
 {
     "pathmnist": {
         "description": "PathMNIST: A dataset based on a prior study for predicting survival from colorectal cancer histology slides, which provides a dataset NCT-CRC-HE-100K of 100,000 non-overlapping image patches from hematoxylin & eosin stained histological images, and a test dataset CRC-VAL-HE-7K of 7,180 image patches from a different clinical center. 9 types of tissues are involved, resulting a multi-class classification task. We resize the source images of 3 x 224 x 224 into 3 x 28 x 28, and split NCT-CRC-HE-100K into training and valiation set with a ratio of 9:1.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/pathmnist.npz?download=1",
+        "MD5": "a8b06965200029087d5bd730944a56c1",
         "task": "multi-class",
         "label": {
             "0": "adipose",
@@ -19,11 +20,13 @@
             "train": 89996,
             "val": 10004,
             "test": 7180
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "chestmnist": {
         "description": "ChestMNIST: A dataset based on NIH-ChestXray14 dataset, comprising 112,120 frontal-view X-ray images of 30,805 unique patients with the text-mined 14 disease image labels, which could be formulized as multi-label binary classification task. We use the official data split, and resize the source images of 1 x 1024 x 1024 into 1 x 28 x 28.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/chestmnist.npz?download=1",
+        "MD5": "02c8a6516a18b556561a56cbdd36c4a8",
         "task": "multi-label, binary-class",
         "label": {
             "0": "atelectasis",
@@ -46,11 +49,13 @@
             "train": 78468,
             "val": 11219,
             "test": 22433
-        }
+        },
+        "license": "CC0 1.0"
     },
     "dermamnist": {
         "description": "DermaMNIST: A dataset based on HAM10000, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. The dataset consists of 10,015 dermatoscopic images labeled as 7 different categories, as a multi-class classification task. We split the images into training, validation and test set with a ratio of 7:1:2. The source images of 3 x 600 x 450 are resized into 3 x 28 x 28.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/dermamnist.npz?download=1",
+        "MD5": "0744692d530f8e62ec473284d019b0c7",
         "task": "multi-class",
         "label": {
             "0": "actinic keratoses and intraepithelial carcinoma",
@@ -66,11 +71,13 @@
             "train": 7007,
             "val": 1003,
             "test": 2005
-        }
+        },
+        "license": "CC BY-NC 4.0"
     },
     "octmnist": {
         "description": "OCTMNIST: A dataset based on a prior dataset of 109,309 valid optical coherence tomography (OCT) images for retinal diseases. 4 types are involved, leading to a multi-class classification task. We split the source training set with a ratio of 9:1 into training and validation set, and use its source validation set as the test set. The source images are single-channel, and their sizes range from (384-1,536) x (277-512). We center-crop the images and resize them into 1 x 28 x 28.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/octmnist.npz?download=1",
+        "MD5": "c68d92d5b585d8d81f7112f81e2d0842",
         "task": "multi-class",
         "label": {
             "0": "choroidal neovascularization",
@@ -83,11 +90,13 @@
             "train": 97477,
             "val": 10832,
             "test": 1000
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "pneumoniamnist": {
         "description": "PneumoniaMNIST: A dataset based on a prior dataset of 5,856 pediatric chest X-ray images. The task is binary-class classification of pneumonia and normal. We split the source training set with a ratio of 9:1 into training and validation set, and use its source validation set as the test set. The source images are single-channel, and their sizes range from (384-2,916) x (127-2,713). We center-crop the images and resize them into 1 x 28 x 28.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/pneumoniamnist.npz?download=1",
+        "MD5": "28209eda62fecd6e6a2d98b1501bb15f",
         "task": "binary-class",
         "label": {
             "0": "normal",
@@ -98,11 +107,13 @@
             "train": 4708,
             "val": 524,
             "test": 624
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "retinamnist": {
         "description": "RetinaMNIST: A dataset based on DeepDRiD, a dataset of 1,600 retina fundus images. The task is ordinal regression for 5-level grading of diabetic retinopathy severity. We split the source training set with a ratio of 9:1 into training and validation set, and use the source validation set as test set. The source images of 3 x 1,736 x 1,824 are center-cropped and resized into 3 x 28 x 28",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/retinamnist.npz?download=1",
+        "MD5": "bd4c0672f1bba3e3a89f0e4e876791e4",
         "task": "ordinal regression",
         "label": {
             "0": "0",
@@ -116,11 +127,13 @@
             "train": 1080,
             "val": 120,
             "test": 400
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "breastmnist": {
         "description": "BreastMNIST: A dataset based on a dataset of 780 breast ultrasound images. It is categorized into 3 classes: normal, benign and malignant. As we use low-resolution images, we simplify the task into binary classification by combing normal and benign as negative, and classify them against malignant as positive. We split the source dataset with a ratio of 7:1:2 into training, validation and test set. The source images of 1 x 500 x 500 are resized into 1 x 28 x 28.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/breastmnist.npz?download=1",
+        "MD5": "750601b1f35ba3300ea97c75c52ff8f6",
         "task": "binary-class",
         "label": {
             "0": "malignant",
@@ -131,11 +144,13 @@
             "train": 546,
             "val": 78,
             "test": 156
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "organmnist_axial": {
         "description": "OrganMNIST_Axial: A dataset based on 3D computed tomography (CT) images from Liver Tumor Segmentation Benchmark (LiTS). We use bounding-box annotations of 11 body organs from another study to obtain the organ labels. Hounsfield-Unit (HU) of the 3D images are transformed into grey scale with a abdominal window; we then crop 2D images from the center slices of the 3D bounding boxes in axial views (planes). The images are resized into 1 x 28 x 28 to perform multi-class classification of 11 body organs. 115 and 16 CT scans from the source training set are used as training and validation set, respectively. The 70 CT scans from the source test set are treated as the test set.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/organmnist_axial.npz?download=1",
+        "MD5": "866b832ed4eeba67bfb9edee1d5544e6",
         "task": "multi-class",
         "label": {
             "0": "bladder",
@@ -155,11 +170,13 @@
             "train": 34581,
             "val": 6491,
             "test": 17778
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "organmnist_coronal": {
         "description": "OrganMNIST_Coronal: A dataset based on 3D computed tomography (CT) images from Liver Tumor Segmentation Benchmark (LiTS). We use bounding-box annotations of 11 body organs from another study to obtain the organ labels. Hounsfield-Unit (HU) of the 3D images are transformed into grey scale with a abdominal window; we then crop 2D images from the center slices of the 3D bounding boxes in coronal views (planes). The images are resized into 1 x 28 x 28 to perform multi-class classification of 11 body organs. 115 and 16 CT scans from the source training set are used as training and validation set, respectively. The 70 CT scans from the source test set are treated as the test set.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/organmnist_coronal.npz?download=1",
+        "MD5": "0afa5834fb105f7705a7d93372119a21",
         "task": "multi-class",
         "label": {
             "0": "bladder",
@@ -179,11 +196,13 @@
             "train": 13000,
             "val": 2392,
             "test": 8268
-        }
+        },
+        "license": "CC BY 4.0"
     },
     "organmnist_sagittal": {
         "description": "OrganMNIST_Sagittal: A dataset based on 3D computed tomography (CT) images from Liver Tumor Segmentation Benchmark (LiTS). We use bounding-box annotations of 11 body organs from another study to obtain the organ labels. Hounsfield-Unit (HU) of the 3D images are transformed into grey scale with a abdominal window; we then crop 2D images from the center slices of the 3D bounding boxes in sagittal views (planes). The images are resized into 1 x 28 x 28 to perform multi-class classification of 11 body organs. 115 and 16 CT scans from the source training set are used as training and validation set, respectively. The 70 CT scans from the source test set are treated as the test set.",
-        "url": "...",
+        "url": "https://zenodo.org/record/4269852/files/organmnist_sagittal.npz?download=1",
+        "MD5": "e5c39f1af030238290b9557d9503af9d",
         "task": "multi-class",
         "label": {
             "0": "bladder",
@@ -203,6 +222,7 @@
             "train": 13940,
             "val": 2452,
             "test": 8829
-        }
+        },
+        "license": "CC BY 4.0"
     }
 }
diff --git a/medmnist/models.py b/medmnist/models.py
@@ -1,5 +1,5 @@
 '''
-kuangliu/pytorch-cifar
+Adapted from kuangliu/pytorch-cifar .
 '''
 
 import torch.nn as nn