A Dense Residual U-Net For Multiple Sclerosis Lesions Segmentation From
A Dense Residual U-Net For Multiple Sclerosis Lesions Segmentation From
ir
https://www.tarjomano.com https://www.tarjomano.com
A R T I C L E I N F O A B S T R A C T
Keywords: Multiple Sclerosis (MS) is an autoimmune disease that causes brain and spinal cord lesions, which magnetic
Multiple sclerosis (MS) resonance imaging (MRI) can detect and characterize. Recently, deep learning methods have achieved
MS lesion segmentation remarkable results in the automated segmentation of MS lesions from MRI data. Hence, this study proposes
MRI
a novel dense residual U-Net model that combines attention gate (AG), efficient channel attention (ECA), and
U-net
Convolutional neural networks
Atrous Spatial Pyramid Pooling (ASPP) to enhance the performance of the automatic MS lesion segmentation
Deep learning using 3D MRI sequences. First, convolution layers in each block of the U-Net architecture are replaced by
Residual blocks residual blocks and connected densely. Then, AGs are exploited to capture salient features passed through the
skip connections. The ECA module is appended at the end of each residual block and each downsampling block
of U-Net. Later, the bottleneck of U-Net is replaced with the ASSP module to extract multi-scale contextual
information. Furthermore, 3D MR images of Fluid Attenuated Inversion Recovery (FLAIR), T1-weighted (T1-w),
and T2-weighted (T2-w) are exploited jointly to perform better MS lesion segmentation. The proposed model
is validated on the publicly available ISBI2015 and MSSEG2016 challenge datasets. This model produced an
ISBI score of 92.75, a mean Dice score of 66.88%, a mean positive predictive value (PPV) of 86.50%, and a
mean lesion-wise true positive rate (LTPR) of 60.64% on the ISBI2015 testing set. Also, it achieved a mean Dice
score of 67.27%, a mean PPV of 65.19%, and a mean sensitivity of 74.40% on the MSSEG2016 testing set. The
results show that the proposed model performs better than the results of some experts and some of the other
state-of-the-art methods realized related to this particular subject. Specifically, the best Dice score and the best
LTPR are obtained on the ISBI2015 testing set by using the proposed model to segment MS lesions.
✩
Python codes are available on GitHub at https://github.com/beytullahsarica/MS-Lesion-Segmentation.
* Corresponding author.
E-mail addresses: [email protected] (B. Sarica), [email protected] (D.Z. Seker), [email protected] (B. Bayram).
https://doi.org/10.1016/j.ijmedinf.2022.104965
Available
Received 6online 21 December
December 2022 8 December 2022
2022; Accepted
1386-5056/© 2022 Elsevier B.V. All rights reserved.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
whether each central pixel/voxel is classified as a healthy region. The the overfitting issue in 3D-based segmentation and the deficiency of
advantage of this technique is that it exploits better contextual infor- global structure in the patch-based method, they preferred a whole-
mation around pixels and is also preferred in medical image analysis to brain slice-based approach in their research. Zhang et al. [52] proposed
obtain more training samples, reducing class imbalance problems [45]. a method using a fully convolutional densely connected network and
However, a longer training time and omitting global structure informa- stacked adjacent slices from the orthogonal planes (axial, sagittal, coro-
tion are disadvantages of this approach due to repeated computations nal) of different modalities as 2.5D input data. They took advantage of
and small patch sizes. On the other hand, image-based methods use the the global and local context from slices, and these slices increased train-
global structure of the entire image, and they have higher computa- ing samples to make accurate segmentation as well. Kang et al. [24]
tional efficiency due to one forward propagation to classify all pixels presented a 3D attention context U-Net (ACU-Net) which is a novel
from the input image [10,45,50]. Image-based segmentation can be end-to-end segmentation framework to cope with the challenge of MS
performed with either slice-based or 3D-based segmentation methods. lesion segmentation. To expand the perception field and guide contex-
Slice-based segmentation is defined as converting each 3D MRI into 2D tual information, a 3D context-guided module was used in the encoding
slices using three plane orientations, then processing each slice individ- and decoding stages of 3D U-Net. They used a 3D spatial attention block
ually as an input to a CNN. Afterward, the segmented slices are fused to enhance feature representation in the skip connection phase as well.
again to form a 3D binary segmentation. In 3D-based segmentation, a Zhang et al. [53] proposed a deep CNN model based on 3D U-Net to per-
CNN with 3D kernels is employed to directly extract useful information form fast and accurate MS lesion segmentation that uses FLAIR, T1-w,
from the original 3D image. A standard 3D segmentation is highly likely and T2-w sequences. Anatomical information obtained using distance
to result in overfitting issues due to the large number of parameters in transformation mapping and a lesion-wise loss function was integrated
the small dataset, which is common in the medical field [10]. To handle to obtain anatomical structure information and improve small lesion de-
this issue, Liu et al. [29] and Tetteh et al. [44] proposed 3D cross-hair tection, respectively. Kamraoui et al. [23] proposed a DeepLesionBrain
convolution. They defined three 2D filters for each of the three orthogo- (DLB) method which is a novel and robust method for domain shift
nal plane orientations around the voxel, and then the sum of the results using the 3D CNN model. They used a spatial strategy that involves
obtained from each convolution is given to the central voxel. This re- multiple compact 3D CNNs with large overlapping receptive fields to
duces the number of training parameters, resulting in a shorter training generate consensus-based segmentation robust to domain shift. To effec-
time when compared to a standard 3D segmentation. Nevertheless, this
tively combine both generic and specialized features, they trained the
method has three times more parameters for each layer in the network
model using hierarchical specialization learning. To increase training
when compared to sliced-based methods.
data variability, they proposed a novel image quality data augmenta-
In the literature, many CNN-based methods are proposed for the au-
tion as well.
tomatic segmentation of MS lesions with various input data strategies
Additionally, Weeda et al. [49] aimed to test the CNN-based nicM-
and networks. 2D patch-based CNNs from multiple images, multiple
Slesions software and compare it with manual and other automatic seg-
views, and multiple time points have been proposed by Birenbaum and
mentation to investigate its performance using an independent dataset.
Greenspan [8] to take advantage of longitudinal data for MS lesion
They focused on five segmentation methods, which are LessionTOADS,
segmentation. Brosch et al. [10] proposed a whole-brain segmentation
LST-LPA default, LST-LPA adjusted-threshold, BIANCA, and nicMSle-
approach using a 3D CNN with a single shortcut connection between
sions single-subject, respectively. Their results show that the nicMSLe-
the first and last layers of the network. This shortcut connection al-
sions method can be easily trained with only one manual delineation
lows the proposed network to integrate high- and low-level features
and achieved better results than others. Most of these studies have fo-
to obtain information on the structure of MS lesions. However, their
cused on accurate MS lesion segmentation using MRI data from the
proposed network could not benefit from mid-level features, affecting
same domain. However, quantifying MS lesions through the use of MRI
the segmentation performance [37]. A pipeline for white matter lesion
data from different centers and scanners has become essential for eval-
segmentation using a cascade of two 3D patch-based CNNs has been
uating segmentation performance. Therefore, new deep learning-based
proposed by Valverde et al. [46]. In their study, the second network ob-
tained the input features from the first network, which was designed segmentation approaches are still required for accurate MS lesion seg-
to select features. Their proposed network consisted of a 7-layer CNN mentation and generalization capabilities toward different centers and
model and used multi-sequence 3D patches from training images. Roy scanners.
et al. [38] developed a 2D patch-based fully CNN model for MS lesion This study proposes a novel fully convolutional neural network that
segmentation. Their network contains two pathways for FLAIR and T1- combines U-Net, dense connections, residual blocks, AG, ECA, and ASPP
w modalities and then concatenates the outputs of each path to generate modules for MS lesion segmentation on multi-sequence 3D MR images.
a member function for MS lesions. A 3D fully convolutional densely con- Furthermore, the proposed model’s segmentation performance and gen-
nected network (FC-DenseNet) using a 3D patch-based CNN has been eralization ability are evaluated on multiple datasets exhibiting high
proposed by Hashemi et al. [18]. They also used an asymmetric simi- variability in acquisition sites, scanner manufacturers, resolution, pre-
larity loss layer based on the Tversky index to handle unbalanced data processing, and clinical cases. A whole-brain slice-based segmentation
issues in medical imaging. is employed to avoid overfitting that occurs in 3D methods and the
Although the patch-based methods improve lesion segmentation and loss of global structural information that occurs in patch-based meth-
perform well, global structural information is not used; namely, the ods [10,46,38]. In addition, a 3D reconstruction approach is presented
global brain structure and lesion locations are not part of the segmen- to form a 3D final binary output from 3D outputs generated from the
tation. To address this issue, whole-brain slice-based methods for MS predicted 2D slices for each plane orientation using a majority voting
segmentation using CNNs based algorithms are also proposed in the lit- method. The main contributions of this study are summarized as fol-
erature. For example, Aslani et al. [3,4] proposed a deep end-to-end 2D lows:
encoder-decoder CNN utilizing the slice-based segmentation approach
for 3D MRI data of different modalities. In their first study, they ex- • Presented a novel deep learning model that combines U-Net, dense
ploited a stack of 3-channel slices extracted from each plane of each connections, residual blocks, AG, ECA, and ASPP to improve the
corresponding modality and a modified ResNet50 architecture. Their performance, generalization ability, and robustness of the MS le-
other study used single-channel slices obtained from each plane of each sion segmentation task.
modality using multiple networks separately. Their proposed network • Multiple MRI sequences, such as FLAIR, T1-w, and T2-w, have been
includes a multi-branch downsampling path and multi-scale feature fu- used jointly to obtain more features related to MS lesions, such as
sion blocks to merge features from multimodal MRI data. To overcome shape and location.
2
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Fig. 1. Illustration of the preprocessed ISBI2015 and MSSEG2016 datasets training samples with an axial view of (a-e) FLAIR, (b-f) T1-w, (c-g) T2-w. The corre-
sponding manual delineations by two raters for the ISBI2015 are shown in (d) for Rater #1 and (e) for Rater #2. The consensus segmentation computed from the
manual segmentation for the MSSEG2016 is shown in (h).
• Data augmentation techniques were heavily used to prevent over- Tesla MRI scanner [12]. 181 (axial and sagittal) and 217 (coronal) slices
fitting and realize a robust model. make up the volumes with 1 mm cubic voxel resolution. The training set
• A hybrid loss function, which is the addition of focal and Dice comprises 21 3D scans from five patients with white matter lesions asso-
losses, was employed to overcome the class imbalance problem in ciated with MS and has been annotated by the two expert raters, while
MS lesion segmentation. 14 patients of the testing dataset comprising 61 3D scans do not have
• Cross-dataset validation across unseen datasets was performed for their delineated masks available for the public. Rater #1 has four years
the generalization ability and robustness of the model. of experience delineating lesions, while rater #2 has 10 years of expe-
• It has been observed that transferring weights from one dataset rience in manual segmentation and 17 years in structural MRI analysis
to another in the same domain has outperformed, especially on [12]. The performance of the proposed methods can be evaluated via
the MSSEG2016 dataset that contains different acquisition sites, the ISBI2015 challenge website2 by submitting the 3D binary masks ob-
resolutions, preprocessing, and clinical cases. tained from the ISBI2015 testing set. This dataset has also been widely
• Better results than other studies in the literature on the ISBI2015 used as a benchmark in many research studies for automatic MS lesion
testing set in terms of Dice score, which is one of the most impor- segmentation [51].
tant segmentation metrics for the evaluation of segmentation tasks The provided preprocessed dataset, in which the brain is stripped
in medical imaging, and LTPR were achieved. from the skull using the Brain Extraction Tool (BET) [40], has been
• Better results were obtained than experts on the MSSEG2016 test- employed in this study. First, we performed intensity normalization on
ing set in most evaluation metrics, such as Dice score and PPV. each 3D MRI using Kernel Density Estimation (KDE) with the Gaussian
kernel. Then, slices along three orthogonal directions were extracted
2. Materials and methods from all modalities for both raters. The size of each slice – axial, sagittal,
and coronal – is 181 × 217, 217 × 181, and 181 × 181, respectively. To
2.1. Datasets acquire the same size for each plane view (224 × 224), a zero-padding
technique was applied by centering the brain without considering its
In this study, two publicly available datasets, the ISBI2015 MS le- orientation. Furthermore, the slices that have MS lesions, including at
sion segmentation challenge dataset which will be denoted as ISBI2015, least one pixel, were preferred to remove non-informative samples and
and the MSSEG2016 challenge dataset which will be denoted as excessively unbalanced data when feeding the designed model.
MSSEG2016, were exploited to evaluate the proposed model. Fig. 1
shows the training samples for each dataset. In addition, these datasets 2.1.2. MSSEG2016 dataset
enable us to assess the generalization ability and robustness of the pro- The MSSEG2016 dataset is composed of 3D MR images of 53 MS
posed model on unseen datasets since they present high variability in patients gathered from four different clinical centers (Center01, Cen-
terms of acquisition sites, resolution, preprocessing, and clinical cases. ter03, Center07, and Center08) and four MRI scanners (1.5T and 3T).
3D FLAIR, 3D T1-w, 3D T1-w GADO, 2D DP, and 2D T2 sequences are
2.1.1. ISBI2015 dataset provided for each patient [17]. These images have been divided into
The ISBI2015 dataset, which is publicly available and download- two subsets, 15 patients for training and 38 patients for testing. The
able from the Challenge Evaluation website,1 was used to evaluate the MRI images from Center03 are only included in the testing set to eval-
MS lesion segmentation task in the proposed model. It has five train- uate the generalization and robustness of the model. The MS lesions
ing and fourteen testing subjects with 4 to 6 follow-up 3D scans per have been manually delineated by seven experts, and consensus mask
subject and a mean of 4.4 time-points. T1-w, T2-w, PD-w, and FLAIR data has been computed from their outputs for each patient. The voxel
sequences are provided for each time-point with data acquired on a 3.0
3
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Fig. 2. A framework for MS lesion segmentation using the proposed model with 2D slices extracted from 3D MR images of multiple sequences including the output
of the final 3D binary mask.
size of each MRI scan ranges in size from 1 × 0.5 × 0.5 to 1.25 × 1.04 ing the training step is robust, properly improving the prediction ac-
× 1.0 𝑚𝑚3 in this dataset. Raw and preprocessed MR images for each curacy, and reducing overfitting. In this study, data augmentation was
patient have been provided by the challenge organizers as well. performed during the training using Albumentations, an open-source li-
In this study, the provided preprocessed dataset was exploited to brary for image augmentations [11]. In particular, random 90 degrees
evaluate the generalization and robustness of the proposed model on rotation (p=0.5), vertical flip, horizontal flip, random crop, trans-
unseen centers and scanners. For the provided preprocessed data, the pose (p=0.5), shift scale rotate (shift_limit=0.01, scale_limit=0.04,
non-local means algorithm for denoising, the block-matching registra- rotate_limit=0, p=0.25), random brightness contrast (p=0.5), random
tion approach for a rigid registration, brain extraction using the vol- gamma (p=0.25), emboss (p=0.25), blur (p=0.01, blur_limit=3), and
Brain platform, and the N4 algorithm for bias field correction were per- one of these on elastic transform (p=0.5, alpha=120, sigma=120 ×
formed by the challenge organizers [17]. In addition to these steps, we 0.05, alpha_affine=120 × 0.03), grid distortion (p=0.5), optical distor-
performed intensity normalization on each 3D MRI as in the ISBI2015 tion (p=1, distort_limit=2, shift_limit=0.5) were used.
dataset. Afterward, each extracted 2D slice from each 3D MRI, which CNN transfer learning from another domain (not medical) to med-
has different spatial dimensions, is resized to a 224 × 224 shape using ical domains is usually exploited to initialize training weights in the
the nearest interpolation with keeping the original range of values [47] segmentation tasks due to the data scarcity in this domain. However,
since the proposed model depends on the fixed input size. Similar to the in this study, training weights obtained from the ISBI2015 dataset
ISBI2015 dataset, slices only having at least one lesion were chosen to (medical domain) were exploited to initialize training weights on the
remove excessively unbalanced data and non-informative samples. MSSEG2016 as a transfer learning strategy to improve the segmenta-
tion performance, generalization, and robustness of the model.
2.2. Data preparation
2.3. Network architecture details
Fig. 2 shows the whole framework of how to process 3D MRI data
associated with MS disease. Although the FLAIR sequence has a higher The proposed model integrates different components into the mod-
contrast between lesions and white matter than others, other modali- ified U-Net architecture. U-Net was chosen since it performs well over
ties also provide useful features like location and shape. Thus, a stack different domains in image segmentation, especially in the medical
of 2D slices was generated using the corresponding views obtained from domain [37]. It is an encoder-decoder architecture for semantic seg-
each plane orientation of FLAIR, T1-w, and T2-w sequences per channel. mentation with skip connections and consists of a contracting path, a
Then, stacked 2D slices of all plane orientations were concatenated to bottleneck, and an expansive path. In the contracting path, the input
form the training input data. For the ISBI2015 dataset, manual segmen- image is encoded into the feature representations at multiple different
tation masks of both raters were combined to create a single training levels with convolution blocks followed by a max pooling operation.
set. The total number of 2D slices extracted from the 3D MRI data of In the expansive path, upsampling operations are performed to expand
each rater was 5197 and 5716, resulting in a total of 10913 2D slices. the feature dimensions to concatenate with the corresponding features
Later, the total input data was divided into 90% for the training and from the contracting path through the skip connections to better learn
10% for the validation data which derives to a total of 9821 slices for representations. The bottleneck, consisting of two 3x3 convolutions, en-
the training set and a total of 1092 slices for the validation set be- ables propagating features from the contracting path to the expansive
fore feeding the proposed model. For the MSSEG2016 dataset, a total path. The standard U-Net architecture consists of four downsampling
of 5414 2D slices from 15 3D MR images in the provided training set and four upsampling blocks with skip connections added between each
were obtained. Afterward, the data division for training and validation corresponding block. Each block of U-Net is comprised of two 3x3 con-
sets was performed similarly to the ISBI2015 dataset, resulting in a to- volutions each followed by a rectified linear unit (ReLU) activation.
tal of 4872 slices for the training set and a total of 542 slices for the Besides, a 2x2 max pooling operation is applied for each downsampling
validation set. block while a 2x2 convolution is applied for each upsampling block to
Data augmentation is a process of artificially increasing the amount change the spatial dimensions of the input feature map. At the final
of data by performing random realistic transformations to ensure mak- layer of U-Net, each pixel value is predicted in the range of 0 to 1 by
4
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
employing a 1x1 convolution followed by a sigmoid activation func- image segmentation tasks across different domains, including medical
tion [37]. To improve the performance of MS lesion segmentation, the image analysis [34,5]. DSC is employed to measure the overall segmen-
U-Net architecture was modified by batch normalization (BN), spatial tation accuracy between the predicted segmentation and the ground
dropout (SD), an exponential linear unit (ELU) for the activation func- truth, and is formulated as follows:
tion, strided convolutions for the pooling operations, and transposed
2𝑇 𝑃
convolutions for the upsampling operations. BN standardizes the input 𝐷𝑆𝐶 = (2)
2𝑇 𝑃 + 𝐹 𝑃 + 𝐹 𝑁
to a layer for each mini-batch, enhances convergence, and reduces over-
where 𝑇 𝑃 , 𝐹 𝑃 , and 𝐹 𝑁 indicate the true positive, false positive, and
fitting. The ELU activation used in this study is given in equation (1) as
false negative voxels, respectively.
follows [15]:
{ To compare our model with other models used in the literature,
𝑥 if 𝑥 > 0 LTPR and LFPR are also exploited for all experiments. LTPR is expected
𝐸𝐿𝑈 = 𝑓 (𝑥) = (1)
𝛼(exp(𝑥) − 1), if 𝑥 ≤ 0 to be a higher percentage while LFPR is expected to be a lower percent-
age, and these metrics are calculated as follows:
where 𝛼 is a positive scale factor to control negative 𝑥 values.
𝐿𝑇 𝑃 𝐿𝐹 𝑃
In addition to these modifications, the 3x3 convolution in each block 𝐿𝑇 𝑃 𝑅 = 𝐿𝐹 𝑃 𝑅 = (3)
𝑅𝐿 𝑃𝐿
was replaced by a residual block that handles performance and degrada-
tion problems [20]. A full pre-activation residual unit was implemented where 𝐿𝑇 𝑃 is the number of lesions in the ground truth that overlap
according to He et al. [21], and each residual block in each block was with a lesion in the predicted segmentation. 𝑅𝐿 denotes the total num-
densely connected as presented in Fig. 3. Furthermore, the residual ber of lesions in the ground truth. 𝐿𝐹 𝑃 is the number of lesions in the
block was modified with ELU and the ECA module. Attention gates, predicted segmentation that do not overlap with a lesion in the ground
which guide the model’s attention to salient features while ignoring ir- truth reference mask. 𝑃 𝐿 indicates the total number of lesions in the
relevant areas in the given feature map [35], were modified by adding predicted segmentation.
ELU and BN. To suppress feature activations in irrelevant regions of the The ISBI Challenge website also provides additional metrics for per-
image, AGs are implemented at skip connections before concatenating formance evaluation, such as PPV and VD [12]. The PPV and VD metrics
features passed through skip connections and upsampling operations can be formulated as follows:
as shown in Fig. 3. In addition, the ECA module proposed by Wang 𝑇𝑃 |𝑇 𝑃𝑠 − 𝑇 𝑃𝑔𝑡 |
𝑃𝑃𝑉 = 𝑉𝐷= (4)
et al. [48] was incorporated into the proposed model as given in detail 𝑇𝑃 + 𝐹𝑃 𝑇 𝑃𝑔𝑡
in Fig. 4(a). This module was appended to the end of each downsam-
where 𝑇 𝑃𝑠 and 𝑇 𝑃𝑔𝑡 indicate the number of segmented voxels in the
pling block and each residual block as shown in Fig. 3. This module
automatic segmentation output and the ground truth, respectively.
produces channel weights based on the features aggregated from global
The ISBI challenge exploits some of these metrics in the ISBI score
average pooling (GAP) by applying a fast 1D convolution operation of
calculation. The total score used in the official ISBI website is calculated
size k, which is adaptively determined by channel dimension. Later,
as follows:
BN and a sigmoid activation function are performed, respectively. An-
other important component is the modified ASPP semantic segmen- 1 ∑ ( 𝐷𝑆𝐶 𝑃 𝑃 𝑉 1 − 𝐿𝐹 𝑃 𝑅 𝐿𝑇 𝑃 𝑅 𝐶𝑜𝑟
)
𝑆𝑐𝑜𝑟𝑒 = . + + + + (5)
tation module. The bottleneck of U-Net was replaced by this module |𝑅|.|𝑆| 𝑅,𝑆 8 8 4 4 4
which resamples a given feature layer at multiple rates before the con-
volution operation [13]. The ASPP has demonstrated promising results where 𝑆 is all test subjects (14 test subjects consisting of 61 3D scans),
on numerous segmentation tasks by providing multi-scale information. 𝑅 is all raters (rater 1 and rater 2), and 𝐶𝑜𝑟 is Pearson’s correlation co-
Hence, ASPP was exploited to acquire useful multi-scale information for efficient of the volumes. The inter-rater score was computed as a score
the MS lesion segmentation task. Fig. 4(b) shows the implemented ASSP of 90, which means a score of 90 or higher can be considered a compa-
component in this study. This component is comprised of four 3x3 con- rable score to a human rater [12].
volutional layers with dilation rates of 1, 6, 12, and 18 each followed by In addition to these metrics, the F1 score, sensitivity, and specificity
BN. Later, the extracted feature maps from each convolutional layer are were employed to evaluate the segmentation performance of the pro-
added, and a 1x1 convolution, BN having L2 regularization of gamma posed model on the MSSEG2016 dataset. The F1 score is a detection
and beta, and ELU are performed, respectively. and lesion-wise metric that focuses on the number of lesions correctly
recognized without taking into account the precision of the contours of
2.4. Metrics the detected lesions. Lesion sensitivity (𝑆) and lesion positive predic-
tive (𝑃 ) are exploited to calculate the F1 score defined in equation (6)
The ISBI2015 and MSSEG2016 use several evaluation metrics to according to Commowick et al. [16]:
assess segmentation quality. Accordingly, common metrics in both chal- 𝑇 𝑃𝐺 𝑇 𝑃𝐴 2𝑆𝑃
𝑆= 𝑃= 𝐹1 = (6)
lenges are Dice and PPV scores. In addition to these metrics, LTPR, 𝑀 𝑁 𝑆 +𝑃
lesion-wise false positive rate (LFPR), and absolute volume difference where 𝑀 , 𝑁 , 𝑇 𝑃𝐺 , and 𝑇 𝑃𝐴 indicate the number of lesions in the
(VD) metrics are provided for the ISBI2015 while the F1 score, sensitiv- ground truth, in the automatic segmentation, overlapped in the ground
ity, and specificity metrics are provided for the MSSEG2016. The results truth, and overlapped in the automatic segmentation, respectively.
of the ISBI2015 can be obtained via the online website by submitting Sensitivity (𝑆𝑒 ) and Specificity (𝑆𝑝 ) are defined as the voxel-based
predicted 3D binary masks. However, to evaluate the model perfor- overlap of the automatic segmentation (𝐴) and the ground truth (𝐺).
mance on the MSSEG2016 testing set, the segmentation performance These two metrics are computed as follows:
analyzer tool available in Anima (animaSegPerfAnalyzer)3 is used. To
𝐴∩𝐺 𝐵−𝐴∩𝐺
obtain evaluation metric results, each final 3D binary mask obtained 𝑆𝑒 = 𝑆𝑝 = (7)
from a majority voting method and its corresponding ground truth pro- 𝐺 𝐵−𝐺
vided by the MSSEG2016 challenge organizers are given as input to the where 𝐵 denotes the entire image.
animaSegPerfAnalyzer tool [16].
The Dice Similarity Coefficient (DSC) is the most commonly used 2.5. Implementation details
metric for validating MS lesion segmentation. It is generally exploited in
The ISBI2015 and MSSEG2016 training sets are composed of 21 and
15 3D MR images, respectively. FLAIR, T1-w, and T2-w sequences are
3
Anima scripts: RRID SCR_017072 https://anima.irisa.fr/. also provided for both datasets. To prepare training input data, 2D slices
5
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Fig. 3. The dense residual U-Net with AG, ECA, and ASPP using multi-planar 2D slices of FLAIR, T1-w, and T2-w sequences. The number of filters starts from 32 and
increases two times up to 512. In the residual block, 𝑥𝑙 and 𝑥𝑙+1 denote the input and output of the 𝑙 − 𝑡ℎ unit. In the attention gate, 𝑥𝑙 , 𝑔, and 𝛼 denote the input
feature map of the layer 𝑙, the gating signal providing contextual information, and the attention coefficient used to scale the 𝑥𝑙 , respectively.
Fig. 4. a) Diagram of the ECA module. GAP is used to obtain the aggregated features from the given input. In ECA, a fast 1D convolution of size k is performed
to obtain channel weights, where a mapping of the channel dimension C is used to adaptively calculate k [48]. b) Diagram of the ASPP module. From given the
features, ASPP performs multi-rate convolution operations to obtain multi-scale information.
were extracted from these 3D MR images for each sequence according function, the addition of focal and Dice losses [42,28], was exploited as
to the three orthogonal directions. Then, a three-channel input feature the training loss function to handle the class imbalance problem. The
map was generated by leveraging each corresponding 2D slice obtained final loss function is calculated in equation (8).
from plane orientations of the three sequences as discussed previously.
( )
Afterward, the obtained total 2D slices were divided into the training 2 𝑔𝑡 𝑝𝑡 + 1
𝐿(𝑔𝑡, 𝑝𝑟) = 1 −
and validation sets with a ratio of 90 to 10 for each dataset, respec- 𝑔𝑡 + 𝑝𝑡 + 1
tively. The proposed model was implemented in the Python language4
using Keras5 running on top of TensorFlow6 [14,1]. All experiments + 1 ∗ (−𝑔𝑡𝛼(1 − 𝑝𝑟)𝛾 log(𝑝𝑟) − (1 − 𝑔𝑡)𝑝𝑟𝛾 log(1−𝑝𝑟)) (8)
were conducted on Google Colab Pro, which provides an NVIDIA Tesla where 𝑔𝑡 and 𝑝𝑟 indicate the ground truth and the prediction segmenta-
P100 GPU with 16GB memory [9]. The models were trained using the tion, respectively. The values of 𝛼 and 𝛾 are 0.25 and 2.0, respectively.
Adam optimizer [25] with an initial learning rate of 1e-4 (adjusting The training dataset was divided into training and validation sets to ad-
with patience = 10, factor = 0.1, cooldown = 10, and min_lr = 1e-5 just the network weights and early stopping criterion. The best model
during the training), and a batch size of 8 over 300 epochs. BN was
was selected based on the validation loss, and the weights of the model
implemented by gamma and beta regularizers with L2 (1e-4). Class im-
were saved at the end of the epoch where the validation loss was at its
balance is a common problem in MS lesion segmentation, as lesions only
minimum during the training. Moreover, early stopping, used to avoid
constitute a minority of the MRI volume [50]. Therefore, a hybrid loss
overfitting, was exploited to enable the training to be automatically
stopped when the validation loss stopped improving for 50 epochs. To
4
https://www.python.org/. yield the final 3D segmentation output, the predictions of 3D binary
5 https://keras.io/. output generated from 2D slices of each plane orientation were fused
6
https://www.tensorflow.org/. using a majority voting method as shown in Fig. 2.
6
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Table 1
Ablation study results with different variants of our model trained on the ISBI2015 training set and tested on the ISBI2015
testing set. To be ranked first, the ISBI score, Dice score, PPV, and LTPR are expected to have high numerical values, while
the LFPR and VD are expected to have low numerical values.
(A) Dense Residual U-Net with AG, ECA, and ASPP 92.75 0.6688 0.8650 0.6064 0.2617 0.3882
(B) Dense Residual U-Net with AG and ECA 92.63 0.6653 0.8608 0.6132 0.2810 0.3845
(C) Dense Residual U-Net with AG 92.52 0.6653 0.8517 0.6153 0.2924 0.3847
(D) Dense Residual U-Net 92.54 0.6629 0.8569 0.6174 0.2919 0.3874
(E) Dense U-Net 92.41 0.6652 0.8485 0.6092 0.2983 0.3825
(F) U-Net 92.31 0.6613 0.8434 0.6080 0.3047 0.3850
1
This table is sorted in descending order of the ISBI score.
2
The bold value indicates the best score among the generated models.
3
Each model will be denoted by letters between A-F in future analyses.
Fig. 5. Evaluation metric results on the ISBI2015 testing set for each variant of the proposed model. (For interpretation of the colors in the figure(s), the reader is
referred to the web version of this article.)
7
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Fig. 6. The boxplot of Dice and PPV scores for each patient across raters on the ISBI2015 testing set for the proposed model and its variants. Asterisks indicate
statistical significance (* p <= 0.05, ** p <= 0.01, *** p <= 0.001, and **** p <= 0.0001) when using a Wilcoxon test compared to the proposed model
(denoted as A).
Fig. 7. The segmentation results of each variant of the proposed model on the ISBI2015 testing set. A slice of axial, sagittal, and coronal views of FLAIR from
timepoint 1 of test subject 9.
of 0.3653 was obtained by Maier and Handels [33] while our method was obtained with a mean of 0.7315 on the Center01 data. In terms of
achieved the second-best VD of 0.3882. In terms of the ISBI score, the specificity, the proposed model achieved a similar score for all centers
best and second-best performances in the overall scores which are 93.32 around 0.9998.
and 93.21 were achieved by Zhang et al. [53] and Zhang et al. [52], re- Table 4 summarizes the comparison of the proposed model and the
spectively.
experts on the MSSEG2016 testing dataset. The proposed model per-
formed similarly to the segmentation output of experts and even better
3.3. MSSEG2016 dataset comparison
than the results of some experts. As such, the average Dice score of the
Table 3 summarizes the performance of the proposed model on the proposed model was obtained as 0.6727 where expert3, expert6, and
MSSEG2016 testing dataset for each center. This dataset was gathered expert7 produced a mean Dice score of 0.6724, 0.6717, and 0.6690,
from four different centers to evaluate the generalization and robust- respectively. In terms of PPV, sensitivity, and specificity, our model
ness of the model for unseen centers. Hence, the training set contains produced a mean score of 0.6519, 0.7440, and 0.9997, respectively.
data from three centers, excluding the center03. Although the testing According to the results of expert3, expert6, and expert7, their sensi-
data consists of 38 patients, one of case is an outlier due to no visible le-
tivity scores were obtained as a mean of 0.7206, 0.6136, and 0.6867,
sions delineated by five experts out of seven. Therefore, this outlier case
respectively. In terms of specificity, the proposed model and experts
(Center07-Patient08) was excluded while measuring all metrics used in
this study for experts and our model. The proposed model produced achieved a similar score. Fig. 8 presents the qualitative results of our
the highest Dice score on the testing data of center01. In terms of F1 proposed method for the MSSEG2016 testing set. Accordingly, the pro-
score and sensitivity, the proposed model achieved the best scores of posed method achieves good performance in detecting MS lesions. Ad-
0.6556 and 0.7793 on the Center07 data, respectively. The best PPV ditionally, it is observed that the proposed model can detect more MS
8
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Table 2
Comparison of the result of this study with other state-of-the-art methods published results for the ISBI2015
dataset. To be ranked first and second, the ISBI score, Dice score, PPV, and LTPR are expected to have high
numerical values, while the LFPR and VD are expected to have low numerical values.
Table 3
Results comparison of the proposed model for each center on the MSSEG2016 test set. To
be ranked first, all metrics are expected to have high numerical values.
Table 4 testing set achieved a mean Dice score of 0.4819, a mean PPV of 0.9450,
Comparison of the results of this study with the manual delineations of experts and a mean sensitivity of 0.3540, respectively. On the other hand, the
for the MSSEG2016 testing set. All metrics are expected to have high numerical prediction results of the ISBI2015 on the MSSEG2016 testing set ob-
values to be considered the best and second-best scores. tained a mean Dice score of 0.6031, a mean PPV of 0.7011, and a mean
Model Dice Score F1 Score PPV Sensitivity Specificity sensitivity of 0.5797, respectively.
Expert5 0.7819 0.8928 0.7359 0.8518 0.9997
Additionally, we compared the segmentation results of the
Expert4 0.7590 0.8619 0.6837 0.8677 0.9996 MSSEG2016 on the ISBI2015 testing set with other methods in the liter-
Expert1 0.7428 0.8509 0.7046 0.8090 0.9997 ature. The results were obtained from the challenge website to make a
Expert2 0.6961 0.8141 0.5912 0.8736 0.9994 fair comparison with the others. Table 6 shows the numerical details of
Our Model 0.6727 0.5883 0.6519 0.7440 0.9997
the results compared with the previously proposed two methods. The
Expert3 0.6724 0.6782 0.6575 0.7206 0.9997
Expert6 0.6717 0.7544 0.8051 0.6136 0.9998
results of these two methods were obtained from the study of Kam-
Expert7 0.6690 0.6561 0.6980 0.6867 0.9997 raoui et al. [23]. For our model, the evaluation metric results of ISBI
1 The results of the experts can be accessed at https://doi.org/10.5281/zenodo.
score, PPV, LTPR, LFPR, and Cor were obtained 91.84, 0.4819, 0.9450,
0.1604, and 0.8398, respectively. As a result, the proposed model ob-
1307653.
2
This table is sorted in descending order of the Dice score. tained the highest scores in terms of the ISBI score, PPV, LTPR, and
3
Bold and underlined values indicate the best and second-best scores among Cor. As for Dice score and LFPR, the highest Dice score of 0.5350 was
the proposed method and the experts, respectively. achieved by Kamraoui et al. [23] while the best LFPR of 0.0750 was
obtained by Zhang et al. [52].
lesions in the data of all centers compared to the outputs of some ex-
perts. 4. Discussion
Fig. 9 demonstrates the barplot of Dice score, PPV, sensitivity, and
specificity for our model and the experts who delineated the structures, In this study, an automated framework for MS lesion segmentation
comparing visually as well. In addition, Fig. 10 shows the boxplot of the was designed using 3D MRI data of FLAIR, T1-2, and T2-w sequences.
Dice score, F1, PPV, and sensitivity of each patient on the testing set for The proposed model within this framework was developed by modify-
experts and our model, and also shows the statistical significance test. ing the U-Net architecture. First, the network was modified by adding
BN, SD, ELU, strided convolutions for the pooling operations, and trans-
3.4. Cross-dataset comparison posed convolutions for the upsampling operations. Then, dense connec-
tions, residual blocks, AG, ECA, and ASSP were incorporated to improve
The cross-dataset robustness and generalization ability of our pro- overall MS lesion segmentation performance measured by several com-
posed approach were assessed using the segmentation outputs produced mon metrics, such as the Dice score and LTPR. This model achieved
by the proposed model on different datasets. First, we trained the advantages through the modification of the encoder-decoder network
ISBI2015 dataset and tested it on the MSSEG2016. Then, we trained together with modified components, such as replacing the bottleneck of
the MSSEG2016 dataset and tested it on the ISBI2015 via the online the network with atrous convolutions with different dilated rates to ex-
challenge website by submitting 3D binary output masks. Table 5 sum- tract multi-scale contextual information from the given feature map. In
marizes the performance of each dataset in terms of Dice score, PPV, the ablation study (see Table 1 and Fig. 5), the proposed model, which
and sensitivity, which are common in both dataset evaluation metrics. contains all components, outperformed its variants. Additionally, this
Accordingly, the prediction results of the MSSEG2016 on the ISBI2015 model outperformed other proposed methods in the literature in terms
9
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Fig. 8. The segmentation results of the proposed model on the MSSEG2016 testing set compared to the manual delineation of experts and their consensus ground
truth. A slice of axial, sagittal, and coronal views of FLAIR from each center is demonstrated.
Fig. 9. Evaluation metric results on the MSSEG2016 testing dataset for our model and experts. The figure is sorted in descending order of the Dice score.
Table 5
Results comparison of cross-dataset, namely, training on the ISBI2015 and testing on the MSSEG2016 or
vice versa. Dice score, PPV, and sensitivity are expected to have high numerical values.
Table 6
Results comparison with other methods when trained on the MSSEG2016 dataset and tested on the ISBI2015
testing set. To be ranked first and second, the ISBI score, Dice score, PPV, LTPR, and Cor are expected to
have high numerical values, while the LFPR is expected to have low numerical values.
10
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
Fig. 10. The boxplot of Dice score, F1 score, PPV, and sensitivity evaluation metrics of each patient for experts and the proposed model on the MSSEG2016 testing
set. Asterisks and ns indicate statistical significance (* p <= 0.05, ** p <= 0.01, *** p <= 0.001, **** p <= 0.0001, and ns (non-significance)) when using a
Wilcoxon test compared to our model.
of Dice score and LTPR according to the results obtained from ISBI2015 MSSEG2016 testing set have shown that the proposed model outper-
as given in Table 2. Moreover, this model outperformed the results of formed the manual delineation of some experts (see Fig. 9). According
some experts for the MSSEG2016 testing data, especially for the Dice to the results of the MSSEG2016 testing set, our model improved the
score, PPV, sensitivity, and specificity as presented in Table 4. Dice score by 0.5%, the PPV score by 10.26%, and the sensitivity score
Although a lower number of modalities might be sufficient to eval- by 21.25% compared to the results of experts who achieved lower scores
uate MS lesions, automated MS lesion segmentation should consider among the evaluation metrics. Furthermore, Fig. 8 exhibits detected le-
using different modalities such as T1-w and T2-w to improve the seg- sions visually, as such the proposed model could be able to detect better
mentation performance. These modalities will provide additional infor- than the results of some experts, even on an unseen center. We ob-
mation, such as the location and shape of MS lesions. All MS bench- served that transferring weights obtained from the ISBI2015 training
mark datasets provide different modalities for better segmentation. Our set is an effective approach for training our model on the MSSEG2016
model achieved a competitive performance using three modalities com- training set. Indeed, this approach improved MS lesion segmentation
pared to the state-of-the-art methods published recently, as given in Ta- performance on the MSSEG2016 testing set, as given in Table 4. An-
ble 2, even on unseen datasets. According to the results of the ISBI2015 other observation was that MS lesion segmentation performance im-
testing set, the proposed model improved the Dice score by an aver- proved significantly when corresponding axial, sagittal, and coronal
age of 3.43% and the LTPR by an average of 13.77% compared to the slices, which were obtained from three modalities, were stacked into
second-best scores. Additionally, data augmentation strategies are real- the input channel, respectively. Although Aslani et al. [4] emphasized
ized to make a robust model, improve prediction accuracy, and reduce that stacking corresponding modalities together into the three-channel
overfitting during the training. Therefore, we applied data augmenta- input is not an optimal solution, the proposed model and stacking cor-
tion methods as discussed previously rather than using simple strategies responding modalities into the input channel dimension enhanced the
(rotating and flipping) to improve the performance, generalization, and segmentation performance compared with their results presented in Ta-
robustness of the proposed model. ble 2. This network exploited the contextual information in all plane
Our work improved the MS lesion segmentation performance due to directions and obtained useful features of the location and shape of the
the use of different components in the encoder-decoder network. Our lesions with channel-wise stacking.
modifications to the U-Net and other components, which were used to The Dice score between two raters who delineated the same struc-
build the proposed model, obtained competitive performance in most tures is relatively low compared to automated methods evaluated on
of the evaluation metrics as presented in Table 1, 2, and 4. Besides, a the ISBI2015 testing set. The average Dice score across raters is approx-
whole-brain slice-based approach was exploited as patch-based CNNs imately 73%, and an ISBI score of 90 or higher can be considered a
suffer from spatial information about MS lesions due to patch size lim- comparable score to the human rater mentioned in Carass et al. [12].
itations [4,53]. The results indicated that our approach using whole According to the results obtained from the official ISBI test set, our pro-
brain slices achieved competitive performance for most measures, such posed model and its variants produced a score over 90, as presented
as Dice score and LTPR, compared to other methods and experts, as in Table 1. Moreover, our model and its variants achieved a competi-
presented in Table 2 and 4. Especially, the results obtained from the tive segmentation performance compared to the results across raters for
11
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
other metrics, such as a mean Dice score of 0.6688, a mean LTPR of Finally, the ISBI score of 92.75 is considered comparable to a human
0.6064, and a mean LFPR of 0.2617, as presented in Table 1. Indeed, rater, as discussed previously. As such, our proposed model can be used
rater#1 produced a mean LTPR of 0.6450 and a mean LFPR of 0.1740 for different segmentation problems in image analysis. In future stud-
on the ground truth of rater#2, while rater#2 produced a mean LTPR ies, different large datasets will be investigated and used to improve the
of 0.8260 and a mean LFPR of 0.3550 on the ground truth of rater#1 segmentation, generalization ability, and robustness of the deep learn-
[12]. ing models and proposed model, although it is challenging to obtain
MS appearance can vary significantly based on the manufacturer such datasets.
and imaging protocol. Aggregating additional MRI data would gener-
alize the DL models for better segmentation outputs in clinical setups CRediT authorship contribution statement
since DL models would be better trained with a large number of pa-
tients’ data [32]. Although the segmentation of a 3D scan requires Beytullah Sarica: Conceptualization, Methodology, Data acquisi-
several steps, such as extracting slices as 2D, processing these slices in- tion, Software, Experiments, Analysis and interpretation of the results,
dividually, and reconstructing a 3D binary output mask, 2D CNNs using Writing - original draft, and Edition.
3D MRI data are still achieving state-of-the-art results compared to 3D- Dursun Zafer Seker: Conceptualization, Methodology, Analysis and
based CNN methods. According to the results of this study, it is observed interpretation of the results, Writing - original draft, and Edition.
that processing 3D MRI data by converting it into 2D slices still outper- Bulent Bayram: Conceptualization, Revising - original draft, and
formed others, especially in terms of Dice score and LTPR [24,53,23]. Edition.
According to Altay et al. [2], when clinical raters with different levels
of experience assessed MS lesions on an MS dataset for no more than
Declaration of competing interest
10 minutes per a study, notable variability in counting lesions manually
was observed among these raters. Hence, training the proposed model
with a larger dataset would reduce the variability of experts, and even The authors declare that they have no known competing financial
decrease the lesion detection time and effort since processing a 3D scan interests or personal relationships that could have appeared to influence
takes around 35 seconds on a mid-range GPU in our study. This is faster the work reported in this paper.
than the approach of Aslani et al. [4], which segments the input im-
age in approximately 90 seconds. Overall, this study has provided clues References
about recent techniques regarding accuracy that can be used to guide
future research about MS lesion segmentation using multiple datasets. [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A.
Davis, J. Dean, M. Devin, et al., Tensorflow: large-scale machine learning on hetero-
geneous distributed systems, preprint, arXiv:1603.04467, 2016.
5. Conclusions [2] E.E. Altay, E. Fisher, S.E. Jones, C. Hara-Cleaver, J.C. Lee, R.A. Rudick, Reliability
of classifying multiple sclerosis disease activity using magnetic resonance imaging
In this study, we proposed a novel dense residual U-Net that com- in a multiple sclerosis clinic, JAMA Neurol. 70 (2013) 338–344.
bines modified AG, ECA, and ASPP modules to improve the segmen- [3] S. Aslani, M. Dayan, V. Murino, D. Sona, Deep 2d encoder-decoder convolutional
neural network for multiple sclerosis lesion segmentation in brain mri, in: Interna-
tation of MS lesions. Two publicly available datasets, namely, the
tional MICCAI Brainlesion Workshop, Springer, 2018, pp. 132–141.
ISBI2015 and MSSEG2016, were employed to validate the proposed [4] S. Aslani, M. Dayan, L. Storelli, M. Filippi, V. Murino, M.A. Rocca, D. Sona, Multi-
model’s segmentation performance, generalization ability, and robust- branch convolutional neural network for multiple sclerosis lesion segmentation,
ness. 2D slices of axial, sagittal, and coronal views extracted from 3D NeuroImage 196 (2019) 1–15.
volumetric scans of FLAIR, T1-w, and T2-w sequences were exploited [5] H.L. Aung, B. Uzkent, M. Burke, D. Lobell, S. Ermon, Farm parcel delineation using
spatio-temporal convolutional networks, in: Proceedings of the IEEE/CVF Confer-
jointly to obtain contextual information in all directions and comple- ence on Computer Vision and Pattern Recognition Workshops, 2020, pp. 76–77.
mentary information related to MS lesions. 2D slices extracted from [6] C. Baecher-Allan, B.J. Kaskow, H.L. Weiner, Multiple sclerosis: mechanisms and im-
the corresponding orientation of each modality were stacked to form a munotherapy, Neuron 97 (2018) 742–768.
3-channel input feature map. Then, all the 2D stacked slices were ag- [7] B.D. Basaran, P.M. Matthews, W. Bai, New lesion segmentation for multiple sclero-
gregated to form the training input data. Additionally, for the ISBI2015 sis brain images with imaging and lesion-aware augmentation, Front. Neurosci. 16
(2022).
dataset, the manual delineations of two raters were concatenated to
[8] A. Birenbaum, H. Greenspan, Longitudinal multiple sclerosis lesion segmentation us-
form a single training set, which enables an increase in the training sam- ing multi-view convolutional neural networks, in: Deep Learning and Data Labeling
ples. Data augmentation methods of the Albumentations library were for Medical Applications, Springer, 2016, pp. 58–67.
employed to make a robust model, improve prediction accuracy, and [9] E. Bisong, Building Machine Learning and Deep Learning Models on Google Cloud
reduce overfitting during the training. While manual delineation of MS Platform: A Comprehensive Guide for Beginners, Apress, 2019.
[10] T. Brosch, L.Y. Tang, Y. Yoo, D.K. Li, A. Traboulsee, R. Tam, Deep 3d convolutional
lesions is time-consuming, costly, and subject to variability across ex- encoder networks with shortcuts for multiscale feature integration applied to multi-
perts, DL methods, which learn features from their input data during the ple sclerosis lesion segmentation, IEEE Trans. Med. Imaging 35 (2016) 1229–1239.
training period, can automatically assist in MS lesion segmentation and [11] A. Buslaev, V.I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, A.A. Kalinin,
detection, reducing cost and variability. Thus, CNN-based deep learning Albumentations: fast and flexible image augmentations, Information 11 (2020) 125.
methods were used in this study for accurate automatic MS lesion seg- [12] A. Carass, S. Roy, A. Jog, J.L. Cuzzocreo, E. Magrath, A. Gherman, J. Button, J.
Nguyen, F. Prados, C.H. Sudre, et al., Longitudinal multiple sclerosis lesion segmen-
mentation. Furthermore, whole-brain slice-based segmentation of MS tation: resource and challenge, NeuroImage 148 (2017) 77–102.
lesions gave promising results according to most of the metrics in all [13] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, Deeplab: semantic
experiments and comparisons. The proposed model was also compared image segmentation with deep convolutional nets, atrous convolution, and fully
with some other proposed methods that employed these two datasets in connected crfs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2017) 834–848.
their research. The results of the MSSEG2016 testing set were used to [14] F. Chollet, Keras, https://github.com/fchollet/keras, 2015.
[15] D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning
compare the proposed model’s segmentation performance, generaliza- by exponential linear units (elus), preprint, arXiv:1511.07289, 2015.
tion, and robustness according to the results of experts and each center. [16] O. Commowick, A. Istace, M. Kain, B. Laurent, F. Leray, M. Simon, S.C. Pop, P.
Additionally, the results of the ISBI2015 obtained from the official test Girard, R. Ameli, J.C. Ferré, et al., Objective evaluation of multiple sclerosis lesion
set via the online website were exploited to make a fair comparison segmentation using a data management and processing infrastructure, Sci. Rep. 8
(2018) 1–17.
with other methods. The results showed promising outputs in most
[17] O. Commowick, M. Kain, R. Casey, R. Ameli, J.C. Ferré, A. Kerbrat, T. Tourdias, F.
of the metrics when compared to the evaluation metric results of the Cervenansky, S. Camarasu-Pop, T. Glatard, et al., Multiple sclerosis lesions segmen-
inter-rater and scores from other methods. We have also presented a tation from multiple experts: the miccai 2016 challenge dataset, NeuroImage 244
quantitative evaluation comparing the consistency of the two datasets. (2021) 118589.
12
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
B. Sarica, D.Z. Seker and B. Bayram International Journal of Medical Informatics 170 (2023) 104965
[18] S.R. Hashemi, S.S.M. Salehi, D. Erdogmus, S.P. Prabhu, S.K. Warfield, A. Gholipour, [38] S. Roy, J.A. Butman, D.S. Reich, P.A. Calabresi, D.L. Pham, Multiple sclerosis le-
Asymmetric loss functions and deep densely-connected networks for highly- sion segmentation from brain mri via fully convolutional neural networks, preprint,
imbalanced medical image segmentation: application to multiple sclerosis lesion arXiv:1803.09172, 2018.
detection, IEEE Access 7 (2018) 1721–1735. [39] B. Sarica, D.Z. Seker, New ms lesion segmentation with deep residual attention gate
[19] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.M. u-net utilizing 2d slices of 3d mr images, Front. Neurosci. 16 (2022).
Jodoin, H. Larochelle, Brain tumor segmentation with deep neural networks, Med. [40] S.M. Smith, Fast robust automated brain extraction, Hum. Brain Mapp. 17 (2002)
Image Anal. 35 (2017) 18–31. 143–155.
[20] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: [41] L. Steinman, Multiple sclerosis: a coordinated immunological attack against myelin
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, in the central nervous system, Cell 85 (1996) 299–302.
2016, pp. 770–778. [42] C.H. Sudre, W. Li, T. Vercauteren, S. Ourselin, M. Jorge Cardoso, Generalised dice
[21] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: overlap as a deep learning loss function for highly unbalanced segmentations, in:
European Conference on Computer Vision, Springer, 2016, pp. 630–645. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical De-
[22] S. Hitziger, W.X. Ling, T. Fritz, T. D’Albis, A. Lemke, J. Grilo, Triplanar u-net with cision Support, Springer, 2017, pp. 240–248.
lesion-wise voting for the segmentation of new lesions on longitudinal mri studies, [43] E.M. Sweeney, R.T. Shinohara, N. Shiee, F.J. Mateen, A.A. Chudgar, J.L. Cuzzocreo,
Front. Neurosci. 16 (2022). P.A. Calabresi, D.L. Pham, D.S. Reich, C.M. Crainiceanu, Oasis is automated sta-
[23] R.A. Kamraoui, V.T. Ta, T. Tourdias, B. Mansencal, J.V. Manjon, P. Coupé, Deeple- tistical inference for segmentation, with applications to multiple sclerosis lesion
sionbrain: towards a broader deep-learning generalization for multiple sclerosis segmentation in mri, NeuroImage Clin. 2 (2013) 402–413.
lesion segmentation, Med. Image Anal. 76 (2022) 102312. [44] G. Tetteh, V. Efremov, N.D. Forkert, M. Schneider, J. Kirschke, B. Weber, C. Zim-
[24] G. Kang, B. Hou, Y. Ma, F. Labeau, Z. Su, et al., Acu-net: a 3d attention context mer, M. Piraud, B.H. Menze, Deepvesselnet: Vessel segmentation, centerline predic-
u-net for multiple sclerosis lesion segmentation, in: ICASSP 2020-2020 IEEE Inter- tion, and bifurcation detection in 3-d angiographic volumes, Front. Neurosci. 1285
national Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2020).
2020, pp. 1384–1388. [45] K.L. Tseng, Y.L. Lin, W. Hsu, C.Y. Huang, Joint sequence learning and cross-modality
[25] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, preprint, arXiv: convolution for 3d biomedical segmentation, in: Proceedings of the IEEE Conference
1412.6980, 2014. on Computer Vision and Pattern Recognition, 2017, pp. 6393–6400.
[26] J. Kleesiek, G. Urban, A. Hubert, D. Schwarz, K. Maier-Hein, M. Bendszus, A. Biller, [46] S. Valverde, M. Cabezas, E. Roura, S. González-Villà, D. Pareto, J.C. Vilanova,
Deep mri brain extraction: a 3d convolutional neural network for skull stripping, L. Ramió-Torrentà, À. Rovira, A. Oliver, X. Lladó, Improving automated multiple
NeuroImage 129 (2016) 460–469. sclerosis lesion segmentation with a cascaded 3d convolutional neural network ap-
[27] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to docu- proach, NeuroImage 155 (2017) 159–168.
ment recognition, Proc. IEEE 86 (1998) 2278–2324. [47] S. Van der Walt, J.L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J.D. Warner, N.
[28] T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, Yager, E. Gouillart, T. Yu, scikit-image: image processing in python, PeerJ 2 (2014)
in: Proceedings of the IEEE International Conference on Computer Vision, 2017, e453.
pp. 2980–2988. [48] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, Eca-net: efficient channel attention for
[29] S. Liu, D. Zhang, Y. Song, H. Peng, W. Cai, Triple-crossing 2.5 d convolutional neural deep convolutional neural networks, in: 2020 IEEE/CVF Conference on Computer
network for detecting neuronal arbours in 3d microscopic images, in: International Vision and Pattern Recognition (CVPR), 2020, pp. 11531–11539.
Workshop on Machine Learning in Medical Imaging, Springer, 2017, pp. 185–193. [49] M. Weeda, I. Brouwer, M. de Vos, M. de Vries, F. Barkhof, P. Pouwels, H. Vrenken,
[30] X. Lladó, O. Ganiler, A. Oliver, R. Martí, J. Freixenet, L. Valls, J.C. Vilanova, L. Comparing lesion segmentation methods in multiple sclerosis: input from one man-
Ramió-Torrentà, À. Rovira, Automated detection of multiple sclerosis lesions in se- ually delineated subject is sufficient for accurate lesion segmentation, NeuroImage
rial brain mri, Neuroradiology 54 (2012) 787–807. Clin. 24 (2019) 102074.
[31] X. Lladó, A. Oliver, M. Cabezas, J. Freixenet, J.C. Vilanova, A. Quiles, L. Valls, L. [50] C. Zeng, L. Gu, Z. Liu, S. Zhao, Review of deep learning approaches for the seg-
Ramió-Torrentà, À. Rovira, Segmentation of multiple sclerosis lesions in brain mri: mentation of multiple sclerosis lesions on brain mri, Front. Neuroinform. 14 (2020)
a review of automated approaches, Inf. Sci. 186 (2012) 164–185. 610967.
[32] Y. Ma, C. Zhang, M. Cabezas, Y. Song, Z. Tang, D. Liu, W. Cai, M. Barnett, C. Wang, [51] H. Zhang, I. Oguz, Multiple sclerosis lesion segmentation-a survey of supervised
Multiple sclerosis lesion analysis in brain magnetic resonance images: techniques cnn-based methods, in: International MICCAI Brainlesion Workshop, Springer, 2020,
and clinical applications, IEEE J. Biomed. Health Inform. (2022). pp. 11–29.
[33] O. Maier, H. Handels, MS lesion segmentation in MRI with random forests, in: Pro- [52] H. Zhang, A.M. Valcarcel, R. Bakshi, R. Chu, F. Bagnato, R.T. Shinohara, K. Hett, I.
ceedings, 2015, pp. 1–2. Oguz, Multiple sclerosis lesion segmentation with tiramisu and 2.5 d stacked slices,
[34] F. Milletari, N. Navab, S.A. Ahmadi, V-net: fully convolutional neural networks for in: International Conference on Medical Image Computing and Computer-Assisted
volumetric medical image segmentation, in: 2016 Fourth International Conference Intervention, Springer, 2019, pp. 338–346.
on 3D Vision (3DV), IEEE, 2016, pp. 565–571. [53] H. Zhang, J. Zhang, C. Li, E.M. Sweeney, P. Spincemaille, T.D. Nguyen, S.A. Gau-
[35] O. Oktay, J. Schlemper, L.L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. thier, Y. Wang, M. Marcille, All-net: anatomical information lesion-wise loss func-
McDonagh, N.Y. Hammerla, B. Kainz, et al., Attention u-net: learning where to look tion integrated into neural network for multiple sclerosis lesion segmentation, Neu-
for the pancreas, preprint, arXiv:1804.03999, 2018. roImage Clin. 32 (2021) 102854.
[36] L.A. Rolak, Multiple sclerosis: it’s not the disease you thought it was, Clin. Med. Res.
1 (2003) 57–60.
[37] O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for biomedical
image segmentation, in: International Conference on Medical Image Computing and
Computer-Assisted Intervention, Springer, 2015, pp. 234–241.
13