Boosting Multiple Sclerosis Lesion Segmentation Through Attention
Boosting Multiple Sclerosis Lesion Segmentation Through Attention
ir
https://www.tarjomano.com https://www.tarjomano.com
Keywords: Magnetic resonance imaging is a fundamental tool to reach a diagnosis of multiple sclerosis and monitoring its
Multiple sclerosis (MS) progression. Although several attempts have been made to segment multiple sclerosis lesions using artificial
Magnetic resonance imaging (MRI) intelligence, fully automated analysis is not yet available. State-of-the-art methods rely on slight variations
Fully convolutional neural network
in segmentation architectures (e.g. U-Net, etc.). However, recent research has demonstrated how exploiting
Attention
temporal-aware features and attention mechanisms can provide a significant boost to traditional architectures.
Lesion segmentation
Medical image analysis
This paper proposes a framework that exploits an augmented U-Net architecture with a convolutional
In Silico Trials (IST) long short-term memory layer and attention mechanism which is able to segment and quantify multiple
sclerosis lesions detected in magnetic resonance images. Quantitative and qualitative evaluation on challenging
examples demonstrated how the method outperforms previous state-of-the-art approaches, reporting an overall
Dice score of 89% and also demonstrating robustness and generalization ability on never seen new test samples
of a new dedicated under construction dataset.
∗ Corresponding author.
E-mail address: [email protected] (A. Rondinella).
https://doi.org/10.1016/j.compbiomed.2023.107021
Received 19 January 2023; Received in revised form 11 April 2023; Accepted 5 May 2023
Available online 10 May 2023
0010-4825/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
2
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
3
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
Table 1
FC-DenseNet + SA + C-LSTM layers for FLAIR MS lesion segmentation: every row
shows the position, the number of times usages and how it is composed. DenseLayer
and conv_block are complex blocks, explained in the last two rows.
Position Layer Composed by # layers
Downsampling Conv2d – 1
DenseLayer
DenseLayer
Downsampling DenseBlock DenseLayer 5
DenseLayer
DenseLayer
AvgPool2d
conv_block
Fig. 3. Preprocessing performed on the FLAIR slices. First the entire black masks were Downsampling SqueezeAttentionBlock 5
conv_block
removed, then the edges were cropped; the resulting image from the preprocessing
Upsample
steps has dimensions 160 × 160.
BatchNorm2d
ReLU
Downsampling TransitionDown Conv2d 5
3.3. Proposed model Dropout2d
MaxPool2d
In recent years Medical Imaging researchers demonstrated how U- DenseLayer
Net and its customized architectures [10] provided effective results in DenseLayer
Bottleneck DenseBlock DenseLayer 1
various scenarios. The capacity of U-net to produce detailed segmenta-
DenseLayer
tion maps, using a very limited amount of data, makes it particularly DenseLayer
helpful; it assumes relevance in the context of medical imaging since
Bottleneck ConvLSTM Conv2d 1
access to large amounts of labeled data is very limited.
Upsampling TransitionUp ConvTranspose2d 5
The state-of-the-art employed U-Net in medical image segmentation
adding some customization to enhance the results; in our case squeeze DenseLayer
DenseLayer
and attention block [14], as reported in next sections, sensibly improve Upsampling DenseBlock DenseLayer 5
the results on MS lesion segmentation. Before introducing the overall DenseLayer
architecture, the main blocks added to our segmentation network will DenseLayer
be described below. AvgPool2d
conv_block
Squeeze-and-attention module Squeeze-and-attention (SA) modules Upsampling SqueezeAttentionBlock 5
conv_block
[14] attempt to emphasize channels that contain informative features Upsample
and suppress the non-informative ones. This module performs a re- Upsampling Conv2d – 1
weighting technique that pay attention locally and globally; locally Exit Softmax – 1
because the convolutional operations are performed in a small pixel
BatchNorm2d
neighborhood, while globally they selects which image feature maps to ReLU
focus on to perform segmentation. SA extends the feature recalibration DenseLayer
Conv2d
operations performed by the squeeze-and-excitation (SE) modules [42] Dropout2d
to not apply fully-squeezed operation to spatial information. Conv2d
BatchNorm2d
Convolutional LSTM Convolutional LSTM [15], combines the advan- ReLU
conv_block
tages of RNN and CNN architectures. It introduces convolutional layers Conv2d
in place of fully connected layers in an LSTM to enable more struc- BatchNorm2d
ReLU
ture in the recurrent levels. In medical image segmentation, spatial
information is essential to be able to reconstruct an entire area. For Total parameters = 13,242,782
this reason, a convolutional LSTM uses the convolution operator in
recurrent connections to learn the spatial features of adjacent images.
Fig. 4 shows the proposed MS Lesion segmentation architecture. It
The architecture consists of a downsampling path composed of a
is build starting from a Fully Convolutional Densely Network (known
convolutional layer and five sequences composed of: a dense block,
as Tiramisú network [11]) based on a modified U-Net structure [10].
a squeeze-attention block and transition down blocks. The upsam-
As mentioned before, compared to the conventional U-net architectures
pling path is symmetrical to the downsampling one and also each
for MS lesion segmentation [43], squeeze and attention blocks to both
couple upsampling/downsampling block is concatenated through Skip
the downsampling and upsampling path were added to emphasize the
connections.
more informative feature maps; also a unidirectional convolutional
LSTM [15] was inserted in the bottleneck, in order to catch spatial The proposed network consists of over 400 levels. As it is not
correlation of sequentially axial slices. These slices are processed in- possible to include a table listing all the levels, these are described using
dependently in the first part of the net (convolutional operations) the grouping in Table 1. As mentioned earlier, the network takes FLAIR
and combined in the network bottleneck to produce the final output. images as input because lesions are more visible in this modality, gener-
In particular, the segmentation result of the central slice is obtained ating the lesion segmentation mask as output. Within the architecture
providing to the network an input with the same slice and the two each slice is shown in grayscale, in which every pixel value contains
adjacent ones, (previous and next): it is easy to observe how lesions are only information on intensity. The group of three slices were passed
propagating over the space with a similar shape. We chose the number through a standard convolutional layer, which is needed to increase
of 3 slices since the MS lesion is more likely to be within this spatial the size of feature maps. Then they go through the downsampling path,
sequential and because in a greater sequence the lesion could lead not also called encoder, consisting of a sequence of dense blocks, squeeze-
only to considering lesions with different structure, but also increase attention blocks, and transition down blocks. In the downsampling
the training time. process, the spatial resolution of the images was gradually reduced and
4
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
Fig. 4. Architecture of the FLAIR MS lesion segmentation model. This model takes as input a sequence of 3 FLAIR slices and returns as output the corresponding segmentation
map.
the number feature maps were gradually increased. The advantage of 4. Experiments and results
applying squeeze-attention modules is to emphasize channels that con-
tain informative features and suppress all other non-informative ones. 4.1 Evaluation metrics
Specifically, it is demonstrated that the SA block introduces a pixel-
group attention mechanism with a convolutional attention channel, The evaluation of the model was done comparing the predicted seg-
which allows the network to selectively focus on the most significant mentation masks with the reference ones that, as mentioned previously,
groups of pixels in the input image, while excluding other groups. This were chosen by only one of the experts as ground truth.
is achieved through spatial attention, where neighboring pixels of the As evaluation metrics, Dice score, sensitivity, specificity, Extra Frac-
same class are grouped together and treated as a single unit during tion, Intersection Over Union (IOU), Positive Predictive Value (PPV)
processing, allowing for pixel-wise prediction [14]. In particular, in and Negative Predictive Value (NPV) were used. The Dice score [44] is
the case of multiple sclerosis lesion segmentation, it is important to defined in Eq. (1),
consider the relationships between pixels in a group, as lesions often
have a distinct shape and structure. In this path, the output feature 2 ∗ 𝑇𝑃
𝐷𝑆𝐶 = (1)
maps from each transition down level are concatenated with the output 2 ∗ 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
feature maps of each squeeze and attention level, and used as the input where TP, FP and FN denote the number of True Positive, False Positive
of the next level. The downsampling path is followed by the bottleneck, and False Negative pixels, respectively. Dice score is a metric used to
which is typically characterized by a sequence of levels that process the measure the similarity between two classes, widely used in medical
slices when they have the lowest possible spatial resolution. It consists image segmentation.
of a dense block and a unidirectional convolutional LSTM layer. A 2D The Sensitivity is defined in Eq. (2),
convolutional approach was chosen instead of 3D, as the dataset em- 𝑇𝑃
ployed had a limited number of samples available. By incorporating the 𝑆𝐸𝑁𝑆 = (2)
𝑇𝑃 + 𝐹𝑁
LSTM layer, the network can capture the spatial dependencies between
Sensitivity measures the number of positive voxel that are properly
adjacent slices, leading to better feature representations and improved
identified.
accuracy in the final output, thereby focusing on the sequentiality of
Specificity is defined in Eq. (3),
the scans instead of giving an entire scan per single step. By doing
so, the sequential task has many more samples than the 3D task. At 𝑇𝑁
𝑆𝑃 𝐸𝐶 = (3)
the end of the bottleneck there is the upsampling path, also called 𝑇𝑁 + 𝐹𝑃
the decoder, which is symmetrical to the downsampling path and is Specificity measures the number of negative voxel that are properly
useful for recovering the input spatial resolution that is lost during the identified.
previous path. The spatial resolution of the images is then gradually Extra Fraction (EF) is defined in Eq. (4),
increased and the number of feature maps is gradually reduced. The 𝐹𝑃
main characteristics of the upsampling path is the presence of skip 𝐸𝐹 = (4)
𝑇𝑁 + 𝐹𝑁
connections, which concatenate the future maps at the exit of each
Extra Fraction measures the number of voxels segmented that are not
Transition Up blocks with those that have the same resolution coming
in the reference segmentation.
from the downsampling path to create the input of the next layer. The
Intersection Over Union (IOU) is defined in Eq. (5),
skip connection were useful to recover spatially detailed information
lost during the downsampling path. At the end of the upsampling path, 𝑇𝑃
𝐼𝑂𝑈 = (5)
there is a convolutional layer and the softmax which encodes for each 𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃
pixel a probability for each possible class. Thus, the output of the model Intersection Over Union measures the number of voxels segmented that
is the segmentation mask of the central slice of the sequence. quantifies the degree of overlap between two region.
5
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
Fig. 5. Dice (first column) and Loss (second column) produced after 200 epochs by Fold1 (a, b), Fold2 (c, d), Fold3 (e, f), Fold4 (g, h), and Fold5 (i, j). The best model was selected
based on the performance of validation set. The green line indicates the peak performance in term of Dice Score on validation set.
Positive Predictive Value (PPV) is defined in Eq. (6), Negative Predictive Value (NPV) is defined in Eq. (7),
𝑇𝑃 𝑇𝑁
𝑃𝑃𝑉 = (6) 𝑁𝑃 𝑉 = (7)
𝑇𝑃 + 𝐹𝑃 𝑇𝑁 + 𝐹𝑁
Positive Predictive Value measure the number of positive voxel that are Negative Predictive Value measure the number of negative voxel that
true positive results. are true negative results.
6
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
Fig. 6. Results of our approach for cases trained with FLAIR images for fold: Fold2. The image shows (a), (e), (i) the original slices, (b), (f), (j) the ground truth for each slice,
(c), (g), (k) the predictions of our model, (d), (h), (l) false-positive pixels (in red) and false-negative pixels (in green), for slices taken from three different regions of the brain,
respectively.
4.2 Experimental setup and hardware specification information. Network training was performed for 200 epochs well
beyond the average converging rate, through the usage of Stochastic
As explained in Section 3 the experiments were performed em- Gradient Descent (SGD) [46] as optimizer with an initial learning rate
ploying images obtained after the pre-processing phase, removing the of 1𝑒−4, a weight decay equal to 1𝑒−4 and a batch size fixed at 4. Fig. 5
black areas. Given the low number of patients (only 5) and scans shows the Dice and loss curves obtained during training considering
(only 21), the tests were carried out through a cross-validation strategy the various folds as configurations. The best model was then selected
considering 5-fold, with 17 scans to train the network, 3 scans used as to perform all tests based on the highest Dice value achieved by the
validation set and a scan to test it. As done by Hashemi et al. in [34] validation set for each fold. The training computation time for 200
part of patient’s scans were employed during training phase while the epochs was approximately 20 h. No data augmentation was applied
remaining ones for testing. during the training process. To demonstrate the absence of overfitting,
a subsequent test was performed by applying random transformations
• Fold1 includes a total of 1119 images as a training set, 197 images
to the training data, including flipping and affine transformations of
as a validation set and 70 images as a test set (patient 1 at T4).
the images. This experiment helped to ensure the model’s convergence
• Fold2 includes a total of 1119 images as a training set, 183 images
while avoiding overfitting. The results obtained from the additional
as a validation set and 84 images as a test set (patient 2 at T4).
training are very similar to those reported in Fig. 5. This suggests that
• Fold3 includes a total of 1119 images as a training set, 200 images
the model’s performance is robust.
as a validation set and 67 images as a test set (patient 3 at T4).
• Fold4 includes a total of 1136 images as a training set, 208 images
4.3 General results
as a validation set and 42 images as a test set (patient 4 at T4).
• Fold5 includes a total of 1111 images as a training set, 225 images
as a validation set and 50 images as a test set (patient 5 at T4). To properly evaluate the performances of the proposed approach
a set of tests were conducted, in which the employed folds contain
The proposed approach was implemented in Python language (ver- multiple combinations of the data.
sion 3.9.7) using Pytorch [45] package. All experiments were done The averages ± Standard Deviation (SD) of all metrics obtained
on a NVIDIA Quadro RTX 6000 GPU. The network was evaluated in the cross-validation test folds are reported in Table 2. From the
using the Dice loss function, which considers both local and global comparison of all the results trained in the different folds, it can be
7
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
Table 2
Average of the evaluation metrics for the proposed approach in the different folds for the test data. The last row shows the average among all folds (Average ± SD).
Fold Dice Sensitivity Specificity IOU EF PPV NPV Accuracy
ID Score
1 0.8448 0.8601 0.9983 0.7314 0.0016 0.8301 0.9987 0.9971
2 0.8900 0.8679 0.9987 0.8019 0.0012 0.9133 0.9980 0.9968
3 0.8855 0.9071 0.9995 0.7946 0.0004 0.8650 0.9997 0.9992
4 0.8101 0.7675 0.9997 0.6808 0.0002 0.8577 0.9995 0.9992
5 0.8190 0.8442 0.9992 0.6936 0.0007 0.7953 0.9994 0.9987
Mean 0.84 ± 0.03 0.84 ± 0.05 0.99 ± 5e−4 0.74 ± 0.05 8e−4 ± 5e−4 0.85 ± 0.04 0.99 ± 7e−4 0.99 ± 1e−3
Table 3
Mean dice obtained from the proposed approach, compared with Hashemi et al. [34], Feng et al. [47], Abolvardi et al. [48], Salem et al. [49],
Aslani et al. [50], Afzal et al. [51], Roy et al. [52], Zhang et al. [43], Raab et al. [32] Kamraoui et al. [33], Sarica et al. [35]. The value
in bold indicates the obtained best value; to note that also our mean value overcomes the state-of-the-art. Although some of these methods
proposed solutions using all or some of MRI modalities, we report the results of all of them as done by Hashemi et al. [34] in Table 8.
Papers Method Dice score PPV Sensitivity
Hashemi et al. [34] Attention U-Net 0.80 0.82 0.79
Hashemi et al. [34] U-Net 0.81 0.84 0.79
Feng et al. [47] 3D U-Net 0.68 0.78 0.64
Abolvardi et al. [48] 3D U-Net 0.61 – –
Salem et al. [49] 2D U-Net 0.64 0.79 0.57
Aslani et al. [50] CNN 0.76 – –
Afzal et al. [51] CNN 0.67 0.9 0.48
Roy et al. [52] CNN 0.56 0.6 –
Zhang et al. [43] FC-DenseNets 0.64 0.90 –
Raab et al. [32] 2D U-Net 0.77 – –
Kamraoui et al. [33] 3D U-Net 0.67 0.84 –
Sarica et al. [35] Dense-Residual U-Net 0.66 0.86 –
Our (Mean) FC-DenseNet + SA + C-LSTM 0.84 0.85 0.84
Our (Fold2) FC-DenseNet + SA + C-LSTM 0.89 0.91 0.86
concluded that the model achieves the best result in terms of Dice score as Attention U-Net has worse results than simple U-Net. The results
in Fold2 (89%). In general, high values are achieved for all metrics and obtained with our method in Fold2 exceed the results obtained by [34]
the results being very comparable between all folds. for both networks.
It is possible to appreciate visually the results of the proposed
approach, in the test set, in Fig. 6, where is possible to observe the 4.4 Additional test
segmentation results on slices of the same patient extracted from three
different regions of the brain. Fig. 6 shows slices of the same subject (a), As previously mentioned, we are in the process of building a new
(e), (i), the ground truth segmentations (b), (f), (j), the segmentations dataset containing FLAIR scans of multiple sclerosis patients with
obtained by the proposed approach (c), (g), (k), the false positive and expert-labeled MS lesions.
false negative pixels distincted in red and green pixels respectively in To evaluate the performances of the proposed method (with the
(d), (h), (l). model trained on the ISBI-2015 dataset) an additional test was done
As can be observed, the predicted lesions mask is very similar employing MR FLAIR images from three patients of our in progress
to the ground-truth mask, so the proposed approach segments most dataset. As depicted in Fig. 7, the ground truths of three patients
of the lesions with good accuracy. False-positive and false-negative (P1, P2, and P3) were overlaid with the corresponding segmentations
pixels mask, confirms how most of the Flair MS lesions were correctly obtained by our model and the resulting masks. The high number of
detected by the model. false negative pixels indicates that lesion contouring is the task where
The proposed approach has been compared with recent state-of-the- our network is less accurate. The larger are the lesions the less accurate
art solutions based on 2D/3D U-Net for MS lesion segmentation. The is the contouring. Table 4 reports individual fold and average Dice
comparison was done through the mean Dice score between methods Scores from each patient. Dice Score performances obtained by our
on ISBI2015 dataset. Table 3 shows the results of state-of-the-art, our method are somehow different when the results from training dataset
results achieved in the best fold (Fold2 in our case) and our mean are compared with the test scans, but the mean Dice Score of 0.7730
calculated considering all the involved folds. achieved for P1 represents a satisfying result in terms of accuracy
State-of-the-art methods used for sake of comparisons are the fol- and lesion segmentation. This discrepancy may be due to differences
lowing: [33,47,48] (3D U-Net architecture), [32,35,49] (2D U-Net ar- between acquisition scanners, as a 3.0 T scanner was used for the
chitecture). Also we included [50–52] based on a CNN model while training images (ISBI-2015), whereas a 1.5 T scanner was employed
[43] make use of a Tiramisú network by combining slices in the three for testing images. In addition, a reduced performance on test scans
anatomical planes to capture both global and local contexts. The results may reflect the fact that they were obtained from a single time point,
of Table 3 show how our framework improves the results of the state-of- whereas the scans of training set included multiple serial acquisitions
the-art by about 7 of Dice Score. The results obtained from the method for each patients, which improved the accuracy of our automated
proposed by [34] are the most similar to ours; [34] making use of two segmentation method.
segmentation networks for MS lesions, an Attention U-Net and a U-Net,
and presents the results obtained with both networks on the different 4.5 Ablation studies
MRI acquisition modalities.
The comparison between our results and [34] are referred to FLAIR In order to explain the reason behind the design of the employed
images achieved considering in both the results of the patient corre- architecture was chosen some ablation studies were done. The proposed
sponding to our Fold2 as test. Furthermore, it is possible to note how network architecture consists of a main backbone to which several
the use of attention mechanisms does not give advantages to [34], modules have been added, then some variants will be presented in
8
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
Fig. 7. Some of results obtained in different trained networks of our approach tested with three patients of the dataset under construction. The image shows (a), (e), (i) the
original slice, (b), (f), (j) the ground-truth for that slice, (c), (g), (k) the prediction of our model, and (d), (h), (l) the false-positive (in red) and false-negative (in green) pixels of
P1, P2 and P3, respectively.
this Section. Specifically, some tests were carried out removing parts of
Table 4
Additional test performed on three different patients extracted from the new dataset, the network or replacing them with others in order to obtain a better
named P1, P2 and P3, respectively, using the best validation model for each Folds. explanation of the model behavior and overall achieved performance.
The results are reported in terms of Dice Score (DSC). The last row shows the average The purpose is to quantitatively measure the contribution of each parts
Dice Score across all folds (Average ± SD). to the overall model. Starting from a specific model, i.e. the Tiramisú
Fold ID DSC P1 DSC P2 DSC P3 network architecture with squeeze and attention layers in the two paths
1 0.8023 0.6944 0.5646 and the unidirectional convolutional LSTM layer in the bottleneck,
2 0.7290 0.6248 0.3443 (shown as FC-DenseNet + SA + C-LSTM in Table 5), three different
3 0.8016 0.6097 0.4483 configurations were considered: Basic Tiramisú model (FC-DenseNet
4 0.7954 0.6869 0.5537
in Table 5), Tiramisú model with the addition of the unidirectional
5 0.7390 0.5765 0.4021
Mean 0.7730 ± 0.03 0.6384 ± 0.05 0.4626 ± 0.09 convolutional LSTM level in the bottleneck (FC-DenseNet + C-LSTM
in Table 5) and the Tiramisú model with the addition of the squeeze
and attention modules in the two network paths (FC-DenseNet + SA
in Table 5). Every configuration was tested on two of the five folds
Table 5 described in Section 4.2, chosen on the basis of the results obtained in
Ablation studies performed on Folds 2 and 4 with different network configurations Section 4.3. In particular, the two folds with the best and worst Dice
employing the ISBI-2015 dataset. Scores in test experiments were chosen, Fold2 and Fold4, respectively.
Architecture DSC (Fold2) DSC (Fold4) DSC mean As can be verified from Table 5, the ablation studies demonstrate how
FC-DenseNet [11] 0.8875 0.7989 0.8432 SA module always improves the performances while C-LSTM works
FC-DenseNet + C-LSTM 0.8639 0.7794 0.8216 only if it is in couple with SA.
FC-DenseNet + SA 0.8939 0.8048 0.8493
FC-DenseNet + SA + C-LSTM 0.8900 0.8101 0.8500
5 Conclusion and future work
9
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
on an extension of the U-net neural network. The proposed method [8] R. Khayati, M. Vafadust, F. Towhidkhah, M. Nabavi, Fully automatic segmen-
demonstrated to be more accurate than state-of-the-art methods, boost- tation of multiple sclerosis lesions in brain MR FLAIR images using adaptive
mixtures method and Markov random field model, Comput. Biol. Med. 38 (3)
ing results by exploiting a dedicated attention mechanism. It is worth
(2008) 379–390.
noting that the simple insertion of attention does not always improves [9] N. Gessert, J. Krüger, R. Opfer, A.-C. Ostwaldt, P. Manogaran, H.H. Kitzler,
results [34], whilst only a dedicated solution, as the novel one pre- S. Schippling, A. Schlaefer, Multiple sclerosis lesion activity segmentation with
sented in this paper, could be able to provide substantial improvement. attention-guided two-path CNNs, Comput. Med. Imaging Graph. 84 (2020)
The effectiveness and robustness of the technique was demonstrated 101772.
[10] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for
for the first time on patients never employed for the training of the biomedical image segmentation, in: International Conference on Medical Image
model. The high level of Dice Score, obtained by the proposed method Computing and Computer-Assisted Intervention, Springer, 2015, pp. 234–241.
on this particular sample, is of utter importance in demonstrating the [11] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, Y. Bengio, The one hundred
generalizing capabilities of the solution, as it is not dependent to a layers tiramisu: Fully convolutional DenseNets for semantic segmentation, in:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
specific acquisition hardware and method. To further investigate these
Workshops, 2017, pp. 11–19.
capabilities, we are continuing the acquisition campaign with the aim [12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser,
to have new samples and enrich the comparison dataset. I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
Furthermore, the lesion segmentation framework proposed in the [13] A. Sinha, J. Dolz, Multi-scale self-guided attention for medical image
segmentation, IEEE J. Biomed. Health Inform. 25 (1) (2020) 121–130.
paper uses recent AI methodologies to estimate the level of progression
[14] Z. Zhong, Z.Q. Lin, R. Bidart, X. Hu, I.B. Daya, Z. Li, W.-S. Zheng, J. Li, A. Wong,
of the MS disease by recognizing automatically the lesions in MRI Squeeze-and-attention networks for semantic segmentation, in: Proceedings of the
images. The obtained data about the quantitative MRI lesion load will IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp.
be used in the UISS framework, which has the aim to model and 13065–13074.
[15] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo, Convolutional
simulate the progression of MS lesions as well as to predict the immune
LSTM network: A machine learning approach for precipitation nowcasting, Adv.
response to specific treatments. A potential future direction for the Neural Inf. Process. Syst. 28 (2015).
research is to explore the possibility of replacing the recurrent layers [16] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
with 3D convolutions to enhance the performance of the network. 436–444.
[17] F.L. Sips, F. Pappalardo, G. Russo, R. Bursi, In silico clinical trials for relapsing-
remitting multiple sclerosis with MS TreatSim, BMC Med. Inform. Decis. Mak.
Declaration of competing interest 22 (6) (2022) 1–10.
[18] G. Russo, G.A.P. Palumbo, V. Di Salvatore, D. Maimone, F. Pappalardo, Com-
All authors have participated in (a) conception and design, or anal- putational models to predict disease course and treatment response in multiple
ysis and interpretation of the data; (b) drafting the article or revising sclerosis, in: 2021 International Conference on Electrical, Computer and Energy
Technologies, ICECET, IEEE, 2021, pp. 1–5.
it critically for important intellectual content; and (c) approval of the
[19] R.A.J. Alhatemi, S. Savaş, Transfer learning-based classification comparison of
final version. This manuscript has not been submitted to, nor is under stroke, Comput. Sci. 192–201.
review at, another journal or other publishing venue. The authors have [20] S. Savaş, N. Topaloğlu, Ö. Kazci, P. Koşar, Comparison of deep learning models in
no affiliation with any organization with a direct or indirect financial carotid artery Intima-Media thickness ultrasound images: CAIMTUSNet, Bilişim
Teknol. Derg. 15 (1) (2022) 1–12.
interest in the subject matter discussed in the manuscript.
[21] P. Singh, J. Cirrone, A data-efficient deep learning framework for segmentation
and classification of histopathology images, in: Computer Vision–ECCV 2022
Acknowledgments Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, Springer,
2023, pp. 385–405.
Alessia Rondinella is a PhD candidate enrolled in the National [22] J. Wang, Y. Bao, Y. Wen, H. Lu, H. Luo, Y. Xiang, X. Li, C. Liu, D. Qian, Prior-
attention residual learning for more discriminative COVID-19 screening in CT
PhD in Artificial Intelligence, XXXVII cycle, course on Health and life images, IEEE Trans. Med. Imaging 39 (8) (2020) 2572–2583.
sciences, organized by Università Campus Bio-Medico di Roma. [23] M.A. Napoli Spatafora, A. Ortis, S. Battiato, Mixup data augmentation for COVID-
Authors Francesco Guarnera, Giulia Russo, and Francesco Pap- 19 infection percentage estimation, in: Image Analysis and Processing. ICIAP
palardo were supported by the European Commission through the 2022 Workshops: ICIAP International Workshops, Lecce, Italy, May 23–27, 2022,
Revised Selected Papers, Part II, Springer, 2022, pp. 508–519.
H2020 project ‘‘In Silico World: Lowering barriers to ubiquitous adop-
[24] A. Rondinella, F. Guarnera, O. Giudice, A. Ortis, F. Rundo, S. Battiato, Attention-
tion of In Silico Trials’’ (topic SC1-DTH-06-2020, grant ID 101016503). based convolutional neural network for CT scan COVID-19 detection, in:
Experiments were carried out thanks to the hardware and software Proceedings of the IEEE International Conference on Acoustics Speech and Signal
granted and managed by iCTLab S.r.l. - Spinoff of University of Catania. Processing Satellite Workshops, IEEE, 2023.
[25] E. Roura, A. Oliver, M. Cabezas, S. Valverde, D. Pareto, J.C. Vilanova, L.
Ramió-Torrentà, À. Rovira, X. Lladó, A toolbox for multiple sclerosis lesion
References segmentation, Neuroradiology 57 (10) (2015) 1031–1043.
[26] J. Knight, A. Khademi, MS lesion segmentation using FLAIR MRI only,
[1] H. Lassmann, Multiple sclerosis pathology, Cold Spring Harb. Perspect. Med. 8 in: Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Le-
(3) (2018) a028936. sions Segmentation Challenge using a Data Management and Processing
[2] M. Filippi, A. Bar-Or, F. Piehl, P. Preziosa, A. Solari, S. Vukusic, M.A. Rocca, Infrastructure-MICCAI-MSSEG, 2016, pp. 21–28.
Multiple sclerosis, Nat. Rev. Dis. Primers 4 (43) (2018). [27] P. Schmidt, C. Gaser, M. Arsic, D. Buck, A. Förschler, A. Berthele, M. Hoshi,
[3] A.J. Thompson, B.L. Banwell, F. Barkhof, W.M. Carroll, T. Coetzee, G. Comi, R. Ilg, V.J. Schmid, C. Zimmer, et al., An automated tool for detection of
J. Correale, F. Fazekas, M. Filippi, M.S. Freedman, et al., Diagnosis of multiple FLAIR-hyperintense white-matter lesions in Multiple Sclerosis, Neuroimage 59
sclerosis: 2017 revisions of the McDonald criteria, Lancet Neurol. 17 (2) (2018) (4) (2012) 3774–3783.
162–173. [28] O. Freifeld, H. Greenspan, J. Goldberger, Lesion detection in noisy MR brain
[4] M.P. Wattjes, O. Ciccarelli, D.S. Reich, B. Banwell, N. de Stefano, C. Enzinger, F. images using constrained GMM and active contours, in: 2007 4th IEEE Interna-
Fazekas, M. Filippi, J. Frederiksen, C. Gasperini, et al., 2021 MAGNIMS–CMSC– tional Symposium on Biomedical Imaging: From Nano to Macro, IEEE, 2007, pp.
NAIMS consensus recommendations on the use of MRI in patients with multiple 596–599.
sclerosis, Lancet Neurol. 20 (8) (2021) 653–670. [29] P. Schmidt, V. Pongratz, P. Küster, D. Meier, J. Wuerfel, C. Lukas, B. Bellenberg,
[5] P. Molyneux, D. Miller, M. Filippi, T. Yousry, E. Radü, H. Adèr, F. Barkhof, Visual F. Zipp, S. Groppa, P.G. Sämann, et al., Automated segmentation of changes in
analysis of serial T2-weighted MRI in multiple sclerosis: intra-and interobserver FLAIR-hyperintense white matter lesions in multiple sclerosis on serial magnetic
reproducibility, Neuroradiology 41 (12) (1999) 882–888. resonance imaging, NeuroImage: Clin. 23 (2019) 101849.
[6] A. Shoeibi, M. Khodatars, M. Jafari, P. Moridian, M. Rezaei, R. Alizadehsani, F. [30] R.E. Gabr, I. Coronado, M. Robinson, S.J. Sujit, S. Datta, X. Sun, W.J. Allen, F.D.
Khozeimeh, J.M. Gorriz, J. Heras, M. Panahiazar, et al., Applications of deep Lublin, J.S. Wolinsky, P.A. Narayana, Brain and lesion segmentation in multiple
learning techniques for automated multiple sclerosis detection using magnetic sclerosis using fully convolutional neural networks: A large-scale study, Mult.
resonance imaging: A review, Comput. Biol. Med. 136 (2021) 104697. Scler. J. 26 (10) (2020) 1217–1226.
[7] R. Khayati, M. Vafadust, F. Towhidkhah, S.M. Nabavi, A novel method for [31] G. Placidi, L. Cinque, F. Mignosi, M. Polsinelli, Ensemble CNN and uncertainty
automatic determination of different stages of multiple sclerosis lesions in brain modeling to improve automatic identification/segmentation of multiple sclerosis
MR FLAIR images, Comput. Med. Imaging Graph. 32 (2) (2008) 124–133. lesions in magnetic resonance imaging, 2021, arXiv preprint arXiv:2108.11791.
10
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com
[32] F. Raab, S. Wein, M. Greenlee, W. Malloni, E. Lang, A multimodal 2D [45] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
convolutional neural network for multiple sclerosis lesion detection, 2022. N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison,
[33] R.A. Kamraoui, V.-T. Ta, T. Tourdias, B. Mansencal, J.V. Manjon, P. Coupé, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch:
DeepLesionBrain: Towards a broader deep-learning generalization for multiple An imperative style, high-performance deep learning library, in: Advances in
sclerosis lesion segmentation, Med. Image Anal. 76 (2022) 102312. Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp.
[34] M. Hashemi, M. Akhbari, C. Jutten, Delve into Multiple Sclerosis (MS) lesion 8024–8035.
exploration: A modified attention U-Net for MS lesion segmentation in Brain [46] S. Ruder, An overview of gradient descent optimization algorithms, 2016, arXiv
MRI, Comput. Biol. Med. 145 (2022) 105402. preprint arXiv:1609.04747.
[35] B. Sarica, D.Z. Seker, B. Bayram, A dense residual U-net for multiple sclerosis [47] Y. Feng, H. Pan, C. Meyer, X. Feng, A self-adaptive network for multiple sclerosis
lesions segmentation from multi-sequence 3D MR images, Int. J. Med. Inform. lesion segmentation from multi-contrast MRI with various imaging sequences, in:
170 (2023) 104965. 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019),
[36] A. Carass, S. Roy, A. Jog, J.L. Cuzzocreo, E. Magrath, A. Gherman, J. Button, IEEE, 2019, pp. 472–475.
J. Nguyen, F. Prados, C.H. Sudre, et al., Longitudinal multiple sclerosis lesion [48] A.A. Abolvardi, L. Hamey, K. Ho-Shon, Registration based data augmentation
segmentation: Resource and challenge, NeuroImage 148 (2017) 77–102. for multiple sclerosis lesion segmentation, in: 2019 Digital Image Computing:
[37] A. Collignon, F. Maes, D. Delaere, D. Vandermeulen, P. Suetens, G. Marchal, Techniques and Applications, DICTA, IEEE, 2019, pp. 1–5.
Automated multi-modality image registration based on information theory, in: [49] M. Salem, S. Valverde, M. Cabezas, D. Pareto, A. Oliver, J. Salvi, À. Rovira,
Information Processing in Medical Imaging, Vol. 3, 1995, pp. 263–274. X. Lladó, Multiple sclerosis lesion synthesis in MRI using an encoder-decoder
[38] W. Zheng, M.W. Chee, V. Zagorodnov, Improvement of brain segmentation U-NET, IEEE Access 7 (2019) 25171–25184.
accuracy by optimizing non-uniformity correction using N3, Neuroimage 48 (1) [50] S. Aslani, M. Dayan, L. Storelli, M. Filippi, V. Murino, M.A. Rocca, D. Sona, Multi-
(2009) 73–83. branch convolutional neural network for multiple sclerosis lesion segmentation,
[39] V. Popescu, M. Battaglini, W. Hoogstrate, S.C. Verfaillie, I. Sluimer, R.A. van NeuroImage 196 (2019) 1–15.
Schijndel, B.W. van Dijk, K.S. Cover, D.L. Knol, M. Jenkinson, et al., Optimizing [51] H.R. Afzal, S. Luo, S. Ramadan, J. Lechner-Scott, M.R. Amin, J. Li, M.K.
parameter choice for FSL-Brain Extraction Tool (BET) on 3D T1 images in Afzal, Automatic and robust segmentation of multiple sclerosis lesions with
multiple sclerosis, Neuroimage 61 (4) (2012) 1484–1494. convolutional neural networks, Comput. Mater. Contin. 66 (1) (2021) 977–991.
[40] E. Roura, A. Oliver, M. Cabezas, J.C. Vilanova, A. Rovira, L. Ramió-Torrentà, [52] S. Roy, J.A. Butman, D.S. Reich, P.A. Calabresi, D.L. Pham, Multiple sclerosis
X. Lladó, MARGA: Multispectral adaptive region growing algorithm for brain lesion segmentation from brain MRI via fully convolutional neural networks,
extraction on axial MRI, Comput. Methods Programs Biomed. 113 (2) (2014) 2018, arXiv preprint arXiv:1803.09172.
655–673.
[41] D.W. Shattuck, S.R. Sandor-Leahy, K.A. Schaper, D.A. Rottenberg, R.M. Leahy,
Magnetic resonance image tissue classification using a partial volume model,
NeuroImage 13 (5) (2001) 856–876.
[42] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.
7132–7141.
[43] H. Zhang, A.M. Valcarcel, R. Bakshi, R. Chu, F. Bagnato, R.T. Shinohara, K.
Hett, I. Oguz, Multiple sclerosis lesion segmentation with tiramisu and 2.5 d
stacked slices, in: International Conference on Medical Image Computing and
Computer-Assisted Intervention, Springer, 2019, pp. 338–346.
[44] L.R. Dice, Measures of the amount of ecologic association between species,
Ecology 26 (3) (1945) 297–302.
11