Multiclass instance segmentation optimization for fetal heart image object interpretation
Multiclass instance segmentation optimization for fetal heart image object interpretation
Hadi Syaputra1,4, Siti Nurmaini2, Radiyati Umi Partan3, Muhammad Taufik Roseno1,4
1
Doctoral Program in Engineering, Faculty of Engineering, Universitas Sriwijaya, Indralaya, Indonesia
2
Intelligent System Research Group, Faculty of Computer Science, Universitas Sriwijaya, Palembang, Indonesia
3
Department of Internal Medicine, Faculty of Medicine, Universitas Sriwijaya, Indralaya, Indonesia
4
Computer Science Study Program, Faculty of Computer Science, Universitas Sumatera Selatan, Palembang, Indonesia
Corresponding Author:
Siti Nurmaini
Intelligent System Research Group, Faculty of Computer Science, Universitas Sriwijaya
Palembang, Indonesia
Email: siti_nurmaini@[Link]
1. INTRODUCTION
The rapid development of AI technology has become an integral part of modern society. This is due
to the capability of AI to rationalize and take actions or solutions that have the highest probability of
achieving set goals [1]. In recent years, AI has been widely applied across various sectors, including
government [2], infrastructure [3], agriculture [4], and healthcare [5]. By leveraging this technology,
companies and organizations can integrate vast amounts of data to process information and make decisions.
To support decision-making processes, an AI-based approach in developing models using machine learning
(ML) algorithms is necessary. Various AI methodologies have been developed, one of which is ML. ML
operates by utilizing neural networks to process data with the aim of generating knowledge that supports
organizational or individual activities. In the process, ML extracts key features from data for model
formation [6]–[8].
In the healthcare field, ML has been extensively used to aid medical professionals in decision-
making. Research by Pullagura et al. [9] utilized ML to enhance the accuracy of fetal heart disease
identification. Canadilla et al. [10] conducted research employing ML to improve the evaluation of fetal heart
function by optimizing image acquisition and measurements, thereby aiding in prenatal diagnosis of fetal
heart remodeling and abnormalities. Hoodbhoy et al. [11] studied the accuracy of ML algorithm techniques
in identifying high-risk fetuses through cardiotocography. Cömert and Kocamaz [12] used ML as a
monitoring technique that provides crucial and vital information about fetal status during antepartum and
intrapartum periods, as well as classifying fetal heart rate signals. However, previous studies have shown that
ML methods have limitations in analyzing structured and limited data. Additionally, ML methods involve
more complex stages, such as manual image augmentation, which can be time-consuming to produce
actionable information for decision-making and actions [13].
To address the challenges of traditional ML methods, several studies have adopted a deep learning
(DL) [14] approach for analyzing and predicting medical examination outcomes, especially in image
classification and object detection to support fetal echocardiography examinations. By processing large
amounts of data, DL has demonstrated potential in enhancing accuracy and efficiency in medical image
analysis. DL methods are frequently employed in the medical field, such as in fetal cardiography image
detection [15]. One of the primary advantages of DL techniques is their ability to extract significant insights,
patterns, and information from images and videos. This is achieved through the development of algorithms
and models that enable machines to analyze, process, and make decisions based on visual data [16].
Moreover, DL techniques can identify and depict individual objects in images while providing labels for each
object, making them applicable in various fields such as object tracking [17] and medical imaging [18].
However, these studies mainly focus on the classification of medical images or videos by comparing one
image object with another. Additionally, the classification technique in DL methods can only identify a
single object within an image and categorize it based on that object. To overcome the limitations of DL
classification techniques, a solution is required that can detect multiple objects within a single image or video
[19]. In addition to classification and detection capabilities, DL methods also possess the ability to detect
multiple objects in one image and video. For example, research conducted by Sapitri et al. [20] utilized DL
for object detection in fetal ultrasound videos, identifying anatomical substructures of the fetal heart,
including i) four main chambers: left atrium (LA), right atrium (RA), left ventricle (LV), right ventricle
(RV); ii) four valves: tricuspid valve (TV), pulmonary valve (PV), mitral valve (MV), and aortic valve (AV);
and iii) one aorta (Ao).
Subsequent developments in object detection [21], [22] have enabled the identification and
categorization of every pixel in an image into meaningful object categories or areas, known as segmentation.
Segmentation techniques include semantic segmentation and instance segmentation. Research by
Rachmatullah et al. [23] used semantic segmentation methods to develop a semantic model that detects objects
by assigning labels to each pixel in an image, ensuring that pixels with the same label have the same image.
Simply put, semantic image segmentation is a technique used to identify specific object types within an image.
However, semantic segmentation techniques have several drawbacks, including the inability to distinguish
between individual objects in an image and difficulty identifying individual objects with similar textures
[23], [24]. In contrast, instance segmentation can provide unique labels for each individual object [25], [26].
Efforts to recognize and separate each class of objects in an image rely heavily on instance
segmentation, which in turn depends on the backbone architecture [27]. The backbone architecture plays a
crucial role in instance segmentation by providing essential feature information of the areas to be segmented
for the model [28]. Research conducted by Nurmaini et al. [29], has utilized the use of ResNet as the main
structure to achieve optimal instance segmentation. The application of instance segmentation in the medical
field includes automating the segmentation process and improving detection accuracy [30]. For instance, an
instance segmentation approach for fetal echocardiography can simultaneously separate the four standard
heart views and detect defects [29]. To accurately detect fetal heart abnormalities through fetal ultrasound,
all heart substructures must be recognized in normal anatomy [20]. One of the most significant limitations
associated with ultrasound involves interpersonal variability, meaning it depends on the examining doctor's
skills and the patient's condition [28]. Referring to research by Sapitri et al. [20], which examined
anatomical structure detection in fetal heart images, as well as research by Nurmaini et al. [28], which
focused on instance segmentation for the four main chambers of the fetal heart and heart disease detection,
this study expands its scope to include additional anatomical objects, namely the spine. The addition of the
spine is crucial for medical practitioners in identifying the four-chamber view (A4C) of the fetal heart in
images [31]. Therefore, the contributions of this study are the inclusion of ten anatomical objects of the fetal
heart, namely LA, RA, LV, RV, TV, PV, MV, AV, Ao, and spine, and the development of a DL approach
using instance segmentation methods for these ten anatomical structures. By developing a sample
segmentation method for ten fetal heart anatomy objects and applying hyperparameter tuning to find the
optimal settings [32], [33], this study aims to significantly improve medical image analysis in the healthcare
field and pave the way for future research in detecting fetal heart disease. This approach promises accuracy
in segmenting the fetal heart.
The segmentation of these ten anatomical structures was chosen based on clinical considerations as
each has an important role in the diagnosis of congenital heart defects. The four main heart chambers
(LA, RA, LV, RV) and the four valves (TV, PV, MV, AV) are the structures most frequently used in the
functional assessment of the fetal heart via ultrasonography. The structure of the Ao is important in
identifying blood outflow, while the spine helps to ensure correct anatomical orientation in the A4C.
Accurate segmentation of these structures allows early identification of various abnormalities such as septal
defects, valve stenosis, and abnormal positioning of the heart or other organs.
Figure 1. The flowchart of the AI-based models and experimental methods applied
Multiclass instance segmentation optimization for fetal heart image object interpretation (Hadi Syaputra)
4140 ISSN: 2252-8938
provided in .mp4 format, with a file size of 13.7 MB, a duration of 178 seconds, and a frame rate of 30 fps.
The entire video was converted into two-dimensional images with a resolution of 1280×720 pixels through a
frame extraction process.
2.5. Configuration
Prior to training, a hyperparameter tuning process was conducted, including the configuration of
anchor boxes, learning rate, batch size, and number of epochs. The proposed model was developed and
trained on a computer equipped with an Intel Core i3-4170 CPU @ 3.70 GHz (4 CPUs), 8 GB of RAM, and
an Nvidia GeForce GTX 1050 Ti GPU featuring 768 CUDA cores, a GPU clock speed of 1392/1506 MHz,
4 GB of GDDR5 memory, and a memory bandwidth of 112.1 GB/s. The programming language used was
Python 3.6.13, with TensorFlow 1.14.0, Keras 2.3.1, and Protobuf 3.19.6 libraries.
where precision at recall point k is the precision value at a specific recall; and ∆ 𝑟𝑒𝑐𝑎𝑙𝑙𝑘 is the change in
recall between two adjacent recall points.
1
𝑚𝐴𝑃 = ∑𝑁
𝑖=1 𝐴𝑃𝑖 (2)
𝑁
where N is the number of classes or objects; and 𝐴𝑃𝑖 is the AP for the i-th class.
To calculate precision and recall, use (3) and (4). Precision measures how many of the predicted
positive cases are truly positive, and it decreases when there are many false positives. Recall indicates how
many actual positive cases are correctly detected, and it decreases with high false negatives. Together, these
values determine AP, which is then averaged to compute mAP, giving a robust overall measure of object
detection performance.
𝑇𝑃
𝑃= (3)
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑅= (4)
𝑇𝑃 + 𝐹𝑁
Table 1 presents the results of image extraction from fetal heart examination videos, categorized into
four main groups based on the quality and presence of fetal heart structures. A total of 357 images with a
resolution of 1280×720 pixels were obtained. Most of the images contain fetal heart objects with varying
levels of clarity and object count, while others lack relevant features for further analysis. This classification
supports the selection of suitable images for the annotation and model training stages.
Visually, Figure 2 illustrates four main categories resulting from the image extraction process.
Figure 2(a) displays images that do not display any fetal heart object, Figure 2(b) presents images that
contain a fetal heart object but are out of focus, Figure 2(c) shows images that clearly show a single fetal
heart object, and Figure 2(d) presents images that display multiple fetal heart objects within a single frame.
These categories are derived from the video-to-image conversion process and will subsequently undergo
preprocessing as part of the dataset preparation for training the segmentation model.
Following the cropping and selection process for images displaying fetal heart objects, the total
number of images was reduced to 176, which aligns with the requirements for the instance segmentation
model, as shown in Figure 3. After obtaining the fetal heart images, the next step involved scaling the images
to ensure uniform size across the dataset. The scaling process was conducted as described in the method
section, with images resized to 400×300 pixels. Following this, all normal fetal heart images were annotated
with ten labels corresponding to the anatomical features of the fetal heart. This annotation was performed
using polygon points on the fetal heart object images. The annotation process is illustrated in Figure 4.
The final annotated fetal heart images were exported in JSON file format.
Multiclass instance segmentation optimization for fetal heart image object interpretation (Hadi Syaputra)
4142 ISSN: 2252-8938
(a) (b)
(c) (d)
Figure 2. Four image categories from the extraction process of (a) not showing any fetal heart
objects, (b) showing fetal heart objects but out of focus, (c) showing a fetal heart object, and
(d) showing multiple fetal heart object
Figure 4(a) represents the anatomical location of the fetal heart that has been determined based on
expert designation, but has not gone through the AI-based modeling stage. This identification is done
manually by the radiologist or specialist by considering the visual characteristics seen on the ultrasound
image. The location of anatomical structures in this image serves as the ground truth, which becomes the
reference in further annotation and modeling stages. Meanwhile, Figure 4(b) is the result of annotation
performed using annotation tools, where each fetal heart structure has been labeled with a color mask and
bounding box to distinguish specific areas. This annotation is an important part of preparing the dataset for
training AI-based segmentation models.
(a) (b)
Figure 4. Annotation of fetal heart images (a) original image with manual identification and (b) annotated
image with color masks and bounding boxes
After the annotation phase is complete, the JSON annotation files are paired with the annotated
images. This combined dataset is then used to train the instance segmentation model for fetal heart image
objects. A sample of the annotation results is shown in Figure 5. Figure 5 shows the results of ground truth
annotation for segmentation of anatomical structures in fetal heart ultrasound images. Figure 5(a) displays the
original ultrasound image, while Figures 5(b) to 5(k) represent the manually annotated segmentation of
various heart structures. The structures shown include Ao, AV, LA, LV, MV, PV, RA, RV, spine, and TV.
The masking visualized in Figures 5(b) to 5(k) shows the areas identified as part of each anatomical structure
based on the ground truth annotations. This image is generated from annotated data in JSON format imported
into Python code and visualized using image processing libraries such as OpenCV or Matplotlib. The process
involves mapping the JSON data into an array of binary images for each anatomical structure, then visualized
against a blue background to clarify the segmented parts.
(a)
Figure 5. Ground truth of annotation results (a) original image, (b) Ao, (c) AV, (d) LA, (e) LV,
(f) MV, (g) PV, (h) RA, (i) RV, (j) spine, and (k) TV
Table 5 presents the Mask R-CNN model evaluation results based on AP at intersection over union
(IoU)=50 for each fetal heart anatomy category in the training dataset as well as the mAP as a measure of
overall model performance. Based on the results obtained, models R50_sgd_19 and R50_sgd_20 showed the
best performance with mAP of 0.2749 and 0.2641, indicating the ability to recognize various anatomical
structures more accurately than other models. Cardiac structures such as right RV, LV, RA, LA, and AV
tended to have higher AP values, indicating that the models were able to recognize these parts better than
other structures, such as TV or PV, which had lower or even zero AP values. The evaluation results also
show that there are some models with AP value=0.000 in certain categories, indicating that the model failed
to detect objects of that class in the training dataset. This could be due to various factors, such as a limited
amount of annotation data or the complexity of anatomical structures that are difficult for the model to
recognize. In addition, models such as R50_sgd_5 and R50_sgd_23 have mAP=0, indicating that they did not
successfully segment any objects in the dataset. Models with higher mAP show better performance in
detecting and labeling fetal heart structures, while models with many values of 0.000 or mAP=0 show
weaknesses in the learning process from the available data.
Figure 6 displays the mAP for various ResNet-50 models trained using the SGD optimizer with
different hyperparameter combinations. mAP is a commonly used metric to evaluate the performance of
object detection models, with higher values indicating better performance. From the graph, it is evident that
models R50_sgd_19 and R50_sgd_20 achieved the best results, with mAP values of approximately 0.27 and
0.26, respectively. This suggests that models with an input image size of 512×512 and a learning rate of 0.01
perform better in detecting objects within the dataset used. Other models, such as R50_sgd_1, R50_sgd_7,
and R50_sgd_15, also showed fairly good performance with mAP values ranging from 0.1 to 0.15. However,
their performance was still below that of models R50_sgd_19 and R50_sgd_20. Some models exhibited very
low or even zero performance, such as R50_sgd_5 and R50_sgd_23. This may be attributed to suboptimal
hyperparameter combinations for the dataset. Overall, these results highlight the importance of selecting the
appropriate image input size and learning rate to achieve optimal performance in object detection models
using the ResNet-50 architecture with the SGD optimizer.
Although Mask R-CNN is a well-established method, this study presents a novel application by
integrating instance segmentation with targeted hyperparameter optimization tailored for A4C fetal heart
ultrasound images. The combination of input resolution tuning, learning rate, and momentum on a dataset with
ten anatomical classes represents a unique contribution, as previous studies typically limited segmentation to
fewer structures or did not perform systematic model optimization. This approach addresses the complexity of
fetal cardiac imaging and demonstrates improved class-wise recognition in a clinically relevant context.
The Figure 7 illustrates the AP at an IoU threshold of 50 for each class across various ResNet-50
models trained with the SGD optimizer. Each line in the graph represents a class, with AP values for each
model plotted as points along that line. The analysis reveals that the class Ao demonstrates significant
performance variation across models, with some models such as R50_sgd_19 and R50_sgd_20 achieving
high AP values. Other classes, including LA, LV, and RV, also show noticeable variation in performance
among the tested models. Models R50_sgd_19 and R50_sgd_20 exhibit more consistent performance across
many classes compared to others. Certain classes like spine, TV, MV, PV, and AV frequently show low or
even zero AP values in many models, indicating that detection for these classes is more challenging. Overall,
models with larger input image sizes and lower learning rates appear to deliver better and more consistent
results across various classes. The best-performing model in this evaluation is R50_sgd_19, which
demonstrates the highest performance across most classes. Out of the 24 identified models, named from
Multiclass instance segmentation optimization for fetal heart image object interpretation (Hadi Syaputra)
4146 ISSN: 2252-8938
R50_sgd_1 to R50_sgd_24, the research selected two models with optimal detection performance for classes
such as Ao, LA, LV, RV, RA, TV, MV, PV, AV, and spine. The evaluation, based on mAP values, identified
the first optimal model as R50_sgd_19, which achieved the highest mAP of 0.2749, although it failed to
detect the AV class. The second model, R50_sgd_20, successfully detected all classes with a mAP of 0.2641.
Both models demonstrated strong overall performance. Table 6 presents these two optimal models based on
the results from Table 5. These models were selected due to their high mAP values and consistent
performance across most fetal heart anatomical classes.
The results of instance segmentation from the two optimal models are displayed in Figure 8. This
figure shows the segmentation results for ten classes of fetal heart objects. These segmentation outputs are
essential to evaluate the model’s ability to differentiate each anatomical structure accurately. Figure 8(a)
shows the segmentation results for several anatomical structures in a medical image, likely an
echocardiogram of the heart. The segmentation successfully identifies and labels several key parts of the
image with high confidence levels, including: RV with a confidence of 0.995 LV with a confidence of 0.999
AV with a confidence of 0.985 MV with a confidence of 0.975 RA with a confidence of 1.000 PV with a
confidence of 0.972 LA with a confidence of 0.999 TV with a confidence of 0.970 Ao with a confidence of
0.995 spine with a confidence of 0.997. This segmentation demonstrates that the model has very high
accuracy in identifying and labeling various anatomical structures within the medical image. Each segment is
clearly delineated, and the high confidence values suggest that this model is reliable for diagnostic purposes
and further medical analysis. These results are highly favorable for medical applications, particularly in
assisting physicians with the identification and analysis of critical parts of echocardiographic images.
Figure 8(b) presents the segmentation results of several key anatomical structures in an echocardiographic
image, with extremely high confidence levels. Detailed explanations for each identified structure are as
follows. RV: this structure is identified with a confidence of 0.998, indicating that the model is highly
confident in its identification. LV: similarly, the LV is identified with a very high confidence of 0.998.
AV: this valve is identified with a confidence of 0.954. Although slightly lower than other structures, this
value remains very high. MV: marked with a confidence of 0.993, indicating nearly perfect confidence in
identifying this valve. RA: with a confidence of 0.999, the RA is identified with nearly perfect confidence.
PV: this valve is identified with a confidence of 0.945, which remains within a high confidence range.
TV: with a confidence of 0.988, the TV is segmented with very good accuracy. Ao: this structure is marked
with a confidence of 0.998, indicating highly accurate identification. Spine: the spine is segmented with a
confidence of 0.996, showing high confidence in the identification of this structure. Overall, this
segmentation demonstrates that the model excels in identifying various important anatomical structures in
echocardiographic images. With nearly perfect confidence values for most structures, the model is highly
reliable for diagnostic and further medical analysis.
(a) (b)
Figure 8. Instance segmentation results instance segmentation of fetal heart objects using model:
(a) R50_sgd_19 and (b) R50_sgd_20
This study shows significant improvements in fetal cardiac anatomy segmentation using the DL
method based on the ResNet architecture, which has been optimized through hyperparameter settings. By
implementing the segmentation instance, this study successfully identified ten major anatomical objects in
the fetal heart, including the four main chambers, important valves, aorta, and spine. The addition of the
spine in this segmentation provides more comprehensive information, which is beneficial in the identification
of the complete four-chambered view of the heart, aiding in the diagnosis of fetal heart health. Although this
study is superior in anatomical object coverage compared to previous studies (which only focused on four
chambers or no segmentation at all), the results show that the mAP value achieved is still relatively low. This
is due to the challenge of identifying more diverse classes of objects in the fetal heart image, as well as the
high level of noise in the video. These conditions degrade the accuracy of the model in detecting and
classifying objects accurately, which impacts the overall performance of the segmentation. In addition, the
Multiclass instance segmentation optimization for fetal heart image object interpretation (Hadi Syaputra)
4148 ISSN: 2252-8938
noise factor in ultrasound images can complicate the segmentation process as it depends on the varying video
quality. Compared to previous studies that have fewer object classes, this study shows limitations in
accurately analyzing more objects under non-ideal video conditions. Nonetheless, this study still shows the
potential to improve fetal heart image detection and segmentation in the future through improved
preprocessing techniques and improved quality of the video data used.
In addition to prior works focusing on image segmentation and detection, recent studies have
explored alternative approaches to fetal cardiac analysis, such as digital twin modeling and entropy-based
analysis of fetal heart rate variability (HRV). For instance, Lwin et al. [38] proposed a digital twin framework
combined with entropy measures to enhance fetal monitoring systems, demonstrating how physiological signal
analysis can complement image-based techniques. Similarly, Zin and Tin [39] applied Markov chain models to
analyze HRV, highlighting the integration of AI with physiological data for diagnostic support. While these
approaches differ in modality, they align with the broader goal of improving fetal cardiac assessment using AI,
and this study complements them by advancing structural image-based segmentation.
4. CONCLUSION
This study successfully applied Mask R-CNN for sample segmentation on fetal heart ultrasound
images, able to identify and label anatomical structures. The R50_sgd_19 and R50_sgd_20 models showed
good performance, with mAP values of 0.2749 and 0.2641, respectively. These models accurately detected
and labeled major cardiac structures including RV, LV, AV, MV, RA, PV, LA, TV, Ao, and spine with
confidence values ranging from 0.970 to 1.000, demonstrating the robustness of the models. Systematic data
preprocessing, annotation, hyperparameter optimization, and model training were critical to this success. The
results provide a valuable tool for medical practitioners, enabling more precise diagnosis and contributing
significantly to the assessment of fetal heart health. Furthermore, this research can serve as a foundation for
the integration of AI-based diagnostic support in fetal cardiology. Future research can explore advanced
architectures, dataset expansion, integration with other imaging modalities, real-time clinical applications,
and user-friendly interfaces to further improve the utility and accuracy of the model.
ACKNOWLEDGMENTS
We would like to express very great appreciation to Universitas Sumatera Selatan and Lembaga
Penelitian dan Pengabdian kepada Masyarakat Universitas Sumatera Selatan (LPPM) for support during the
development of this research work.
FUNDING INFORMATION
This research was supported by financial assistance from Universitas Sumatera Selatan for
publication funding.
Name of Author C M So Va Fo I R D O E Vi Su P Fu
Hadi Syaputra ✓ ✓ ✓ ✓ ✓ ✓ ✓
Siti Nurmaini ✓ ✓ ✓ ✓ ✓
Radiyati Umi Partan ✓ ✓ ✓ ✓ ✓
Muhammad Taufik ✓ ✓ ✓ ✓ ✓
Roseno
INFORMED CONSENT
We have obtained informed consent from all individuals included in this study.
DATA AVAILABILITY
The authors confirm that the data supporting the findings of this study are available within the article.
REFERENCES
[1] Z. Akkus et al., “A survey of deep-learning applications in ultrasound: artificial intelligence–powered ultrasound for improving clinical
workflow,” Journal of the American College of Radiology, vol. 16, no. 9, pp. 1318–1328, 2019, doi: 10.1016/[Link].2019.06.004.
[2] R. Medaglia, J. R. G.-Garcia, and T. A. Pardo, “Artificial intelligence in government: taking stock and moving forward,” Social
Science Computer Review, vol. 41, no. 1, pp. 123–140, 2023, doi: 10.1177/08944393211034087.
[3] M. Y. A.-Kader, A. M. Ebid, K. C. Onyelowe, I. M. Mahdi, and I. A.-Rasheed, “(AI) in infrastructure projects—gap study,”
Infrastructures, vol. 7, no. 10, 2022, doi: 10.3390/infrastructures7100137.
[4] X. Liu, K. H. Ghazali, and A. A. Shah, “Sustainable oil palm resource assessment based on an enhanced deep learning method,”
Energies, vol. 15, no. 12, 2022, doi: 10.3390/en15124479.
[5] T. Davenport and R. Kalakota, “The potential for artificial intelligence in healthcare,” Future Healthcare Journal, vol. 6, no. 2,
pp. 94–98, 2019, doi: 10.7861/futurehosp.6-2-94.
[6] A. M. Hassan, J. A. Nelson, J. H. Coert, B. J. Mehrara, and J. C. Selber, “Exploring the potential of artificial intelligence in surgery:
insights from a conversation with ChatGPT,” Annals of Surgical Oncology, vol. 30, no. 7, 2023, doi: 10.1245/s10434-023-13347-0.
[7] F. Qin and J. Gu, “Artificial intelligence in plastic surgery: current developments and future perspectives,” Plastic and Aesthetic
Research, vol. 10, no. 1, 2023, doi: 10.20517/2347-9264.2022.72.
[8] S. Oh, J. H. Kim, S.-W. Choi, H. J. Lee, J. Hong, and S. H. Kwon, “Physician confidence in artificial intelligence: an online
mobile survey,” Journal of Medical Internet Research, vol. 21, no. 3, 2019, doi: 10.2196/12422.
[9] L. Pullagura, M. R. Dontha, and S. Kakumanu, “Recognition of fetal heart diseases through machine learning techniques,” Annals
of the Romanian Society for Cell Biology, vol. 25, no. 6, pp. 2601–2615, 2021.
[10] P. G.-Canadilla, S. S.-Martinez, F. Crispi, and B. Bijnens, “Machine learning in fetal cardiology: what to expect,” Fetal Diagnosis
and Therapy, vol. 47, no. 5, pp. 363–372, 2020, doi: 10.1159/000505021.
[11] Z. Hoodbhoy, M. Noman, A. Shafique, A. Nasim, D. Chowdhury, and B. Hasan, “Use of machine learning algorithms for
prediction of fetal risk using cardiotocographic data,” International Journal of Applied and Basic Medical Research, vol. 9, no. 4,
pp. 226–230, 2019, doi: 10.4103/ijabmr.IJABMR_370_18.
[12] Z. Cömert and A. Kocamaz, “A study of artificial neural network training algorithms for classification of cardiotocography signals,”
Bitlis Eren University Journal of Science and Technology, vol. 7, no. 2, pp. 93–103, 2017, doi: 10.17678/beuscitech.338085.
[13] G. Litjens et al., “A survey on deep learning in medical image analysis,” Medical Image Analysis, vol. 42, pp. 60–88, 2017,
doi: 10.1016/[Link].2017.07.005.
[14] H. Zhou et al., “A deep learning approach for medical waste classification,” Scientific Reports, vol. 12, no. 1, 2022,
doi: 10.1038/s41598-022-06146-2.
[15] M. C. Fiorentino, F. P. Villani, M. D. Cosmo, E. Frontoni, and S. Moccia, “A review on deep-learning algorithms for fetal
ultrasound-image analysis,” Medical Image Analysis, vol. 83, 2023, doi: 10.1016/[Link].2022.102629.
[16] Y. Matsuzaka and R. Yashiro, “AI-based computer vision techniques and expert systems,” AI, vol. 4, no. 1, pp. 289–302, 2023,
doi: 10.3390/ai4010013.
[17] Z. Soleimanitaleb and M. A. Keyvanrad, “Single object tracking: a survey of methods, datasets, and evaluation metrics,” arXiv-
Computer Science, pp. 1–15, 2022.
[18] A. A. Adegun, S. Viriri, and R. O. Ogundokun, “Deep learning approach for medical image analysis,” Computational Intelligence
and Neuroscience, vol. 2021, no. 1, 2021, doi: 10.1155/2021/6215281.
[19] Y. Dai, Y. Gao, and F. Liu, “TransMed: transformers advance multi-modal medical image classification,” Diagnostics, vol. 11,
no. 8, 2021, doi: 10.3390/diagnostics11081384.
[20] A. I. Sapitri et al., “Deep learning-based real time detection for cardiac objects with fetal ultrasound video,” Informatics in
Medicine Unlocked, vol. 36, 2023, doi: 10.1016/[Link].2022.101150.
[21] D. V. C. Gowda and R. Kanagavalli, “Video semantic segmentation with low latency,” TELKOMNIKA (Telecommunication
Computing Electronics and Control), vol. 22, no. 5, pp. 1147–1156, 2024, doi: 10.12928/TELKOMNIKA.v22i5.25157.
[22] S. An et al., “A category attention instance segmentation network for four cardiac chambers segmentation in fetal
echocardiography,” Computerized Medical Imaging and Graphics, vol. 93, 2021, doi: 10.1016/[Link].2021.101983.
[23] M. N. Rachmatullah, S. Nurmaini, A. I. Sapitri, A. Darmawahyuni, B. Tutuko, and Firdaus, “Convolutional neural network for
semantic segmentation of fetal echocardiography based on four-chamber view,” Bulletin of Electrical Engineering and
Informatics, vol. 10, no. 4, pp. 1987–1996, 2021, doi: 10.11591/EEI.V10I4.3060.
[24] H. Cheng et al., “Semantic segmentation method for myocardial contrast echocardiogram based on DeepLabV3+ deep learning
architecture,” Mathematical Biosciences and Engineering, vol. 20, no. 2, pp. 2081–2093, 2022, doi: 10.3934/mbe.2023096.
[25] A. M. Hafiz and G. M. Bhat, “A survey on instance segmentation: state of the art,” International Journal of Multimedia
Information Retrieval, vol. 9, no. 3, pp. 171–189, 2020, doi: 10.1007/s13735-020-00195-x.
[26] V. Iglovikov, S. Seferbekov, A. Buslaev, and A. Shvets, “TernausNetV2: fully convolutional network for instance segmentation,”
arXiv-Computer Science, pp. 4321–4325, 2018, doi: 10.1109/CVPRW.2018.00042.
[27] J. Yan, T. Yan, W. Ye, X. Lv, P. Gao, and W. Xu, “Cotton leaf segmentation with composite backbone architecture combining
convolution and attention,” Frontiers in Plant Science, vol. 14, 2023, doi: 10.3389/fpls.2023.1111175.
[28] S. Nurmaini et al., “Deep learning-based computer-aided fetal echocardiography: application to heart standard view segmentation
for congenital heart defects detection,” Sensors, vol. 21, no. 23, 2021, doi: 10.3390/s21238007.
[29] S. Nurmaini et al., “Accurate detection of septal defects with fetal ultrasonography images using deep learning-based multiclass
instance segmentation,” IEEE Access, vol. 8, pp. 196160–196174, 2020, doi: 10.1109/ACCESS.2020.3034367.
[30] A. Boccatonda, “Emergency ultrasound: is it time for artificial intelligence?,” Journal of Clinical Medicine, vol. 11, no. 13, 2022,
doi: 10.3390/jcm11133823.
[31] J. P. McGahan, “Sonography of the fetal heart: findings on the four-chamber view.,” American Journal of Roentgenology,
vol. 156, no. 3, pp. 547–553, 1991, doi: 10.2214/ajr.156.3.1899755.
Multiclass instance segmentation optimization for fetal heart image object interpretation (Hadi Syaputra)
4150 ISSN: 2252-8938
[32] S. Iqbal, A. N. Qureshi, A. Ullah, J. Li, and T. Mahmood, “Improving the robustness and quality of biomedical CNN models
through adaptive hyperparameter tuning,” Applied Sciences, vol. 12, no. 22, 2022, doi: 10.3390/app122211870.
[33] H. Y. Oh, M. S. Khan, S. B. Jeon, and M.-H. Jeong, “Automated detection of greenhouse structures using cascade Mask R-CNN,”
Applied Sciences, vol. 12, no. 11, 2022, doi: 10.3390/app12115553.
[34] myminifellowship, “Mastering the fetal heart: step 1,” YouTube. United States, 2015. Accessed: Mar. 31, 2023. [Online Video].
Available: [Link]
[35] B. Zhang, L. Niu, X. Zhao, and L. Zhang, “Human-centric image cropping with partition-aware and content-preserving features,”
arXiv-Computer Science, pp. 1–27, 2022.
[36] P. Skalski, “Makesense: free to use online tool for labelling photos,” Kaggle. 2023. Accessed: Mar. 25, 2023. [Online]. Available:
[Link]
[37] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision,
2017, pp. 2980–2988, doi: 10.1109/ICCV.2017.322.
[38] T. C. Lwin, T. T. Zin, P. P. Kyaw, P. Tin, E. Kino, and T. Ikenoue, “Enhancing fetal monitoring through digital twin technology
and entropy-based fetal heart rate variability analysis,” International Journal of Innovative Computing, Information and Control,
vol. 21, no. 1, pp. 185–196, 2025, doi: 10.24507/ijicic.21.01.185.
[39] T. T. Zin and P. Tin, “Markov chain modelling for heart rate variability analysis: bridging artificial intelligence and physiological
data,” in 2023 IEEE 13th International Conference on Consumer Electronics - Berlin, 2023, pp. 163–166, doi: 10.1109/ICCE-
Berlin58801.2023.10375625.
BIOGRAPHIES OF AUTHORS
Siti Nurmaini received the master’s degree in Control System from the Institut
Teknologi Bandung (ITB), Indonesia, in 1998, and the Ph.D. degree in Computer Science
from the Universiti Teknologi Malaysia (UTM), in 2011. She is currently a Professor with the
Faculty of Computer Science, Universitas Sriwijaya, Indonesia. Her research interests include
biomedical engineering, deep learning, machine learning, image processing, control systems,
and robotic. She can be contacted at email: siti_nurmaini@[Link].