0% found this document useful (0 votes)
45 views7 pages

Detection and Prediction of Cardiovascular Disease Using Fundus Images With Deep Learning

This document discusses the use of deep learning techniques, specifically convolutional neural networks (CNNs) and transfer learning, for the early detection and risk prediction of cardiovascular diseases (CVD) using retinal fundus images. The study aims to leverage noninvasive imaging methods to identify risk factors associated with CVD, potentially improving patient outcomes and reducing mortality rates. The proposed methods involve data augmentation and evaluation metrics to address class imbalances in the dataset, ultimately enhancing the accuracy of CVD predictions.

Uploaded by

gdheepak1979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

Detection and Prediction of Cardiovascular Disease Using Fundus Images With Deep Learning

This document discusses the use of deep learning techniques, specifically convolutional neural networks (CNNs) and transfer learning, for the early detection and risk prediction of cardiovascular diseases (CVD) using retinal fundus images. The study aims to leverage noninvasive imaging methods to identify risk factors associated with CVD, potentially improving patient outcomes and reducing mortality rates. The proposed methods involve data augmentation and evaluation metrics to address class imbalances in the dataset, ultimately enhancing the accuracy of CVD predictions.

Uploaded by

gdheepak1979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Detection and Prediction of Cardiovascular Disease

2024 20th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) | 979-8-3503-5632-8/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICNC-FSKD64080.2024.10702244

Using Fundus Images with Deep Learning

Victoria Willis, Bing Zhou and Qingzhong Liu


Sam Houston State University
Huntsville, TX USA
{vrw008, bxz003, liu}@[Link]

Abstract--- Early detection and risk prediction of within health data and medical imaging more efficiently than a
cardiovascular diseases (CVD) would help reduce CVD-related medical practitioner. A deep learning model paired alongside
complications and lessen their impact on a global scale. Cost- medical imaging could help lead doctors to a timelier
effective, noninvasive retinal fundus imaging can be used diagnosis, improve lives, and lower the risk of CVD-related
alongside deep learning to assist with early detection and risk
prediction during routine eye exams that most individuals have
deaths worldwide.
on a yearly basis. In this paper, deep learning methods are tested
for CVD detection and risk prediction using fundus images as the Within medical imaging, the use of fundus images has
sole source of data due to medical information not always being gained in popularity due to its noninvasive and cost-effective
readily available for all patients. The proposed methods will be nature [2]. Fundus photography uses reflected light to capture
implemented utilizing convolutional neural networks(CNNs) and a 2D photo of the rear of the eye, also known as the fundus.
transfer learning. The results and a comparison between Researchers have found through various studies that there are
different transfer learning models will be reviewed. many systemic diseases that can cause specific indicators to
become visible in this part of the eye. Technology
Keywords---Artificial Intelligence (AI), Deep Learning, advancements have also been made that change the way
cardiovascular disease (CVD), heart disease, fundus imaging,
Convolutional Neural Networks (CNNs)
fundus images are taken and make them easily obtainable.
Development of handheld fundus cameras or smartphone-
based fundus photography has been ongoing and will increase
the accessibility of this service [3].
I. INTRODUCTION
With most individuals having a yearly routine eye exam,
Cardiovascular diseases (CVD), commonly referred to as
this would be the perfect opportunity to identify at risk
heart diseases, are a set of disorders involving the heart and
individuals for CVD and get them on a plan to prevent further
blood vessels. CVDs are the leading cause of death globally
symptoms and health-related issues. This paper involves the
despite most of these diseases being preventable [1]. The most
proposal and implementation of a deep learning model using
significant behavioral risk factors provided by the World
convolutional neural networks and transfer learning to detect
Health Organization are maintaining an inactive lifestyle, an
and predict the risk of heart diseases. The model does so by
unhealthy diet, excessive consumption of alcohol, and tobacco
identifying features and abnormalities on fundus images that
use. With all of these risk factors being relatively easy to
indicate an individual is at risk for or in the early stages of
change and improve with the guidance of doctors, it is
CVD so that preventative measures can be taken. Some deep
important to detect the early signs of CVD as soon as possible
learning methods have been proposed by other researchers but
so that patients can make lifestyle changes that lower the risk
many use existing patient health data alongside fundus images.
of heart disease symptoms and improve patient outcomes
Complete patient health data is not always readily available, so
overall. Manually predicting CVD risk and detecting early
the proposed method will only use fundus image data.
signs in patients has proven to be a difficult task, if not
impossible, but with recent advancements in technology, In the next section, related methodologies and research are
mainly AI, the possibilities have expanded to make early described in detail followed by Section III that will discuss the
detection achievable. problem statement and why it should be further researched.
Section IV will discuss the proposed method on how to solve
Artificial Intelligence (AI), and more specifically deep
that problem while Section V will be the implementation and
learning, has been gaining traction in the medical field for
results of the proposed methods. To conclude, Section VI will
years. Deep learning uses artificial neural networks that
discuss future works and ways to improve upon this research.
contain multiple computational layers and can detect patterns

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
II. LITERATURE STUDY possible, but a patient’s medical history isn’t always easily
Blood vessels are the main anatomical structure that can be accessible and readily available. The main goal in this project
observed in fundus images and can be used to determine is to predict current CVDs and detect their risk probability
whether a patient is developing several different health issues while only using fundus images, thus avoiding the need to
such as diabetic retinopathy (DR), hypertension, and more. have the patient’s full clinical history at hand, and to be able
Evidence indicating a correlation between retinal vascular to get an expeditious diagnosis with the help of a medical
changes and coronary heart disease is increasing as well as practitioner.
retinal changes being associated with other cardiovascular risk
An additional goal of this project is to increase awareness
factors, even before clinical diagnosis of diseases [4].
and education about CVD by providing a way for patients to
Exudates, microaneurysms, and hemorrhages are visible easily access their medical information within a user-friendly
pathological signs of diabetic retinopathy [5]. Figure 1 is an portal to encourage individuals to play an active role in their
example of how diabetic retinopathy (DR) can be discovered cardiovascular health to help reduce the mortality rate of CVD
by analyzing fundus imaging. A higher number of visible globally. The aim is to increase awareness of CVD and their
indicators of DR signifies a higher risk of CVD. In fact, Simó, risk factors as this alone could have an exponential impact and
Rafael et al. discovered that the presence of DR and its decrease CVD diagnosis, events, and related symptoms.
severity confers a higher risk of subclinical CVD than factors
IV. PROPOSED METHODS
contained in contemporary risk equations such as blood
pressure, LDL cholesterol, and HcA1c [6]. With this Due to the lack of readily available medical information,
knowledge, it’s important to keep DR-specific features and the proposed deep learning methods below will use
abnormalities extracted from retinal microvasculature and convolutional neural networks combined with transfer
optic disc characteristics in mind when developing a deep learning to detect and predict CVD using only fundus imaging
learning model for predicting CVD. as the source of data. Data augmentation will also play a big
role, given the inherent imbalance of the data that we'll be
using to train the model.

A. Dataset & Preprocessing


Most publicly available fundus imaging datasets aren’t
large enough to properly train a CNN and there are currently
no CVD related fundus imaging datasets. Because of this, the
dataset chosen for this study is a Diabetic Retinopathy(DR)
fundus image dataset due to DRs strong correlation to CVD as
stated by Simó, Rafael et al. The DR dataset can be found on
Kaggle as a part of the Diabetic Retinopathy Detection
competition and contains approximately 33,000 different
fundus images.

When analyzing the dataset, we noticed high resolution


photos and large black borders. Both could drastically increase
the time it would take to train a CNN due to the model trying
to learn from the large, unimportant black area. Cropping the
Figure 1. Marked Features & Abnormalities on Fundus Image excess black border around each fundus image was completed
that Indicate Presence of DR to reduce the amount of time the machine learning model
would spend on each image and reduce the training time
Lee, Y.C et al. developed a multimodal deep learning
overall. Due to the high-quality resolution of the images
model that was aimed to predict current CVDs using two
within the dataset, after cropping the images, we resized the
different independent datasets. The multimodal model works
images to be no larger than 1024 for both height and width.
by combining a convolutional neural network (CNN) and a
The tools utilized for cropping and resizing were OpenCV and
deep neural network (DNN) using two different datasets:
Python inspired by Dey, K [8].
fundus images and clinical risk factors such as age, sex,
cholesterol, and hypertension. While they received a 95% To further increase the size of the dataset, data
confidence interval when testing their prediction performance augmentation was used on the data when imported into the
under the AUROC, their focus was on helping diagnose models via data generator. Horizonal flip, rotation range,
patients with easily accessible medical information [7]. zoom range, and shear range are examples of some of the
augmentations used in addition to the preprocessing functions
III. PROBLEM STATEMENT
for each of the pretrained models to ensure the input data is
When making a medical diagnosis it is important to have
as much health information and background about a patient as

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
regularized. A few examples of the augmented images and
preprocessing inputs are below in Figure 2.

Figure 2. Data Augmentation Example


Figure 3. ViT Patches
The DR dataset originally comes divided into 5 categories
that signify different severities of DR. To make this dataset The simulation will provide a comparison of the pretrained
usable for our CVD detection, severity 0, 1, and 2 were CNN models and the ViT model for CVD detection and
combined to create a low-risk category and severity 3 and 4 prediction. The evaluation metrics that will be used to
were combined to create the high-risk category. Following the compare models can be found in the next section.
creation of our low-risk and high-risk categories, the dataset
was split into 80% training, 10% validation, and 10% testing. C. Evaluation Metrics
For the prediction portion of this research, the original 5 DR Accuracy should never be the go-to metric when
categories were used to produce the probability of each CVD evaluating the performance of a model due to potential class
risk level. imbalance and bias that comes along with it. When training a
model, it is important for each class to be balanced. Class
B. Transfer Learning imbalance can have a negative impact on the performance of
Transfer learning is a powerful method that uses the the model which can lead to the model becoming biased
knowledge gained from pre-trained CNNs on a new set of data towards predicting the majority class. The dataset acquired for
and tasks. Pre-trained models are employed as the feature this method is severely imbalanced. There are several ways to
extractor and by fine-tuning the rest of the neural network on a combat imbalanced classes; oversampling, undersampling,
new dataset, they can be reused for many different tasks. By and implementing class weights. Calculating the appropriate
utilizing transfer learning in the development of this project it class weights based on the dataset helps the model train by
allows for faster and more accurate model training, even with giving more importance to the minority class/classes. For this
limited labeled data, as well as requiring fewer computational method, we use the Inverse Class Frequency method to
resources. calculate the class weights. The Inverse Class Frequency
formula is as follows:
The pretrained models selected for comparison in this
simulation are Inceptionv3B3, EfficientNetv2, and a Vision
Transformer ViT-Keras L16. When utilizing a pretrained
model, it is vital to use the same preprocessing functions on
the new data that were used when the model was originally The main metric that will be used to evaluate the models
trained. Each of these models has a different preprocessing due to class imbalance will be weighted F1 score. The
function which means they may pick up on different features weighted F1 score takes into account the different class
of a fundus image. weights when calculating the F1 upon dealing with an
imbalanced dataset and averages them together. Additional
Convolutional neural networks, such as InceptionV3 and metrics that will be used include F1 score, precision, and
EfficientNetv2, have been the go-to for computer vision tasks recall. The standard F1 score equation is as follows:
until recent years. Vision transformers (ViTs) are newer and
pose as a competitive alternative to convolutional neural
networks. ViT models have proven to outperform the current
state-of-the-art CNNs in terms of computational efficiency
and accuracy when pretrained on large amounts of data [9].
Vision transformers work by breaking down the input image In addition to the metrics above, a confusion matrix for
into a series of patches, shown below in Figure 3, and using its each model will be provided and used for performance
attention mechanisms to analyze features to capture both local evaluation. A confusion matrix displays the number of true
and global relationships within images. positives, true negatives, false positives, and false negatives a
model makes after predicting on a test dataset. This visual
summarizes the model performance and is especially helpful
to see which class the model is struggling to predict.

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
V. SIMULATION & RESULTS
Deep learning techniques used across both detection and
prediction models include Global Average Pooling(GAP),
normalization, and early stopping. GAP is used after the
pretrained model output to prevent any overfitting that would
occur when utilizing large pretrained models such as
InceptionV3. It greatly reduces the number of trainable
parameters. Layer normalization is used in place of batch
normalization for the ViT due to vision transformers working
by finding relationships between all features globally so being
batch dependent could negatively affect the model’s training
process and add fluctuations. Early stopping is utilized to save
the best model and the weights if validation loss doesn’t
improve. Using a batch size of 32 and input size of (299, 299,
3), similar to that of the pretrained models, we were able to
achieve impressive results using fundus imaging alone.
A. CVD Detection
After various tests, the model architecture that showed the
best performance for the binary classification and CVD
detection is shown in Figure 4. Sigmoid is the last activation
function used because it produces a number between 0 and 1.
This allows the model to detect whether the fundus image will
be of class 0(high-risk) or class 1(low-risk); which can help
medical practitioners detect heart disease.

Pretrained Model
GlobalAveragePool
Batch/Layer Normalization
Dense 128 (activation: relu)
Batch/Layer Normalization
Dense 64 (activation: relu)
Figure 6. Detection Matrices
Dense 32 (activation: relu)
Dense 1 (activation: sigmoid) B. CVD Prediction
Figure 4. Detection Model The model architecture for the multiclassification used for
prediction and risk level associated with CVD is below in
Each pretrained model performed about the same around Figure 7. Softmax is used as the last activation for the output
82-84% weighted F1-score. InceptionV3 led by a small to predict the probability and risk levels of CVD between all 5
percentage at 84% while EfficientNetv2 predicted a higher severity levels 0-4; 4 being the most at risk for developing a
number of true positives indicating a better learning CVD and showing symptoms.
performance of the high-risk class with a lower weighted F1
score of 82%. The Keras-ViT L16 model performed the best Pretrained Model
in terms of precision, recall, and the number of true negatives GlobalAveragePool
predicted but the weighted F1-score came out in the middle at
Batch/Layer Normalization
83%. All of the metrics including precision, recall, and F1-
score can be seen below in Figure 5. The confusion matrices Dense 256 (activation: relu)
that have been normalized across the true label can be seen in Dense 5 (activation: softmax)
Figure 6 for model comparison. Figure 7. Prediction Model

The InceptionV3 model performed the best out of the 3,


having the highest weighted F1-score of 54% as well as
overall accuracy of 47.87%. EfficientNetV2 was next with
51% weighted F1-score and an overall accuracy of 44.45%.
The Keras-ViT performed the least optimal with a weighted
F1 score of 50% and an overall accuracy of 42.18%. These
Figure 5. Detection Metrics metrics can be found below in Figure 7. Both the InceptionV3

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
and EfficientNetv2 models struggled to classify severity 2 before an official diagnosis is made, it could convince
images but were more accurate with the other severities as individuals to implement relatively easy lifestyle changes to
seen in the confusion matrices in Figure 8. The ViT was the reduce their overall risk of developing a CVD and minimizing
only model to strongly classify severity 1 images but had the symptoms that come along with it. Figure 9 shows an
issues with the remaining severities, although the model’s example of the different risk levels that the InceptionV3
predictions are all surrounding the true classes. model output with a severity 4, high-risk fundus image. The
model correctly predicted the individual as being at severity 4
and informs the medical practitioner that the individual should
be encouraged to actively monitor for symptoms and make the
appropriate adjustments to their lifestyle to reduce their risk of
developing a life-threatening CVD.

Figure 7. Prediction Metrics

Figure 9. CVD Risk using InceptionV3 Model

C. GradCAM Heatmap & ViT Attention Map


Extracting the features from retinal microvasculature and
optic disc characteristics is what holds the valuable predictive
information. Understanding what sections of an image the
CNN has deemed important when it comes to classifying a
patient as being at-risk will help guide healthcare
professionals in making informed decisions. To provide these
visual explanations, we need to see what the CNN focuses on
as well as what features it finds important during the
classification process.

Gradient class activation mapping (Grad-CAM) is used to


provide a visual aid to indicate why the model classified an
individual as at-risk. Grad-CAM is a technique used to find
the importance of a certain class in the model by taking the
gradient and weighing it against the output of the final
convolutional layer. Figure 10 shows examples of the Grad-
CAM heat map technique provided by Chollet, F. and
modified to use with our InceptionV3 and EfficientNetV2
Figure 8. Prediction Matrices models [10].

These predictive models produce probabilities that can


assist medical practitioners with informing patients of their
CVD risk level. By knowing their CVD risk level early on

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
and what lifestyle changes can be made to reduce their overall
risk level.

Figure 10. Heatmap

Vision transformers work differently than CNNs and have


a self-attention mechanism within the architecture. The
attention map of a ViT visualizes the attention weights that are
calculated between each patch in the image. In Figure 11, an
example of an attention map is provided to show the
difference between GradCAM used on the other models and
the output of the attention mechanism inside a vision
transformers architecture [11].

Figure 12. Patient Portal

VI. CONCLUSION AND FUTURE WORK


Technology, and AI itself, is constantly improving and
gaining traction in everyday life, including the medical field.
While it shows great potential in aiding with medical
diagnosis, such as CVD detection and prediction using fundus
imaging, it is still too early to trust and rely solely on AI. As
Figure 11. ViT Attention Map technology continues to expand, as well as our knowledge of
AI and machine learning, the expeditious medical diagnoses
D. Patient Portal Website provided could help save lives and lower the risk of heart
disease worldwide. The models tested in this paper
Heatmaps and other visual aids, such as the ViT attention demonstrated that CVD can be detected and predicted using
maps, that are produced by CNN models can provide a visual fundus imaging alone but due to the medical nature and the
representation of data and patterns within patient data that will potential lives at stake, additional research and improvements
help medical practitioners make decisions about diagnosis and are needed.
prevention strategies. These visual aids can also be made
available in a user-friendly portal easily accessible by patients A model’s accuracy and performance are based on the
with hopes that it will increase awareness and education of dataset it’s given to train off of. Machine learning models and
cardiovascular health and encourage patients to play an active vision transformers require a large dataset to become as
role in improving their own cardiovascular health by making accurate as possible. The lack of large medical datasets needed
lifestyles changes and spreading awareness to others. for machine learning and computer vision tasks is a factor
needing to be considered when continuing to research their
The tool used to create the website for medical role in the medical field. Access to a larger dataset would also
practitioners and their patients is Visual Studio along with allow researchers to create their own machine learning model
languages C#, [Link], and CSS. The home page should be without having to utilize transfer learning. This would allow
used to advertise the advanced computer vision technology them to create the best model architecture, use the best
that assists medical practitioners with detecting and predicting preprocessing functions to emphasize the predictive features
early signs of CVD. It can also be used as an educational of the fundus images, and increase the overall accuracy of
resource that offers information about cardiovascular health their model.
and preventative measures. The patient portal displaying the
patient’s fundus imaging and the related doctor’s notes can be Task specific datasets are also important to have. This
seen below in Figure 12. This will provide patients with a paper focuses on CVD prediction and detection but uses a DR
secure place to view past appointment information including dataset due to the absence of a heart disease dataset being
fundus imaging and related heatmaps as well as an available. Partnering with medical researchers and developing
opportunity to review doctor’s notes relating to their CVD risk a CVD specific dataset would increase the accuracy of the
models tested and provide a better base for continuing the

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
research into lowering heart disease as the leading cause of REFERENCES
death globally utilizing fundus imaging. [1] World Health Organization. “Cardiovascular Diseases.”
World Health Organization, World Health Organization,
11 June 2021, [Link]/news-room/fact-
sheets/detail/cardiovascular-diseases-(cvds).
[2] Goutam, Balla, et al. “A Comprehensive Review of Deep
Learning Strategies in Retinal Disease Diagnosis Using
Fundus Images.” IEEE Access, vol. 10, 2022, pp. 57796–
57823, [Link]
[3] Das, Susmit, et al. “Feasibility and Clinical Utility of
Handheld Fundus Cameras for Retinal Imaging.” Eye, vol.
37, no. pp 274–279, 12 Jan. 2022,
[Link] Accessed 12
Sept. 2023.
[4] McClintic, Benjamin R., et al. “The Relationship between
Retinal Microvascular Abnormalities and Coronary Heart
Disease: A Review.” The American Journal of Medicine,
vol. 123, no. 4, 1 Apr. 2010, pp. 374.e1–374.e7,
[Link]/pmc/articles/PMC2922900/,
[Link] Accessed
13 Aug. 2020.
[5] Barriada, Rubén G, and David Masip. “An Overview of
Deep-Learning-Based Methods for Cardiovascular Risk
Assessment with Retinal Images.” Diagnostics (Basel),
vol. 13, no. 1, 26 Dec. 2022, pp. 68–68,
[Link]/pmc/articles/PMC9818382///,
[Link] Accessed 5
July 2023.
[6] Simó, Rafael, et al. “Diabetic Retinopathy as an
Independent Predictor of Subclinical Cardiovascular
Disease: Baseline Results of the PRECISED Study.” BMJ
Open Diabetes Research and Care, vol. 7, no. 1, 1 Dec.
2019, p. e000845, [Link]/content/7/1/e000845,
[Link] Accessed
10 Sep. 2023.
[7] Lee, Y.C., Cha, J., Shim, I. et al. “Multimodal deep
learning of fundus abnormalities and traditional risk
factors for cardiovascular risk prediction”. npj Digit. Med.
6, 14 (2023). [Link]
[8] Kim, Kyoung Min, et al. “Development of a Fundus
Image-Based Deep Learning Diagnostic Tool for Various
Retinal Diseases.” Journal of Personalized Medicine, vol.
11, no. 5, 21 Apr. 2021, p. 321,
[Link]/pmc/articles/PMC8142986/,
[Link] Accessed 22 Sept.
2023.
[9] Dey, K. (2023, February 26). Create contour and crop
images using opencv. Kaggle.
[Link]
contour-and-crop-images-using-opencv
[10] Chollet, F. (2020, April 26). Keras Documentation: Grad-
cam class activation visualization. Keras.
[Link]
[11] Salama, K. (2021, January 18). Keras documentation:
Image Classification with Vision Transformer. Keras.
[Link]
h_vision_transformer/

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.

You might also like