0% found this document useful (0 votes)

19 views21 pages

Project Report

This project report analyzes clinical test data for Alzheimer's disease using machine learning models to identify significant clinical features contributing to disease progression. It classifies patients into three groups: Alzheimer's Disease (AD), Cognitively Normal (CN), and Mild Cognitive Impairment (MCI), and employs various models including Logistic Regression, SVM, and Gradient Boosting to develop classifiers. The findings indicate that SVM and Logistic Regression demonstrate the best performance, while decision tree models show signs of overfitting.

Uploaded by

charvisalapu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views21 pages

Project Report

Uploaded by

charvisalapu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

PROJECT REPORT

Clinical Test Analysis for Alzheimer's

Disease Using Machine Learning Models

BT21BTECH11008
SRI CHARVI SALAPU

1
ABSTRACT

Alzheimer’s disease(AD) is a progressive neurological disorder which causes

cognitive decline, impairment in daily functioning and memory loss. It is said to be
characterized by the neurofibrillary tangles in the brain and the accumulation of
amyloid plaques which leads to the death of neurons and hence the breakdown of
neural connections. This project aims to analyze the clinical test data of the
patients over a three year period to identify the significant clinical features that
contribute to the progression of Alzeimer’s disease using machine learning and
make classifiers using those features.

➢ ABOUT AD

Alheimer’s disease is a progressive neurodegenerative disorder which

affects the behavior, memory, ability to think and perform everyday activities. It is
caused due to the accumulation of amyloid plaques and neurofibrillary tangles in
the brain leading to breakdown of the neural connections. It’s main symptoms are
memory loss, cognitive decline, behavioral changes, language problems that are
trouble to find the right words to form sentences and make conversations,
disorientation and impaired judgment.
The cause and risk factors for AD are age, genetics, lifestyle and heart health.

2
➢ Elaboration on AD,CN,MCI.
In the context of this project the group of patients have been classified into
three classes which are AD, CN, and MCI.
[Link](Alzheimer’s Disease):
Individuals with AD typically experience the difficulty in language,
gradual memory loss, behavioral changes.
2. CN(Cognitively normal):
CN individuals are those who exhibit normal cognitive functionality
without any symptoms or any significant impairment and this class often
serves as a baseline comparison class.
[Link] (Mild Cognitive Impairment):
MCI individuals experience mild but noticeable cognitive decline which is
more than expected for that particular individual’s age.

➢ Data Analysis: Box Plots and Statistical Tests

For Visualization and Statistical Comparison of Clinical Features

3
5
➢ Box-Plots validated with t-test for Group AD.
6
7
➢ Box-Plots validated with t-test for Group MCI.
9
➢ Box-Plots validated with t-test for Group CN.
10
11
➢ Interpretation of the boxplots:
The boxplots gives us a visual representation of the change in clinical
features over time and how they differ among the different
classes(AD,CN,MCI).
From the plots we can observe that in AD group there some noticeable
deviations in various clinical features from the initial screening(sc) to the 36
months (m36) visit which indicates that there is a progression of the
Alzheimer’s disease.
Where as in CN and MCI groups the changes in clinical features are less
pronounced than compared to AD group but still some slight variations can
be seen particularly in the MCI Group which indicates the mild or early
stages of cognitive impairment while AD group exhibits more significant
changes suggesting a more advanced stage of cognitive decline.

And the t-test provides the statistical analysis, a quantitative measure of

differences for each clinical feature, specifically focusing on AD group.
As we know, in statistical analysis p value helps us to determine the
significance of the results and the null hypothesis states that there is no
effect or difference. So the low p-value(p<0.05) suggests strong evidence
against the null hypothesis which means the observed effect is statistically
significant. And if we get the high p-value(p>0.05) it means it fails to reject
the null hypothesis that is the observed difference is not statistically
significant.
So the less p-values that we got for all the clinical features in AD group
between sc and m36 visits indicates that the changes observed in the
boxplots are not due to any random variation but suggests that there are true
changes in the clinical features over time. Hence these significant variations
tell us about the progression of AD and the importance of the features in
monitoring AD progression.

12
➢ Developing classifiers for classes using ML models.

After performing the visualization and statistical analysis using box plots
and t- test, we observed significant changes in the clinical features that
contribute to Alzheimer’s disease progression. By utilizing the info we can
now apply different machine learning models to the data, perform the feature
selection, and the train the models to develop robust classifiers for the
classification between AD,CN and MCI classes.
We used the supervised machine learning models for our data as we have the
labeled data(AD,CN and MCI) and our objective is to classify the
progression of Alzheimer’s disease into specific categories.

➢ MODELS USED:
● Classification models:
1. K - Nearest Neighbors(KNN):
KNN is a classification algorithm that classifies the new data
points based on the majority class of its K nearest neighbors.
The k values is a hyperparameter that determines the number of
Neighbors to consider.

2. Logistic regression:
Logistic regression is linear classification algorithm widely used for
binary classification tasks.I t models the relationship between the
dependent variable and one or more the independent variables
using sigmoid [Link] estimates the probability that given input
belongs to a particular class.

3. Support Vector Machine(SVM):

SVM is powerful classification algorithm that find hyperplane that
Best separates the data into particular [Link] the kernel tricks, it
performs both the linear and non-linear classification by transforming the
input space into higher dimensions.
13
● Ensemble Models:
Ensemble learning involves combining multiple models to improve
performances. Tree based ensembles leverage decision trees as
base learners.

1. Random Forest:
Random forest classifier combines multiple decision trees to create
a robust classification model. It is worked by construction a set of
decision trees in the forest and independently classifies the input,
and features and while prediction individual tree in the forest
Independently classifies the input and the final output is
determined by the majority voting or averaging the predictions of all
the individual trees.

2. XGBoost:
It is an scalable and efficient implementation of the gradient
boosting framework by setting the limits of computing power.
It builds the individual trees sequentially, where each new tree
corrects the errors that are made by the previous trees and to
to avoid overfitting it uses regularization.

3. Gradient Boosting:
Gradient Boosting classifier is an ensemble model that sequentially
builds models usually decision trees and combines them to create
a strong classifier. It trains each new models to focus on reducing
the errors of the combined model gradually improving the overall
Prediction.

● Tree based models:

1. Decision Tree Classifier:
It is a non linear classifier that splits the data subsets based on
feature values using a decision tree structure, making decisions at
each [Link] splits the dataset into subsets based on the value of
different features and assigns class labels to the leaf nodes. They
are interpretable, easy to understand, and can handle multi class
classification tasks.

➢ Feature Importance analysis:

Performed feature importance analysis using various supervised
machine learning models to identify the most significant clinical
features those contribute to the progression of AD.
In our feature importance analysis, the CDRSB and MMSE features
are purposefully excluded due to their fundamental role in defining
the classification of AD,CN, and MCI [Link] are
recognised as primary assessments for [Link] these
ensures that the classifiers are robust and provides information into
disease progression beyond the primary diagnostic criteria.
Below are the bar graphs for visualization of the feature importance
Analysis.
While each model may give different features as the most important ,
We can identify some common features which are consistently highly
ranked in the multiple models.

The consistently important features across the various models are:

1. FAQ: It is frequently highly rated feature by all models which
indicates that it is a crucial model.
2. ADAS13: Important feature in Random Forest, XGBoost, and
Gradient Boosting.
3. LDELTOTAL: Highly ranked feature in XGBoost, Gradient
Boosting, and Decision Tree.
4. ADAS11: Frequently appeared feature in the top for Logistic
regression, SVM, and Gradient Boosting.
5. ADASQ4: Important in Logistic Regression , XGBoost and SVM.
15
➢ Visualization of feature importance analysis:

16
➢ METRICS USED FOR MODEL COMPARISON:

1. Train accuracy scores: It measures the accuracy of the model by

comparing the predicted labels of the model with the actual labels
of the training data and tells how well does the model fits the
training data, a high accuracy score means the model is performing
well.
2. Test accuracy scores: It measures the accuracy of the model on
the unseen data. It uses a separate dataset for calculating which
was not used during training model.
3. Precision: Precision is said to be the ratio of true positives to the
sum of true positives and false positives. It measures how well the
model is avoiding the false positive [Link] precision tells
that model is making less mistakes in prediction of positive class.
4. Recall: It is also said to be sensitivity and is the ratio of true
positives to the sum of true positives and false negatives. High
recall indicates that m model takes most of the positive instances
with fewer false negatives.
5. F1 Score: It is the harmonic mean of recall and precision which
provides a single metric that balances both the concerns. A high
F1 score means the model has both high precision and recall.
6. ROC-AUC: It tells about the model’s ability to distinguish between
classes by evaluating the trade off between true and false positive
rate. A high ROC score indicates the model’s better performance
in distinguishing between positive and negative classes.
7. Confusion matrix: It provides detailed insights on how the model’s
predictions are compared to the actual values. It helps to identify
the type of errors the model is making. It is a matrix which shows
the number of true positives, true negatives, false positives and
false negatives.
17
➢ Model comparison results:
Model Train accuracy score Test accuracy score Precision Recall F1 score
Logistic 76.63% 78.57% 0.80 0.79 0.79
Regression
Decision Tree 99.30% 66.33% 0.67 0.66 0.67
Random Forest 99.30% 75.26% 0.75 0.73 0.74
SVM 76.63% 79.34% 0.80 0.77 0.78
XGBoost 97.83% 72.96% 0.73 0.71 0.72
Gradient Boosting 86.72% 76.28% 0.76 0.74 0.75
K-Neighbors 81.55% 75.00% 0.74 0.73 0.73

➢ ROC-AUC Curves:

18
➢ CONFUSION MATRIX:
19

➢ CONCLUSION:
The given table presents the results of various classification models that are
tested on the dataset. The following points can be deduced based on their
performance from the ROC curve, accuracy, precision,recall and F1 score.

● SVM and Logistic Regression presents the highest overall performance it

demonstrates the best balance between training(76.63% for both) and test
accuracy sets(79.34% for SVM and 78.57% for Logistic Regression) with
high ROC-AUC values indicating robust generalization ability.
● Gradient Boosting also shows as a strong performer with an accuracy of
76.28% with balanced precision and recall scores across classes. Also with
an ROC-AUC of 0.911, suggests effective class separation.
● Decision Tree and Random Forest exhibit signs of overfitting, with high
training accuracies(99.30%) compared to the test accuracies(66.33% and
75.26%).Both models show lower ROC-AUC scores(0.738 for Decision tree
and 0.903 for Random Forest) which indicates poor discrimination ability.
● While XGBoost and K-Neighbors have reasonable test accuracy and ROC-
AUC scores, but they don’t outperform the SVM.

In summary, the results show that SVM, Logistic Regression, and Gradient
Boosting are the top performing models for our dataset, in terms of accuracy,
precision,recall, and F1 scores across multiple classes, but further analysis is
needed to determine the optimal model based on the preferred balance between
precision and recall specific to this dataset.

20.

Improving Alzheimer's Disease Prediction
No ratings yet
Improving Alzheimer's Disease Prediction
17 pages
Alzheimer's Prediction with SVM
No ratings yet
Alzheimer's Prediction with SVM
10 pages
Using Neural Networks and Feature Selection Algorithm To Perform Early Diagnosis of Alzheimer and Mild Cognitive Impairment Disease
No ratings yet
Using Neural Networks and Feature Selection Algorithm To Perform Early Diagnosis of Alzheimer and Mild Cognitive Impairment Disease
13 pages
Alzheimer's Disease PHD PPT 2019
100% (1)
Alzheimer's Disease PHD PPT 2019
14 pages
Inae D 24 00072 - R1
No ratings yet
Inae D 24 00072 - R1
24 pages
Literature Review
No ratings yet
Literature Review
8 pages
4 Classification of Alzheimer's Disease Using Machine Learning
100% (1)
4 Classification of Alzheimer's Disease Using Machine Learning
8 pages
Multiple Kernel Learning in The Primal For Multimodal Alzheimer's Disease Classification
No ratings yet
Multiple Kernel Learning in The Primal For Multimodal Alzheimer's Disease Classification
7 pages
Research Paper Draft (Alzheimer Detection Using ML) - Anshul
No ratings yet
Research Paper Draft (Alzheimer Detection Using ML) - Anshul
4 pages
Early Alzheimer's Detection via ML
No ratings yet
Early Alzheimer's Detection via ML
10 pages
01 - Graziella
No ratings yet
01 - Graziella
6 pages
Explainable Hierarchical Machine-Learning Approaches For Multimodal Prediction of Conversion From Mild Cognitive Impairment To Alzheimer's Disease
No ratings yet
Explainable Hierarchical Machine-Learning Approaches For Multimodal Prediction of Conversion From Mild Cognitive Impairment To Alzheimer's Disease
19 pages
Machine Learning Techniques For The Diagnosis of Alzheimer's Disease: A Review
No ratings yet
Machine Learning Techniques For The Diagnosis of Alzheimer's Disease: A Review
35 pages
DVFNN
No ratings yet
DVFNN
6 pages
Comparative Analysis of Machine Learning Algorithms For Predicting Alzheimers Disease
No ratings yet
Comparative Analysis of Machine Learning Algorithms For Predicting Alzheimers Disease
3 pages
Performance Comparison of CNN, Random Forest, and Logistic Regression Algorithms For Alzheimer's Disease Detection On Structured Clinical Data
No ratings yet
Performance Comparison of CNN, Random Forest, and Logistic Regression Algorithms For Alzheimer's Disease Detection On Structured Clinical Data
10 pages
Literature Survey
No ratings yet
Literature Survey
4 pages
Prediction of Alzheimer's Disease Using Machine Learning Technique
No ratings yet
Prediction of Alzheimer's Disease Using Machine Learning Technique
3 pages
Paper 2
No ratings yet
Paper 2
14 pages
Xgboost-Shap-Based Interpretable Diagnostic Framework For Alzheimer'S Disease
No ratings yet
Xgboost-Shap-Based Interpretable Diagnostic Framework For Alzheimer'S Disease
14 pages
s12911 018 0710 y
No ratings yet
s12911 018 0710 y
20 pages
Isbi 2019 8759397
No ratings yet
Isbi 2019 8759397
4 pages
IJCAIT132 Rsingh
No ratings yet
IJCAIT132 Rsingh
11 pages
Batch 03 Entire Report
No ratings yet
Batch 03 Entire Report
87 pages
Detection and Analysis of AD Using Various Machine
No ratings yet
Detection and Analysis of AD Using Various Machine
26 pages
Me17 02 0011
No ratings yet
Me17 02 0011
11 pages
Personalized Alzheimer's Disease Progression Prediction With Machine Learning
No ratings yet
Personalized Alzheimer's Disease Progression Prediction With Machine Learning
6 pages
Aging 11 101883
No ratings yet
Aging 11 101883
17 pages
Hss 2
No ratings yet
Hss 2
6 pages
Yash Patel 20210701004 Intro To ML Report
No ratings yet
Yash Patel 20210701004 Intro To ML Report
16 pages
Healthcare 09 00971 v2
No ratings yet
Healthcare 09 00971 v2
19 pages
Machine Learning Methods For Predicting Progression From Mild Cognitive Impairment To Alzheimer S Disease Dementia: A Systematic Review
No ratings yet
Machine Learning Methods For Predicting Progression From Mild Cognitive Impairment To Alzheimer S Disease Dementia: A Systematic Review
29 pages
Thesis Paper
No ratings yet
Thesis Paper
31 pages
Machine Learning for Dementia Survival Analysis
No ratings yet
Machine Learning for Dementia Survival Analysis
10 pages
Early Alzheimer's Disease Prediction in Machine Learning Setup: Empirical Analysis With Missing Value Computation
No ratings yet
Early Alzheimer's Disease Prediction in Machine Learning Setup: Empirical Analysis With Missing Value Computation
8 pages
Phase One
No ratings yet
Phase One
5 pages
Early Alzheimer's Detection with ML
No ratings yet
Early Alzheimer's Detection with ML
28 pages
AD Detection
No ratings yet
AD Detection
17 pages
IJIVP Vol 8 Iss 2 Paper 2 1543 1553
No ratings yet
IJIVP Vol 8 Iss 2 Paper 2 1543 1553
11 pages
Early Diagnosis of Alzheimers Disease Using Machine Learning in MCI Individuals
No ratings yet
Early Diagnosis of Alzheimers Disease Using Machine Learning in MCI Individuals
5 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
4 pages
Group Id - 06
No ratings yet
Group Id - 06
18 pages
Major Project PPT (Group Id-06) ....
No ratings yet
Major Project PPT (Group Id-06) ....
18 pages
A Comparison of Machine Learning Methods For Survival Analysis of High-Dimension
No ratings yet
A Comparison of Machine Learning Methods For Survival Analysis of High-Dimension
23 pages
Clinpharm 3
No ratings yet
Clinpharm 3
10 pages
Report Ai
No ratings yet
Report Ai
13 pages
A Comparative Analysis of Machine Learning Models For Alzheimer's Disease Screening
No ratings yet
A Comparative Analysis of Machine Learning Models For Alzheimer's Disease Screening
5 pages
AI-Based Model For Detection and Classification of Alzheimer Disease
No ratings yet
AI-Based Model For Detection and Classification of Alzheimer Disease
6 pages
Jad - 2020 - 75 1 - Jad 75 1 Jad191169 - Jad 75 Jad191169
No ratings yet
Jad - 2020 - 75 1 - Jad 75 1 Jad191169 - Jad 75 Jad191169
12 pages
Linking Early Detection/treatment of Parkinson's Disease Using Deep Learning Techniques
No ratings yet
Linking Early Detection/treatment of Parkinson's Disease Using Deep Learning Techniques
16 pages
Edina Torok - Clinical Application of Alzheimer's Disease Diagnosis Using Bayesian and Deep Learning Approach With Multi-Modal Datasets
No ratings yet
Edina Torok - Clinical Application of Alzheimer's Disease Diagnosis Using Bayesian and Deep Learning Approach With Multi-Modal Datasets
12 pages
Published Paper
No ratings yet
Published Paper
7 pages
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
No ratings yet
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
22 pages
Alzheimer's Disease Detection Using Deep Learning and Machine Learning: A Review
No ratings yet
Alzheimer's Disease Detection Using Deep Learning and Machine Learning: A Review
39 pages
(IJCST-V8I2P16) :P. Sravanthi, B.Sai Sudha, P.Sai Ganesh, M.Dinesh, T.Ravi Kumar, G.Rajasekharam
No ratings yet
(IJCST-V8I2P16) :P. Sravanthi, B.Sai Sudha, P.Sai Ganesh, M.Dinesh, T.Ravi Kumar, G.Rajasekharam
12 pages
Machine Learning for Early Dementia Detection
No ratings yet
Machine Learning for Early Dementia Detection
8 pages
Explainable ML for Alzheimer's Diagnosis
No ratings yet
Explainable ML for Alzheimer's Diagnosis
18 pages
Classification of Alzheimer's Disease Using Gaussian-Based
No ratings yet
Classification of Alzheimer's Disease Using Gaussian-Based
16 pages
A Comparative Study On Hybrid Machine Learning Voting Classifier Models For Alzheimers Disease Prediction
No ratings yet
A Comparative Study On Hybrid Machine Learning Voting Classifier Models For Alzheimers Disease Prediction
7 pages
Cervical Cancer Prediction Using Machine Learning
No ratings yet
Cervical Cancer Prediction Using Machine Learning
10 pages
Representation Learning For Information Extraction From Form-Like Documents
No ratings yet
Representation Learning For Information Extraction From Form-Like Documents
10 pages
AI Hackathon
No ratings yet
AI Hackathon
11 pages
Machine Learning For Perovskite Solar Cell Design
No ratings yet
Machine Learning For Perovskite Solar Cell Design
13 pages
Cross Validation of The Response Bias Scale and The Response Bias Scale-19 in Active-Duty Personnel Use On The MMPI-2-RF and MMPI-3
No ratings yet
Cross Validation of The Response Bias Scale and The Response Bias Scale-19 in Active-Duty Personnel Use On The MMPI-2-RF and MMPI-3
12 pages
IMDb Sentiment Analysis Report Generation
No ratings yet
IMDb Sentiment Analysis Report Generation
20 pages
Machine Learning Prediction Model of Major Adverse.33
No ratings yet
Machine Learning Prediction Model of Major Adverse.33
10 pages
CS60050 Machine Learning Assignment 3
No ratings yet
CS60050 Machine Learning Assignment 3
5 pages
A. Data Science Methods
No ratings yet
A. Data Science Methods
25 pages
An Explainable Artificial Intelligence-Based Approach For Reliable Damage Detection in Polymer Composite Structures Using Deep Learning
No ratings yet
An Explainable Artificial Intelligence-Based Approach For Reliable Damage Detection in Polymer Composite Structures Using Deep Learning
16 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Technical Questions and Answers
No ratings yet
Technical Questions and Answers
12 pages
Machine Learning For Medical Image Class
No ratings yet
Machine Learning For Medical Image Class
18 pages
SBQ PDF
No ratings yet
SBQ PDF
3 pages
1 s2.0 S0010482523001488 Main
No ratings yet
1 s2.0 S0010482523001488 Main
10 pages
Neural Networks in Healthcare
No ratings yet
Neural Networks in Healthcare
16 pages
An Accurate Plant Disease Detection Technique Usin
No ratings yet
An Accurate Plant Disease Detection Technique Usin
9 pages
Maxent Modeling Training Manualin Arc GIS10
No ratings yet
Maxent Modeling Training Manualin Arc GIS10
25 pages
COVID 19 Prediction Based On Hybrid Inception V3 With VGG16 Using Chest X Ray Images
No ratings yet
COVID 19 Prediction Based On Hybrid Inception V3 With VGG16 Using Chest X Ray Images
18 pages
Ophthalmic Disease Detection Via Deep Learning With A Novel Mixture Loss Function
No ratings yet
Ophthalmic Disease Detection Via Deep Learning With A Novel Mixture Loss Function
8 pages
Ai and Data Science
No ratings yet
Ai and Data Science
9 pages
Risk Assesment CARAS
No ratings yet
Risk Assesment CARAS
23 pages
A Method For Automatic Manifest Verification of Container Cargo Using Radiography Images
No ratings yet
A Method For Automatic Manifest Verification of Container Cargo Using Radiography Images
21 pages
Ritishsajjagcp
No ratings yet
Ritishsajjagcp
7 pages
Statistical Strategies For Avoiding False Discoveries in Metabolomics and Related Experiments - 2007 - Broadhurst, Kell
No ratings yet
Statistical Strategies For Avoiding False Discoveries in Metabolomics and Related Experiments - 2007 - Broadhurst, Kell
26 pages
ML Notes 1
No ratings yet
ML Notes 1
86 pages
Sharir 2022 IOP Conf. Ser. Earth Environ. Sci. 1103 012005
No ratings yet
Sharir 2022 IOP Conf. Ser. Earth Environ. Sci. 1103 012005
12 pages
Colic Scoring System
No ratings yet
Colic Scoring System
8 pages
Empirical Data Drift Detection Experiments On Real-World Medical Imaging Data
No ratings yet
Empirical Data Drift Detection Experiments On Real-World Medical Imaging Data
11 pages
Solutions To The Exercises.: Solution
No ratings yet
Solutions To The Exercises.: Solution
71 pages

Project Report

Uploaded by

Project Report

Uploaded by

PROJECT REPORT

Clinical Test Analysis for Alzheimer's

Alzheimer’s disease(AD) is a progressive neurological disorder which causes

Alheimer’s disease is a progressive neurodegenerative disorder which

➢ Data Analysis: Box Plots and Statistical Tests

For Visualization and Statistical Comparison of Clinical Features

And the t-test provides the statistical analysis, a quantitative measure of

3. Support Vector Machine(SVM):

● Tree based models:

➢ Feature Importance analysis:

The consistently important features across the various models are:

1. Train accuracy scores: It measures the accuracy of the model by

● SVM and Logistic Regression presents the highest overall performance it

You might also like