0% found this document useful (0 votes)

17 views5 pages

Task-by-Task-Guide_-Build-and-deploy-a-stroke-prediction-model-using-R

Uploaded by

dorcas segita

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

17 views5 pages

Task-by-Task-Guide_-Build-and-deploy-a-stroke-prediction-model-using-R

Uploaded by

dorcas segita

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 5

Task-by-Task Guide

If you'd like a little more support while completing this project, explore this step-by-step
resource to get additional hints and resources to help you along each task of this project.

Task 1 - Import data and data preprocessing

For this project, we will be working with stroke prediction data. The data contains 5110 observations
with 12 attributes. You may spend some time reading a well-detailed description of the data here.

In this first task, you will load data, perform data cleaning, data transformation, feature
engineering, and missing data imputation. The whole idea for this task is to prepare the data set
ready for building prediction models in the next task.

Note: You will spend most of the time working on this task. Therefore, please spend some quality
time doing this task well because the quality of your data determines the quality of your model and
predictions.
I have provided general hints on how to complete this task.

Load data and install packages

Hint

As you may know that R lives in a world of packages. Although I have installed some packages for
you, you may need to install some more packages because this project is open-ended. After
installing packages, you need to import the packages using the library() function.

You can load the healthcare-dataset-stroke-data.csv using the read.csv() or read_csv()

functions.

Here are some useful resources to help you with this section of the task:
R Packages: A Beginner's Tutorial
Importing data into R tutorial

Describe and explore the data

Ideally, in this section, you want to describe and explore the data using functions and visualizations
to uncover insights. Once you have your data, it’s a good idea to get acquainted with it. You should
show some summary statistics and visually examine your data. Don’t forget to write out some
insights that you have gained along with your analysis.

Hint

You may decide to use base R functions for data exploration, such as summary(), or advanced
functions, such as the skim() function from the skimr package. You can create data visualizations to
explore the data using base R functions or the ggplot2 package.

Also, in this step, it will be useful to clean or preprocess the data, including recoding variables,
changing N/A or Unknown to missing values, and converting variables to correct data types (such as
numeric, factor, etc.). Some functions in the tidyverse family of packages, including the dplyr and
tidyr functions such as mutate(), mutate_at(), will be useful.

In this section, you might ask yourself questions such as, "What variables are correlated?" “Which
variable provides valuable information for model building” before analyzing data and creating visuals
to showcase your findings.

Here are some useful resources to help you with this section of the task:
Four R packages for Automated Exploratory Data Analysis you might have missed
Exploratory Data Analysis
Data visualization with ggplot2

You can read my articles on tidy data here:

● A Gentle Introduction to Tidy Data in R
● Tidy Data Case Studies Using R
Feature engineering and missing values imputation
After you have explored the data set, you may see the need to create new variables, this is referred
to as feature engineering. Also, you will need to impute missing values in variables.

Hint

Creating new features requires sufficient evidence that comes usually from your data exploration.
There are different techniques to impute missing values, such as multiple imputations. The tidyr and
mice packages provide very interesting ways to deal with missing values.

Here are some useful resources to help you with this section of the task:
Feature Engineering for Machine Learning in R
Read my article on Handling Missing Values in R using Tidyr
Getting Started with Multiple Imputation in R
Multiple imputation with the mice package
Tutorial on 5 Powerful R Packages used for imputing missing values

When you are satisfied that your data is tidy and ready for model building, please proceed to the
next task.

Task 2 - Build prediction models

Once you have your tidy data, your task is to build the stroke prediction model using different
classification models.

Hint
There are different classification algorithms you can try, including logistic regression, naive Bayes,
K-nearest neighbors, (linear and/or kernel) support vector machine, decision tree, random forest,
and XGBoost. In this task, you are allowed to use any approach since this is open-ended, However,
there are some steps you should consider including

● dealing with class imbalance

● standardize or normalize numeric variables
● one-hot encode categorical variables
● splitting the data into train and test sets
● training the data on the train sets using cross-validation
● tuning hyperparameters
● going through that iterative process until you are satisfied with your model.

Note: As I said, you can use any approach you desire; there are no right or wrong answers.
However, I have found using tidymodels useful in tying up many of these steps.

Here are some useful resources to help you with this task:
A Gentle Introduction to tidymodels
Tidymodels: tidy machine learning in R
Your First Machine Learning Project in R Step-By-Step
A complete guide to fit Machine Learning models in R
Task 3 - Evaluate and select prediction
models
After fitting the model using the best hyperparameters on the train set and validating on the
cross-validation set, the next step is to evaluate and select prediction models based on evaluation
metrics, including accuracy, sensitivity, recall, F1 score, and AUC-ROC curve. Again, there are no
right or wrong answers since the decision to select a model is based on your analysis.

Hint
A few common questions to ask are:
● Which model has the highest score across evaluation metrics?
● What is important in this case study? Do you want a model that is highly sensitive or highly
specific for stroke prediction?
● Are there evaluation metrics such as an F-score or balanced accuracy that are better for this
case study?
● How spread out are the confidence intervals around each evaluation metric?

Here are some useful resources to help you with this task:
Tidymodels: tidy machine learning in R
Metrics to evaluate classification models
Compare Models And Select The Best Using The Caret R Package
Your First Machine Learning Project in R Step-By-Step
A complete guide to fit Machine Learning models in R

Task 4 - Deploy the prediction model

Usually, people build models and stop there. The value in building a model is to deploy the
model for use. In this context, you will deploy the best prediction model for stroke prediction in a
clinical setting. Once you have decided on the best model, you can continue to deploy the
model.

Hint
There are several ways to deploy your model in R, either via an API using Vetiver or via a simple
web page using R Shiny. The choice of which to use depends on you.

Note: Using Vetiver is relatively new compared with R Shiny. Both have their pros and cons. For
example, you can deploy your pinned `vetiver_model()` via a Plumber API, which can be hosted in a
variety of ways. R Shiny only provides an option to deploy your model as a web page.

Here are some useful resources to help you with this task:
Read about Vetiver on Posit website
Read these three articles on create a deployable vetiver model, publish and version your model, and
deploy your model as a REST API
Mastering Shiny: Your first Shiny app
R Shiny for Data Science Tutorial – Build Interactive Data-Driven Web Apps
Task 5 - Findings and Conclusions
Finally, we can wrap up the project. You can write a conclusion about your process and any key
findings.

Hint

The main components that you will want to include:

● What did you learn throughout the process?

● Are the results what you expected?
● What are the key findings and takeaways?

T-Distribution Table Extended DF 1-100
100% (4)
T-Distribution Table Extended DF 1-100
2 pages
Hofstede's Cultural Dimensions Theory
No ratings yet
Hofstede's Cultural Dimensions Theory
8 pages
WQU ECONOMETRICS M1 Compiled Content PDF
No ratings yet
WQU ECONOMETRICS M1 Compiled Content PDF
67 pages
ML Performance Improvement Cheatsheet
No ratings yet
ML Performance Improvement Cheatsheet
11 pages
Eep306 Assessment 2 Marking
No ratings yet
Eep306 Assessment 2 Marking
3 pages
Data Preparation For Machine Learning Mini Course
No ratings yet
Data Preparation For Machine Learning Mini Course
19 pages
DSCI Key Terms and Ideas For Review
No ratings yet
DSCI Key Terms and Ideas For Review
98 pages
Week 10 - PROG 8510 Week 10
No ratings yet
Week 10 - PROG 8510 Week 10
16 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
R programming.Q.A
No ratings yet
R programming.Q.A
13 pages
Professional Machine Learning Engineer Demo
No ratings yet
Professional Machine Learning Engineer Demo
6 pages
Understand Steps For Regression - Training - Microsoft Learn
No ratings yet
Understand Steps For Regression - Training - Microsoft Learn
7 pages
Data Preprocessing For Python
No ratings yet
Data Preprocessing For Python
3 pages
Machine Learning With R Cookbook - Sample Chapter
100% (1)
Machine Learning With R Cookbook - Sample Chapter
41 pages
Databyte ML Task 1
No ratings yet
Databyte ML Task 1
6 pages
Pergunta 1: 1 / 1 Ponto
No ratings yet
Pergunta 1: 1 / 1 Ponto
22 pages
Develop A Program To Implement Data Preprocessing Using
No ratings yet
Develop A Program To Implement Data Preprocessing Using
19 pages
7641 Assignment 1
No ratings yet
7641 Assignment 1
4 pages
Project Presentation Viva Question and Answers
No ratings yet
Project Presentation Viva Question and Answers
4 pages
Introduction To Analytics
No ratings yet
Introduction To Analytics
40 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
17 pages
Workbook_Week 8
No ratings yet
Workbook_Week 8
12 pages
week3A
No ratings yet
week3A
18 pages
Data Science
No ratings yet
Data Science
38 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
7641 Assignment 2 Fall 2024
No ratings yet
7641 Assignment 2 Fall 2024
5 pages
ML Interview Questions
No ratings yet
ML Interview Questions
146 pages
16 Key Skills to Master Data Science Sept 2024 1724386424
No ratings yet
16 Key Skills to Master Data Science Sept 2024 1724386424
19 pages
Scikit-Learn Cookbook Sample Chapter
No ratings yet
Scikit-Learn Cookbook Sample Chapter
52 pages
Module 6 PA Instructions
No ratings yet
Module 6 PA Instructions
3 pages
843 Class 12 Competency Based Artificial Intelligence Chap-2 (2024-25)
No ratings yet
843 Class 12 Competency Based Artificial Intelligence Chap-2 (2024-25)
28 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Multidimensional Data Modeling in Pentaho
No ratings yet
Multidimensional Data Modeling in Pentaho
6 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
General AI Concepts
No ratings yet
General AI Concepts
6 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Ch2
No ratings yet
Ch2
29 pages
Data Science & Machine Learning by Using R Programming
No ratings yet
Data Science & Machine Learning by Using R Programming
6 pages
Guide - Data Science 2.0 Capstone Project
No ratings yet
Guide - Data Science 2.0 Capstone Project
37 pages
Jupyter Lab
No ratings yet
Jupyter Lab
42 pages
Algorithm
No ratings yet
Algorithm
4 pages
ML RoadMap
No ratings yet
ML RoadMap
28 pages
Data Science
No ratings yet
Data Science
64 pages
Unit2_2) How python is deployed and Data Science Process.pptx
No ratings yet
Unit2_2) How python is deployed and Data Science Process.pptx
7 pages
DS Assignment (1)
No ratings yet
DS Assignment (1)
2 pages
Class Xii Model Life Cycle
No ratings yet
Class Xii Model Life Cycle
6 pages
WQU Econometrics Module 1
No ratings yet
WQU Econometrics Module 1
73 pages
Machine Learning Strategy
No ratings yet
Machine Learning Strategy
102 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Data Preprocessing using Python. Python implementation of data… _ by Suneet Jain _ Medium
No ratings yet
Data Preprocessing using Python. Python implementation of data… _ by Suneet Jain _ Medium
20 pages
R Data Analysis Cookbook - Sample Chapter
No ratings yet
R Data Analysis Cookbook - Sample Chapter
29 pages
Machine Learning Steps - A Complete Guide - Simplilearn
No ratings yet
Machine Learning Steps - A Complete Guide - Simplilearn
11 pages
The Ultimate Learning Path To Become A Data Scientist and Master Machine Learning in 2019
No ratings yet
The Ultimate Learning Path To Become A Data Scientist and Master Machine Learning in 2019
12 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Methods and Models
No ratings yet
Methods and Models
12 pages
Architecture of Data Science Projects: Components
No ratings yet
Architecture of Data Science Projects: Components
4 pages
Kadir
No ratings yet
Kadir
80 pages
RStudio For R Statistical Computing Cookbook - Sample Chapter
100% (1)
RStudio For R Statistical Computing Cookbook - Sample Chapter
38 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mathematica Data Analysis
From Everand
Mathematica Data Analysis
Suchok Sergiy
No ratings yet
Confidence Interval Estimation (Basic Statistics)
No ratings yet
Confidence Interval Estimation (Basic Statistics)
20 pages
Management Report PDF Template
No ratings yet
Management Report PDF Template
4 pages
Deped Organizational Structure Updated
No ratings yet
Deped Organizational Structure Updated
58 pages
Ap14 FRQ English Language
No ratings yet
Ap14 FRQ English Language
10 pages
Internship Opportunity - ClooBot
No ratings yet
Internship Opportunity - ClooBot
1 page
About Qatar Foundation and Joining Qatar Foundation
100% (1)
About Qatar Foundation and Joining Qatar Foundation
22 pages
Physics Project
No ratings yet
Physics Project
4 pages
Online Shopping Final
No ratings yet
Online Shopping Final
65 pages
Consumer Buying Behaviour Towards CFL
100% (3)
Consumer Buying Behaviour Towards CFL
79 pages
Exergy Analysis Within A Simulation and Optimization Software
No ratings yet
Exergy Analysis Within A Simulation and Optimization Software
1 page
NRRC SG 010
No ratings yet
NRRC SG 010
32 pages
Condition Restraining Alienation
No ratings yet
Condition Restraining Alienation
6 pages
Chapter 8
No ratings yet
Chapter 8
4 pages
Fiber Optics PRESENTATION
No ratings yet
Fiber Optics PRESENTATION
22 pages
Paper - Personality Questionnaire (RST-PQ)
No ratings yet
Paper - Personality Questionnaire (RST-PQ)
11 pages
Various Types of Research Publications For Master
No ratings yet
Various Types of Research Publications For Master
2 pages
Exploring Publication Trends in Accounting Information Systems and Identifying Research Positions in Indonesia: A Bibliometric Analysis
No ratings yet
Exploring Publication Trends in Accounting Information Systems and Identifying Research Positions in Indonesia: A Bibliometric Analysis
16 pages
2022 POLISCI 121 World Politics
No ratings yet
2022 POLISCI 121 World Politics
14 pages
Nihms 1520387 Resilience
No ratings yet
Nihms 1520387 Resilience
18 pages
The Philippine Center For Gifted Education (Pcge) Launches The "Mga Bagong Rizal: Pag-Asa NG Bayan 2017"
No ratings yet
The Philippine Center For Gifted Education (Pcge) Launches The "Mga Bagong Rizal: Pag-Asa NG Bayan 2017"
7 pages
Harbir Singh - Ezekiel Emanuel - Ananth Padmanabhan - India As A Pioneer of Innovation-Oxford University Press, USA (2017)
No ratings yet
Harbir Singh - Ezekiel Emanuel - Ananth Padmanabhan - India As A Pioneer of Innovation-Oxford University Press, USA (2017)
223 pages
Ladotd Cost Estimating Process
No ratings yet
Ladotd Cost Estimating Process
59 pages
3997 11555 1 SM
No ratings yet
3997 11555 1 SM
9 pages
Performance Measurement System - A Conceptual Model: Neide Cristine Ossovski
No ratings yet
Performance Measurement System - A Conceptual Model: Neide Cristine Ossovski
10 pages
G26 Recaña, Vena Jezna L. 9 - St. Jerome Emiliani
No ratings yet
G26 Recaña, Vena Jezna L. 9 - St. Jerome Emiliani
3 pages
(Help) OLS Classical Assumptions PDF
No ratings yet
(Help) OLS Classical Assumptions PDF
3 pages
Iit KGP Civil Curriculum
No ratings yet
Iit KGP Civil Curriculum
3 pages