100% found this document useful (1 vote)
54 views21 pages

Machine Learning

This document provides details about a machine learning project done by Internshala to predict diabetes. It involves building a model using various diagnostic measurements like number of pregnancies, insulin level, BMI etc. from a dataset to predict if a patient has diabetes or not. The main steps involved collecting and preparing the data, selecting and training a random forest classifier model, evaluating the model, tuning parameters and making predictions. The model was able to predict diabetes with an accuracy of around 76%. Various Python libraries like Pandas, NumPy, Scikit-learn were used to develop the model.

Uploaded by

Footloose
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
54 views21 pages

Machine Learning

This document provides details about a machine learning project done by Internshala to predict diabetes. It involves building a model using various diagnostic measurements like number of pregnancies, insulin level, BMI etc. from a dataset to predict if a patient has diabetes or not. The main steps involved collecting and preparing the data, selecting and training a random forest classifier model, evaluating the model, tuning parameters and making predictions. The model was able to predict diabetes with an accuracy of around 76%. Various Python libraries like Pandas, NumPy, Scikit-learn were used to develop the model.

Uploaded by

Footloose
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 21

Induatrial Training by Internshala

MACHINE LEARNING
Internshala Detailes
Internshala is a technology company on a mission to
equip students with relevant skills & practical
exposure to help them get the best possible start to
their careers. Imagine a world full of freedom and
possibilities. A world where you can discover your
passion and turn it into your career. A world where
you graduate fully assured, confident, and prepared
to stake a claim on your place in the world.
What is a Machine Learning?
Machine learning is a branch of artificial
intelligence (AI) and computer science which
focuses on the use of data and algorithms to
imitate the way that humans learn, gradually
improving its accuracy.
Machine learning is an important component
of the growing field of data science. Through
the use of statistical methods, algorithms are
trained to make classifications or predictions,
uncovering key insights within data mining
projects.
TRAIN AND TEST FITTING:
Test- Train from sklearn package we have to test and train by using
model class. df.ilec is a keyword for store x is data(pregnancy ,
Splitting BMI,insulin,….) and y is label or outcome . Then we know x
train and y train then x test and y test . Perform percentage
split of 80% to divide dataset as Training set and 30%.And to
Test data set. If we check x.train.head() how the data can be
stored and y.train.head() how label or outcome can be stored.
This is a classification problem of supervised machine learning.
The objective is to predict whether or not a patient has
diabetes, based on certain diagnostic measurements included
Problem in the dataset.
Statement 0 – Absence of Diabetes
1 – Presence of Diabetes

Diabetes Prediction Project


Details of Project
“In this Diabetes Prediction using Machine
Learning Project , the main objective is
to predict whether the person has Diabetes
or not based on various features like Number •Pregnancies – Number of times pregnant
of Pregnancies, Insulin Level, Age, BMI. The •Glucose – Plasma glucose concentration a 2 hours in an oral glucose
data set that has used in this project has taken tolerance test
from the . “This dataset is originally from •Blood Pressure – Diastolic blood pressure (mm Hg)
the National Institute of Diabetes and •Skin Thickness – Triceps skinfold thickness (mm)
Digestive and Kidney Diseases. The objective •Insulin – 2-Hour serum insulin (mu U/ml)
of the dataset is to diagnostically predict •BMI – Body mass index (weight in kg/(height in m)^2)
whether or not a patient has diabetes, based •Diabetes Pedigree Function – Diabetes pedigree function
on certain diagnostic measurements included •Age – Age (years)
in the dataset. Several constraints were placed •Outcome – Class variable (0 or 1) 268 of 768 are 1, the others are 0

on the selection of these instances from a


larger database.” The dataset has 9 columns as
shown below;
Resources Used
STEPS INVOLVED -> LIBRARY AND PACKAGES USED
The 7 Steps of Machine Learning, provides the
following general framework of steps in supervised
machine learning; Libraries Used : -
•Data Collection 1.Python 3.7
2.pandas
•Data Preparation 3.numpy
4.seaborn
•Choosing a model
5.matplotlib
•Training the model 6.scikit-learn
Package installation:
•Evaluating the model pip install numpy
pip install pandas
•Parameter tuning pip install seaborn
pip install scikit-learn
•Making prediction pip install matplotlib
Steps Involved In The Project
Technical Aspect : - TRAIN AND TEST FITTING:
1.Training a machine learning model using scikit-learn. from sklearn package we have to test and train by using model class. df.ilec is a
keyword for store x is data(pregnancy , BMI,insulin,….) and y is label or outcome .
2.Building and hosting a Flask web app. Then we know x train and y train then x test and y test . Perform percentage split
of 80% to divide dataset as Training set and 30%.And to Test data set. If we check
3.A user has to put details like Number of Pregnancies, Insulin Level, Age, x.train.head() how the data can be stored and y.train.head() how label or outcome
BMI etc. can be stored.
4.Once it get all the fields information , the prediction is displayed on a new page
.
ALGORITHM:
From the scikit learn package we have to import random forest classifier for making
Import required libraries, Import diabetes dataset. predictions. RFC is used to train and test algorithms .Then train and test our
algorithm model.fit is to train the algorithms of x train and y train . Predict the
output variable is to initialize for test the algorithms.then import metrics is have to
pandas — used to read the dataset , numpy — numerical python for mathematical calculation, matplot
check the accuracy score .
lib — to plot the graph ,seaborn in a graphical manner to plot the graph.

DATA ANALYSIS : SUMMARIZE THE RESULTS:


In this notebook we predicted diabetes from medical records with an accuracy of
Shape function is used to represent how many rows and columns. info is used to approximately 76%.
check the data type is int or float. null.sum function is used to to check the null
values is present or not. sum- to calculate all columns and to check each and
every attribute .
Conclusion and Summary
You know that Machine Learning is a technique of training machines to perform the activities a human brain
can do, albeit bit faster and better than an average human-being. Today we have seen that the machines can
beat human champions in games such as Chess, AlphaGO, which are considered very complex. You have
seen that machines can be trained to perform human activities in several areas and can aid humans in living
better lives.
Machine Learning can be a Supervised or Unsupervised. If you have lesser amount of data and clearly
labelled data for training, opt for Supervised Learning. Unsupervised Learning would generally give better
performance and results for large data sets. If you have a huge data set easily available, go for deep learning
techniques. You also have learned Reinforcement Learning and Deep Reinforcement Learning. You now know
what Neural Networks are, their applications and limitations.
Finally, when it comes to the development of machine learning models of your own, you looked at the choices
of various development languages, IDEs and Platforms. Next thing that you need to do is start learning and
practicing each machine learning technique. The subject is vast, it means that there is width, but if you
consider the depth, each topic can be learned in a few hours. Each topic is independent of each other. You
need to take into consideration one topic at a time, learn it, practice it and implement the algorithm/s in it using
a language choice of yours. This is the best way to start studying Machine Learning. Practicing one topic at a
time, very soon you would acquire the width that is eventually required of a Machine Learning expert.
Good Luck!
Refernce
“Definition of Algorithm.” https://www.merriam-webster.com/dictionary/algorithm. (2017).
“iNNvestigate neural networks!”(http://arxiv.org/abs/1808.04260) by Maximilian Alber,
Sebastian Lapuschkin, Philipp Seegerer, Miriam Hägele, Kristof T. Schütt, Grégoire
Montavon, Wojciech Samek, Klaus-Robert Müller, Sven Dähne, Pieter-Jan Kindermans
Aamodt, Agnar, and Enric Plaza. “Case-based reasoning: Foundational issues,
methodological variations, and system approaches.” AI communications 7.1 (1994): 39-
59.
Alain, Guillaume, et al. “Understanding intermediate layers using linear classifier
probes.” arXiv preprint arXiv:1610.01644 (2018).

You might also like