0% found this document useful (0 votes)

134 views

Simple Linear Regression - Assign3

This document outlines the steps to perform simple linear regression using scikit-learn on a dataset containing employee salary hike and churn rate data. It describes: 1) Importing packages and classes for linear regression. 2) Providing and exploring the dataset to use for regression. 3) Creating a linear regression model and fitting it to the dataset. 4) Performing various transformations on the data like log, exponential and polynomial to reduce errors and obtain the best fit model. 5) Choosing the best model based on the lowest RMSE value calculated for each transformation and reporting the final model results.

Uploaded by

Sravani Adapa

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views

Simple Linear Regression - Assign3

Uploaded by

Sravani Adapa

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Simple Linear Regression With scikit-learn

There are five basic steps when you’re implementing linear regression:

1. Import the packages and classes you need.

2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.

These steps are more or less general for most of the regression approaches and implementations.

Problem Statement: -

A certain organization wanted an early estimate of their employee churn out rate. So, the HR
department came up with data regarding the employee’s salary hike and churn out rate for a
financial year. The analytics team will have to perform a deep analysis and predict an estimate
of employee churn and present the statistics. Approach –A Simple Linear regression model
needs to be built with target variable ‘Churn_out_rate’. Apply necessary transformations and
record the RMSE values, Correlation coefficient values for different transformation models.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

import numpy as np
from sklearn.linear_model import LinearRegression
Now, you have all the functionalities you need to implement linear regression.

The fundamental data type of NumPy is the array type called numpy.ndarray. The rest of this article
uses the term array to refer to instances of the type numpy.ndarray.

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

regression and make predictions accordingly.

Step 2: Provide data

The second step is defining data to work with. The inputs (regressors, 𝑥) and output (predictor, 𝑦).

calories_consumed.csv is imported .
Exploratory data analysis is performed on data

Step 3: Create a model and fit it

The next step is to create a linear regression model and fit it using the existing data.

Let’s create an instance of the class LinearRegression, which will represent the regression model:

Simple linear regression

model = LinearRegression()
This statement creates the variable model as the instance of LinearRegression. You can provide several
optional parameters to LinearRegression

statsmodels.formula.api is imported to build a model based on ols of data

model1=smf.ols('calories ~ weight',data=cal_data).fit()

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In order to reduce the errors and to obtain best fit line Transformation is performed on data

Log transformation

In exponential transformation, transformation is applied on y data

#x=log(weight),y=calories

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

new rmse is calculated

Exponential transformation

In exponential transformation, transformation is applied on y data

#x=(weight),y=log(calories)

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

Polynomial transformation

x=s_hike ,x^2=s_hike*s_hike, y=log(churn)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

choose the best model by using all RMSE values of above transformations

models with respective RMS values are tabulated

from the above observations exp model is taken as best

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and interpret it.

the summary of final model is

final model is fitted on train and test split data and prediction is observed

the final rmse value is

6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
2 pages
Data Analytics Week 3
100% (1)
Data Analytics Week 3
42 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Machine Learning Project Report
100% (1)
Machine Learning Project Report
4 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Time Series
No ratings yet
Time Series
23 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
SMDM - Week 1 Checklist
100% (1)
SMDM - Week 1 Checklist
3 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
25 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
January 1, 1983 1990 5 July 1994 1930 1960
100% (1)
January 1, 1983 1990 5 July 1994 1930 1960
13 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
Cluster
100% (1)
Cluster
72 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
1 The Role of Statistics and The Data Analysis Process
100% (1)
1 The Role of Statistics and The Data Analysis Process
30 pages
KPMG Data
50% (2)
KPMG Data
3,723 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
Homework 2
100% (1)
Homework 2
12 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Regression - Elements of AI 4-2
100% (2)
Regression - Elements of AI 4-2
20 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Python Vs R in Data and Machine Learning PDF
100% (1)
Python Vs R in Data and Machine Learning PDF
6 pages
Data Science
No ratings yet
Data Science
39 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
R Lnaguager
No ratings yet
R Lnaguager
38 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
100% (1)
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
25 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
6 Statistics
No ratings yet
6 Statistics
3 pages
2 - How To Use SmartPLS Software - Getting Started - Simple Model
100% (1)
2 - How To Use SmartPLS Software - Getting Started - Simple Model
43 pages
Analysis of Variance Anova
No ratings yet
Analysis of Variance Anova
33 pages
Lampiran Analisis Butir Soal
No ratings yet
Lampiran Analisis Butir Soal
39 pages
Contents of Geostatistical Site Investigation Report: Standard Guide For
No ratings yet
Contents of Geostatistical Site Investigation Report: Standard Guide For
5 pages
Fundamentals of Statistics I - Lecture Notes
No ratings yet
Fundamentals of Statistics I - Lecture Notes
77 pages
Econometrics Eviews 3
No ratings yet
Econometrics Eviews 3
13 pages
Unit 21 - Application of SD (Student)
No ratings yet
Unit 21 - Application of SD (Student)
8 pages
ch14 Nonlinear Regression Models
100% (1)
ch14 Nonlinear Regression Models
18 pages
Casio Scientific Calculator Fx-570ms
No ratings yet
Casio Scientific Calculator Fx-570ms
26 pages
100 plus Statistics Interview Questions
0% (1)
100 plus Statistics Interview Questions
44 pages
MSBT 109 - Unit 1
No ratings yet
MSBT 109 - Unit 1
105 pages
Exercise Lesson 4,5,6
No ratings yet
Exercise Lesson 4,5,6
23 pages
NITK Unit 3 Lecture 21 Regression
No ratings yet
NITK Unit 3 Lecture 21 Regression
20 pages
Cohen's Conventions For Small, Medium, and Large Effects: Difference Between Two Means
No ratings yet
Cohen's Conventions For Small, Medium, and Large Effects: Difference Between Two Means
2 pages
What Is Sample Size
No ratings yet
What Is Sample Size
5 pages
Test Bank For Interactive Statistics 3 e 3rd Edition 0131497561
No ratings yet
Test Bank For Interactive Statistics 3 e 3rd Edition 0131497561
4 pages
Correlation
No ratings yet
Correlation
52 pages
Heckman Selection Model
No ratings yet
Heckman Selection Model
9 pages
Tutorial-Topic 2
No ratings yet
Tutorial-Topic 2
2 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
Randomized Incomplete Block Designs. Abibdisan Incomplete Block Design in Which Any Two Treatments Appear Together An Equal Number of Times
No ratings yet
Randomized Incomplete Block Designs. Abibdisan Incomplete Block Design in Which Any Two Treatments Appear Together An Equal Number of Times
3 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
19 pages
Course Plan 21CSC307P - Machine Learning For Data Analytics
No ratings yet
Course Plan 21CSC307P - Machine Learning For Data Analytics
13 pages
Chapter-15: Research Methodology
No ratings yet
Chapter-15: Research Methodology
25 pages
Smda Project
No ratings yet
Smda Project
12 pages
Download Full The Econometrics of Multi-dimensional Panels 2nd Edition Laszlo Matyas PDF All Chapters
100% (14)
Download Full The Econometrics of Multi-dimensional Panels 2nd Edition Laszlo Matyas PDF All Chapters
40 pages
Samples For Statistical Analysis
No ratings yet
Samples For Statistical Analysis
18 pages
Tugas
No ratings yet
Tugas
4 pages
Stat 151 Assignment 5
No ratings yet
Stat 151 Assignment 5
5 pages