4/4/24, 11:12 PM Untitled - Jupyter Notebook
Implementation of Simple Linear Regression
Algorithm using Python!
Step-1: Data Pre-processing
.First, we will import the three important libraries, which will help us for loading the dataset,
plotting the graphs, and creating the Simple Linear Regression model.!
In [1]: import numpy as np
import pandas as pd
import [Link] as plt
Load the dataset!
In [2]: Thomas_df = pd.read_csv("Iris (1).csv")
Thomas_df
Out[2]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica
150 rows × 6 columns
Data frame columns
localhost:8888/notebooks/ML class/[Link] 1/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
In [3]: Thomas_df.columns
Out[3]: Index(['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthC
m',
'Species'],
dtype='object')
Data frame head
definition and [Link] head() meathod returns a specified number of rows,strings from the
Top
In [4]: Thomas_df.head()
Out[4]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
Data frame tail
definition and [Link] tail() meathod returns a specified number of rows,strings from the
bottom
In [5]: Thomas_df.tail()
Out[5]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica
shape
In [6]: Thomas_df.shape
Out[6]: (150, 6)
localhost:8888/notebooks/ML class/[Link] 2/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
After that, we need to extract the dependent
and independent variables from the given
dataset.!
In [7]: columns = Thomas_df.select_dtypes(include=['number']).columns
x = Thomas_df[columns].drop(columns=['Id', 'SepalLengthCm'])
y = Thomas_df['SepalLengthCm']
In [8]: x
Out[8]:
SepalWidthCm PetalLengthCm PetalWidthCm
0 3.5 1.4 0.2
1 3.0 1.4 0.2
2 3.2 1.3 0.2
3 3.1 1.5 0.2
4 3.6 1.4 0.2
... ... ... ...
145 3.0 5.2 2.3
146 2.5 5.0 1.9
147 3.0 5.2 2.0
148 3.4 5.4 2.3
149 3.0 5.1 1.8
150 rows × 3 columns
In [9]: y
Out[9]: 0 5.1
1 4.9
2 4.7
3 4.6
4 5.0
...
145 6.7
146 6.3
147 6.5
148 6.2
149 5.9
Name: SepalLengthCm, Length: 150, dtype: float64
Split dataset
localhost:8888/notebooks/ML class/[Link] 3/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
In [10]: from sklearn.model_selection import train_test_split
from [Link] import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
In [11]: x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=1/3,random_state
In [12]: x_train
Out[12]:
SepalWidthCm PetalLengthCm PetalWidthCm
69 2.5 3.9 1.1
135 3.0 6.1 2.3
56 3.3 4.7 1.6
80 2.4 3.8 1.1
123 2.7 4.9 1.8
... ... ... ...
9 3.1 1.5 0.1
103 2.9 5.6 1.8
67 2.7 4.1 1.0
117 3.8 6.7 2.2
47 3.2 1.4 0.2
100 rows × 3 columns
localhost:8888/notebooks/ML class/[Link] 4/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
In [13]: x_test
localhost:8888/notebooks/ML class/[Link] 5/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
Out[13]:
SepalWidthCm PetalLengthCm PetalWidthCm
114 2.8 5.1 2.4
62 2.2 4.0 1.0
33 4.2 1.4 0.2
107 2.9 6.3 1.8
7 3.4 1.5 0.2
100 3.3 6.0 2.5
40 3.5 1.3 0.3
86 3.1 4.7 1.5
76 2.8 4.8 1.4
71 2.8 4.0 1.3
134 2.6 5.6 1.4
51 3.2 4.5 1.5
73 2.8 4.7 1.2
54 2.8 4.6 1.5
63 2.9 4.7 1.4
37 3.1 1.5 0.1
78 2.9 4.5 1.5
90 2.6 4.4 1.2
45 3.0 1.4 0.3
16 3.9 1.3 0.4
121 2.8 4.9 2.0
66 3.0 4.5 1.5
24 3.4 1.9 0.2
8 2.9 1.4 0.2
126 2.8 4.8 1.8
22 3.6 1.0 0.2
44 3.8 1.9 0.4
97 2.9 4.3 1.3
93 2.3 3.3 1.0
26 3.4 1.6 0.4
137 3.1 5.5 1.8
84 3.0 4.5 1.5
27 3.5 1.5 0.2
127 3.0 4.9 1.8
132 2.8 5.6 2.2
59 2.7 3.9 1.4
localhost:8888/notebooks/ML class/[Link] 6/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
SepalWidthCm PetalLengthCm PetalWidthCm
18 3.8 1.7 0.3
83 2.7 5.1 1.6
61 3.0 4.2 1.5
92 2.6 4.0 1.2
112 3.0 5.5 2.1
2 3.2 1.3 0.2
141 3.1 5.1 2.3
43 3.5 1.6 0.6
10 3.7 1.5 0.2
60 2.0 3.5 1.0
116 3.0 5.5 1.8
144 3.3 5.7 2.5
119 2.2 5.0 1.5
108 2.5 5.8 1.8
In [14]: y_train
Out[14]: 69 5.6
135 7.7
56 6.3
80 5.5
123 6.3
...
9 4.9
103 6.3
67 5.8
117 7.7
47 4.6
Name: SepalLengthCm, Length: 100, dtype: float64
localhost:8888/notebooks/ML class/[Link] 7/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
In [15]: y_test
Out[15]: 114 5.8
62 6.0
33 5.5
107 7.3
7 5.0
100 6.3
40 5.0
86 6.7
76 6.8
71 6.1
134 6.1
51 6.4
73 6.1
54 6.5
63 6.1
37 4.9
78 6.0
90 5.5
45 4.8
16 5.4
121 5.6
66 5.6
24 4.8
8 4.4
126 6.2
22 4.6
44 5.1
97 6.2
93 5.0
26 5.0
137 6.4
84 5.4
27 5.2
127 6.1
132 6.4
59 5.2
18 5.7
83 6.0
61 5.9
92 5.8
112 6.8
2 4.7
141 6.9
43 5.0
10 5.4
60 5.0
116 6.5
144 6.7
119 6.0
108 6.7
Name: SepalLengthCm, dtype: float64
localhost:8888/notebooks/ML class/[Link] 8/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
Step-2: Fitting the Simple Linear Regression to
the Training Set:!
In [16]: from sklearn.linear_model import LinearRegression
In [17]: model = LinearRegression()
[Link](x_train, y_train)
Out[17]: ▾ LinearRegression
LinearRegression()
Prediction of test set result:!
In [18]: y_pred = [Link](x_test)
x_pred = [Link](x_train)
In [19]: y_pred
Out[19]: array([5.90763683, 5.64942975, 5.46582046, 7.32357947, 5.03281158,
6.85611663, 4.87129874, 6.41802591, 6.37416826, 5.82257278,
6.86804939, 6.3264746 , 6.43639669, 6.14881748, 6.36031157,
4.9112593 , 6.13496079, 6.07563694, 4.62980368, 5.05668896,
6.0320937 , 6.19879872, 5.34359009, 4.63592727, 6.09432214,
4.77201433, 5.45901878, 6.1194946 , 5.1694053 , 4.97058315,
6.82969833, 6.19879872, 5.09664952, 6.29969264, 6.43603302,
5.61107868, 5.37359105, 6.40349114, 5.96571485, 5.76485843,
6.5559758 , 4.74974645, 6.16911217, 4.89449802, 5.2243254 ,
5.13328074, 6.7658604 , 6.62303275, 6.07656836, 6.67975459])
localhost:8888/notebooks/ML class/[Link] 9/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
In [20]: x_pred
Out[20]: array([5.6932874 , 6.8822205 , 6.47574026, 5.55175484, 6.10817882,
6.53729061, 5.73968598, 5.98823605, 6.55182538, 6.39285346,
6.38418894, 4.9189924 , 6.19360655, 5.91412409, 5.57563221,
5.63105897, 6.34645488, 7.2820094 , 5.2243254 , 4.53664286,
6.3489958 , 6.17559944, 5.18820084, 5.5312679 , 6.57341517,
4.84129777, 6.45508189, 4.99403576, 4.88515543, 5.68809522,
7.12532506, 6.04179997, 4.76972674, 6.86675431, 6.69911787,
6.71909815, 6.50599455, 5.36585796, 5.11050621, 6.4428347 ,
6.90287887, 4.10524349, 6.59370987, 4.69976521, 5.91827451,
6.54211911, 4.74974645, 5.08279283, 7.13886734, 4.94899336,
4.62207058, 5.36746746, 5.50338309, 6.85450712, 7.41061669,
5.01895489, 4.9112593 , 4.95511696, 6.21104591, 6.19106563,
4.67205183, 4.91447831, 6.1194946 , 4.89288852, 7.24842576,
5.23324324, 7.83589247, 6.25490357, 5.54963868, 6.22199801,
5.18275533, 7.43059698, 5.2182018 , 4.98283033, 7.03439075,
4.89127902, 6.67913759, 5.9002674 , 5.75100175, 5.51630836,
6.38833936, 6.18296886, 6.19038754, 6.44734879, 4.85515446,
5.54402174, 6.48762378, 6.19360655, 5.03281158, 6.35257847,
6.02794328, 6.34967389, 5.8141616 , 4.94126027, 5.08440233,
4.9112593 , 6.77971708, 6.04631406, 7.92905329, 4.82744108])
In [ ]:
Step: 4. visualizing the Training set results:!
localhost:8888/notebooks/ML class/[Link] 10/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
In [21]: import [Link] as plt
[Link](figsize=(10, 7))
[Link](x_train.iloc[:, 0], y_train, color="blue", label="Actual values")
[Link](x_train.iloc[:, 0], x_pred, color="red", label="Regression line")
[Link](" Training set")
[Link]("Feature")
[Link]("Sepal Length (cm)")
[Link]()
[Link]()
[Link](figsize=(10, 7))
[Link](x_test.iloc[:, 0], y_test, color="blue", label="Actual values")
[Link](x_test.iloc[:, 0], y_pred, color="red", label="Regression line")
[Link](" Testing set")
[Link]("Feature")
[Link]("Sepal Length (cm)")
[Link]()
[Link]()
mae = mean_absolute_error(y_test, y_pred)
print("Here is the Linear Regression Mean Absolute Error:", mae)
localhost:8888/notebooks/ML class/[Link] 11/12
4/4/24, 11:12 PM Untitled - Jupyter Notebook
Here is the Linear Regression Mean Absolute Error: 0.25316544984473643
localhost:8888/notebooks/ML class/[Link] 12/12