Linear Regression

The document discusses linear regression and its application to classification problems by turning them into regression problems. It explains how to perform linear regression using least squares optimization to minimize the error between predicted and actual target values. Examples are provided using logical functions and a car mileage dataset to demonstrate linear regression.

Uploaded by

Hemant Garg

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Linear Regression

Uploaded by

Hemant Garg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

64 Machine Learning: An Algorithmic Perspective

data = (data - data.mean(axis=0))/data.var(axis=0)

targets = (targets - targets.mean(axis=0))/targets.var(axis=0)

There is one thing to be careful of, which is that if you normalise the training and testing
sets separately in this way then a datapoint that is in both sets will end up being different
in the two, since the mean and variance are probably different in the two sets. For this
reason it is a good idea to normalise the dataset before splitting it into training and testing.
Normalisation can be done without knowing anything about the dataset in advance.
However, there is often useful preprocessing that can be done by looking at the data. For
example, in the Pima dataset, column 0 is the number of times that the person has been
pregnant (did I mention that all the subjects were female?) and column 7 is the age of
the person. Taking the pregnancy variable first, there are relatively few subjects that were
pregnant 8 or more times, so rather than having the number there, maybe they should be
replaced by an 8 for any of these values. Equally, the age would be better quantised into
a set of ranges such as 21–30, 31–40, etc. (the minimum age is 21 in the dataset). This
can be done using the np.where function again, as in this code snippet. If you make these
changes and similar ones for the other values, then you should be able to get massively
better results.

pima[np.where(pima[:,0]>8),0] = 8

pima[np.where(pima[:,7]<=30),7] = 1
pima[np.where((pima[:,7]>30) & (pima[:,7]<=40)),7] = 2
#You need to finish this data processing step

The last thing that we can do for now is to perform a basic form of feature selection and
to try training the classifier with a subset of the inputs by missing out different features
one at a time and seeing if they make the results better. If missing out one feature does
improve the results, then leave it out completely and try missing out others as well. This is
a simplistic way of testing for correlation between the output and each of the features. We
will see better methods when we look at covariance in Section 2.4.2. We can also consider
methods of dimensionality reduction, which produce lower dimensionsal representations of
the data that still include the relevant information; see Chapter 6 for more details.
Now that we have seen how to use the Perceptron on a better example than the logic
functions, we will look at another linear method, but coming from statistics, rather than
neural networks.

3.5 LINEAR REGRESSION

As is common in statistics, we need to separate out regression problems, where we fit a line
to data, from classification problems, where we find a line that separates out the classes,
so that they can be distinguished. However, it is common to turn classification problems
into regression problems. This can be done in two ways, first by introducing an indicator
variable, which simply says which class each datapoint belongs to. The problem is now to
use the data to predict the indicator variable, which is a regression problem. The second
approach is to do repeated regression, once for each class, with the indicator value being 1
Neurons, Neural Networks, and Linear Discriminants 65

FIGURE 3.13 Linear regression in two and three dimensions.

for examples in the class and 0 for all of the others. Since classification can be replaced by
regression using these methods, we’ll think about regression here.
The only real difference between the Perceptron and more statistical approaches is in
the way that the problem is set up. For regression we are making a prediction about an
unknown value y (such as the indicator variable for classes or a future value of some data)
by computing some function of known values xi . We are thinking about straight lines, so
PM y is going to be a sum of the xi values, each multiplied by a constant parameter:
the output
y = i=0 βi xi . The βi define a straight line (plane in 3D, hyperplane in higher dimensions)
that goes through (or at least near) the datapoints. Figure 3.13 shows this in two and three
dimensions.
The question is how we define the line (plane or hyperplane in higher dimensions) that
best fits the data. The most common solution is to try to minimise the distance between
each datapoint and the line that we fit. We can measure the distance between a point and a
line by defining another line that goes through the point and hits the line. School geometry
tells us that this second line will be shortest when it hits the line at right angles, and then
we can use Pythagoras’ theorem to know the distance. Now, we can try to minimise an error
function that measures the sum of all these distances. If we ignore the square roots, and
just minimise the sum-of-squares of the errors, then we get the most common minimisation,
which is known as least-squares optimisation. What we are doing is choosing the parameters
in order to minimise the squared difference between the prediction and the actual data
value, summed over all of the datapoints. That is, we have:

N M
!2
X X
tj − βi xij . (3.21)
j=0 i=0

This can be written in matrix form as:

(t − Xβ)T (t − Xβ), (3.22)

where t is a column vector containing the targets and X is the matrix of input values (even
including the bias inputs), just as for the Perceptron. Computing the smallest value of this
means differentiating it with respect to the (column) parameter vector β and setting the
derivative to 0, which means that XT (t − Xβ) = 0 (to see this, expand out the brackets,
remembering that ABT = BT A and note that the term β T Xt t = tT Xβ since they are
66 Machine Learning: An Algorithmic Perspective

both a scalar term), which has the solution β = (XT X)−1 XT t (assuming that the matrix
XT X can be inverted). Now, for a given input vector z, the prediction is zβ. The inverse of
a matrix X is the matrix that satisfies XX−1 = I, where I is the identity matrix, the matrix
that has 1s on the leading diagonal and 0s everywhere else. The inverse of a matrix only
exists if the matrix is square (has the same number of rows as columns) and its determinant
is non-zero.
Computing this is very simple in Python, using the np.linalg.inv() function in
NumPy. In fact, the entire function can be written as (where the 'symbol denotes a
linebreak in the text, so that the command continues on the next line):

def linreg(inputs,targets):
inputs = np.concatenate((inputs,-np.ones((np.shape(inputs)[0],1))),'
axis=1)
beta = np.dot(np.dot(np.linalg.inv(np.dot(np.transpose(inputs),'
inputs)),np.transpose(inputs)),targets)

outputs = np.dot(inputs,beta)

3.5.1 Linear Regression Examples

Using the linear regressor on the logical OR function seems a rather strange thing to do,
since we are performing classification using a method designed explicitly for regression,
trying to fit a surface to a set of 0 and 1 points. Worse, we will view it as an error if we
get say 1.25 and the output should be 1, so points that are in some sense too correct will
receive a penalty! However, we can do it, and it gives the following outputs:

[[ 0.25]
[ 0.75]
[ 0.75]
[ 1.25]]

It might not be clear what this means, but if we threshold the outputs by setting every value
less than 0.5 to 0 and every value above 0.5 to 1, then we get the correct answer. Using it
on the XOR function shows that this is still a linear method:

[[ 0.5]
[ 0.5]
[ 0.5]
[ 0.5]]

A better test of linear regression is to find a real regression dataset. The UCI database
is useful here, as well. We will look at the auto-mpg dataset. This consists of a collection of
a number of datapoints about certain cars (weight, horsepower, etc.), with the aim being to
predict the fuel efficiency in miles per gallon (mpg). This dataset has one problem. There are
Neurons, Neural Networks, and Linear Discriminants 67

missing values in it (labelled with question marks ‘?’). The np.loadtxt() method doesn’t
like these, and we don’t know what to do with them, anyway, so after downloading the
dataset, manually edit the file and delete all lines where there is a ? in that line. The linear
regressor can’t do much with the names of the cars either, but since they appear in quotes
(") we will tell np.loadtxt that they are comments, using:

auto = np.loadtxt(’/Users/srmarsla/Book/Datasets/auto-mpg/auto-mpg.data.txt’,'
comments=’"’)

You should now separate the data into training and testing sets, and then use the training
set to recover the β vector. Then you use that to get the predicted values on the test set.
However, the confusion matrix isn’t much use now, since there are no classes to enable
us to analyse the results. Instead, we will use the sum-of-squares error, which consists of
computing the difference between the prediction and the true value, squaring them so that
they are all positive, and then adding them up, as is used in the definition of the linear
regressor. Obviously, small values of this measure are good. It can be computed using:

beta = linreg.linreg(trainin,traintgt)

testin = np.concatenate((testin,-np.ones((np.shape(testin)[0],1))),axis=1)
testout = np.dot(testin,beta)
error = np.sum((testout - testtgt)**2)

Now you can test out whether normalising the data helps, and perform feature selection
as we did for the Perceptron. There are other more advanced linear statistical methods.
One of them, Linear Discriminant Analysis, will be considered in Section 6.1 once we have
built up the understanding we need.

FURTHER READING
If you are interested in real brains and want to know more about them, then there are
plenty of popular science books that should interest you, including:
• Susan Greenfield. The Human Brain: A Guided Tour. Orion, London, UK, 2001.
• S. Aamodt and S. Wang. Welcome to Your Brain: Why You Lose Your Car Keys but
Never Forget How to Drive and Other Puzzles of Everyday Life. Bloomsbury, London,
UK, 2008.
If you are looking for something a bit more formal, then the following is a good place to
start (particularly the ‘Roadmaps’ at the beginning):

• Michael A. Arbib, editor. The Handbook of Brain Theory and Neural Networks, 2nd
edition, MIT Press, Cambridge, MA, USA, 2002.

The original paper by McCulloch and Pitts is:

• W.S. McCulloch and W. Pitts. A logical calculus of ideas imminent in nervous activity.
Bulletin of Mathematics Biophysics, 5:115–133, 1943.

R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
ML - Mid2
No ratings yet
ML - Mid2
24 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
A3 110006223
No ratings yet
A3 110006223
7 pages
CPSC 540 Assignment 1 (Due January 19)
No ratings yet
CPSC 540 Assignment 1 (Due January 19)
9 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
3
No ratings yet
3
12 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Implementing The Division Operation On A Database Containing Uncertain Data
No ratings yet
Implementing The Division Operation On A Database Containing Uncertain Data
31 pages
Unit-3
No ratings yet
Unit-3
28 pages
Least-Square Method
No ratings yet
Least-Square Method
32 pages
Curve Fitting With Matlab
100% (1)
Curve Fitting With Matlab
38 pages
Principal Component Analysis Primer
100% (2)
Principal Component Analysis Primer
15 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Histogram Charts in Matlab: Data Analysis Statistics
No ratings yet
Histogram Charts in Matlab: Data Analysis Statistics
13 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Least Square Method
No ratings yet
Least Square Method
2 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Least Squares Line Fitting: I I I I
No ratings yet
Least Squares Line Fitting: I I I I
6 pages
EE 5020 - Spring 2024 - Project 2
No ratings yet
EE 5020 - Spring 2024 - Project 2
22 pages
Experiment 2 v2
No ratings yet
Experiment 2 v2
10 pages
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
No ratings yet
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
22 pages
Numerical Methods II - Least-Squares Regression
No ratings yet
Numerical Methods II - Least-Squares Regression
80 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Linear Regression: An Approach For Forecasting
No ratings yet
Linear Regression: An Approach For Forecasting
12 pages
Fitting Line To Set of Points
No ratings yet
Fitting Line To Set of Points
5 pages
Globally Optimal Consensus Set Maximization Through Rotation Search
No ratings yet
Globally Optimal Consensus Set Maximization Through Rotation Search
13 pages
81 PDFsam Matlab Prog
No ratings yet
81 PDFsam Matlab Prog
20 pages
Matlab Tutorial
No ratings yet
Matlab Tutorial
10 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Svmdoc
No ratings yet
Svmdoc
8 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Artificial Neural Network Bao
No ratings yet
Artificial Neural Network Bao
26 pages
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
No ratings yet
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
14 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Variance Covariance
No ratings yet
Variance Covariance
6 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
MWN 780 Assignment1 Block 1
No ratings yet
MWN 780 Assignment1 Block 1
5 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
32 pages
ECL 222-A Numerical Methods-Day 1: Department of Physics, University of Colombo Electronics & Computing Laboratary Ii
No ratings yet
ECL 222-A Numerical Methods-Day 1: Department of Physics, University of Colombo Electronics & Computing Laboratary Ii
23 pages
Regression
No ratings yet
Regression
45 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
Multivariate Data Analysis: Universiteit Van Amsterdam
No ratings yet
Multivariate Data Analysis: Universiteit Van Amsterdam
28 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
Group 6 - FNC01 - K48 - HW1
No ratings yet
Group 6 - FNC01 - K48 - HW1
11 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Lab 11,12 - Copy
No ratings yet
Lab 11,12 - Copy
7 pages
Higham Siam Sde Review
No ratings yet
Higham Siam Sde Review
22 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
8 pages
Matlab Matrix Assignment
100% (1)
Matlab Matrix Assignment
4 pages
ML UNIT-4
No ratings yet
ML UNIT-4
34 pages
ML UNIT-4
No ratings yet
ML UNIT-4
35 pages
UNIT I Notes-1
No ratings yet
UNIT I Notes-1
18 pages
UNIT I Notes
No ratings yet
UNIT I Notes
23 pages
Metamodeling Scilab
No ratings yet
Metamodeling Scilab
13 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Assessment 2 - STA408 DEC2020 1
No ratings yet
Assessment 2 - STA408 DEC2020 1
2 pages
Statistical Inference, Econometric Analysis and Matrix Algebra. Schipp, Bernhard Krämer, Walter. 2009
No ratings yet
Statistical Inference, Econometric Analysis and Matrix Algebra. Schipp, Bernhard Krämer, Walter. 2009
445 pages
Boukeloua 2024 TWMS
No ratings yet
Boukeloua 2024 TWMS
16 pages
Output Analisis Univariat, Normalitas Word
No ratings yet
Output Analisis Univariat, Normalitas Word
23 pages
Residual Analysis Using R Activity
No ratings yet
Residual Analysis Using R Activity
5 pages
Normal Distribution For ML
No ratings yet
Normal Distribution For ML
17 pages
Agr3701 - Exercise 4 - Anova
100% (1)
Agr3701 - Exercise 4 - Anova
6 pages
Mit18 05 s22 Exam1 Rev Pset Sol
No ratings yet
Mit18 05 s22 Exam1 Rev Pset Sol
11 pages
Stat Prob 11 q3 SLM Wk2
No ratings yet
Stat Prob 11 q3 SLM Wk2
10 pages
cm1 - Unit 2 Project - Patterns in Data
No ratings yet
cm1 - Unit 2 Project - Patterns in Data
13 pages
Introduction To Multivariate Analysis MPU2263
No ratings yet
Introduction To Multivariate Analysis MPU2263
14 pages
Data Analysis
No ratings yet
Data Analysis
1 page
OCR MEI S2 Revision Sheets
No ratings yet
OCR MEI S2 Revision Sheets
8 pages
Demand Estimation and Forecasting - ch 7
No ratings yet
Demand Estimation and Forecasting - ch 7
2 pages
Statistical Treatment
No ratings yet
Statistical Treatment
16 pages
Business Decision Making
No ratings yet
Business Decision Making
11 pages
Download Introduction to Environmental Data Science Hsieh ebook All Chapters PDF
100% (3)
Download Introduction to Environmental Data Science Hsieh ebook All Chapters PDF
40 pages
Project 4
No ratings yet
Project 4
4 pages
Measures of Central Tendency: Mean Median Mode Weighted Mean
100% (2)
Measures of Central Tendency: Mean Median Mode Weighted Mean
20 pages
139-Article Text-183-1-10-20210528
No ratings yet
139-Article Text-183-1-10-20210528
14 pages
LESSON 4 Normal Distribution
No ratings yet
LESSON 4 Normal Distribution
60 pages
Regression: Notes
No ratings yet
Regression: Notes
10 pages
Final Eda Report (1) - Removed
No ratings yet
Final Eda Report (1) - Removed
6 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
BA - ASSIGNMENT - Shyam Sunder Singh - GROUP 2
No ratings yet
BA - ASSIGNMENT - Shyam Sunder Singh - GROUP 2
25 pages
Validity and Reliability in Quantitative Research
No ratings yet
Validity and Reliability in Quantitative Research
3 pages
Artificial Intelligence & BA - Practicals Assignments
No ratings yet
Artificial Intelligence & BA - Practicals Assignments
15 pages
2019 Statistics Markscheme
No ratings yet
2019 Statistics Markscheme
6 pages
Soal Nomor 1: Sgot/Sgpt Hemoglobin Trigliserid Totalkolestrol HDL LDL N Valid Missing Mean Std. Deviation
No ratings yet
Soal Nomor 1: Sgot/Sgpt Hemoglobin Trigliserid Totalkolestrol HDL LDL N Valid Missing Mean Std. Deviation
18 pages
MTH 233 Week 3 MyStatLab® Post-Test
No ratings yet
MTH 233 Week 3 MyStatLab® Post-Test
17 pages