0% found this document useful (0 votes)
14 views36 pages

lecture7-linear-regression

The document provides an overview of linear regression, including its application in predicting prices, such as gold and housing prices. It discusses the concept of cost functions, specifically L1 and L2 cost functions, and methods for optimizing these functions, such as gradient descent. Additionally, it touches on feature scaling and non-linear cases, suggesting the use of feature mapping to handle non-linear relationships.

Uploaded by

i220600
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
14 views36 pages

lecture7-linear-regression

The document provides an overview of linear regression, including its application in predicting prices, such as gold and housing prices. It discusses the concept of cost functions, specifically L1 and L2 cost functions, and methods for optimizing these functions, such as gradient descent. Additionally, it touches on feature scaling and non-linear cases, suggesting the use of feature mapping to handle non-linear relationships.

Uploaded by

i220600
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 36

Data Science Boot

Camp
Sibt ul Hussain
http://sites.google.com/SibtulHussain

Linear Regression

Majority of the content is borrowed from multiple online


resources. So
2

Predict Gold Prices Over the next


day….
43000

42000

41000

40000

39000
Series1

38000

37000

36000

35000
0 20 40 60 80 100 120 140 160 180 200
3
500000

Housing Prices400000
300000

Price 200000
(in
1000s of 100000
dollars)
0
500 1000 1500 2000 2500 3000
Size (feet2)

Regression Problem
Predict real-valued
output
4

Training set of
housing prices Size in Price ($) in
feet2 (x) 1000's (y)
2104 460
1416 232
1534 315
852 178
Notation: … …
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
5

Training Set Size in Price ($) in


feet2 (x) 1000's (y)
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:

‘s: Parameters

How to choose ‘s ?
6

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
7

Idea: Choose so
that
is close
to for our
training examples
8

Score Function (Or Hypothesis)


• Score Function or Hypthosis is
used to generate the output given an
input X. In linear regression our
9

Cost Function
• Cost function is used to evaluate our
hypothesis, i.e. how good is our chosen
hypothesis.
▫ For instance, in case of linear regression cost
functions can be:

• Goal of learning thus reduces to searching in


hypthosis space for the best possible hypothesis
(a hypothesis thats optimizes our cost function).
• In other words cost functions specifies the
purpose of learning algorithm.
10

Hypothesis:

Parameters:

Cost Function:

Goal:
11

Possible Cost Functions for LR


• Absolute (or L1) Cost Function:

• Properties:
▫ Penalty for positive and negative deviations the
same
▫ Penalty for large deviations remains same, that
is an error with small value as well as large
value receives same treatment.
▫ Difficult to derivate (non-differentiable at zero).
▫ Convex
12

Possible Cost Functions for LR


• L2 Cost Function:

• Properties:
▫ Penalty for positive and negative deviations
is same
▫ Penalty for large deviations is large
compared to small deviations.
▫ Easy to derivate.
▫ Convex
13

L2 Cost Function
14

How to Optimize Cost Function


• Random Search
• Define a finite interval for values of
• Iterate over the interval
 Evaluate the cost function,
 Cache the results.

• Choose the values of parameters that


give optimum value of cost function.
15

How to Optimize Cost Function


• Derivation Approach:
▫ Compute derivate of J w.r.t each
variable.
▫ Set all the derivatives equal to zero, i.e.

and solve system of linear equations to


achieve optimum values for the
parameters.
16

How to Optimize Cost Function


• Gradient Descent

How ???
17

How to Optimize Cost Function


• Gradient Descent
18
19
20
21
22
23

Using Multiple Input


Features
24
25
26
27
28
29

Remember Debugging Trick


• Always plot your cost-
function in your gradient
descent loop w.r.t. number
of iterations. If gradient
descent is working then
J(θ) should decrease after
every iteration
• If J(θ) value is increasing -
means you probably need
a smaller α. This is
because your are
overshooting, like in right
graph.
30

Feature Scaling:

Always remember feature scaling can make


your convergence faster
31

What about Non-Linear Cases ?


32
33
34

Feature Mapping (A Simple


Trick)
• We will map our features to higher
dimensions using a simple trick.

• For example, You are given only feature X,


but you can expand this feature by
including higher-order polynomials of X,
i.e.
35

Different Mappings can be


used
36

Non-Linear Case
• Algorithm:
▫ Expand each feature to include the non-
linear mapping.
▫ Learn set of parameters using gradient
descent.

You might also like