Lecture 05 - Linear Regression
Lecture 05 - Linear Regression
sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwiZ9PXwh-
vLAhXF7w4KHUxwBaYQjRwIBw&url=https%3A%2F%2Fsummer-heart-0930.chufeiyun1688.workers.dev%3A443%2Fhttps%2Fwww.pinterest.com
%2Fpin
%2F18929260905651518%2F&psig=AFQjCNGK4zRH_WnqZQAsCBknZMbDRb
rROQ&ust=1459517828773858&cad=rjt
Statistics
Lecture 5: Linear Regression
Recap
normal distribution
• add more and more discrete events
– example measuring a physical
quantity n-times
• Has a mean, median and mode
• symmetric: 50% values are higher than
mean and 50% are lower than mean
https://www.mathsisfun.com/data/standard-normal-distribution.html
1 −( x−µ )
2
2σ 2
N ( x) = e
σ 2π
1/04/16 2
The Standard Normal distribution
shifting the normal distribution to the mean = 0
• Standardize normal distribution:
– subtract the mean
– divide by the standard deviation
• Standardize by z
x−µ
z=
σ
https://www.mathsisfun.com/data/standard-normal-distribution.html
1/04/16 3
The Standard Normal distribution
In more detail
1/04/16 4
The central limit theorem
The CLT http://www.value-at-risk.net/central-limit-theorem/
• Goal:
y = ax + b
– error reduction
(wikipedia)
– predicting/forcasting
– calibration
1/04/16 6
Linear Regression
Finding linear trends in data – how to
• plotting/fitting a line to data that are linear
related
y = ax + b
• most common method:
– least square methods: minimizing the
squares of the differences between the
mean line and the actual value
∂R 2
R = ∑"# yi − f ( xi , a1, a2 ,..., an )$%
2
=0
∂ai http://onlinestatbook.com/2/regression/intro.html
1/04/16 7
Linear Regression
Least squares
• After a bit of math the equation for least square straight line fitting:
– slope: a=
∑ (x − x )(y − y )
i i
2
∑( x − x )
i
– intercept: b = y − ax
1/04/16 8
Linear Regression
Least Squares
• coefficient of determination: R2 gives an idea of how well the fit is:
2
SSregression # ∑( x − x ) ( y − y ) &
1
R2 = =% (
SStotal %n σ xσ y (
$ '
• values range: 0 ≤ R2 ≤ 1
• R2 = 0; dependent variable cannot be predicted by the model
• R2 = 1; dependent variable can be predicted without error
• R2 between 0 and 1 indicated to what extend the dependent variable
can be predicted
2/04/16 9
Linear Regression
Least Squares – some things to note
• Excel calculates trendline BUT use scatterplot!
1/04/16 10
Linear Regression
Example
The sales of a company (in million dollars) for each year are shown in the
table below.
2/04/16 11
Linear Regression
Example
The sales of a company (in million dollars) for each year are shown in the
table below.
3/04/16 12