0% found this document useful (0 votes)
85 views12 pages

Lecture 05 - Linear Regression

This document discusses linear regression and the standard normal distribution. It explains that linear regression finds the linear trend in data by fitting a line described by the equation y=ax+b. It describes the least squares method which minimizes the sum of the squares of the differences between observed and predicted values to calculate the slope (a) and intercept (b). The coefficient of determination, R^2, measures how well the linear model fits the data. It also discusses how the central limit theorem implies that the average of samples from a non-normal distribution will be approximately normally distributed.

Uploaded by

Ramona Cirstian
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
85 views12 pages

Lecture 05 - Linear Regression

This document discusses linear regression and the standard normal distribution. It explains that linear regression finds the linear trend in data by fitting a line described by the equation y=ax+b. It describes the least squares method which minimizes the sum of the squares of the differences between observed and predicted values to calculate the slope (a) and intercept (b). The coefficient of determination, R^2, measures how well the linear model fits the data. It also discusses how the central limit theorem implies that the average of samples from a non-normal distribution will be approximately normally distributed.

Uploaded by

Ramona Cirstian
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 12

https://www.google.nl/url?

sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwiZ9PXwh-
vLAhXF7w4KHUxwBaYQjRwIBw&url=https%3A%2F%2Fsummer-heart-0930.chufeiyun1688.workers.dev%3A443%2Fhttps%2Fwww.pinterest.com
%2Fpin
%2F18929260905651518%2F&psig=AFQjCNGK4zRH_WnqZQAsCBknZMbDRb
rROQ&ust=1459517828773858&cad=rjt

Statistics
Lecture 5: Linear Regression
Recap
normal distribution
•  add more and more discrete events
–  example measuring a physical
quantity n-times
•  Has a mean, median and mode
•  symmetric: 50% values are higher than
mean and 50% are lower than mean
https://www.mathsisfun.com/data/standard-normal-distribution.html

1 −( x−µ )
2
2σ 2
N ( x) = e
σ 2π

1/04/16 2
The Standard Normal distribution
shifting the normal distribution to the mean = 0
•  Standardize normal distribution:
–  subtract the mean
–  divide by the standard deviation
•  Standardize by z

x−µ
z=
σ

https://www.mathsisfun.com/data/standard-normal-distribution.html

1/04/16 3
The Standard Normal distribution
In more detail

1/04/16 4
The central limit theorem
The CLT http://www.value-at-risk.net/central-limit-theorem/

•  distribution of an average tends to be Normal, even when the


distribution from which the average is computed is decidedly non-
Normal.
•  foundation for many statistical procedures, including Quality Control
Charts, because the distribution of the phenomenon under study does
not have to be Normal because its average will be.
•  this normal distribution will have the same mean as the parent
distribution, AND, variance equal to the variance of the parent divided
by the sample size.
x−µ
z=
σ n
1/04/16 5
Linear Regression
Finding linear trends in data
•  plotting/fitting a line to data that are linear
related
–  a – slope
–  b – intercept

•  Goal:
y = ax + b
–  error reduction
(wikipedia)
–  predicting/forcasting
–  calibration

1/04/16 6
Linear Regression
Finding linear trends in data – how to
•  plotting/fitting a line to data that are linear
related

y = ax + b
•  most common method:
–  least square methods: minimizing the
squares of the differences between the
mean line and the actual value
∂R 2
R = ∑"# yi − f ( xi , a1, a2 ,..., an )$%
2
=0
∂ai http://onlinestatbook.com/2/regression/intro.html

1/04/16 7
Linear Regression
Least squares
•  After a bit of math the equation for least square straight line fitting:

–  slope: a=
∑ (x − x )(y − y )
i i
2
∑( x − x )
i

–  intercept: b = y − ax

1/04/16 8
Linear Regression
Least Squares
•  coefficient of determination: R2 gives an idea of how well the fit is:
2
SSregression # ∑( x − x ) ( y − y ) &
1
R2 = =% (
SStotal %n σ xσ y (
$ '

•  values range: 0 ≤ R2 ≤ 1
•  R2 = 0; dependent variable cannot be predicted by the model
•  R2 = 1; dependent variable can be predicted without error
•  R2 between 0 and 1 indicated to what extend the dependent variable
can be predicted

2/04/16 9
Linear Regression
Least Squares – some things to note
•  Excel calculates trendline BUT use scatterplot!

•  What if there is no linear relation between x and y


–  try to transform into linear relation

•  How good is your fit?


–  R2 > 0.8 otherwise you should go back to the drawing board
–  Keep in mind your sample size – don’t fit a line through 2 or 3 points!

1/04/16 10
Linear Regression
Example
The sales of a company (in million dollars) for each year are shown in the
table below.

•  x (year) 2012 2013 2014 2015 2016


•  y (sales) 12 19 29 37 45
• 

a) Find the least square regression line y = ax + b.

b) Use the least squares regression line as a model to estimate the


sales of the company in 2019.

2/04/16 11
Linear Regression
Example
The sales of a company (in million dollars) for each year are shown in the
table below.

•  x (year) 2012 2013 2014 2015 2016


•  y (sales) 12 19 29 37 45
• 

a) Find the least square regression line y = ax + b.

b) Use the least squares regression line as a model to estimate the


sales of the company in 2019.

3/04/16 12

You might also like