Linear Regression and Correlation
Linear Regression and Correlation
Correlation
1
Learning objectives
• Interpret a scatter diagram
2
Example
Subject Body Plasma
weight volume (l)
(kg)
1 58.0 2.75
2 70.0 2.86
3 74.0 3.37
4 63.5 2.76
5 62.0 2.62
6 70.5 3.49
7 71.0 3.05
8 66.0 3.12
4
Scatter plot
3.6
3.4
P
l 3.2
a
s
m 3
a
2.8
v
o
2.6
l
u
m 2.4
e
2.2
2
56 58 60 62 64 66 68 70 72 74 76
Body weight (kg)
6
Linear regression equation
7
Linear regression line
y b
y = a + bx
a
8
Linear regression
• The constant 'a' is the intercept, the point at which the
line crosses the y-axis.
– value of y when x = 0
10
11
Linear regression
b = (x - x )(y - y )
(x - x )2
• Denominator = 2 2
x - (x) /n
12
Linear regression
a = y - bx
where y = y/n and x = x/n
n =8
x = 535
x2 = 35983.5
y = 24.02
y2 = 72.798
xy = 1615.295
14
Example
b = 1615.296 - (535)(24.02)/8
35983.5 - (535)2/8
= 8.96/205.38 = 0.043615
and
a = 3.0025 - 0.043615 x 66.875
= 0.0857
15
Example 1
Regression line is given by:
Interpretation of slope
For every one point change (1 kg) in body weight,
on average there is a corresponding increase of
0.04 litres in plasma volume
16
Simple linear regression
3.6
3.4
P
l 3.2
a
s
m 3
a
2.8
v
o
2.6
l
u
m 2.4
e
2.2
2
56 58 60 62 64 66 68 70 72 74 76
Body weight (kg)
18
SBP (mm Hg)
220
200
SBP 81.54 1.222 Age
180
160
140
120
100
80
20 30 40 50 60 70 80 90
Age (years)
19
Prediction
• Can use regression equation to predict the value
of y for a particular value of x
20
Sampling error in the regression
line
• Higher levels of plasma volume are associated
with higher values of weight
22
Correlation
• Linear regression - straight line summarizing the
relationship between two variables.
– Does not tell how closely the data lie on a straight
line.
• Correlation is defined as the quantification of
the degree to which two continuous variables
are related, provided that the relationship is
linear
– i.e. the closeness with which the points lie along the
straight line
23
Correlation coefficient
• Denote the true underlying population
correlation between X and Y by ρ (rho)
xy ( x)( y)
r n
2 2
[ x 2
( x)
]*[ y 2
( x)
]
n n
25
Correlation coefficient
From Example 1, correlation coefficient for
the association between body weight and
plasma volume is given by
r = 8.96______ = 0.76
sqrt(205.38 x 0.678)
31
Inference about unknown
population correlation
• We can make inference about the
unknown population correlation ρ using
the sample correlation coefficient r
32
The test statistic is
35