Module 2 MAT 350
Module 2 MAT 350
MATHEMATICS DEPARTMENT
MODULE 2-2018
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2
𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
1
Content
5. Random Vectors and Random Matrices
Definition of Vectors and Matrices
Properties of Matrices
Types of Special Matrices
Identity Matrix
Idempotent Matrix
Orthogonal Matrix
Derivative of a Quadratic Form
Expectation
Variance-Covariance Matrix
Properties of Covariance
Multivariate Normal Distribution
6. Regression Models
Simple Linear Regression Models
Least Squares Estimation of 𝛽0 and 𝛽1.
Properties of 𝛽0 and 𝛽1
Confidence Intervals and Hypothesis Testing
Multiple Regression Models
Lest Square Estimation
The F-Test of all Slopes
Prediction
The General Linear Hypothesis
Residual Analysis
Weighted Regression
The General Least Squares Model
Multicollinearity
7. Analysis of Variance and Covariance
Analysis of Variance
Assumptions of Analysis of Variance
Completely Randomised Design
Randomised Block Design
2
The Latin Square Design
Contrast Estimation
Least Significant Difference Method (LSD)
Analysis of Covariance
Model
The ANCOVA Table
Removal of the Effect of the Thickness
3
Unit 5
Random Vectors and Random Matrices
When dealing with multiple random variables it is sometimes useful to use vector and
matrix notation. This makes the formulas more compact and let us use the facts from
linear algebra. A random vector or random matrix is a vector or matrix whose elements
are random variables. A random variable is a function defined for each element of a
sample space. In this chapter the focus is on vectors and matrices. For example, how
to write n random variables (𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ) in column vector form.
𝐴 = 𝑎𝑖𝑗 .
4
(𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇
(𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇
(𝐴𝐵)−1 = 𝐵 −1 𝐴−1
(𝐴𝐵𝐶)−1 = 𝐶 −1 𝐵 −1 𝐴−1
(𝐴−1 )𝑇 = (𝐴𝑇 )−1
1 0 0
𝐼3 = (0 1 0)
0 0 1
1 0⋯ 0
𝐼𝑛 = ( ⋮ … ⋮)
0 0⋯ 1
1 1 1 1 1 1 1 1
−2 −2 −2 −2
2 2 2 2 2
Example, let 𝐴 = ( 1 1 ), then 𝐴 = ( 1 1 )×( 1 1 ) = ( 1 1 )
−2 −2 −2 −
2 2 2 2 2
5
5.3.5 Orthogonal Matrix
𝐴 = 𝐴𝑇 (𝑎𝑖𝑗 = 𝑎𝑗𝑖 )
𝑥1
𝑥2
.
𝑋 = . . Similarly, a row vector or row matrix is a 1 × 𝑚 matrix, that is, a matrix
.
[𝑥𝑚 ]
consisting of a single row of m elements.
𝑋 = [𝑥1 𝑥2 . . . 𝑥𝑚 ]
Note
Let 𝑋 𝑇 = [𝑥1 𝑥2 . . . 𝑥𝑛 ] and 𝑌 𝑇 = [𝑦1 𝑦2 . . . 𝑦𝑛 ] be n independent
vectors, then
𝑋 𝑇 𝑌 = [𝑥1 𝑦1 + 𝑥2 𝑦2 + . . . +𝑥𝑛 𝑦𝑛 ]
𝑥1 + 𝑦1
𝑥2 + 𝑦2
.
𝑋+𝑌 = .
.
[𝑥𝑚 + 𝑦𝑚 ]
6
5.4.2 Rank of a Matrix
The rank of a matrix is the maximum number of linearly independent column vectors
in the matrix. To find the rank of a matrix we simply transform the matrix to its row
echelon form and count the number of non-zero rows.
The nullity of a matrix A is the number of columns reduced to echelon form of A that
does not contain a leading entry. 𝑁𝑢𝑙𝑙𝑖𝑡𝑦 𝑜𝑓 𝐴 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑙𝑢𝑚𝑛𝑠 − 𝑅𝑎𝑛𝑘 𝐴.
Solution
𝑇𝑟(𝐴) = 𝑎11 + 𝑎22 + 𝑎33
𝑄(𝐴) = 𝑋 𝑇 𝐴𝑋 = ∑𝑛𝑖=1 ∑𝑛𝑗=1 𝑎𝑖𝑗 𝑥𝑖 𝑥𝑗 where 𝑎𝑖𝑗 are the constants. 𝐴 is called the matrix
of quadratic form.
Example
If 𝑄(𝑋) = 3𝑥12 + 8𝑥1 𝑥2 + 5𝑥22 . Find A such that 𝑄(𝐴) = 𝑋 𝑇 𝐴𝑋.
Solution
𝑥1
𝑋 = (𝑥 ) and 𝑋 𝑇 = (𝑥1 𝑥2 )
2
7
1
𝑎 𝑏
2
𝐴 = (1 ), 𝑎 is the coefficient of 𝑥12 = 3, 𝑐 is the coefficient of 𝑥22 = 5 and 𝑏 is
𝑏 𝑐
2
the coefficient of 𝑥1 𝑥2 = 8
1
3 ×8
𝐴=( 2 )
1
×8 5
2
𝟑 𝟒
𝑨=( )
𝟒 𝟓
𝑥
𝑄(𝐴) = 𝑋 𝑇 𝐴𝑋 = (𝑥1 𝑥2 ) (3 4) ( 1 )
4 5 𝑥2
𝑑𝑓(𝑥)
𝑑𝑥1
𝑑𝑓(𝑥)
𝑑𝑥2
𝑑𝑓(𝑥) 𝑑𝑓(𝑥)
=
𝑑𝑋 𝑑𝑥3
..
.
𝑑𝑓(𝑥)
( 𝑑𝑥𝑟 )
Example
𝑑𝑓(𝑥)
Consider the function 𝑓(𝑥) = 6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 . Find .
𝑑𝑋
Solution
𝑑𝑓(𝑥) 𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥1 + 2𝑥32 )
𝑑𝑥1 𝑑𝑥1
𝑑𝑓(𝑥) 𝑑𝑓(𝑥) 𝑑
= = (6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
𝑑𝑋 𝑑𝑥2 𝑑𝑥2
𝑑𝑓(𝑥) 𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
( 𝑑𝑥3 ) (𝑑𝑥3 )
8
𝒅𝒇(𝒙) 𝟏𝟐𝒙𝟏− 𝟐𝒙𝟐
= (𝟐𝒙𝟐 − 𝟐𝒙𝟏 )
𝒅𝑿 𝟒𝒙 𝟑
5.5.3 Theorem
𝑑𝑄(𝑋)
Let 𝑄(𝑋) = 𝑋 𝑇 𝐴𝑋 where 𝐴 is a symmetric matrix of constants, then = 2𝐴𝑋.
𝑑𝑋
Proof
𝑋 𝑇 𝐴𝑋 = ∑𝑟𝑖=1 ∑𝑟𝑖=1 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 , let 𝑟 = 3
3 3
𝑋 𝑇 𝐴𝑋 = ∑ ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗
𝑖=1 𝑖=1
3
𝑇
𝑋 𝐴𝑋 = ∑(𝑎𝑖1 𝑋𝑖 𝑋1 + 𝑎𝑖2 𝑋𝑖 𝑋2 + 𝑎𝑖3 𝑋𝑖 𝑋3 )
𝑖=1
𝑑(𝑋 𝑇 𝐴𝑋) 𝑑
= (𝑎 𝑋 2 + 𝑎12 𝑋1 𝑋2 + 𝑎13 𝑋1 𝑋3 + 𝑎22 𝑋22 + 𝑎21 𝑋2 𝑋1 + 𝑎23 𝑋2 𝑋3 + 𝑎31 𝑋3 𝑋1
𝑑𝑋1 𝑑𝑋1 11 1
+ 𝑎32 𝑋3 𝑋2 + 𝑎33 𝑋32 )
𝑑(𝑋 𝑇 𝐴𝑋)
= 2𝑎11 𝑋1 + 𝑎12 𝑋2 + 𝑎13 𝑋3 + 𝑎21 𝑋2 + 𝑎31 𝑋3. Since the matrix X and A are
𝑑𝑋1
symmetric, then 𝑎12 𝑋2 = 𝑎21 𝑋2, 𝑎13 𝑋3 = 𝑎31 𝑋3
𝑑(𝑋 𝑇 𝐴𝑋)
That is = 2𝑎11 𝑋1 + 𝑎12 𝑋2 + 𝑎13 𝑋3 + 𝑎12 𝑋2 + 𝑎13 𝑋3
𝑑𝑋1
𝑑(𝑋 𝑇 𝐴𝑋)
= 2𝑎11 𝑋1 + 2𝑎12 𝑋2 + 2𝑎13 𝑋3
𝑑𝑋1
9
𝑑(𝑋 𝑇 𝐴𝑋)
= 2(𝑎11 𝑋1 + 𝑎12 𝑋2 + 𝑎13 𝑋3 )
𝑑𝑋1
𝑑(𝑋 𝑇 𝐴𝑋)
∴ = 2𝐴𝑋
𝑑𝑋1
5.5.3 Example
Consider the function 𝑓(𝑥) = 6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 .
(a) Write 𝑓(𝑥) in quadratic form
𝑑
(b) Find 𝑑𝑋 (𝑓(𝑥)) and express the answer in form of 2𝐴𝑋.
Solutions
(a) 𝑓(𝑥) = 𝑋 𝑇 𝐴𝑋
𝑥1
Where 𝑋 = (𝑥1 𝑇 𝑥2 𝑥3 ), 𝑋 = (𝑥2 ) and
𝑥3
1 1
𝑎 𝑏 𝑑
2 2
1 1
𝐴= 𝑏 𝑐 𝑒 where 𝑎 = 6, 𝑏 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥1 𝑥2 = −2, 𝑐 =
2 2
1 1
𝑑 2𝑒 𝑓
(2 )
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥22 = 1, 𝑑 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥1 𝑥3 = 0, 𝑒 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥2 𝑥3 = 0,
𝑓 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑥32 =2.
1 1
6
× −2 ×0
2 2
1 1
𝐴= × −2 1 ×0
2 2
1 1
(2 × 0 ×0 2 )
2
𝟔 −𝟏 𝟎
𝑨 = (−𝟏 𝟏 𝟎)
𝟎 𝟎 𝟐
𝑓(𝑥) = 𝑋 𝑇 𝐴𝑋
𝟔 −𝟏 𝟎 𝒙𝟏
∴ 𝒇(𝒙) = (𝒙𝟏 𝒙𝟐 𝒙𝟑 ) (−𝟏 𝟏 𝟎) ( 𝒙 𝟐 )
𝟎 𝟎 𝟐 𝒙𝟑
𝑑𝑄(𝑋) 𝑑(𝑋 𝑇 𝐴𝑋)
(b) =
𝑑𝑋 𝑑𝑋
10
𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
𝑑𝑋1
𝑑𝑄(𝑋) 𝑑
= (6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
𝑑𝑋 𝑑𝑋2
𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
(𝑑𝑋3 )
𝑑𝑄(𝑋) 12 −2 0 𝑥1
= (−2 2 0) (𝑥2 )
𝑑𝑋
0 0 4 𝑥3
Factorising 2 in the matrix above we have
𝑑𝑄(𝑋) 16 −1 0 𝑥1
= 2 (−1 1 0) (𝑥2 ) = 2𝐴𝑋
𝑑𝑋
0 0 1 𝑥3
5.5.4 Example
Consider the matrix A given by
−2 2
𝐴=( ). Find the quadratic equation.
2 2
Solution
𝑥
𝑓(𝑥) = 𝑋 𝑇 𝐴𝑋 = (𝑥1 𝑥2 ) (−2 2) ( 1 )
2 2 𝑥2
5.6 Expectation
𝐸(𝑦1 )
𝐸(𝑦2 )
𝐸(𝑌) = .
..
(𝐸(𝑦𝑛 ))
𝐸(𝑥11 ) . . . 𝐸(𝑥1𝑛 )
𝐸(𝑥12 )
𝐸(𝑥21 ) . . . 𝐸(𝑥2𝑛 )
𝐸(𝑥22 )
𝐸(𝑋) = . . .
.. .. .
[𝐸(𝑥𝑛1 ) 𝐸(𝑥𝑛2 ) . . . 𝐸(𝑥𝑛𝑛 )]
11
5.6.1 Property
Let X and Y be a random matrix of the same dimension and let A and B be conformable
matrices of constants then,
5.6.3 Theorem
Proof
𝑉𝑎𝑟(𝑦1 ) . . . 𝐶𝑜𝑣(𝑦1 𝑦𝑛 )
𝐶𝑜𝑣(𝑦1 𝑦2 )
𝐶𝑜𝑣(𝑦2 𝑦1 ) . . . 𝐶𝑜𝑣(𝑦2 𝑦𝑛 )
𝑉𝑎𝑟(𝑦2 )
𝑉𝑎𝑟(𝑌) = . . .
.. .. .
(𝐶𝑜𝑣(𝑦𝑛 𝑦1 ) 𝐶𝑜𝑣(𝑦𝑛 𝑦2 ) . . . 𝑉𝑎𝑟(𝑦𝑛 ) )
𝑉𝑎𝑟(𝑦1 ) . . . 𝐶𝑜𝑣(𝑦1 𝑦𝑛 )
𝐶𝑜𝑣(𝑦1 𝑦2 )
𝐶𝑜𝑣(𝑦2 𝑦1 ) . . . 𝐶𝑜𝑣(𝑦2 𝑦𝑛 )
𝑉𝑎𝑟(𝑦2 )
𝑉𝑎𝑟(𝑌) = . . .
.. .. .
(𝐶𝑜𝑣(𝑦𝑛 𝑦1 ) 𝐶𝑜𝑣(𝑦𝑛 𝑦2 ) . . . 𝑉𝑎𝑟(𝑦𝑛 ) )
𝐶𝑜𝑣(𝑦𝑖 𝑦𝑖 ) = 𝑉𝑎𝑟(𝑦𝑖 ).
𝒊 = 𝟏, 𝟐, 𝟑, … . , 𝒏
12
5.6.3 Properties of Covariance
𝑇
Proof for 𝑉𝑎𝑟(𝑌) = 𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )
R.H.S
𝑦1 − 𝜇1
𝑦2 − 𝜇2
𝑇
. 𝑦𝑛 − 𝜇𝑛 ) .
𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 ) = 𝐸(𝑦1 − 𝜇1 𝑦2 − 𝜇2 . .
..
(𝑦𝑛 − 𝜇𝑛 )
𝑦1 − 𝜇1
𝑦2 − 𝜇2
𝑇
. 𝑦𝑛 − 𝜇𝑛 ) .
𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 ) = 𝐸(𝑦1 − 𝜇1 𝑦2 − 𝜇2 . .
..
(𝑦𝑛 − 𝜇𝑛 )
(𝑦1 − 𝜇1 )(𝑦1 − 𝜇1 ) (𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) . . . (𝑦1 − 𝜇1 )(𝑦𝑛 − 𝜇𝑛 )
(𝑦2 − 𝜇2 )(𝑦1 − 𝜇1 ) (𝑦2 − 𝜇2 )(𝑦2 − 𝜇2 ) . . . (𝑦2 − 𝜇2 )(𝑦𝑛 − 𝜇𝑛 )
=𝐸 . . .
.. .. .
((𝑦𝑛 − 𝜇𝑛 )(𝑦1 − 𝜇1 ) (𝑦𝑛 − 𝜇𝑛 )(𝑦2 − 𝜇2 ) . . . (𝑦𝑛 − 𝜇𝑛 )(𝑦𝑛 − 𝜇𝑛 ))
13
𝑉𝑎𝑟(𝑦1 ) . . . 𝐶𝑜𝑣(𝑦1 𝑦𝑛 )
𝐶𝑜𝑣(𝑦1 𝑦2 )
𝑇
𝐶𝑜𝑣(𝑦2 𝑦1 ) . . . 𝐶𝑜𝑣(𝑦2 𝑦𝑛 )
𝑉𝑎𝑟(𝑦2 )
𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 ) = . . . = 𝑉𝑎𝑟(𝑌)
.. .. .
(𝐶𝑜𝑣(𝑦𝑛 𝑦1 ) 𝐶𝑜𝑣(𝑦𝑛 𝑦2 ) . . . 𝑉𝑎𝑟(𝑦𝑛 ) )
5.6.4 Theorem
Let 𝐶 𝑇 = (𝑐1 , 𝑐2 , 𝑐3 , … , 𝑐𝑛 ) be a random vector of constants, then the linear combination
𝐶 𝑇 𝑌 = 𝑐1 𝑦1 + 𝑐2 𝑦2 + 𝑐3 𝑦3 + ⋯ + 𝑐𝑛 𝑦𝑛 has the mean 𝐶 𝑇 𝜇𝑦 and the variance 𝐶 𝑇 ∑𝑦 𝐶 .
Proof
(a) 𝐸(𝐶 𝑇 𝑌) = 𝐶 𝑇 𝐸(𝑌) = 𝑪𝑻 𝝁𝒚
𝑇
(b) 𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐸(𝐶𝑌 − 𝐶𝜇𝑦 ) (𝐶𝑌 − 𝐶𝜇𝑦 )
𝑇
𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐸𝐶 𝑇 (𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )𝐶
𝑇
𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐶 𝑇 𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )𝐶
𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐶 𝑇 𝑉𝑎𝑟(𝑌)𝐶
Hence, 𝑽𝒂𝒓(𝑪𝑻 𝒀) = 𝑪𝑻 ∑𝒚 𝑪.
Example
Let X be (𝑁3 (𝜇, ∑)) a random normally distributed vector with mean
1 −2 0
𝜇 𝑇 = (3 1 4) and Variance-Covariance matrix ∑ = (−2 5 0)
0 0 2
(i) Are 𝑋1 and 𝑋3 independent?
(ii) Find the distribution of 𝑌 = 3𝑥1 − 2𝑥2 + 𝑥3
14
Solutions
(i) 𝐶𝑜𝑣(𝑥1 𝑥3 ) = 0. Since 𝐶𝑜𝑣(𝑥1 𝑥3 ) = 0, then 𝑋1 and 𝑋1 are independent.
(ii) 𝑌 = 3𝑥1 − 2𝑥2 + 𝑥3
𝑥1
𝑇
𝐶 = (3 −2 1) and 𝑋 = (𝑥2 )
𝑥3
Mean of Y
3
𝑇
𝜇 = (−3 1 ),
4 𝐸(𝑋) = 𝜇 = (1)
4
𝑥1
𝑌 = (3 −2 1) (𝑥2 )
𝑥3
𝑥1
𝐸(𝑌) = 𝐸 ((3 −2 1) (𝑥2 ))
𝑥3
𝑥1
𝐸(𝑌) = (3 −2 1)𝐸 (𝑥2 )
𝑥3
3
𝐸(𝑌) = (3 −2 1) (1)
4
𝐸(𝑌) = (9 − 2 + 4)
𝑬(𝒀) = (𝟏𝟏)
Mean of Y
𝑉𝑎𝑟(𝑌) = (𝑉𝑎𝑟(𝐶 𝑇 𝑋) = 𝐶 𝑇 ∑ 𝐶
𝑥
𝑉𝑎𝑟(𝑌) = 𝐶 𝑇 ∑ 𝐶
𝑥
1 −2 0 3
𝑉𝑎𝑟(𝑌) = (3 )
−2 1 (−2 5 0) (−2)
0 0 2 1
7
𝑉𝑎𝑟(𝑌) = (3 −2 1) (−16)
2
𝑽𝒂𝒓(𝒀) = (𝟓𝟓)
∴ 𝒀~𝑵((𝟏𝟏), (𝟓𝟓))
15
5.6.5 Theorem
If 𝑍 = 𝐴𝑌 where 𝐴 is a matrix of constants, then
(a) 𝐸(𝑍) = 𝐴𝜇𝑦
(b) 𝑉𝑎𝑟(𝑍) = 𝐴 ∑𝑦 𝐴𝑇
Proof
(a) 𝐸(𝑍) = 𝐸(𝐴𝑌) = 𝐴𝐸(𝑌) = 𝐴𝜇𝑦
(b) 𝑉𝑎𝑟(𝑍) = 𝐴 ∑𝑦 𝐴𝑇
𝑇
𝑉𝑎𝑟(𝑍) = 𝐸(𝐴𝑌 − 𝐴𝜇𝑦 )(𝐴𝑌 − 𝐴𝜇𝑦 )
𝑇
𝑉𝑎𝑟(𝑍) = 𝐴𝐸(𝑌 − 𝜇𝑦 )(𝑌 − 𝜇𝑦 ) 𝐴𝑇
Example
1 3 2 1
𝐸(𝑍) = 𝜇 = (2) and 𝑉𝑎𝑟(𝑍) = (2 2 1) where 𝑍 = (𝑍1 𝑍2 𝑍3 )𝑇
3 1 1 1
Let
𝑌1 = 𝑍1 + 2𝑌3
𝑌2 = 𝑍1 + 𝑍2 − 𝑍3
𝑌3 = 3𝑍1 + 𝑍2 + 𝑍3
1
(a) Find the mean and variance of 𝑋 = 7 (𝑌1 + 𝑌2 + 𝑌3 )
𝑌1
(b) Find 𝐸(𝑌) and 𝑉𝑎𝑟(𝑌) where 𝑌 = (𝑌1 𝑌2 𝑌3 )𝑇 = (𝑌2 )
𝑌3
16
Solutions
1
𝑋 = (𝑌1 + 𝑌2 + 𝑌3 )
7
1
𝑋 = (𝑍1 + 2𝑌3 + 𝑍1 + 𝑍2 − 𝑍3 + 3𝑍1 + 𝑍2 + 𝑍3 )
7
1
𝑋 = (𝑍1 + 2(3𝑍1 + 𝑍2 + 𝑍3 ) + 𝑍1 + 𝑍2 − 𝑍3 + 3𝑍1 + 𝑍2 + 𝑍3 )
7
1
𝑋 = (𝑍1 + 6𝑍1 + 2𝑍2 + 2𝑍3 + 𝑍1 + 𝑍2 − 𝑍3 + 3𝑍1 + 𝑍2 + 𝑍3 )
7
1
𝑋 = (11𝑍1 + 4𝑍2 + 2𝑍3 )
7
1 𝑍1
𝑋 = (11 4 2) (𝑍2 )
7 𝑍3
𝑍1
1
That is 𝑋 = 𝐶 𝑇 𝑍 = 7 (11 4 2) (𝑍2 )
𝑍3
(a)
𝑍1
1
(i) 𝐸(𝑋) = 𝐶 𝑇 𝐸(𝑍) = 7 (11 4 2 )𝐸 ( 𝑍2)
𝑍3
1 1
𝐸(𝑋) = 𝐶 𝑇 𝐸(𝑍) = (11 4 2) (2)
7
3
11 + 8 + 6
𝐸(𝑋) = 𝐶 𝑇 𝐸(𝑍) = ( )
7
𝟐𝟓
∴ 𝑬(𝑿) = ( )
𝟕
(ii) 𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟(𝐶 𝑇 𝑍) = 𝐶 𝑇 ∑𝑧 𝐶
𝑉𝑎𝑟(𝑋) = 𝐶 𝑇 ∑ 𝐶
𝑧
11
7
11 4 2 3 2 1 4
𝑉𝑎𝑟(𝑋) = ( ) (2 2 1)
7 7 7 1 1 1 7
2
(7)
𝟔𝟑𝟓
∴ 𝑽𝒂𝒓(𝑿) = ( )
𝟒𝟗
17
𝑌1
(b) 𝑌 = (𝑌1 𝑌2 𝑌3 )𝑇 = (𝑌2 )
𝑌3
7 2 2 𝑍1
𝑌 = (1 1 −1) (𝑍2 )
3 1 1 𝑍3
(i) 𝑌 = 𝐴𝑍
𝐸(𝑌) = 𝐴𝐸(𝑍)
7 2 2 𝑍1
𝐸(𝑌) = (1 1 −1) 𝐸 (𝑍2 )
3 1 1 𝑍3
7 2 2 1
𝐸(𝑌) = (1 1 −1) (2)
3 1 1 3
17
∴ 𝐸(𝑌) = ( 0 )
8
(ii) 𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝐴𝑍) = 𝐴𝑉𝑎𝑟(𝑍)𝐴𝑇 = 𝐴 ∑𝑧 𝐴𝑇
𝑉𝑎𝑟(𝑌) = 𝐴 ∑ 𝐴𝑇
𝑧
7 2 2 3 2 1 7 1 3
𝑉𝑎𝑟(𝑌) = (1 1 −1) (2 2 1) (2 1 1)
3 1 1 1 1 1 2 −1 1
𝟐𝟓𝟏 𝟑𝟔 𝟏𝟏𝟐
∴ 𝑽𝒂𝒓(𝒀) = ( 𝟑𝟔 𝟔 𝟏𝟔 )
𝟏𝟏𝟐 𝟏𝟔 𝟓𝟎
5.6.5 Theorem
If Y is a random vector with component mean vector 𝜇𝑦 and Variance-Covariance
matrix ∑𝑦 which is non-singular and each of the k components of Y has a standard
normal distribution then
𝑇
𝑄(𝑌) = (𝑌 − 𝜇𝑦 ) ∑−1
𝑦 (𝑌 − 𝜇𝑦 ) follows a Chi-Square with k degrees of freedom.
18
Proof
For each symmetric matrix A, there exist a non-singular matrix P such that
𝑃𝑇 𝐴𝑃 = Diagonal Matrix
Since ∑𝑦 is symmetric, then 𝑃𝑇 ∑𝑦 𝑃 = 𝐼.
1 −
1
(𝑋−𝜇)2
𝑓(𝑥) = 𝑒 2𝜎2 , −∞ <𝑥<∞
√2𝜋𝜎 2
1 1 𝑇
− (𝑌−𝜇𝑦 ) ∑−1 (𝑌−𝜇𝑦 )
𝑓(𝑥) = 𝑝 1𝑒
2
(2𝜋)2 |∑|2
19
Tutorial Sheet 5
1. Let 𝑌 = (𝑦1 𝑦2 𝑦3 )𝑇 be a random vector with mean 𝜇 = (𝜇1 𝜇2 𝜇3 )𝑇 and
Variance-Covariance matrix
𝜎11 𝜎12 𝜎13
∑= ( 21 𝜎22 𝜎23 )
𝜎
𝜎31 𝜎32 𝜎33
Consider the linear combination
𝑊1 = 𝑌1 + 𝑌2 + 𝑌3
𝑊2 = 𝑌1 + 𝑌2
𝑊3 = 𝑌1 − 𝑌2 − 𝑌3
(i) State the above linear combination in matrix notation
(ii) Find 𝐸(𝑈) and 𝑉𝑎𝑟(𝑉) of 𝑈 = (𝑊1 + 𝑊2 + 𝑊3 )
(iii) Find 𝐸(𝑊) and 𝑉𝑎𝑟(𝑊) of 𝑊 = (𝑊1 , 𝑊2 , 𝑊3 )𝑇
2. Let 𝑌 = (𝑦1 𝑦2 𝑦3 )𝑇 be a random vector with mean 𝜇 = (1 2 3)𝑇 and
Variance-Covariance matrix
1 0 1
∑= (0 2 −1)
1 −1 3
Consider the linear combination
𝑈1 = 𝑌1 − 𝑌2 + 2𝑌3
𝑈2 = 𝑌1 + 𝑌3
(i) State the above linear combination in matrix notation
(ii) Find 𝐸(𝑈) and 𝑉𝑎𝑟(𝑈)
3. Let X be (𝑁3 (𝜇, ∑)) a random normally distributed vector with mean
1 2 0
𝜇 𝑇 = (3 1 4) and Variance-Covariance matrix ∑ = (2 5 0)
0 0 2
Find the distribution of
𝑈1 = 𝑋1 − 𝑋2 + 3𝑈2
𝑈2 = 2𝑋1 + 𝑋3
𝑈3 = 𝑋1 + 𝑋2 + 𝑋3
20
4. Consider the function 𝑓(𝑥) = 𝑥12 + 2𝑥22 + 2𝑥32 − 2𝑥1 𝑥2 − 2𝑥1 𝑥3
(i) Express the quadratic form as product of matrices
𝑑(𝑓(𝑥))
(ii) Find the vector form of and express the answer in form of 2𝐴𝑋.
𝑑𝑋
5. Consider the function 𝑓(𝑥) = 𝑥12 + 2𝑥22 − 7𝑥32 + 𝑥42 − 4𝑥1 𝑥2 + 8𝑥1 𝑥3 − 6𝑥3 𝑥4
(i) Express the quadratic form as product of matrices
𝑑(𝑓(𝑥))
(ii) Find the vector form of and express the answer in form of 2𝐴𝑋.
𝑑𝑋
6. Write down the quadratic form associated with the given matrices;
0 2 −4 2
1 2 −1
(i) ( 2 1 3 ) (ii) ( 2 3 1 0)
−4 1 2 1
−1 3 1
2 0 1 7
7. Prove that if B is symmetric idempotent matrix then 𝑃𝑇 𝐵𝑃 is also symmetric
idempotent matrix.
8. Let A be an 𝑛 × 𝑛 matrices. Prove that if A is idempotent, then det(𝐴) is equal to
either 0 or 1.
9. Prove that if 𝐶 = 𝐴 + 𝐵, then 𝑇𝑟(𝐶) = 𝑇𝑟(𝐴) + 𝑇𝑟(𝐵). Assume that A,B,C are all
𝑛 × 𝑛 square matrices.
21
Unit 6
Regression Models
Introduction
The focus in regression is to determine the relationship between a set of variables (to
determine the relationship between the dependent variable and the independent
variable or variables). For example, in a chemical process, we might be interested in
the relationship between the output of the process, the temperature at which it occurs,
and the amount of catalyst employed. Knowledge of such a relationship would enable
us to predict the output for various values of temperature and amount of catalyst. In
this chapter, we will focus on Simple Linear Regression Models and Multiple
Regression Models.
6.1.1 Model
The following is the model for simple linear regression;
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖
Where
𝑖 = 1,2, … , 𝑛
𝑌𝑖 is the dependent variable
𝑋𝑖 is the independent variable
22
𝛽0 and 𝛽1 are the regression coefficients
𝜀𝑖 is the random error.
6.1.2 Assumptions
The following are the assumptions about the random error;
𝐸(𝜀𝑖 ) = 0
𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2
𝜀𝑖 is normally distributed
𝜀𝑖 are independent.
Recall from Second Year that using the equation 𝑆 = ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̂𝑖 )2 where
𝑌̂𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 , we had
(a) 𝜷𝟎 = 𝒀 ̂ 𝟏𝑿
̅−𝜷 ̅
𝒏 𝒏
∑𝒊=𝟏 𝒙𝒊 ∑𝒊=𝟏 𝒚𝒊
∑𝒏
𝒊=𝟏 𝒙𝒊 𝒚𝒊 −
̂𝟏 =
(b) 𝜷 𝒏
𝟐
𝒏 𝟐 (∑𝒏
𝒊=𝟏 𝒙𝒊 )
∑𝒊=𝟏 𝒙𝒊 −
𝒏
From the regression model 𝑌̂𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 and the assumptions about the
random error (𝜀𝑖 ), the following can be shown;
(a) 𝐸(𝛽̂1 ) = 𝛽1
𝜎 2 ∑𝑛 2
𝑖=1 𝑥𝑖
(b) 𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 −𝑋)
(c) 𝐸(𝛽̂0 ) = 𝛽0
𝜎2
(d) 𝑉𝑎𝑟(𝛽̂1 ) = ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 −𝑋)
̂ ,𝛽
̂ −𝜎2 𝑋
̅
̂ )
̅ 𝑉𝑎𝑟(𝛽
(e) 𝐶𝑜𝑣(𝛽 0 1) = 2 = −𝑋 1
∑𝑛 ̅
𝑖=1(𝑋𝑖 −𝑋)
23
Proof
𝒏 𝒏
∑𝒊=𝟏 𝒙𝒊 ∑𝒊=𝟏 𝒚𝒊
∑𝒏
𝒊=𝟏 𝒙𝒊 𝒚𝒊 −
Simplifying β̂1 = 𝒏
𝒏
𝟐 we have;
∑𝒏 𝟐 (∑𝒊=𝟏 𝒙𝒊 )
𝒊=𝟏 𝒙𝒊 − 𝒏
n n 2 n 𝐧
(∑ni=1 xi )2 ∑ni=1 xi
2
∑ xi − 2
= ∑ xi − 𝑛 ( ) = ∑ xi2 − 𝑛𝑋̅ 2 = ∑(𝑿𝒊 − 𝑿
̅ )𝟐
n 𝑛
i=1 i=1 i=1 𝐢=𝟏
n n n n n
∑ni=1 xi ∑ni=1 yi
∑ xi yi − ̅) (𝒚𝒊− 𝒚
= ∑ xi yi − y̅ ∑ xi = ∑ xi yi − nx̅y̅ = ∑(𝒙𝒊 − 𝒙 ̅)
n
i=1 i=1 i=1 i=1 i=1
∑𝐧
𝐢=𝟏(𝒙𝒊 −𝒙
̅)(𝒚𝒊− 𝒚̅)
̂1
Therefore, β = 𝐧
∑𝐢=𝟏(𝑿𝒊 −𝑿̅ )𝟐
Note that
∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑦̅ = ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥̅ = ∑ni=1(𝑦𝑖 − 𝑦̅) 𝑥̅ = 0
∑ni=1(𝒙𝒊 − 𝒙 ̅) = ∑ni=1(𝒙𝒊 − 𝒙
̅) (𝒚𝒊− 𝒚 ̅) 𝒚𝒊 = ∑ni=1(𝒚𝒊 − 𝒚
̅ ) 𝒙𝒊
∑ni=1(𝑥𝑖 − 𝑥̅ )2 = ∑ni=1(𝑥𝑖 − 𝑥̅ )(𝑥𝑖 − 𝑥̅ ) = ∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
(a) 𝐸(𝛽̂1 ) = 𝛽1
∑ni=1(𝑥𝑖 − 𝑥
̅) (𝑦𝑖 − 𝑦
̅)
𝐸(𝛽̂1 ) = 𝐸 ( )
∑ni=1(𝑥𝑖 − 𝑥
̅ )2
∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑦𝑖 − 0
𝐸(𝛽̂1 ) = 𝐸 ( )
∑ni=1(𝑥𝑖 − 𝑥̅ )2
∑ni=1(𝑥𝑖 − 𝑥
̅ ) 𝑦𝑖
̂
𝐸(𝛽1 ) = 𝐸 ( n )
∑i=1(𝑥𝑖 − 𝑥̅ )2
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑦𝑖
̂
𝐸(𝛽1 ) = 𝐸 ( n )
∑i=1(𝑥𝑖 − 𝑥̅ )(𝑥𝑖 − 𝑥̅ )
24
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑦𝑖
𝐸(𝛽̂1 ) = 𝐸 ( n )
∑i=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
∑ni=1(𝑥𝑖 − 𝑥̅ )𝐸(𝑦𝑖 )
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
∑ni=1(𝑥𝑖 − 𝑥̅ )𝐸(𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 )
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
∑ni=1(𝑥𝑖 − 𝑥̅ )(𝛽0 + 𝛽1 𝑥𝑖 )
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
𝛽0 ∑ni=1(𝑥𝑖 − 𝑥̅ ) + 𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
0 + 𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
̂ 𝟏 ) = 𝜷𝟏
∴ 𝑬(𝜷
𝜎 2 ∑𝑛 2
𝑖=1 𝑥𝑖
(b) 𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛
𝑖=1(𝑋𝑖 −𝑥̅ )
2
Y − β̂1 x̅
β0 = ̅
25
𝜎 2 𝑛 𝑛 n
∑i=1(𝑥𝑖 −𝑥)(𝑦𝑖− 𝑦) ̅ ̅
∑ 𝑦 ∑
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−2𝐶𝑜𝑣( 𝑖=1𝑛 𝑖 , 𝑖=1
𝑥𝑖
× 2 )
𝑛 𝑛
∑n ̅
i=1(𝑋𝑖 −𝑋)
𝜎 2
∑ 𝑛 𝑛 ∑i=1(𝑥𝑖 −𝑥 n
̅ )𝑦
𝑦 ∑
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−2𝐶𝑜𝑣( 𝑖=1𝑛 𝑖 , 𝑖=1
𝑥𝑖 𝑖
× 2)
𝑛 𝑛
∑n ̅
i=1(𝑋𝑖 −𝑋)
𝜎 2 𝑛 𝑛
∑ 𝑦 ∑
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−2𝐶𝑜𝑣( 𝑖=1𝑛 𝑖 , 𝑛 ∑n𝑖 𝑖=1
𝑦 𝑥𝑖
(𝑥 −𝑥 ̅)
)
𝑛 i=1 𝑖
∑𝑛𝑖=1 𝑦𝑖 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖
2𝐶𝑜𝑣 ( , )=0
𝑛 𝑛 ∑n
i=1(𝑥𝑖 − 𝑥)
̅
𝜎 2
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−0
𝑛
𝜎 2
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )
𝑛
𝝈𝟐
𝑽𝒂𝒓(𝛃̂ 𝟏 )= ∑𝒏 ̅ 𝟐
𝒊=𝟏(𝒙𝒊 −𝒙)
𝜎2 𝜎2
𝑉𝑎𝑟(𝛽̂0 ) = 2
̅ 𝑛
+𝑥
𝑛 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝜎 2 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 + 𝑛𝜎 2 𝑥̅2
𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 + 𝑛𝑥̅2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( )
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1(𝑥𝑖2 − 2𝑥̅ 𝑥𝑖 + 𝑥̅ 2 ) + 𝑛𝑥̅2
𝑉𝑎𝑟(𝛽̂0 ) = 𝜎 ( 2
)
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 2𝑥̅ ∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑥̅ 2 + 𝑛𝑥̅2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( )
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 2𝑥̅ × 𝑛𝑥̅ + 𝑛𝑥̅ 2 + 𝑛𝑥̅2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( )
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 2𝑛𝑥̅ 2 + 𝑛𝑥̅ 2 + 𝑛𝑥̅2
𝑉𝑎𝑟(𝛽̂0 ) = 𝜎 ( 2
)
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 0
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( 𝑛 )
𝑛 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( 𝑛 )
𝑛 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
26
𝝈𝟐 ∑𝒏𝒊=𝟏 𝒙𝟐𝒊
̂ 𝟎) =
∴ 𝑽𝒂𝒓(𝜷
𝒏 ∑𝒏𝒊=𝟏(𝑿𝒊 − 𝒙 ̅)𝟐
Try C, D and E
𝑀𝑆𝐸
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 2) (√ 𝑛 )
2 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
From the sample variance, 𝑆 2 = . That is ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = (𝑛 − 1)𝑠 2
𝑛−1
𝑀𝑆𝐸
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 2) (√ )
2 (𝑛 − 1)𝑠 2
𝑆𝑆𝑅𝑒𝑠
Sample variance 𝑆 2 is calculated using the calculator and 𝑀𝑆𝐸 = .
𝑛−2
1 𝑥̅ 2
̂
𝛽0 ± (𝑡 , 𝑛 − 2) (√𝑀𝑆𝐸) (
𝛼 √ + )
2 𝑛 (𝑛 − 1)𝑆 2
Example
The following table shows X and Y where X is the weight of fish in tones and Y is the
amount of fish being sold in Kwacha.
27
X 7.23 8.53 9.82 10.26 8.96 12.27 10.28 4.45 1.78 4 3.3 4.3 0.8 0.5
Y 190 160 134 129 172 197 167 239 542 372 245 376 454 410
Solutions
𝑛 = 14, ∑𝑛𝑖=1 𝑦𝑖 = 3787, ∑𝑛𝑖=1 𝑦𝑖2 = 1257465, ∑𝑛𝑖=1 𝑥𝑖 = 86.48 , ∑𝑛𝑖=1 𝑥𝑖2 = 732.4876,
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 17562.8 , 𝑆 2 = 15.3
Calculations
3787 86.48
𝑦̅ = = 270.5, 𝑥̅ = = 6.177
14 14
∑ni=1 xi ∑ni=1 yi
∑ni=1 xi yi −
β̂1 = n
n
(∑ x )2
∑ni=1 xi2 − i=1 i
n
(86.48 × 3787)
17562.8 −
β̂1 = 14 = −𝟐𝟗. 𝟒𝟎𝟐
(86.48)2
732.4876 − 14
𝛽0 = 𝑌̅ − 𝛽̂1 𝑋̅
𝛽0 = 270.5 − (−29.402 × 6.177) = 452.116
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = ∑ 𝑦𝑖2 −
𝑛
𝑖=1
(3787)2
𝑆𝑆𝑇 = 1257465 − = 𝟐𝟑𝟑𝟎𝟖𝟏. 𝟓
14
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
3787 × 86.48 2
(17562.8 − )
SSReg = 14 = 𝟏𝟕𝟏𝟒𝟏𝟑. 𝟗
(86.48)2
732.4876 − 14
28
𝑺𝑺𝑹𝒆𝒔 (𝑬𝒓𝒓𝒐𝒓) = 𝑆𝑆𝑇 − SSReg = 233081.5 − 171413.9 = 𝟔𝟏𝟔𝟔𝟕. 𝟔
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹
Total 233081.5 13
𝑆𝑆𝑅𝑒𝑠 61667.6
𝑀𝑆𝐸 = = = 𝟓𝟏𝟑𝟖. 𝟗𝟕
𝑛−2 12
𝑀𝑆𝐸 5138.97
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 2) (√ ) = −29.402 ± (𝑡0.975 , 12) (√ )
2 (𝑛 − 1)𝑠 2 (14 − 1)15.3
= −29.402 ± (2.1788)(5.083)
= −29.402 ± 11.075
29
6.1.5.2 Hypothesis Testing
Here the focus is on how to conduct a hypothesis test to determine whether there is a
significant linear relationship between an independent variable X and a dependent
variable Y. The test focuses on the slope of the regression line 𝑌 = 𝛽0 + 𝛽1 𝑋. Where
𝛽0 is a constant, 𝛽1 is the slope (regression coefficient), X is the value of the
independent variable and Y is the value of the dependent variable.
Hypothesis for 𝜷𝟏
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0
Calculation for 𝜷𝟏
𝛽̂1 − 𝛽1
𝑇=
𝑀𝑆𝐸
√
(𝑛 − 1)𝑆 2
Hypothesis for 𝜷𝟎
𝐻0 : 𝛽0 = 0
𝐻1 : 𝛽0 ≠ 0
Calculation for 𝜷𝟎
𝛽̂0 − 𝛽0
𝑇=
1 𝑋̅ 2
√𝑀𝑆𝐸 ( + )
𝑛 (𝑛 − 1)𝑆 2
Decision Rule
30
Example
A statistician in the United States of America wanted to find out the relationship
between skin cancer mortality and state latitude. The response variable Y is the
mortality rate (number of deaths per 10 million people) of white males due to malignant
skin melanoma from 1950-1959. The predictor variable X is the latitude (degrees
North) at the centre of each of 49 states the United States. A subset of the data was
summarised and looks like this:
𝑛 = 49, ∑𝑛𝑖=1 𝑦𝑖 = 7491, ∑𝑛𝑖=1 𝑦𝑖2 = 1198843, ∑𝑛𝑖=1 𝑥𝑖 =1937.1 , ∑𝑛𝑖=1 𝑥𝑖2 = 77599.19,
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 290039
Solutions
Calculations
7491 1937.1
𝑦̅ = = 𝟏𝟓𝟐. 𝟖𝟖 𝑥̅ = = 𝟑𝟗. 𝟓𝟑
49 49
∑ni=1 xi ∑ni=1 yi
∑ni=1 xi yi −
β̂1 = n
n
(∑ x )2
∑ni=1 xi2 − i=1 i
n
(1937.1 × 7491)
290039 −
β̂1 = 49 = −𝟓. 𝟗𝟖
(1937.1)2
77599.19 − 49
𝛽0 = 𝑌̅ − 𝛽̂1 𝑋̅
𝛽0 = 152.88 −(−5.98 × 39.53) = 𝟑𝟖𝟗. 𝟐𝟕
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = ∑ 𝑦𝑖2 −
𝑛
𝑖=1
(7491)2
𝑆𝑆𝑇 = 1198843 − = 𝟓𝟑𝟔𝟑𝟕. 𝟐𝟕
49
31
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
1937.1 × 7491 2
(290039 − )
SSReg = 49 = 𝟑𝟔𝟒𝟔𝟒. 𝟐
(1937.1)2
77599.19 − 49
𝑺𝑺𝑹𝒆𝒔 (𝑬𝒓𝒓𝒐𝒓) = 𝑆𝑆𝑇 − SSReg = 53637.27 − 36464.2 = 𝟏𝟕𝟏𝟕𝟑. 𝟎𝟕
ANOVA Table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Total 53637.27 48
𝑛
1 (∑𝑛𝑖=1 𝑥𝑖 )2
2
𝜎̂ = [∑ 𝑥𝑖2 − ]
𝑛−1 𝑛
𝑖=1
1 (1937.1)2
𝜎̂ 2 = [77599.19 − ] = 𝟐𝟏. 𝟐𝟔
49 − 1 49
𝑛−1 2
𝑆2 = 𝜎̂
𝑛
49 − 1
𝑆2 = × 21.26 = 𝟐𝟎. 𝟖𝟑
49
1 𝑋̅ 2
𝛽̂0 ± (𝑡𝛼 , 𝑛 − 2) √𝑀𝑆𝐸 ( + )
2 𝑛 (𝑛 − 1)𝑆 2
1 39.532
= 389.27 ± (𝑡0.975 , 47)√365.38 ( + )
49 (49 − 1)20.83
32
= 389.27 ± (2.01)(24.05)
= 389.27 ± (48.3405)
We are 95% confident that the population intercept is between 340.93 and 437.61.
That is, we can be confident that the mean mortality rate at a latitude of 0 degrees
north is between 340.93 and 437.61 deaths per 10 million people.
Calculation for 𝜷𝟏
𝛽̂1 − 𝛽1
𝑇=
𝑀𝑆𝐸
√
(𝑛 − 1)𝑆 2
−5.98 − 0
𝑇=
365.38
√
(49 − 1)20.83
−5.98
𝑇=
√0.365
𝑻 = −𝟗. 𝟗
Decision Rule
Conclusion
Since |−9.9| = 9.9 > 𝑡0.975 , 47 = 2.01, we reject 𝐻0 at 5% level of significance and
conclude that 𝛽1 ≠ 0 (There is a relationship between state latitude and skin cancer
mortality.
33
(b) Hypothesis for 𝜷𝟎
Calculation for 𝜷𝟎
𝛽̂0 − 𝛽0
𝑇=
1 𝑋̅ 2
√𝑀𝑆𝐸 ( + )
𝑛 (𝑛 − 1)𝑆 2
389.27 − 0
𝑇=
1 39.532
√365.38 ( + )
49 (49 − 1)20.83
389.27
𝑇=
√578.5
𝑇 =16.2
Decision Rule
Conclusion
Since |16.2| = 16.2 > 𝑡0.975 , 47 = 2.01, we reject 𝐻0 at 5% level of significance and
conclude that 𝛽0 ≠ 0 (The mean mortality rate at a latitude of 0 degrees is not 0).
34
𝑆𝑆𝑅𝑒𝑔
Coefficient of Determination (𝑅 2 ) = × 100. From the above example, Coefficient
𝑆𝑆𝑇
36464.2
of Determination (𝑅 2 ) = 53637.27 × 100 = 𝟔𝟖%.
Multiple regression is an extension of simple linear regression in which more than one
independent variable (X) is used to predict a single dependent variable (Y). A multiple
regression equation expresses a linear relationship between a response variable y and two or
more predictor variables (𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 ).
Model
The general form of a multiple regression model is given below
𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
Where
The method of least squares may be used to estimate the regression coefficients in
the multiple regression model. The objective is to find 𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑘 . This is done by
minimizing the equation 𝐿 = ∑𝑛𝑖=1 𝜀𝑖2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦)2 with respect to 𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑘 . Let
𝑥𝑖𝑗 denote the 𝑖 𝑡ℎ observation, then multiple regression model becomes
𝑌 = 𝛽0 + 𝛽1 𝑥𝑖1 + 𝛽2 𝑥𝑖2 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜀𝑖
𝑦 = 𝛽0 + ∑ 𝛽𝑗 𝑥𝑖𝑗 + 𝜀
𝑗=1
35
2
𝑛 𝑘
𝐿 = ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗 𝑥𝑖𝑗 )
𝑖=1 𝑗=1
𝑛 𝑘
= −2 ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗 𝑥𝑖𝑗 ) = 0
𝑖=1 𝑗=1
𝑛 𝑘
Where 𝑗 = 1,2, … , 𝑘
Simplifying further we have
𝑛 𝑘
𝛽̂0 ∑ 𝑥𝑖1 + 2
𝛽1 ∑ 𝑥𝑖1 + 𝛽̂2 ∑ 𝑥𝑖1 𝑥𝑖2 = ∑ 𝑥𝑖1 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛
Example 6.2.1.1
Use the following data to fit the model 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 .
36
𝑛 = 25, ∑25 25 25
𝑖=1 𝑦𝑖 = 725.82 ∑𝑖=1 𝑥𝑖1 = 206, ∑𝑖=1 𝑥𝑖2 = 8294
∑25 25 2 25 2
𝑖=1 𝑥𝑖1 𝑥𝑖2 =77177, ∑𝑖=1 𝑥𝑖1 =2396, ∑𝑖=1 𝑥𝑖2 =3,531,848
∑25 25
𝑖=1 𝑥𝑖2 𝑦𝑖 = 274,811.31 ∑𝑖=1 𝑥𝑖1 𝑦𝑖 = 8,008.37.
Solution
25𝛽̂0 + 206𝛽̂1 + 8294𝛽̂2 = 725.82
206𝛽̂0 + 2396𝛽̂1 + 77177𝛽̂2 = 8008.37
8294𝛽̂0 + 77177𝛽̂1 + 3531848𝛽̂2 = 274811.31
The solution to the above set of equations is;
𝛽0 = 2.266378726, 𝛽1 = 2.744204729, 𝛽2 = 0.012521625
Therefore, the fitted regression equation is
̂ = 𝟐. 𝟐𝟔𝟔𝟑𝟕𝟖𝟕𝟐𝟔 + 𝟐. 𝟕𝟒𝟒𝟐𝟎𝟒𝟕𝟐𝟗𝒙𝟏 + 𝟎. 𝟎𝟏𝟐𝟓𝟐𝟏𝟔𝟐𝟓𝒙𝟐
𝒚
𝑌 = 𝑋𝛽 + 𝜀
Where
𝐿 = (𝑦 − 𝑋𝛽)(𝑦 − 𝑋𝛽)
37
𝐿 = 𝑦 2 − 2𝑋𝑦𝛽 + 𝛽 2 𝑋 2 = 𝒚𝑻 𝒚 − 𝟐𝑿𝒚𝜷 + 𝜷𝑻 𝜷𝑿𝑻 𝑿
Differentiating 𝐿 with respect to 𝛽 and equate the differential to 0 we have
𝑑𝐿
=0
𝑑𝛽
𝑑
(𝑦 2 − 2𝑋𝑦𝛽 + 𝛽 2 𝑋 2 ) = 0
𝑑𝛽
−2𝑋𝑦 + 2𝛽𝑋 2 = 0
𝛽̂ 𝑋 2 = 𝑋𝑦
𝛽̂ 𝑋 𝑇 𝑋 = 𝑋𝑦
𝛽̂ (𝑋 𝑇 𝑋) 𝑋𝑦
𝑇
= 𝑇
(𝑋 𝑋) (𝑋 𝑋)
𝑋𝑦
𝛽̂ =
(𝑋 𝑇 𝑋)
Not that 𝑋 is a symmetric. That is 𝑋 = 𝑋 𝑇
̂ = (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚
∴𝜷
6.2.2 Theorem
1 1 1 . . . 1
Suppose that 𝑋 𝑇 = ( 𝑇 𝑦2 . . . 𝑦𝑛 )
𝑥1 𝑥2 𝑥3 . . . 𝑥𝑛 ) and 𝑦 = (𝑦1
𝑉𝑎𝑟(𝛽̂0 ) 𝐶𝑜𝑣(𝛽̂0 , 𝛽̂1 )
(a) 𝜎 2 (𝑋 𝑇 𝑋)−1 = ( )
𝐶𝑜𝑣(𝛽̂0 , 𝛽̂1 ) 𝑉𝑎𝑟(𝛽̂1 )
𝛽̂
(b) (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝛽̂ = ( 0 )
𝛽̂1
Proof
𝑛
𝑦1
∑ 𝑦𝑖
1 1 1 . . . 1 𝑦2
𝑋𝑇 𝑦 = ( ) .. = 𝑖=1
𝑥1 𝑥2 𝑥3 . . . 𝑥𝑛 𝑛
.
(𝑦𝑛 ) ∑ 𝑥𝑖 𝑦𝑖
( 𝑖=1 )
𝑛
𝑥1 1
𝑥2 𝑛 ∑ 𝑥𝑖
1 𝑥
1 1 1 . . . 1
𝑋𝑇 𝑋 = ( ..3 = 𝑛 𝑖=1
𝑥1 𝑥2 𝑥3 . . . 𝑥𝑛 ) 1.. 𝑛
. . ∑ 𝑥𝑖 ∑ 𝑥𝑖2
(1 𝑥𝑛 ) ( 𝑖=1 𝑖=1 )
38
𝑛 𝑛 2
|𝑋 𝑇 𝑋| = 𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
𝑖=1 𝑖=1
𝑛 𝑛
∑ 𝑥𝑖2 − ∑ 𝑥𝑖
1 1 𝑖=1 𝑖=1
(𝑋 𝑇 𝑋)−1 = (𝑋 𝑇 𝑋)𝑇 = 𝑛
𝑇
𝑑𝑒𝑡(𝑋 𝑋) 𝑛 ∑𝑖=1 𝑥𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )2
𝑛 2
− ∑ 𝑥𝑖 𝑛
( 𝑖=1 )
𝜎2 ∑𝑛 2
𝑖=1 𝑥𝑖 −𝜎2 ∑𝑛
𝑖=1 𝑥𝑖
2 2
𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 ) 𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
(a) 𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = ( )
−𝜎2 ∑𝑛
𝑖=1 𝑥𝑖 𝑛𝜎2
2 2
𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 ) 𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2 𝑛
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = 𝑛
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖 𝑛𝜎 2
𝑛 𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑ 𝑛
( 𝑖=1 𝑥𝑖
2
− ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖 )
𝑛 𝑛
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2 𝑛
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = 𝑛
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖 𝑛𝜎 2
𝑛 𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑ 𝑛
𝑥
( 𝑖=1 𝑖
2
− ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖 )
𝑛 𝑛
39
𝑽𝒂𝒓(𝜷̂ 𝟎) ̂ 𝟎, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟏)
∴ 𝝈𝟐 (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒀 = ( )
̂ 𝟎, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟏) 𝑽𝒂𝒓(𝜷̂ 𝟏)
𝛽̂
(b) (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝛽̂ = ( 0 )
𝛽̂1
𝑛
∑𝑛𝑖=1 𝑥𝑖2 − ∑𝑛𝑖=1 𝑥𝑖 ∑ 𝑦𝑖
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 𝑖=1
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝑛
− ∑𝑛𝑖=1 𝑥𝑖 𝑛
𝑛 2 𝑛
(𝑛 ∑𝑖=1 𝑥𝑖 − (∑𝑖=1 𝑥𝑖 )
2 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 ) ∑ 𝑥𝑖 𝑦𝑖
( 𝑖=1 )
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 =
− ∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖 + 𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
2
( 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 ) )
40
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 − (∑ni=1 xi )2 ∑ni=1 yi + (∑ni=1 xi )2 ∑ni=1 yi − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
= ∑𝑛 𝑦 ∑𝑛 𝑥
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑖=1 𝑖 𝑖=1 𝑖
𝑛
2
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
( ∑𝑖=1 𝑥𝑖 − )
𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2
(∑𝑛𝑖=1 𝑥𝑖2 −
𝑛 ) (∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑ni=1 xi ∑ni=1 yi )
𝑦̅ 2 − 𝑋̅ 2
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 ) 𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 − ∑𝑖=1 𝑥𝑖 −
= 𝑛 𝑛
𝑛 𝑛
∑ 𝑦 ∑ 𝑥
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑖=1 𝑖 𝑖=1 𝑖
𝑛
𝑛 2
( ∑ 𝑥 𝑖 )
( ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 )
𝑛
𝑛 𝑛
∑𝑖=1 𝑦𝑖 ∑𝑖=1 𝑥𝑖
∑𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 −
Note that 𝑛
2 = 𝛽̂1
(∑𝑛
𝑖=1 𝑥𝑖 )
∑𝑛 𝑥
𝑖=1 𝑖
2
−
𝑛
41
2
(∑𝑛𝑖=1 𝑥𝑖 )
(∑𝑛𝑖=1 𝑥2𝑖 − )
𝑛
𝑦
̅ ̂ 𝑋
−𝛽 ̅
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 2 1
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 −
𝑛
̂
𝛽
( 1 )
𝑦
̅−𝛽 ̂ 𝑋 ̅
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = ( 1
) and 𝛽̂0 = 𝑦 ̂ 𝑋
̅−𝛽 ̅
̂ 1
𝛽 1
̂
̂ = (𝜷𝟎 )
∴ (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚 = 𝜷
̂𝟏
𝜷
6.2.3 Theorem
If = 𝑋𝛽 + 𝜀 , 𝜀𝑖 ~𝑁(0, 𝜎 2 ) ,𝑟 = (𝑌 − 𝑌̂)and (Hat Matrix) 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 , then
(a) 𝑌̂ = 𝐻𝑌
(b) 𝐻 𝑇 = 𝐻
(c) 𝐻 2 = 𝐻
(d) 𝑟 = (1 − 𝐻)𝑌
(e) 𝛽̂ ~𝑁(𝛽, 𝜎 2 (𝑋 𝑇 𝑋)−1 )
(f) 𝑌̂~𝑁(𝑋𝛽, 𝜎 2 𝐻)
(g) 𝑟~𝑁(0, 𝜎 2 (1 − 𝐻))
(h) 𝐸(𝑟 𝑇 𝑟) = 𝜎 2 (𝑛 − 𝑝 − 1)
Proofs
(a) 𝑌̂ = 𝐻𝑌
Since 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 and 𝛽̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 then
𝑌̂ = 𝛽̂ 𝑋
𝑌̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌𝑋
𝑌̂ = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
𝑌̂ = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑌 and 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇
∴ 𝑌̂ = 𝐻𝑌
(b) 𝐻 𝑇 = 𝐻
𝐻 𝑇 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑇
𝐻 𝑇 = (𝑋 𝑇 (𝑋 𝑇 𝑋)−1 𝑋)
42
𝐻 𝑇 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 ) = 𝐻
(c) 𝐻 2 = 𝐻
𝐻 2 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )2 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )
𝐻 2 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋 and (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋 = 𝑰
𝐻 2 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝐼 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 = 𝑯
(d) 𝑟 = (1 − 𝐻)𝑌
𝑟 = (𝑌 − 𝑌̂) where 𝑌̂ = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑌
𝑟 = (𝑌 − (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑌). Factorising 𝑌 we have
𝑟 = (1 − (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 ))𝑌. 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 = 𝐻
𝑟 = (1 − 𝐻)𝑌
Thus, 𝒓 = (𝟏 − 𝑯)𝒀
(e) 𝛽̂ ~𝑁(𝛽, 𝜎 2 (𝑋 𝑇 𝑋)−1 ) where 𝛽̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
𝛽̂ follows the normal distribution with mean 𝛽 and variance 𝜎 2 (𝑋 𝑇 𝑋)−1
Mean
𝐸(𝛽̂ ) = 𝐸((𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝐸(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝐸(𝑌). Where 𝑌 = 𝑋𝛽 + 𝜀
𝐸(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝐸(𝑋𝛽 + 𝜀 )
𝐸(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋𝛽 and (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋 = 𝑰
𝐸(𝛽̂ ) = 𝐼𝛽 = 𝜷
Variance
𝑉𝑎𝑟(𝛽̂ ) = 𝑉𝑎𝑟((𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝑉𝑎𝑟(𝛽̂ ) = ((𝑋 𝑇 𝑋)−1 𝑋 𝑇 )2 𝑉𝑎𝑟(𝑌)
𝑉𝑎𝑟(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 ((𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑇 𝜎 2
𝑉𝑎𝑟(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 𝜎 2 and 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝑰
𝑉𝑎𝑟(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝐼𝜎 2 = 𝝈𝟐 (𝑿𝑻 𝑿)−𝟏
(f) 𝑌̂~𝑁(𝑋𝛽, 𝜎 2 𝐻)
𝑌 = 𝑋𝛽 + 𝜀
𝑌̂ = 𝐻𝑌
Mean
43
𝐸(𝑌̂) = 𝐸(𝐻𝑌)
𝐸(𝑌̂) = 𝐸(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝐸(𝑌̂) = 𝐸(𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 𝑌). 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝐼
𝐸(𝑌̂) = 𝐸(𝐼𝑌) = 𝐸(𝑋𝛽 + 𝜀) = 𝑋𝛽 + 0 = 𝑿𝜷
Variance
𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝐻𝑌)
𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑉𝑎𝑟(𝑌)(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑇
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝜎 2 . 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝐼
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝐼𝑋 𝑇 𝜎 2
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝜎 2 and 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 = 𝐻
𝑉𝑎𝑟(𝑌̂) = 𝝈𝟐 𝑯
̂ ~𝑵(𝑿𝜷, 𝝈𝟐 𝑯)
Thus, 𝒀
44
(h) 𝐸(𝑟 𝑇 𝑟) = 𝜎 2 (𝑛 − 𝑝 − 1)
𝑇
𝐸(𝑟 𝑇 𝑟) = 𝐸 {((𝑌 − 𝐻𝑌)) ((𝑌 − 𝐻𝑌))}
𝑇
𝐸(𝑟 𝑇 𝑟) = 𝐸 {((1 − 𝐻)) (𝑌)𝑇 𝑌((1 − 𝐻))}
𝑇
𝐸(𝑟 𝑇 𝑟) = 𝐸 {((1 − 𝐻)) (1 − 𝐻)(𝑌)𝑇 𝑌}
Example 6.2.1
The average monthly electric power consumption (Y) at a certain manufacturing
plant is considered to be dependent on the average temperature (𝑥1 ) and the
number of working days in a month (𝑥2 ). Consider the one year monthly data given
in the table below;
45
𝑥1 20 26 41 55 60 67 75 79 70 55 45 33
𝑥2 23 21 24 25 24 26 25 25 24 25 25 23
𝑦 210 206 260 244 271 285 270 265 234 241 258 230
1 20 23
1 1 . . . 1 1. 26 21 12 626 290
. .
𝑋 𝑇 𝑋 = (20 26 . . . 33) = (626 36776 15336)
.. .. ..
23 21 . . . 23 290 15336 7028
(1 33 23)
210
1 1 . . . 1 206 2974
.
𝑋 𝑇 𝑌 = (20 26 . . . 33) = (159011)
..
23 21 . . . 23 72166
( 230 )
Procedures to get the inverse of the matrix 𝑿𝑻 𝑿:
Using 𝑓𝑥 − 991𝑀𝑆 calculator to get the 3 × 3 inverse matrix procedures:
Mode-Mode-Mode-MAT. After that, shift-4- Dimension (1) then choose (3 for 3 ×
3 matrix )-then press 3= again 3=. Enter the entries of the matrix 𝑋 𝑇 𝑋 then
press = for each entered entry. After that press shift-4-MAT(3) then choose C (3)
representing 3 × 3 matrix-𝑥 −1 = then next to get the 3 × 3 inverse matrix.
46
51.16997994 0.105362232 −2.341367299
(𝑋 𝑇 𝑋)−1 = ( 0.105362232 0.0005189824426 −0.005480102741)
−2.341367299 −0.005480102741 0.108713627
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
51.16997994 0.105362232 −2.341367299 2974
= ( 0.105362232 0.0005189824426 −0.005480102741) (159011)
−2.341367299 −0.005480102741 0.108713627 72166
−33.838285
𝑇 −1 𝑇
(𝑋 𝑋) 𝑋 𝑌 = (0.394100741)
10.8046419
−33.838285 𝛽0
∴ (𝑿𝑻 −𝟏 𝑻 ̂
𝑿) 𝑿 𝒀 = 𝜷 = (0.394100741) = (𝛽1 )
10.8046419 𝛽2
𝛽̂0 = −33.838285, 𝛽̂1 = 0.394100741, 𝛽̂2 = 10.8046419
Model
𝑌 = −33.838285 + 0.394100741𝑥1 + 10.8046419𝑥2 + 𝜀𝑖
Example 6.2.2
Fit a polynomial 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝜀 to the following data;
𝑋 1 2 3 4 5 6 7 8 9 10
𝑌 20.6 30.8 55 71.4 97.3 131.8 156.3 197.3 238.7 291.7
Solution
𝑥 1 2 3 4 5 6 7 8 9 10
𝑥2 1 4 9 16 25 36 49 64 81 100
𝑦 20.6 30.8 55 71.4 97.3 131.8 156.3 197.3 238.7 291.7
47
20.6
1 1 . . . 1 30.8 1290.9
.
𝑋 𝑇 𝑌 = (1 2 . . . 10 ) = ( 9547.9 )
..
1 4 . . . 100 77749.1
(291.7)
1.383333333 −0.525 0.041666666
(𝑋 𝑇 𝑋)−1 = ( −0.525 0.241287878 −0.020833333 )
0.041666666 −0.020833333 0.001893939394
1.383333333 −0.525 0.041666666 1290.9
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = ( −0.525 0.241287878 −0.020833333 ) ( 9547.9 )
0.041666666 −0.020833333 0.001893939394 77749.1
12.64328107
𝑇 −1 𝑇
(𝑋 𝑋) 𝑋 𝑌 = ( 6.29713961 )
2.125002326
12.64328107 𝛽0
𝑻 −𝟏 𝑻 ̂
∴ (𝑿 𝑿) 𝑿 𝒀 = 𝜷 = ( 6.29713961 ) = (𝛽1 )
2.125002326 𝛽2
𝛽̂0 = 12.64328107, 𝛽̂1 = 6.29713961, 𝛽̂2 = 2.125002326
Thus, the estimated quadratic equation is
𝑌 = 12.64328107 + 6.29713961𝑥 + 2.125002326𝑥 2 + 𝜀𝑖
𝑇
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = 𝑌 𝑌 −
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑅𝑒𝑔 = 𝛽̂ 𝑇 𝑋 𝑇 𝑌 −
𝑛
𝑆𝑆𝑅𝑒𝑠 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑔
Hypothesis
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑝−1 = 0 (there is no relationship)
𝐻1 : 𝛽𝑖 ≠ 0, for at least one value of 𝑖 (there is a relationship)
48
ANOVA Table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
𝑛−𝑝
Where 𝑝 is the number of both the independent and dependent variables, 𝑘 is the
number of the independent variables and 𝑛 is the sample size.
Decision Rule
𝛼
Reject 𝐻0 if𝐹 ∗ > 𝑓(𝑝−1),(𝑛−𝑘−1)
49
Example 6.3.1
The table below shows pull strength of a wire bond in a semiconductor manufacturing
process, wire length and die height.
13 41.95 12 500
Test the hypothesis at 𝛼 = 0.05 that pull strength is linearly related to either wire
length or die height, or to both.
50
Solutions
𝑛, = 25, ∑25 25 2 25 25
𝑖=1 𝑥𝑖1 = 206, ∑𝑖=1 𝑥𝑖1 = 2396 , ∑𝑖=1 𝑥𝑖1 𝑥𝑖2 = 77177, ∑𝑖=1 𝑥𝑖2 =8294,
∑25 2 25 25 2
𝑖=1 𝑥𝑖2 = 3531848, ∑𝑖=1 𝑦𝑖 = 725.82, ∑𝑖=1 𝑦𝑖 = 27178.5316
1 2 50
1 1 . . . 1 1. 3. 110 25 206 8294
𝑇 .
𝑋 𝑋=(2 8 . . . 5 ) = ( 206 2396 77177 )
.. .. ..
50 110 . . . 400 8294 77177 3531848
(1 5 400)
9.95
1 1 .
. . 1 24.45 725.82
𝑇 .
𝑋 𝑌=(2 8 . . 5 )
. = ( 8008.47 )
..
50 110 . . . 400 274816.71
( 21.15 )
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
0.214652616 −0.007490914218 −0.0003403890868 725.82
= ( −0.007490914218 0.001670763131 −0.00001891781403 ) ( 8008.47 )
−0.0003403890868 −0.00001891781403 0.000001495876159 274816.71
2.263791003
= (2.744269642)
0.012527811
𝟐. 𝟐𝟔𝟑𝟕𝟗𝟏𝟎𝟎𝟑
̂ = (𝟐. 𝟕𝟒𝟒𝟐𝟔𝟗𝟔𝟒𝟐)
∴ (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒀 = 𝜷
𝟎. 𝟎𝟏𝟐𝟓𝟐𝟔𝟓𝟏𝟖
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = 𝑌 𝑇 𝑌 −
𝑛
(725.82)2
𝑆𝑆𝑇 = 27178.5316 − = 𝟔𝟏𝟎𝟓. 𝟗𝟒𝟓
25
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑅𝑒𝑔 = 𝛽̂ 𝑇 𝑋 𝑇 𝑌 −
𝑛
725.82 (725.82)2
𝑆𝑆𝑅𝑒𝑔 = (2.263791003 2.744269642 )
0.012527811 ( 8008.47 ) −
25
274816.71
51
(725.82)2
𝑆𝑆𝑅𝑒𝑔 = 27063.35769 − = 𝟓𝟗𝟗𝟎. 𝟕𝟕𝟏
25
𝑆𝑆𝑅𝑒𝑠 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑔= 6105.945 − 5990.771 = 𝟏𝟏𝟓. 𝟏𝟕𝟒
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Regression 5990.771 2 2995.3856 𝟓𝟕𝟐. 𝟐
Error or
Residue 115.174 22 5.235
Total 6105.945 24
Hypothesis
𝐻0 : 𝛽1 = 𝛽2 = 0 (Pull strength is not linearly related to either wire length or die height,
or to both).
Decision Rule
𝛼=0.05
Reject 𝐻0 if𝐹 ∗ > 𝑓2,22 = 3.4434
Conclusion
𝛼=0.05
Since 𝐹 ∗ = 572.2 > 𝑓2,22 = 3.4434, we reject 𝐻0 at 5% level of significant and
conclude that pull strength is linearly related to either wire length or die height, or to
both.
52
6.4 Tests of Individual Regression Coefficients and Subsets of
Coefficients
We are frequently interested in testing hypotheses on the individual regression
coefficients. Such tests would be useful in determining the potential value of each of
the regressor variables in the regression model.
Hypothesis
𝐻0 : 𝛽𝑗 = 0
𝐻1 : 𝛽𝑗 ≠ 0
Calculations
̂𝑗 −𝛽𝑗
𝛽
𝑡= ̂𝑗 )
𝑆𝐸(𝛽
𝑆𝐸(𝛽̂𝑗 ) = √𝑀𝑆𝐸(𝐶𝑗𝑗 )
̂𝑗 −𝛽𝑗
𝛽
𝑡= where 𝐶𝑗𝑗 is the diagonal element of (𝑋 𝑇 𝑋)−1 corresponding to 𝛽̂𝑗 .
√𝑀𝑆𝐸(𝐶𝑗𝑗 )
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡𝛼, 𝑛 − 𝑘 − 1
2
Example 6.4.1
Consider the wire bond pull strength data from example 6.3.1 and suppose that we
want to test the hypothesis that the regression coefficient for 𝑥2 (die height) is zero.
53
Solution
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Regression 5990.771 2 2995.3856 𝟓𝟕𝟐. 𝟐
Error or
Residue 115.174 22 5.235
Total 6105.945 24
2.263791003
𝛽̂ = (2.744269642)
0.012526518
Hypothesis
𝐻0 : 𝛽2 = 0 (The variable 𝑥2 (die height) does not contribute significantly to the model).
𝐻1 : 𝛽2 ≠ 0 (The variable 𝑥2 (die height) contributes significantly to the model).
Calculations
̂2 −𝛽2
𝛽
𝑡= ̂𝑗 )
𝑆𝐸(𝛽
̂2 −𝛽2
𝛽
𝑡= .
√𝑀𝑆𝐸(𝐶22 )
54
𝐶22 = 0.0000015
0.012526518−0
𝑡= = 𝟒. 𝟒𝟕
√0.0000015(5.235)
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡0.975,22 = 2.07
Conclusion
Since |𝑡| = 4.47 > 𝑡0.975,22 = 2.07, we reject 𝐻0 at 5% level of significant and conclude
that the variable 𝑥2 (die height) contributes significantly to the model.
Example 6.4.2
A collector of antique grandfather clocks knows that the price received for the clocks
increases linearly with the age of the clocks. Moreover, then collector hypothesizes
that the auction price of the clocks will increase linearly as the number of bidders
increases. A sample of 32 auction prices of grandfather clocks, along with their age
and the number of bidders, is summarised below;
𝑛 = 32, ∑32 32 2 32 32
𝑖=1 𝑦𝑖 = 42460 ∑𝑖=1 𝑦𝑖 = 61138902, ,∑𝑖=1 𝑥𝑖1 = 4638, ∑𝑖=1 𝑥𝑖2 =305
∑32 32 2 32 2
𝑖=1 𝑥𝑖1 𝑥𝑖2 =43594, ∑𝑖=1 𝑥𝑖1 = 695486, ∑𝑖=1 𝑥𝑖2 =3157,
∑32 32
𝑖=1 𝑥𝑖1 𝑦𝑖 = 6397869, ∑𝑖=1 𝑥𝑖2 𝑦𝑖 =418386.
Where 𝑥1 is the age, 𝑥2 is the number of bidders and 𝑦 is the auction price.
(a) Test the hypothesis that the mean auction price of a clock increases as the
number of bidders increases. Use 𝛼 = 0.05.
(b) Form a 90% confidence interval for 𝛽1 and interpret the result.
Solutions
32 4638 305
𝑋 𝑇 𝑋 = (4638 695486 43594)
305 43594 3157
55
42460
𝑋 𝑇 𝑌 = (6397869)
418386
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
1.695446553 −0.007730243607 −0.057053835 42460
= (−0.007730243607 0.00004593937769 0.0001124621695 6397869)
) (
−0.057053835 0.0001124621695 0.004275813752 418386
−1338.951106
= ( 12.7405741 )
85.95300626
−𝟏𝟑𝟑𝟖. 𝟗𝟓𝟏𝟏𝟎𝟔 𝛽̂0
̂ = ( 𝟏𝟐. 𝟕𝟒𝟎𝟓𝟕𝟒𝟏 )=(𝛽̂1 )
∴ (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒀 = 𝜷
𝟖𝟓. 𝟗𝟓𝟑𝟎𝟎𝟔𝟐𝟔 𝛽̂2
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑇
𝑆𝑆𝑇 = 𝑌 𝑌 −
𝑛
(42460)2
𝑆𝑆𝑇 = 61138902 − = 𝟒𝟕𝟗𝟗𝟕𝟖𝟗. 𝟓
32
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑅𝑒𝑔 = 𝛽̂ 𝑇 𝑋 𝑇 𝑌 −
𝑛
42460 (42460)2
𝑆𝑆𝑅𝑒𝑔 = (−1338.951106 12.7405741 ) (
85.95300626 6397869 ) −
32
418386
(42460)2
𝑆𝑆𝑅𝑒𝑔 = 60622194.59 − = 𝟒𝟐𝟖𝟑𝟎𝟖𝟐. 𝟎𝟗
32
𝑆𝑆𝑅𝑒𝑠 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑔=4799789.5 − 4283082.09 =516707.41
56
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Regression 4283082.09 2 2141541.045 𝟏𝟐𝟎. 𝟏𝟗
Error or
Residue 516707.41 29 17817.4969
Total 4799789.5 31
Hypothesis
(a) 𝐻0 : 𝛽2 = 0 (The mean auction price of a clock decreases as the number of
bidders increases.
𝐻1 : 𝛽2 > 0 (The mean auction price of a clock increases as the number of bidders
increases).
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡0.95,29 = 1.6991
Calculation
̂2 −𝛽2
𝛽
𝑡= .
√𝑀𝑆𝐸(𝐶22 )
Conclusion
Since |𝑡| = 9.85 > 𝑡0.95,29 = 1.6991, we reject 𝐻0 at 5% level of significant and
conclude that the mean auction price of a clock increases as the number of bidders
increases.
57
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 𝑘 − 1) √𝑀𝑆𝐸(𝐶11 )
2
12.7405741 ± (1.6991)√17817.4969(0.00004593937769)
12.7405741 ± (1.5372158)
(𝟏𝟏. 𝟐, 𝟏𝟒. 𝟑)
Interpretation
Thus, we are 90% confident that 𝛽1 falls between 11.2 and 14.3. Since 𝛽1 is the slope
of the line relating auction price (y) to age of the clock (𝑥1 ), we conclude that price
increases between $11.2 and $14.3 for every 1-year increase in age, holding number
of bidders (𝑥1 ) constant.
6.4 Multicollinearity
Multicollinearity refers to a situation in which two or more explanatory (independent)
variables in a multiple regression model are highly linearly related. More commonly,
the issue of multicollinearity arises when there is an approximate linear relationship
among two or more independent variables. Multicollinearity can lead to wider
confidence intervals and less reliable probability values (P-values) for the independent
variables.
Causes of Multicollinearity
Multicollinearity occurs when the variables are highly correlated to each other.
Multicollinearity can result from the repetition of the same kind of variable.
It is caused by the inclusion of a variable which is computed from other
variables in the data set.
How to deal with Multicollinearity
The following are the solutions;
Remove some of the highly correlated independent variables.
Perform an analysis designed for highly correlated variables such as principle
components analysis.
58
How to Check for Multicollinearity
The effects of multicollinearity may be easily demonstrated. The diagonal elements of
the matrix 𝐶 = (𝑋 𝑇 𝑋)−1 can be written as
1
𝐶𝑗𝑗 = , 𝑗 = 1,2, … 𝑘
(1−𝑅𝑗2 )
Where 𝑅𝑗2 is the coefficient of multiple determination resulting from regressing 𝑥𝑗 on the
other 𝑘 − 1 regressor variables. Clearly, the stronger the linear dependency of 𝑥𝑗 on
the remaining regressor variables, and hence the stronger the multicollinearity, the
larger the value of 𝑅𝑗2 will be. Recall that 𝑉𝑎𝑟(𝛽̂𝑗 ) = 𝜎 2 𝐶𝑗𝑗 Therefore, we say that the
1
variance of 𝛽̂𝑗 is “inflated’’ by the quantity . Consequently, we define the variance
(1−𝑅𝑗2 )
59
Example
Consider the wire bond pull strength data from example 6.3.1
13 41.95 12 500
𝑛, = 25, ∑25 25 2 25 25
𝑖=1 𝑥𝑖1 = 206, ∑𝑖=1 𝑥𝑖1 = 2396 , ∑𝑖=1 𝑥𝑖1 𝑥𝑖2 = 77177, ∑𝑖=1 𝑥𝑖2 =8294,
∑25 2
𝑖=1 𝑥𝑖2 = 3531848
Calculate
(a) 𝑉𝐼𝐹(𝛽1 )
(b) 𝑉𝐼𝐹(𝛽2 )
60
Solutions
To calculate the variance inflation factor for 𝛽1 and 𝛽2, we need to calculate 𝑅12 and
𝑅22 . 𝑅12 and 𝑅22 are the coefficients of determinations for the model when 𝑥1 is
regressed on the remaining variables and when 𝑥2 is regressed on the remaining
variables.
(206)2
𝑆𝑆𝑇 = 2396 − = 𝟔𝟗𝟖. 𝟓𝟔
25
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
206 × 8294 2
(77177 − )
SSReg = 25 = 𝟏𝟎𝟎. 𝟎𝟑𝟏𝟏𝟏𝟏𝟓
(8294)2
3531848 −
25
𝑆𝑆𝑅𝑒𝑔
𝑅12 =
𝑆𝑆𝑇
100.0311115
𝑅12 = = 𝟎. 𝟏𝟒𝟑𝟏𝟗𝟔𝟏𝟔𝟐
698.56
1
𝑉𝐼𝐹(𝛽1 ) =
(1 − 𝑅12 )
1
𝑉𝐼𝐹(𝛽1 ) = = 1.167128293 = 𝟏. 𝟐
(1 − 0.143196162)
(8294)2
𝑆𝑆𝑇 = 3531848 − = 𝟕𝟖𝟎𝟐𝟑𝟎. 𝟓𝟔
25
61
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
206 × 8294 2
(77177 − )
S SReg = 25 = 𝟏𝟏𝟏𝟕𝟐𝟔. 𝟎𝟐𝟐𝟑
(206)2
2396 −
25
𝑆𝑆𝑅𝑒𝑔
𝑅22 =
𝑆𝑆𝑇
111726.0223
𝑅22 = = 𝟎. 𝟏𝟒𝟑𝟏𝟗𝟔𝟏𝟔𝟐
780230.56
1
𝑉𝐼𝐹(𝛽2 ) =
(1 − 𝑅12 )
1
𝑉𝐼𝐹(𝛽2 ) = = 1.167128293 = 𝟏. 𝟐
(1 − 0.143196162)
Conclusion
Since both 𝑉𝐼𝐹1and 𝑉𝐼𝐹2 are small, there is no problem with multicollinearity.
Example
Consider the wire bond pull strength data from example 6.3.1, construct a 95%
prediction interval on the wire bond pull strength when the wire length is 𝑥1 = 8 and
the die height is 𝑥2 = 275.
62
Solution
2.263791003
𝑋0𝑇 = (1 8 275) and 𝒚 ̂ = (1
̂𝟎 = 𝑿𝑻𝟎 𝜷 )
8 275 2.744269642)
(
0.012527811
̂𝟎 = 𝟐𝟕. 𝟔𝟔
𝒚
0.214652616 −0.007490914218 −0.0003403890868
(𝑋 𝑇 𝑋)−1 = ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
𝑋0𝑇 (𝑋 𝑇 𝑋)−1 𝑋0
0.214652616 −0.007490914218 −0.0003403890868 1
= (1 8 275) ( −0.007490914218 0.001670763131 −0.00001891781403) ( 8 )
−0.0003403890868 −0.00001891781403 0.000001495876159 275
0.061118303
𝑋0𝑇 (𝑋 𝑇 𝑋)−1 𝑋0 = (1 8 275 ) ( 0.0006727919718 ) = 𝟎. 𝟎𝟒𝟒𝟒
−0.00008036565532
63
Standardized Residuals
Standardized residuals are often more useful than the ordinary residuals when
𝑒𝑖
assessing residual magnitude and is given by 𝑑𝑖 = .
√𝑀𝑆𝐸
𝑒𝑖
𝑟𝑖 =
√𝑀𝑆𝐸(1 − ℎ𝑖𝑖 )
Where ℎ𝑖𝑖 is the 𝑖 𝑡ℎ diagonal element of the matrix ℎ𝑖𝑖 = 𝑋𝑖𝑇 (𝑋 𝑇 𝑋)−1 𝑋.
Example
Consider the wire bond pull strength data from example 6.3.1
(a) Calculate the standardized residuals corresponding to 𝑒15 and 𝑒17 .
(b) Calculate studentized residuals corresponding to 𝑒15 and 𝑒17 .
Solutions
0.214652616 −0.007490914218 −0.0003403890868
(𝑋 𝑇 𝑋)−1 = ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
𝑒𝑖
𝑑𝑖 =
√𝑀𝑆𝐸
64
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Regression 5990.771 2 2995.3856 𝟓𝟕𝟐. 𝟐
Error or
Residue 115.174 22 5.235
Total 6105.945 24
𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
𝑦̂𝑖 = 2.263791003 + 2.744269642𝑥1 + 0.012527811𝑥2
𝑒15 means 𝑛 = 15, 𝑦 = 21.65, 𝑥1 = 4 𝑥2 =205
𝑦̂𝑖 = 2.263791003 + 2.744269642(4) + 0.012527811(205) = 15.81
𝑒15 = 21.65 − 15.81 = 𝟓. 𝟖𝟒
𝑦̂𝑖 = 2.263791003 + 2.744269642(20) + 0.012527811(600) = 64.666
𝑒17 = 69 − 64.666 = 𝟒. 𝟑𝟑
𝑒𝑖
(a) 𝑑𝑖 =
√𝑀𝑆𝐸
5.84
𝑑15 = = 𝟐. 𝟓𝟓
√5.235
𝑒𝑖
𝑑𝑖 =
√𝑀𝑆𝐸
4.33
𝑑17 = = 𝟏. 𝟖𝟗
√5.235
𝑒𝑖
(b) 𝑟𝑖 =
√𝑀𝑆𝐸 (1−ℎ𝑖𝑖 )
65
𝑋𝑖𝑇 = (1 4 205)
ℎ15
0.214652616 −0.007490914218 −0.0003403890868 1
= (1 4 205) ( −0.007490914218 0.001670763131 −0.00001891781403) ( 4 )
−0.0003403890868 −0.00001891781403 0.000001495876159 205
0.114909196
ℎ15 = (1 4 )
205 ( −0.00468601357 ) = 𝟎. 𝟎𝟕𝟑𝟕
−0.0001094057303
ℎ17
0.214652616 −0.007490914218 −0.0003403890868 1
= (1 20 ) (
600 −0.007490914218 0.001670763131 −0.00001891781403 ) ( 20 )
−0.0003403890868 −0.00001891781403 0.000001495876159 600
−0.13939912
ℎ17 = (1 20 600) ( 0.014573659 ) = 𝟎. 𝟐𝟓𝟗𝟑
0.000178780328
5.84
𝑟15 = = 𝟐. 𝟔𝟓
√5.235(1 − 0.0737)
4.33
𝑟17 = = 𝟐. 𝟐
√5.235(1 − 0.2593)
66
𝑛 𝑛 𝑛
𝛽0 ∑ 𝑤𝑖 + 𝛽1 ∑ 𝑤𝑖 𝑥𝑖 = ∑ 𝑤𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝛽0 ∑ 𝑤𝑖 𝑥𝑖 + 𝛽1 ∑ 𝑤𝑖 𝑥𝑖2 = ∑ 𝑤𝑖 𝑥𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
1
Each weight is inversely proportional to the error variance (𝑤𝑖 = ). An observation
𝑥𝑖
with small error variance has a large weight and it contains relatively more information
than an observation with large error variance (small weight).
67
Tutorial Sheet 6
1. The following data represent the relationship between the number of alignment
errors and the number of missing rivets for 10 different aircrafts.
320
𝑇
𝑋 𝑌 = (36768), 𝑆𝑆𝑇 = 𝟕𝟒𝟐𝟎
9965
(a) Use the results above to compute 𝛽̂ and fit the model 𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜀
(b) Write the analysis of variance table.
(c) Construct a 95% confidence interval for each regression coefficient in the model.
68
(d) Compute t statistics to test simple hypothesis for regression coefficient is equal to
0. State the hypothesis for 𝛽1 and 𝛽2 in words and write the conclusion for each
regression coefficient.
𝑥1 1.6 15.5 22 43 33 40
𝑥2 851 816 1058 1201 1357 1115
𝑦 293 230 172 91 113 125
𝑛 = 10, ∑10 10 10 10
𝑖=1 𝑦𝑖 = 1916 ∑𝑖=1 𝑥𝑖1 = 223, ∑𝑖=1 𝑥𝑖2 = 553, ∑𝑖=1 𝑥𝑖1 𝑥𝑖2 =12352,
∑10 2 10 2 10 10
𝑖=1 𝑥𝑖1 =5200.9, ∑𝑖=1 𝑥𝑖2 =31729, ∑𝑖=1 𝑥𝑖2 𝑦𝑖 = 104736.8 ∑𝑖=1 𝑥𝑖1 𝑦𝑖 = 43550.8,
∑10 2
𝑖=1 𝑦𝑖 =371595.6
69
Number of Bars of Soap Coupons Redeemed Proportion on Sale
31 22 0.87
24 15 0.79
17 9 0.41
14 3 0.36
13 4 0.38
26 15 0.69
10 8 0.1
33 24 0.76
17 10 0.47
28 21 0.82
33 27 0.85
42 31 0.95
11 1 0.09
18 9 0.44
12 4 0.17
(a) Which variable is the dependent variable and which ones are the independent
variables?
(b) Construct a multiple regression model.
(c) Interpret the coefficients of the model.
(d) Write the ANOVA table
(e) Calculate the standardized residuals corresponding to 𝑒13
(f) Calculate studentized residuals corresponding to 𝑒13 .
6. The following data represent travel times in a downtown area of a certain city. The
independent, or input, variable is the distance to be travelled.
70
Distance
0.5 1 1.5 2 3 4 5 6 8 10
(X)
Time (Y) 15 15.1 16.5 19.9 27.7 29.7 26.7 35.9 42 49.4
Calculate the weighted least square and fit the weighted least squares regression
equation.
71
Unit 7
Analysis of Variance and
Covariance
7.1 Analysis of Variance
Analysis of Variance (ANOVA) is a collection of statistical models and their
associated estimation procedures (such as the variation among and between
groups) used to test differences between two or more means. ANOVA helps when
conducting research to figure out if to reject the null hypothesis or not. Experiments
involves collecting and analysing data from a comparative investigations. There
are two types of comparative investigations namely observational study and
experimental study. Observational study involves drawing conclusion about the
population based on the sample where the independent variable is not under the
control of the researcher because of ethical concerns. Experimental study
involves the random assignment of participants into different groups.
72
7.1.2 Terminologies
Experimental Unit: This is the physical entity which can be assigned at
random to a treatment.
Confounding: Inability of the investigator to distinguish between two
explanatory variables (𝑥1 𝑎𝑛𝑑 𝑥2 ) as the cause of the response 𝑦. In other
words, Confounding occurs when the experimental controls do not allow the
experimenter to reasonably eliminate plausible alternative explanations for an
observed relationship between independent and dependent variables.
Interaction: Refers to a situation where the relationship between one
explanatory variable and the response variable is affected by the value of
another explanatory variable. In other words, an interaction may arise when
considering the relationship among three or more variables, and describes a
situation in which the simultaneous influence of two variables on a third is not
additive. Most commonly, interactions are considered in the context of
regression analyses.
Blocking: Involves forming groups of units in which one or more explanatory
are held fixed while different treatments are applied to the units within the
group.
Blocking prevents confounding due to those explanatory variables held fixed
within each block
Improves the precision of the conclusions.
Replication: Involves applying each treatment to more than one unit in sample.
Replication helps;
Reduce sample error
Estimate the precision of our conclusions
Randomization: Is a process of assigning the treatments to the sample units
at random. Randomization helps to reduce the risk of confounding by unknown
explanatory variables.
73
7.1.3 Assumptions
𝑘 𝑘
𝜇1 + 𝜇2
𝐶 = ∑ 𝑐𝑖 𝜇𝑖 𝑤ℎ𝑒𝑟𝑒 ∑ 𝑐𝑖 = 0 𝑒. 𝑔. 𝜇1 − 𝜇2 , − 𝜇3
2
𝑖=1 𝑖=1
𝑘 𝑘
𝐶 = ∑ 𝑐𝑖 𝜇𝑖 𝑏𝑦 𝐶̂ = ∑ 𝑐𝑖 𝑦̅𝑖.
𝑖=1 𝑖=1
Now,
𝑘 𝑘 𝑘 𝑘
𝜎2 𝜎2 𝑀𝑆𝐸
̂
𝑉𝑎𝑟(𝐶 ) = 𝑉𝑎𝑟 (∑ 𝑐𝑖 𝑦̅𝑖. ) = ∑ 𝑐𝑖 2
= ∑ 𝑐𝑖 2 = ∑ 𝑐𝑖 2
𝑛 𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑘 𝑘
𝑀𝑆𝐸
∑ 𝑐𝑖 𝑦̅𝑖. ± 𝑡𝑛𝑘−𝑘,𝛼/2 √ ∑ 𝑐𝑖 2
𝑛
𝑖=1 𝑖=1
74
Example
An investigation was conducted to compare five brands of AAA alkaline batteries. The
response variable was the watt – hour delivered by the battery under the test
conditions. Ten batteries of each brand (A, B, C, D, and E) were purchased from a
single store and tested in a random order. The data is shown below:
A B CA B D C DE E Total
1.37 1.24 1.27
1.37 1.241.191.27 1.28
1.19 1.28 6.35
1.35 1.25 1.27
1.35 1.251.221.27 1.24 1.24
1.22 6.33
1.34 1.18 1.23
1.34 1.181.141.23 1.21 1.21
1.14 6.10
1.37 1.18 1.25
1.37 1.181.211.25 1.26 1.26
1.21 6.27
1.33 1.19 1.25
1.33 1.191.181.25 1.23 1.23
1.18 6.18
1.39 1.23 1.29
1.39 1.231.231.29 1.25 1.25
1.23 6.39
1.31 1.27 1.27
1.31 1.271.171.27 1.26
1.17 1.26 6.28
1.34 1.22 1.21 1.18 1.27 6.22
1.34 1.22 1.21 1.18 1.27
1.32 1.20 1.22 1.19 1.27 6.20
1.32 1.20 1.22 1.19 1.27
1.34 1.20 1.27 1.23 1.26 6.30
1.34 1.20 1.27 1.23 1.26
Total 13.46 12.16 12.53 11.94 12.53 62.62
(a) Write down a model for the data and estimate all the parameters in your model
(b) Are there any significant differences among battery brands
(c) Brands A and B are relatively high priced. Is there any evidence that these two
brands have more average power?
Solutions
Model
From the information above, the data that we have is for completely randomise design
(balance one way analysis of variance)
(a) 𝑦𝑖𝑗 = 𝜇 + 𝜏𝑖 + 𝜀𝑖𝑗 , 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1,2,3,4,5. 𝑎𝑛𝑑 𝑗 = 1,2,3, … ,10 𝑎𝑛𝑑 𝜀𝑖𝑗 ~ 𝑁(0, 𝜎 2 )
𝑦𝑖𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑏𝑎𝑡𝑡𝑒𝑟𝑦 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑 𝜏𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑏𝑎𝑡𝑡𝑒𝑟𝑦 𝑏𝑟𝑎𝑛𝑑 𝑒𝑓𝑓𝑒𝑐𝑡.
75
𝑦3. = 12.53 𝑦̅3. = 1.253, 𝑦4. = 11.94 𝑦̅4. = 1.194
𝑘
𝑦𝑖 . 2 𝑦.. 2 13.46 2 12.16 2 12.53 2 11.94 2 12.53 2 62.62 2
𝑆𝑆𝑇𝑟𝑡 = ∑ − = + + + + −
𝑛 𝑛𝑘 10 10 10 10 10 50
𝑖=1
= 78.56026 − 78.425288
= 𝟎. 𝟏𝟑𝟒𝟗𝟕𝟐
𝑘 𝑛
2
𝑦.. 2 2 2 2
62.62 2
𝑆𝑆𝑇 = ∑ ∑ 𝑦𝑖𝑗 − = 1.37 + 1.35 + ⋯ + 1.26 − = 𝟎. 𝟏𝟔𝟔𝟓𝟏𝟐
𝑛𝑘 50
𝑖=1 𝑗=1
𝐴𝑡 5% 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒, 2.53 < 𝐹(4, 45) < 2.61 𝐹 ∗ > 𝐹(4, 45)
Alternatively,
76
(c) In this case, the contrast is given by;
𝜇1 + 𝜇2 𝜇3 + 𝜇4 + 𝜇5
𝐶= −
2 3
𝑘
1 1
𝐶̂ = ∑ 𝑐𝑖 𝑦̅𝑖. = (1.346 + 1.216) − (1.253 + 1.194 + 1.253)
2 3
𝑖=1
= 1.281 − 1.2333
= 0.048
𝑘
𝑀𝑆𝐸 0.0007 1 2 1 2 1 2 1 2 1 2
̂
𝑉𝑎𝑟(𝐶 ) = 2
∑ 𝑐𝑖 = [( ) + ( ) + (− ) + (− ) + (− ) ]
𝑛 10 2 2 3 3 3
𝑖=1
= 0.00005833
𝐻0 : 𝐶 = 0, 𝐻1 : 𝐶 > 0
𝐶̂ − 0 0.048 − 0
𝑇= = = 𝟔. 𝟐𝟖𝟒𝟕
𝑆𝑒(𝐶̂ ) 0.007637626
77
1 1
𝐿𝑆𝐷 = (𝑡𝛼 , 𝑘(𝑛 − 1)) √𝑀𝑆𝐸 ( + )
2 𝑛𝑖 𝑛𝑗
2𝑀𝑆𝐸
𝐿𝑆𝐷 = (𝑡𝛼 , 𝑘(𝑛 − 1)) √
2 𝑛
1 1
𝐿𝑆𝐷 = (𝑡𝛼 , (𝑘 − 1)(𝑛 − 1)) √𝑀𝑆𝐸 ( + )
2 𝑛𝑖 𝑛𝑗
Example 2
An Engineer is interested in determining if the cotton weight in fibre affects the tensile
strength. She runs an experiment with five levels of cotton weight percentage and gets
the following results:
(i) Is the tensile strength the same for all the five levels of cotton weight? Use
𝛼 = 0.05
(ii) Use the LSD method to make comparisons between pairs of means. Use
𝛼 = 0.05
Solutions
78
Weight of Cotton (%) Tensile Strength Total
15 7 7 15 11 9 49
20 12 17 12 18 18 77
25 14 18 18 19 19 88
30 19 25 22 19 23 108
35 7 10 11 15 11 54
Total 376
𝑘 𝑛
𝑦.. 2 376 2 376 2
𝑆𝑆𝑇 = ∑ ∑ 𝑦𝑖𝑗 2 − = 72 + 122 + ⋯ + 112 − = 𝟔𝟐𝟗𝟐 −
𝑛𝑘 25 25
𝑖=1 𝑗=1
= 𝟔𝟑𝟔. 𝟗𝟔
𝑘
𝑦𝑖 . 2 𝑦.. 2 49 2 77 2 88 2 108 2 54 2 376 2
𝑆𝑆𝑇𝑟𝑡 = ∑ − = + + + + −
𝑛 𝑛𝑘 5 5 5 5 5 25
𝑖=1
= 𝟒𝟕𝟓. 𝟕𝟔
ANOVA table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 475.76 4 118.94 14.7568
𝐸𝑟𝑟𝑜𝑟 161.2 20 8.06
𝑇𝑜𝑡𝑎𝑙 636.96 24
Conclusion
1 1
𝐿𝑆𝐷 = (𝑡𝛼 , 𝑘(𝑛 − 1)) √𝑀𝑆𝐸 ( + )
2 𝑛𝑖 𝑛𝑗
79
2(8.06)
= (𝑡0.975,20 ) √
5
= 2.086(1.795550055)
= 3.745517415
= 𝟑. 𝟕𝟓
𝑦̅1. = 9.8, 𝑦̅2. = 15.4, 𝑦̅3. = 17.6, 𝑦̅4. = 21.6 𝑎𝑛𝑑 𝑦̅5. = 10.8
Ordering means:
|𝑦̅5. − 𝑦̅1. | = 1 < 𝐿𝑆𝐷 𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇1 𝑎𝑛𝑑 𝜇5 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
|𝑦̅2. − 𝑦̅5. | = 4.6 > 𝐿𝑆𝐷 𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇2 𝑎𝑛𝑑 𝜇5 𝑖𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
|𝑦̅3. − 𝑦̅2. | = 2.2 < 𝐿𝑆𝐷 𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇2 𝑎𝑛𝑑 𝜇3 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
Conclusion
The tensile strength of fibre at 15% cotton weight is not significantly different from
the tensile strength of fibre at 35% cotton weight.
The tensile strength of fibre at 20% and 25% cotton weight are not significantly
different
The tensile strength of fibre at 35% cotton weight is significantly different from the
tensile strength of fibre at 20% cotton weight.
The tensile strength of fibre at 30% cotton weight is significantly different from the
rest.
80
7.1.6 The Latin Square Design (Three Factor Analysis)
Column
Year 𝐹1 𝐹2 𝐹3 𝐹4
1 𝐴 𝐵 𝐶 𝐷
Row 2 𝐶 𝐴 𝐷 𝐵
3 𝐵 𝐷 𝐴 𝐶
4 𝐷 𝐶 𝐵 𝐴
Model
𝑦𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛽𝑗 + 𝜏𝑘 + 𝜀𝑖𝑗𝑘
Where
𝑖 = 1,2,3, … , 𝑟
𝑗 = 1,2,3, … , 𝑟
𝑘 = 1,2,3, … , 𝑟
81
𝜀𝑖𝑗𝑘 is the error made in the 𝑖 𝑡ℎ column and in the 𝑗 𝑡ℎ row under the 𝑘 𝑡ℎ treatment
and 𝜀𝑖𝑗𝑘 ~𝑁(0, 𝜎 2 )
𝑟 𝑟 𝑟
∑ 𝛼𝑖 = 0, ∑ 𝛽𝑗 = 0, ∑ 𝜏𝑘 = 0
𝑖=1 𝑖=1 𝑖=1
Calculations
𝑟 𝑟 𝑟
2
(𝑦… )2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 − 2
𝑟
𝑖=1 𝑖=1 𝑖=1
𝑟
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑅 = ∑ − 2
𝑟 𝑟
𝑖=1
𝑟
𝑦.𝑗2 (𝑦… )2
𝑆𝑆𝐶 = ∑ − 2
𝑟 𝑟
𝑖=1
𝑘
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑇𝑟𝑡 = ∑ − 2
𝑘 𝑟
𝑖=1
ANOVA Table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Rows 𝑆𝑆𝑅 𝑟−1 𝑀𝑆𝑅 𝐹1
Columns 𝑆𝑆𝐶 𝑟−1 𝑀𝑆𝐶 𝐹2
Treatment 𝑆𝑆𝑇𝑟𝑡 𝑟−1 𝑀𝑆𝑇𝑟𝑡 𝐹3
Error 𝑆𝑆𝐸 (𝑟 − 1)(𝑟 − 2) 𝑀𝑆𝐸
Total 𝑆𝑆𝑇 𝑟2 − 1
Decision Rule
𝛼
Reject 𝐻0 if 𝐹𝑖∗ > 𝑓(𝑟−1),((𝑟−1)(𝑟−2))
82
Hypothesis
𝐻0 : 𝜏1 = 𝜏2 = 𝜏3 = ⋯ = 𝜏𝑘 = 0
𝐻1 : 𝜏𝑖 ≠ 𝜏𝑗 for some 𝑖 ≠ 𝑗
Example
Course
Time Period Algebra Geometry Statistics Calculus Total
1 𝐴 84 𝐵 79 𝐶 63 𝐷 97 323
2 𝐵 91 𝐶 82 𝐷 80 𝐴 93 346
3 𝐶 59 𝐷 70 𝐴 77 𝐵 80 286
4 𝐷 75 𝐴 91 𝐵 75 𝐶 68 309
Total 309 322 295 338 𝟏𝟐𝟔𝟒
(a) Test the hypothesis that different professors have no effect on the grades.
(b) Use LSD method to compare one grade by the 4 professors.
(c) Is it true that professor 𝐶 gives different grades compared to other professors?
83
Solutions
(a) Hypothesis
Calculations
𝐴 = 𝑦.1 = 84 + 91 + 77 + 93 = 𝟑𝟒𝟓
𝐵 = 𝑦.2 = 91 + 79 + 75 + 80 = 𝟑𝟐𝟓
𝐶 = 𝑦.3 = 59 + 82 + 63 + 68 = 𝟐𝟕𝟐
𝐷 = 𝑦.4 = 75 + 70 + 80 + 97 = 𝟑𝟐𝟐
𝑟 𝑟 𝑟
(𝑦… )2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 2 −
𝑟2
𝑖=1 𝑖=1 𝑖=1
𝑟 𝑟 𝑟
2
(𝑦… )2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 − 2
𝑟
𝑖=1 𝑖=1 𝑖=1
4 4 4
(1264)2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 2 −
42
𝑖=1 𝑖=1 𝑖=1
(1264)2
𝑆𝑆𝑇 = 842 + 912 + ⋯ 682 − = 𝟏𝟕𝟑𝟖
42
𝑟
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑅 = ∑ − 2
𝑟 𝑟
𝑖=1
84
3092 3222 2952 3382 (1264)2
𝑆𝑆𝐶 = + + + − = 𝟐𝟓𝟐. 𝟓
4 4 4 4 42
𝑘
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑇𝑟𝑡 = ∑ − 2
𝑘 𝑟
𝑖=1
ANOVA Table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Rows 474.5 3 158.16667 𝐹1 = 3.301
Columns 252.5 3 84.16667 𝐹2 = 1.757
Treatment 723.5 3 241.16667 𝐹3 = 5.033
Error 287.5 6 47.91667
Total 1738 15
Decision Rule
𝛼=0.05
Reject 𝐻0 if 𝐹3∗ > 𝑓3,6 = 4.75
Conclusion
𝛼=0.05
Since 𝐹3∗ = 5.033 > 𝑓3,6 = 4.75, we reject 𝐻0 at 5% level of significance and
conclude that different professors have effect on the grades.
Note
85
(b) Using Least Significance Difference (LSD) Method
2𝑀𝑆𝐸
𝐿𝑆𝐷 = (𝑡0.05 ) √
2
,6 𝑛
2(47.91667)
𝐿𝑆𝐷 = (𝑡0.975,6 )√
4
2(47.91667)
𝐿𝑆𝐷 = (2.447)√
4
Ordering means:
Conclusion
86
(c) Using Contrast Method
𝜏1 + 𝜏2 + 𝜏4
𝐶̂ = − 𝜏3
3
86.25 + 81.25 + 80.5
𝐶̂ = − 68
3
𝐶̂ = 𝟏𝟒. 𝟔𝟔𝟕
𝑘
𝑀𝑆𝐸 47.91667 1 2 1 2 1 2
𝑉𝑎𝑟(𝐶̂ ) = ∑ 𝑐𝑖 2 = [( ) + ( ) + ( ) + (−1)2 ]
𝑛 4 3 3 3
𝑖=1
𝐶̂ ± (𝑡0.975,6 )√𝑉𝑎𝑟(𝐶̂ )
(14.667 ± (2.447)√(15.972))
(14.667 ± 9.779)
(𝟒. 𝟖𝟗, 𝟐𝟒. 𝟒𝟓)
Conclusion
87
Assumptions
For each independent variable, the relationship between the dependent variable
(𝑦) and the covariate (𝑥) is linear.
The covariate is independent of the treatment effects (𝑖. 𝑒 the covariate and the
independent variables are independent).
Model
Where
𝑖 = 1,2,3, … , 𝑘
𝑗 = 1,2,3, … , 𝑛
88
Decision Rule
𝛼
Reject 𝐻0 if 𝐹 ∗ > 𝑓(𝑘−1),(𝑁−𝑘−1)
Hypothesis
𝐻0 : 𝜏1 = 𝜏2 = 𝜏3 = ⋯ = 𝜏𝑘 = 0
𝐻1 : 𝜏𝑖 ≠ 𝜏𝑗 for some 𝑖 ≠ 𝑗
𝐾 𝑛
2
𝑦…2
𝑆𝑆𝑇𝑌 = ∑ ∑ 𝑦𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
𝑘
𝑦.𝑖2 𝑦…2
𝑆𝑆𝑇𝑟𝑡𝑌 = ∑ −
𝑛 𝑁
𝑖=1
𝐾 𝑛
2
𝑥…2
𝑆𝑆𝑇𝑋 = ∑ ∑ 𝑥𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
𝑘
𝑥.𝑖2 𝑥…2
𝑆𝑆𝑇𝑟𝑡𝑋 = ∑ −
𝑛 𝑁
𝑖=1
𝐾 𝑛𝑖
(𝑥… )(𝑦… )
𝑆𝑆𝑇𝑋𝑌 = ∑ ∑(𝑥𝑖𝑗 )(𝑦𝑖𝑗 ) −
𝑁
𝑖=1 𝑗=1
𝑘
(𝑥.𝑖 )(𝑦.𝑗 ) (𝑥… )(𝑦… )
𝑆𝑆𝑇𝑟𝑡𝑋𝑌 = ∑ −−
𝑛𝑗 𝑁
𝑖=1
89
Summary
Source 𝑋 𝑌 𝑋𝑌
treatment 𝑆𝑆𝑇𝑟𝑡𝑋 𝑆𝑆𝑇𝑟𝑡𝑌 𝑆𝑆𝑇𝑟𝑡𝑋𝑌
Error 𝑆𝑆𝐸𝑋 𝑆𝑆𝐸𝑌 𝑆𝑆𝐸𝑋𝑌
Total 𝑆𝑆𝑇𝑋 𝑆𝑆𝑇𝑌 𝑆𝑆𝑇𝑋𝑌
2
(𝑆𝑆𝑇𝑥𝑦 )
𝑆𝑆𝑇(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 −
𝑆𝑆𝑇𝑥
2
(𝑆𝑆𝐸𝑥𝑦 )
𝑆𝑆𝐸(𝑎𝑑𝑗) = 𝑆𝑆𝐸𝑦 −
𝑆𝑆𝐸𝑥
Or
2 2
(𝑆𝑆𝐸𝑥𝑦 ) (𝑆𝑆𝑇𝑥𝑦 )
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 − 𝑆𝑆𝐸𝑦 + −
𝑆𝑆𝐸𝑥 𝑆𝑆𝑇𝑥
ANCOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Treatment 𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) 𝑘−1 𝑀𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) 𝑀𝑆𝑇𝑟𝑡(𝑎𝑑𝑗)
𝑀𝑆𝐸(𝑎𝑑𝑗)
Error 𝑆𝑆𝐸(𝑎𝑑𝑗) 𝑁−𝑘−1 𝑀𝑆𝐸(𝑎𝑑𝑗)
Total 𝑆𝑆𝑇(𝑎𝑑𝑗) 𝑁−2
90
Example
An engineer is studying the cutting effect of the cutting speed on the rate of method
removal in a machine operation. However, the rate of metal removal is also related to
the hardness of the test specimen. Five observations are taken at each cutting speed.
The amount of metal removal (𝑦) and the hardness of the specimen (𝑥) are shown in
the table below;
Cutting Speed
1000 1200 1400
𝑦 𝑥 𝑦 𝑥 𝑦 𝑥
68 120 112 165 118 175
90 140 94 140 82 132
98 150 65 120 73 124
77 125 74 125 92 141
88 136 85 133 80 130
Total 𝟒𝟐𝟏 𝟔𝟕𝟏 𝟒𝟑𝟎 𝟔𝟖𝟑 𝟒𝟒𝟓 𝟕𝟎𝟐
(a) Obtain the sum of squares and products for single factor analysis of covariance.
(b) Compute the adjusted totals, standard errors and adjusted treatment means.
(c) Analyse the data and test the hypothesis that the rate of metal removal is also
related to the hardness of the test specimen. Take 𝛼 = 5%.
(d) Use the Analysis of Variance approach to test the hypothesis that the rate of metal
removal is also related to the hardness of the test specimen. Take 𝛼 = 5%.
(e) Remove the effect of the thickness.
Solutions
(a)
𝐾 𝑛
2
𝑦…2
𝑆𝑆𝑇𝑌 = ∑ ∑ 𝑦𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
91
2
(421 + 430 + 445)2
2 2
𝑆𝑆𝑇𝑌 = 68 + 90 + ⋯ 80 − = 𝟑𝟏𝟕𝟑. 𝟔
15
𝑘
𝑦.𝑖2 𝑦…2
𝑆𝑆𝑇𝑟𝑡𝑌 = ∑ −
𝑛 𝑁
𝑖=1
𝐾 𝑛
2
𝑥…2
𝑆𝑆𝑇𝑋 = ∑ ∑ 𝑥𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
𝑘
𝑥.𝑖2 𝑥…2
𝑆𝑆𝑇𝑟𝑡𝑋 = ∑ −
𝑛 𝑁
𝑖=1
𝐾 𝑛𝑖
(𝑥… )(𝑦… )
𝑆𝑆𝑇𝑋𝑌 = ∑ ∑(𝑥𝑖𝑗 )(𝑦𝑖𝑗 ) −
𝑁
𝑖=1 𝑗=1
(1296)(2056)
𝑆𝑆𝑇𝑋𝑌 = (68)(120) + (90)(140) + ⋯ + (80)(130) − = 𝟑𝟑𝟎𝟕. 𝟔
15
𝑘
(𝑥.𝑖 )(𝑦.𝑗 ) (𝑥… )(𝑦… )
𝑆𝑆𝑇𝑟𝑡𝑋𝑌 = ∑ −−
𝑛𝑗 𝑁
𝑖=1
92
(421)(671) (430)(683) (445)(702) (1296)(2056)
𝑆𝑆𝑇𝑟𝑡𝑋𝑌 = + + − = 𝟕𝟓. 𝟖
5 5 5 15
Summary
Source 𝑋 𝑌 𝑋𝑌
Treatment 97.73 58.8 75.8
Error 3459.2 3114.8 3231.8
Total 3556.93 3173.6 3307.6
2
(𝑆𝑆𝑇𝑥𝑦 )
(b) 𝑆𝑆𝑇(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 − 𝑆𝑆𝑇𝑥
(3307.6)2
𝑆𝑆𝑇(𝑎𝑑𝑗) = 3173.6 − = 𝟗𝟕. 𝟖𝟓
3556.93
2
(𝑆𝑆𝐸𝑥𝑦 )
𝑆𝑆𝐸(𝑎𝑑𝑗) = 𝑆𝑆𝐸𝑦 −
𝑆𝑆𝐸𝑥
(3231.8)2
𝑆𝑆𝐸(𝑎𝑑𝑗) = 3114.8 − = 𝟗𝟓. 𝟒𝟓
3459.2
Or
2 2
(𝑆𝑆𝐸𝑥𝑦 ) (𝑆𝑆𝑇𝑥𝑦 )
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 − 𝑆𝑆𝐸𝑦 + −
𝑆𝑆𝐸𝑥 𝑆𝑆𝑇𝑥
(3231.8)2 (3307.6)2
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 3173.6 − 3114.8 + − = 𝟐. 𝟒
3459.2 3556.93
93
(c)
ANCOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Treatment 2.4 2 1.2 0.138
Error 95.45 11 8.677
Total 97.85 13
Decision Rule
𝛼=0.05
Reject 𝐻0 if 𝐹 ∗ > 𝑓2,11 = 3.98
Hypothesis
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 (The rate of metal removal is also related to the hardness of the test
specimen).
𝐻1 : 𝜇1 ≠ 𝜇2 = 𝜇3 (The rate of metal removal is not related to the hardness of the test
specimen).
Conclusion
𝛼=0.05
Since 𝐹 ∗ = 0.138 < 𝑓2,11 = 3.98, we fail to reject 𝐻0 at 5% level of significance and
conclude that the rate of metal removal is also related to the hardness of the test
specimen.
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Treatment 58.8 2 29.4 0.113
Error 3114.8 12 259.6
Total 3173.6 14
94
Conclusion
𝛼=0.05
Since 𝐹 ∗ = 0.113 < 𝑓2,12 = 3.88, we fail to reject 𝐻0 at 5% level of significance and
conclude that the rate of metal removal is also related to the hardness of the test
specimen.
𝑆𝑆𝐸𝑋𝑌 3231.8
𝛽= = = 𝟎. 𝟗𝟑𝟒𝟑
𝑆𝑆𝐸𝑋 3459.2
95