Module 2 MAT 350

MUKUBA UNIVERSITY
MATHEMATICS DEPARTMENT
MATHEMATICAL AND APPLIED

STATISTICS
MAT 350
MODULE 2-2018
𝜎11 𝜎12 . . . 𝜎1𝑛

𝜎21 𝜎22 . . . 𝜎2𝑛
. . .
∑=
.. .. .
(𝜎𝑛1 𝜎𝑛2 . . . 𝜎𝑛𝑛 )
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2
𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
1
Content
5. Random Vectors and Random Matrices
 Definition of Vectors and Matrices
 Properties of Matrices
 Types of Special Matrices
 Identity Matrix
 Idempotent Matrix
 Orthogonal Matrix
 Derivative of a Quadratic Form
 Expectation
 Variance-Covariance Matrix
 Properties of Covariance
 Multivariate Normal Distribution
6. Regression Models
 Simple Linear Regression Models
 Least Squares Estimation of 𝛽0 and 𝛽1.
 Properties of 𝛽0 and 𝛽1
 Confidence Intervals and Hypothesis Testing
 Multiple Regression Models
 Lest Square Estimation
 The F-Test of all Slopes
 Prediction
 The General Linear Hypothesis
 Residual Analysis
 Weighted Regression
 The General Least Squares Model
 Multicollinearity
7. Analysis of Variance and Covariance
 Analysis of Variance
 Assumptions of Analysis of Variance
 Completely Randomised Design
 Randomised Block Design
2
 The Latin Square Design
 Contrast Estimation
 Least Significant Difference Method (LSD)
 Analysis of Covariance
 Model
 The ANCOVA Table
 Removal of the Effect of the Thickness
3
Unit 5
Random Vectors and Random Matrices
When dealing with multiple random variables it is sometimes useful to use vector and
matrix notation. This makes the formulas more compact and let us use the facts from
linear algebra. A random vector or random matrix is a vector or matrix whose elements
are random variables. A random variable is a function defined for each element of a
sample space. In this chapter the focus is on vectors and matrices. For example, how
to write n random variables (𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ) in column vector form.
5.1.1 Definition (Rectangular Array of Elements)

A matrix 𝐴𝑛𝑚 = 𝐴𝑖𝑗 is said to be a rectangular array of elements if the dimensions of A
are n rows and m columns and can be written as
𝑎11 𝑎12 𝑎13 . . . 𝑎1𝑚

𝑎21 𝑎22 𝑎23 . . . 𝑎2𝑚
𝑎31 𝑎32 𝑎33 . . . 𝑎3𝑚
𝑎𝑖𝑗 = .. .. .. .. ..
. . . . .
(𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 …. 𝑎𝑛𝑚 )
5.1.2 Definition (Transpose of a Matrix)
If A is an 𝑛 × 𝑚, the transpose of A denoted by 𝐴𝑇 is a matrix 𝐴𝑇 = (𝑎𝑗𝑖 ) where
𝐴 = 𝑎𝑖𝑗 .
5.2 Properties of Matrices

Let A, B and C be matrices and λ be any scalar, then
 𝐴 + 𝐵 = 𝐵 + 𝐴 (Commutative Law)
 𝐴 + (𝐵 + 𝐶) = (𝐴 + 𝐵) + 𝐶
 𝐴(𝐵𝐶) = 𝐴𝐵𝐶
 𝜆(𝐴 + 𝐵) = 𝜆𝐴 + 𝜆𝐵
 (𝐴𝑇 )𝑇 = 𝐴
4
 (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇
 (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇
 (𝐴𝐵)−1 = 𝐵 −1 𝐴−1
 (𝐴𝐵𝐶)−1 = 𝐶 −1 𝐵 −1 𝐴−1
 (𝐴−1 )𝑇 = (𝐴𝑇 )−1
5.3 Types of Special Matrices

5.3.1 Identity Matrix
Let I and A be two matrices, then the matrix I is said to be an identity matrix if and
only if 𝐴𝐼 = 𝐼𝐴. That is
1 0
𝐼2 = ( ),
0 1
1 0 0
𝐼3 = (0 1 0)
0 0 1
1 0⋯ 0
𝐼𝑛 = ( ⋮ … ⋮)
0 0⋯ 1
5.3.2 Determinant of a Matrix

If A is an n Matrix, the determinant is a value that can be computed from the elements
of a square matrix. The determinant of a matrix A is denoted by det(𝐴) or |𝐴|For
𝑎 𝑏
example for a 2 × 2 matrix |𝐴| = | | = 𝑎𝑑 − 𝑏𝑐
𝑐 𝑑
5.3.3 Inverse of a Matrix

If A is an 𝑛 × 𝑛 matrix and if there exist a matrix C such that 𝐴𝐶 = 𝐶𝐴 = 𝐼 and A is not
a singular matrix, then the matric C is called the inverse of A denoted by 𝐴−1 .
5.3.4 Idempotent Matrix
An 𝑛 × 𝑛 matrix is said to be an idempotent matrix if 𝑃𝑛 = 𝑃 for all 𝑛 ∈ ℛ.
1 1 1 1 1 1 1 1
−2 −2 −2 −2
2 2 2 2 2
Example, let 𝐴 = ( 1 1 ), then 𝐴 = ( 1 1 )×( 1 1 ) = ( 1 1 )
−2 −2 −2 −
2 2 2 2 2
5
5.3.5 Orthogonal Matrix
A matrix P is said to be orthogonal if 𝑃𝑃𝑇 = 𝑃−1 𝑃 = 𝐼 𝑜𝑟 𝑃−1 = 𝑃𝑇

Example
cos 𝜃 − sin 𝜃 cos 𝜃 − sin 𝜃 cos 𝜃 sin 𝜃 1 0
Let 𝑃 = ( ), then 𝑃𝑃𝑇 = ( ), ( )=( )
sin 𝜃 cos 𝜃 sin 𝜃 cos 𝜃 −sin 𝜃 cos 𝜃 0 1
5.3.6 Symmetric Matrix
A symmetric matrix is a square matrix that is equal to its transpose. That is
𝐴 = 𝐴𝑇 (𝑎𝑖𝑗 = 𝑎𝑗𝑖 )
5.4.1 Definition (Column Vector and Row Vector)
A column vector 0r column matrix is an 𝑚 × 1 matrix, that is, a matrix consisting of a

single column of m elements
𝑥1
𝑥2
.
𝑋 = . . Similarly, a row vector or row matrix is a 1 × 𝑚 matrix, that is, a matrix
.
[𝑥𝑚 ]
consisting of a single row of m elements.
𝑋 = [𝑥1 𝑥2 . . . 𝑥𝑚 ]
Note
Let 𝑋 𝑇 = [𝑥1 𝑥2 . . . 𝑥𝑛 ] and 𝑌 𝑇 = [𝑦1 𝑦2 . . . 𝑦𝑛 ] be n independent
vectors, then
𝑋 𝑇 𝑌 = [𝑥1 𝑦1 + 𝑥2 𝑦2 + . . . +𝑥𝑛 𝑦𝑛 ]
𝑥1 + 𝑦1
𝑥2 + 𝑦2
.
𝑋+𝑌 = .
.
[𝑥𝑚 + 𝑦𝑚 ]
𝜆𝑋 𝑇 = [𝜆𝑥1 𝜆𝑥2 . . . 𝜆𝑥𝑛 ]
6
5.4.2 Rank of a Matrix
The rank of a matrix is the maximum number of linearly independent column vectors
in the matrix. To find the rank of a matrix we simply transform the matrix to its row
echelon form and count the number of non-zero rows.
5.4.3 Nullity of a Matrix
The nullity of a matrix A is the number of columns reduced to echelon form of A that
does not contain a leading entry. 𝑁𝑢𝑙𝑙𝑖𝑡𝑦 𝑜𝑓 𝐴 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑙𝑢𝑚𝑛𝑠 − 𝑅𝑎𝑛𝑘 𝐴.
5.4.4 Trace of a Matrix

The trace of an 𝑛 × 𝑛 square matrix A denoted by 𝑇𝑟(𝐴) is defined to be the sum of
the elements on the main diagonal. 𝑇𝑟(𝐴𝐵) = 𝑇𝑟(𝐵𝐴)
Example
𝑎11 𝑎12 𝑎13
𝑎
Find the trace of matrix A where 𝐴 = ( 21 𝑎22 𝑎23 )
𝑎31 𝑎32 𝑎33
Solution
𝑇𝑟(𝐴) = 𝑎11 + 𝑎22 + 𝑎33
5.5 Matrix Algebra

5.5.1 Quadratic Form
A quadratic form in n variables 𝑋 𝑇 = [𝑥1 𝑥2 . . . 𝑥𝑛 ] is a function of the form
𝑄(𝐴) = 𝑋 𝑇 𝐴𝑋 = ∑𝑛𝑖=1 ∑𝑛𝑗=1 𝑎𝑖𝑗 𝑥𝑖 𝑥𝑗 where 𝑎𝑖𝑗 are the constants. 𝐴 is called the matrix
of quadratic form.
Example
If 𝑄(𝑋) = 3𝑥12 + 8𝑥1 𝑥2 + 5𝑥22 . Find A such that 𝑄(𝐴) = 𝑋 𝑇 𝐴𝑋.
Solution
𝑥1
𝑋 = (𝑥 ) and 𝑋 𝑇 = (𝑥1 𝑥2 )
2
7
1
𝑎 𝑏
2
𝐴 = (1 ), 𝑎 is the coefficient of 𝑥12 = 3, 𝑐 is the coefficient of 𝑥22 = 5 and 𝑏 is
𝑏 𝑐
2
the coefficient of 𝑥1 𝑥2 = 8
1
3 ×8
𝐴=( 2 )
1
×8 5
2
𝟑 𝟒
𝑨=( )
𝟒 𝟓
𝑥
𝑄(𝐴) = 𝑋 𝑇 𝐴𝑋 = (𝑥1 𝑥2 ) (3 4) ( 1 )
4 5 𝑥2
5.5.2 Definition (Derivative of a Quadratic Form)

Let 𝑓(𝑥) be a function of 𝑟 independent variables given by
𝑋 𝑇 = [𝑥1 𝑥2 . . . 𝑥𝑟 ], then the derivative of 𝑓(𝑥) with respect to a vector 𝑋 is
𝑑𝑓(𝑥)
𝑑𝑥1
𝑑𝑓(𝑥)
𝑑𝑥2
𝑑𝑓(𝑥) 𝑑𝑓(𝑥)
=
𝑑𝑋 𝑑𝑥3
..
.
𝑑𝑓(𝑥)
( 𝑑𝑥𝑟 )
Example
𝑑𝑓(𝑥)
Consider the function 𝑓(𝑥) = 6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 . Find .
𝑑𝑋
Solution
𝑑𝑓(𝑥) 𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥1 + 2𝑥32 )
𝑑𝑥1 𝑑𝑥1
𝑑𝑓(𝑥) 𝑑𝑓(𝑥) 𝑑
= = (6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
𝑑𝑋 𝑑𝑥2 𝑑𝑥2
𝑑𝑓(𝑥) 𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
( 𝑑𝑥3 ) (𝑑𝑥3 )
8
𝒅𝒇(𝒙) 𝟏𝟐𝒙𝟏− 𝟐𝒙𝟐
= (𝟐𝒙𝟐 − 𝟐𝒙𝟏 )
𝒅𝑿 𝟒𝒙 𝟑
5.5.3 Theorem
𝑑𝑄(𝑋)
Let 𝑄(𝑋) = 𝑋 𝑇 𝐴𝑋 where 𝐴 is a symmetric matrix of constants, then = 2𝐴𝑋.
𝑑𝑋
Proof
𝑋 𝑇 𝐴𝑋 = ∑𝑟𝑖=1 ∑𝑟𝑖=1 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 , let 𝑟 = 3
3 3
𝑋 𝑇 𝐴𝑋 = ∑ ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗
𝑖=1 𝑖=1
3
𝑇
𝑋 𝐴𝑋 = ∑(𝑎𝑖1 𝑋𝑖 𝑋1 + 𝑎𝑖2 𝑋𝑖 𝑋2 + 𝑎𝑖3 𝑋𝑖 𝑋3 )
𝑖=1
𝑋 𝑇 𝐴𝑋 = 𝑎11 𝑋1 𝑋1 + 𝑎12 𝑋1 𝑋2 + 𝑎13 𝑋1 𝑋3 + 𝑎21 𝑋2 𝑋1 + 𝑎22 𝑋2 𝑋2 + 𝑎23 𝑋2 𝑋3 + 𝑎31 𝑋3 𝑋1

+ 𝑎32 𝑋3 𝑋2 + 𝑎33 𝑋3 𝑋3
𝑋 𝑇 𝐴𝑋 = 𝑎11 𝑋12 + 𝑎12 𝑋1 𝑋2 + 𝑎13 𝑋1 𝑋3 + 𝑎22 𝑋22 + 𝑎21 𝑋2 𝑋1 + 𝑎23 𝑋2 𝑋3 + 𝑎31 𝑋3 𝑋1
+ 𝑎32 𝑋3 𝑋2 + 𝑎33 𝑋32
Differentiating 𝑎11 𝑋12 + 𝑎12 𝑋1 𝑋2 + 𝑎13 𝑋1 𝑋3 + 𝑎22 𝑋22 + 𝑎21 𝑋2 𝑋1 + 𝑎23 𝑋2 𝑋3 + 𝑎31 𝑋3 𝑋1 +
𝑎32 𝑋3 𝑋2 + 𝑎33 𝑋32 with respect to either 𝑋1 , 𝑋2 or 𝑋3 we have;
𝑑𝑄(𝑋) 𝑑(𝑋 𝑇 𝐴𝑋)

=
𝑑𝑋 𝑑𝑋
𝑑(𝑋 𝑇 𝐴𝑋) 𝑑
= (𝑎 𝑋 2 + 𝑎12 𝑋1 𝑋2 + 𝑎13 𝑋1 𝑋3 + 𝑎22 𝑋22 + 𝑎21 𝑋2 𝑋1 + 𝑎23 𝑋2 𝑋3 + 𝑎31 𝑋3 𝑋1
𝑑𝑋1 𝑑𝑋1 11 1
+ 𝑎32 𝑋3 𝑋2 + 𝑎33 𝑋32 )
𝑑(𝑋 𝑇 𝐴𝑋)
= 2𝑎11 𝑋1 + 𝑎12 𝑋2 + 𝑎13 𝑋3 + 𝑎21 𝑋2 + 𝑎31 𝑋3. Since the matrix X and A are
𝑑𝑋1
symmetric, then 𝑎12 𝑋2 = 𝑎21 𝑋2, 𝑎13 𝑋3 = 𝑎31 𝑋3
That is = 2𝑎11 𝑋1 + 𝑎12 𝑋2 + 𝑎13 𝑋3 + 𝑎12 𝑋2 + 𝑎13 𝑋3
𝑑𝑋1
= 2𝑎11 𝑋1 + 2𝑎12 𝑋2 + 2𝑎13 𝑋3
𝑑𝑋1
9
= 2(𝑎11 𝑋1 + 𝑎12 𝑋2 + 𝑎13 𝑋3 )
𝑑𝑋1
∴ = 2𝐴𝑋
𝑑𝑋1
5.5.3 Example
Consider the function 𝑓(𝑥) = 6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 .
(a) Write 𝑓(𝑥) in quadratic form
𝑑
(b) Find 𝑑𝑋 (𝑓(𝑥)) and express the answer in form of 2𝐴𝑋.
Solutions
(a) 𝑓(𝑥) = 𝑋 𝑇 𝐴𝑋
𝑥1
Where 𝑋 = (𝑥1 𝑇 𝑥2 𝑥3 ), 𝑋 = (𝑥2 ) and
𝑥3
1 1
𝑎 𝑏 𝑑
2 2
1 1
𝐴= 𝑏 𝑐 𝑒 where 𝑎 = 6, 𝑏 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥1 𝑥2 = −2, 𝑐 =
2 2
1 1
𝑑 2𝑒 𝑓
(2 )
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥22 = 1, 𝑑 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥1 𝑥3 = 0, 𝑒 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓𝑥2 𝑥3 = 0,
𝑓 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑥32 =2.
1 1
6
× −2 ×0
2 2
1 1
𝐴= × −2 1 ×0
2 2
1 1
(2 × 0 ×0 2 )
2
𝟔 −𝟏 𝟎
𝑨 = (−𝟏 𝟏 𝟎)
𝟎 𝟎 𝟐
𝑓(𝑥) = 𝑋 𝑇 𝐴𝑋
𝟔 −𝟏 𝟎 𝒙𝟏
∴ 𝒇(𝒙) = (𝒙𝟏 𝒙𝟐 𝒙𝟑 ) (−𝟏 𝟏 𝟎) ( 𝒙 𝟐 )
𝟎 𝟎 𝟐 𝒙𝟑
𝑑𝑄(𝑋) 𝑑(𝑋 𝑇 𝐴𝑋)
(b) =
𝑑𝑋 𝑑𝑋
10
𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
𝑑𝑋1
𝑑𝑄(𝑋) 𝑑
= (6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
𝑑𝑋 𝑑𝑋2
𝑑
(6𝑥12 + 𝑥22 − 2𝑥1 𝑥2 + 2𝑥32 )
(𝑑𝑋3 )
𝑑𝑄(𝑋) 12𝑥1 − 2𝑥2

= ( 2𝑥2 − 2𝑥1 )
𝑑𝑋 4𝑥 3
𝑑𝑄(𝑋) 12 −2 0 𝑥1
= (−2 2 0) (𝑥2 )
𝑑𝑋
0 0 4 𝑥3
Factorising 2 in the matrix above we have
𝑑𝑄(𝑋) 16 −1 0 𝑥1
= 2 (−1 1 0) (𝑥2 ) = 2𝐴𝑋
𝑑𝑋
0 0 1 𝑥3
5.5.4 Example
Consider the matrix A given by
−2 2
𝐴=( ). Find the quadratic equation.
2 2
Solution
𝑥
𝑓(𝑥) = 𝑋 𝑇 𝐴𝑋 = (𝑥1 𝑥2 ) (−2 2) ( 1 )
2 2 𝑥2
𝒇(𝒙) = −𝟐𝒙𝟐𝟏 + 𝟒𝒙𝟏 𝒙𝟐 + 𝟐𝒙𝟐𝟐
5.6 Expectation
Let 𝑌 be an n dimension random vector and 𝑋 be a 𝑝 × 𝑛 random matrix then
𝐸(𝑦1 )
𝐸(𝑦2 )
𝐸(𝑌) = .
..
(𝐸(𝑦𝑛 ))
𝐸(𝑥11 ) . . . 𝐸(𝑥1𝑛 )
𝐸(𝑥12 )
𝐸(𝑥21 ) . . . 𝐸(𝑥2𝑛 )
𝐸(𝑥22 )
𝐸(𝑋) = . . .
.. .. .
[𝐸(𝑥𝑛1 ) 𝐸(𝑥𝑛2 ) . . . 𝐸(𝑥𝑛𝑛 )]
11
5.6.1 Property
Let X and Y be a random matrix of the same dimension and let A and B be conformable
matrices of constants then,
 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)

 𝐸(𝐴𝑋𝐵) = 𝐴𝐸(𝑋)𝐵
5.6.2 Variance-Covariance Matrix
If 𝑦1 and 𝑦2 are random variables with means 𝜇1 and 𝜇2 , then
 𝑉𝑎𝑟(𝑌1 ) = 𝐸(𝑦1 − 𝜇1 )2 = 𝐸(𝑦12 ) − 𝜇12

 𝐶𝑜𝑣(𝑌1 , 𝑌2 ) = 𝐸(𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) = 𝐸(𝑦1 𝑦2 ) − 𝜇1 𝜇2
5.6.3 Theorem
Let 𝑌 𝑇 = (𝑦1 , 𝑦2 , 𝑦3 , … , 𝑦𝑛 ) be a random vector with 𝐸(𝑌) = (𝜇1 , 𝜇2 , 𝜇3 , … , 𝜇𝑛 ), then

𝑉𝑎𝑟(𝑌) is a matrix with diagonal elements 𝑉𝑎𝑟(𝑌𝑖 ) where 𝑖 = 1,2,3, … . , 𝑛
Proof
𝑉𝑎𝑟(𝑦1 ) . . . 𝐶𝑜𝑣(𝑦1 𝑦𝑛 )
𝐶𝑜𝑣(𝑦1 𝑦2 )
𝐶𝑜𝑣(𝑦2 𝑦1 ) . . . 𝐶𝑜𝑣(𝑦2 𝑦𝑛 )
𝑉𝑎𝑟(𝑦2 )
𝑉𝑎𝑟(𝑌) = . . .
.. .. .
(𝐶𝑜𝑣(𝑦𝑛 𝑦1 ) 𝐶𝑜𝑣(𝑦𝑛 𝑦2 ) . . . 𝑉𝑎𝑟(𝑦𝑛 ) )
𝑉𝑎𝑟(𝑦1 ) . . . 𝐶𝑜𝑣(𝑦1 𝑦𝑛 )
𝐶𝑜𝑣(𝑦2 𝑦1 ) . . . 𝐶𝑜𝑣(𝑦2 𝑦𝑛 )
𝑉𝑎𝑟(𝑌) = . . .
.. .. .
𝐶𝑜𝑣(𝑦𝑖 𝑦𝑖 ) = 𝑉𝑎𝑟(𝑦𝑖 ).
Therefore, 𝑽𝒂𝒓(𝒀) is a matrix with diagonal elements 𝑽𝒂𝒓(𝒀𝒊 ) where
𝒊 = 𝟏, 𝟐, 𝟑, … . , 𝒏
12
5.6.3 Properties of Covariance
 𝐶𝑜𝑣(𝑦𝑖 𝑦𝑗 ) = 𝐶𝑜𝑣(𝑦𝑖 𝑦𝑖 ) = 𝑉𝑎𝑟(𝑦𝑖 ) if 𝑖 = 𝑗 (Symmetric matrix)

 If 𝑦𝑖 and 𝑦𝑗 are independent then 𝐶𝑜𝑣(𝑦𝑖 𝑦𝑗 ) = 0, 𝑖 ≠ 𝑗
𝑇
 𝑉𝑎𝑟(𝑌) = 𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )
𝑇
Proof for 𝑉𝑎𝑟(𝑌) = 𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )
R.H.S
𝑦1 − 𝜇1
𝑦2 − 𝜇2
𝑇
. 𝑦𝑛 − 𝜇𝑛 ) .
𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 ) = 𝐸(𝑦1 − 𝜇1 𝑦2 − 𝜇2 . .
..
(𝑦𝑛 − 𝜇𝑛 )
𝑦1 − 𝜇1
𝑦2 − 𝜇2
𝑇
. 𝑦𝑛 − 𝜇𝑛 ) .
𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 ) = 𝐸(𝑦1 − 𝜇1 𝑦2 − 𝜇2 . .
..
(𝑦𝑛 − 𝜇𝑛 )
(𝑦1 − 𝜇1 )(𝑦1 − 𝜇1 ) (𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) . . . (𝑦1 − 𝜇1 )(𝑦𝑛 − 𝜇𝑛 )
(𝑦2 − 𝜇2 )(𝑦1 − 𝜇1 ) (𝑦2 − 𝜇2 )(𝑦2 − 𝜇2 ) . . . (𝑦2 − 𝜇2 )(𝑦𝑛 − 𝜇𝑛 )
=𝐸 . . .
.. .. .
((𝑦𝑛 − 𝜇𝑛 )(𝑦1 − 𝜇1 ) (𝑦𝑛 − 𝜇𝑛 )(𝑦2 − 𝜇2 ) . . . (𝑦𝑛 − 𝜇𝑛 )(𝑦𝑛 − 𝜇𝑛 ))
𝐸(𝑦1 − 𝜇1 )(𝑦1 − 𝜇1 ) 𝐸(𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦1 − 𝜇1 )(𝑦𝑛 − 𝜇𝑛 )

𝐸(𝑦2 − 𝜇2 )(𝑦1 − 𝜇1 ) 𝐸(𝑦2 − 𝜇2 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦2 − 𝜇2 )(𝑦𝑛 − 𝜇𝑛 )
= . . .
.. .. .
(𝐸(𝑦𝑛 − 𝜇𝑛 )(𝑦1 − 𝜇1 ) 𝐸(𝑦𝑛 − 𝜇𝑛 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦𝑛 − 𝜇𝑛 )(𝑦𝑛 − 𝜇𝑛 ))
𝐸(𝑦1 − 𝜇1 )2 𝐸(𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦1 − 𝜇1 )(𝑦𝑛 − 𝜇𝑛 )

𝐸(𝑦2 − 𝜇2 )(𝑦1 − 𝜇1 ) 𝐸(𝑦2 − 𝜇2 )2 . . . 𝐸(𝑦2 − 𝜇2 )(𝑦𝑛 − 𝜇𝑛 )
= . . .
.. .. .
(𝐸(𝑦𝑛 − 𝜇𝑛 )(𝑦1 − 𝜇1 ) 𝐸(𝑦𝑛 − 𝜇𝑛 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦𝑛 − 𝜇𝑛 )2 )
13
𝑉𝑎𝑟(𝑦1 ) . . . 𝐶𝑜𝑣(𝑦1 𝑦𝑛 )
𝑇
𝐶𝑜𝑣(𝑦2 𝑦1 ) . . . 𝐶𝑜𝑣(𝑦2 𝑦𝑛 )
𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 ) = . . . = 𝑉𝑎𝑟(𝑌)
.. .. .
The above matrix can also be written as

𝜎11 𝜎12 . . . 𝜎1𝑛
𝜎21 𝜎22 . . . 𝜎2𝑛 𝑉𝑎𝑟(𝑦𝑖 ), 𝑖 = 𝑗
. . .
∑= where ∑= 𝑉𝑎𝑟(𝑌) and 𝜎𝑖𝑗 = {
.. .. . 𝐶𝑜𝑣(𝑦𝑖 𝑦𝑗 ), 𝑖 ≠ 𝑗
(𝜎𝑛1 𝜎𝑛2 . . . 𝜎𝑛𝑛 )
5.6.4 Theorem
Let 𝐶 𝑇 = (𝑐1 , 𝑐2 , 𝑐3 , … , 𝑐𝑛 ) be a random vector of constants, then the linear combination
𝐶 𝑇 𝑌 = 𝑐1 𝑦1 + 𝑐2 𝑦2 + 𝑐3 𝑦3 + ⋯ + 𝑐𝑛 𝑦𝑛 has the mean 𝐶 𝑇 𝜇𝑦 and the variance 𝐶 𝑇 ∑𝑦 𝐶 .
Proof
(a) 𝐸(𝐶 𝑇 𝑌) = 𝐶 𝑇 𝐸(𝑌) = 𝑪𝑻 𝝁𝒚
𝑇
(b) 𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐸(𝐶𝑌 − 𝐶𝜇𝑦 ) (𝐶𝑌 − 𝐶𝜇𝑦 )
𝑇
𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐸𝐶 𝑇 (𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )𝐶
𝑇
𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐶 𝑇 𝐸(𝑌 − 𝜇𝑦 ) (𝑌 − 𝜇𝑦 )𝐶
𝑉𝑎𝑟(𝐶 𝑇 𝑌) = 𝐶 𝑇 𝑉𝑎𝑟(𝑌)𝐶
Hence, 𝑽𝒂𝒓(𝑪𝑻 𝒀) = 𝑪𝑻 ∑𝒚 𝑪.
Example
Let X be (𝑁3 (𝜇, ∑)) a random normally distributed vector with mean
1 −2 0
𝜇 𝑇 = (3 1 4) and Variance-Covariance matrix ∑ = (−2 5 0)
0 0 2
(i) Are 𝑋1 and 𝑋3 independent?
(ii) Find the distribution of 𝑌 = 3𝑥1 − 2𝑥2 + 𝑥3
14
Solutions
(i) 𝐶𝑜𝑣(𝑥1 𝑥3 ) = 0. Since 𝐶𝑜𝑣(𝑥1 𝑥3 ) = 0, then 𝑋1 and 𝑋1 are independent.
(ii) 𝑌 = 3𝑥1 − 2𝑥2 + 𝑥3
𝑥1
𝑇
𝐶 = (3 −2 1) and 𝑋 = (𝑥2 )
𝑥3
Mean of Y
3
𝑇
𝜇 = (−3 1 ),
4 𝐸(𝑋) = 𝜇 = (1)
4
𝑥1
𝑌 = (3 −2 1) (𝑥2 )
𝑥3
𝑥1
𝐸(𝑌) = 𝐸 ((3 −2 1) (𝑥2 ))
𝑥3
𝑥1
𝐸(𝑌) = (3 −2 1)𝐸 (𝑥2 )
𝑥3
3
𝐸(𝑌) = (3 −2 1) (1)
4
𝐸(𝑌) = (9 − 2 + 4)
𝑬(𝒀) = (𝟏𝟏)
Mean of Y
𝑉𝑎𝑟(𝑌) = (𝑉𝑎𝑟(𝐶 𝑇 𝑋) = 𝐶 𝑇 ∑ 𝐶
𝑥
𝑉𝑎𝑟(𝑌) = 𝐶 𝑇 ∑ 𝐶
𝑥
1 −2 0 3
𝑉𝑎𝑟(𝑌) = (3 )
−2 1 (−2 5 0) (−2)
0 0 2 1
7
𝑉𝑎𝑟(𝑌) = (3 −2 1) (−16)
2
𝑽𝒂𝒓(𝒀) = (𝟓𝟓)
∴ 𝒀~𝑵((𝟏𝟏), (𝟓𝟓))
15
5.6.5 Theorem
If 𝑍 = 𝐴𝑌 where 𝐴 is a matrix of constants, then
(a) 𝐸(𝑍) = 𝐴𝜇𝑦
(b) 𝑉𝑎𝑟(𝑍) = 𝐴 ∑𝑦 𝐴𝑇
Proof
(a) 𝐸(𝑍) = 𝐸(𝐴𝑌) = 𝐴𝐸(𝑌) = 𝐴𝜇𝑦
(b) 𝑉𝑎𝑟(𝑍) = 𝐴 ∑𝑦 𝐴𝑇
𝑇
𝑉𝑎𝑟(𝑍) = 𝐸(𝐴𝑌 − 𝐴𝜇𝑦 )(𝐴𝑌 − 𝐴𝜇𝑦 )
𝑇
𝑉𝑎𝑟(𝑍) = 𝐴𝐸(𝑌 − 𝜇𝑦 )(𝑌 − 𝜇𝑦 ) 𝐴𝑇
Hence, 𝑽𝒂𝒓(𝒁) = 𝑨𝑽𝒂𝒓(𝒀)𝑨𝑻 = 𝑨 ∑𝒚 𝑨𝑻
Example
Let 𝑍1 𝑍2 𝑍3 be random variables with mean vector and Variance-Covariance

matrix
1 3 2 1
𝐸(𝑍) = 𝜇 = (2) and 𝑉𝑎𝑟(𝑍) = (2 2 1) where 𝑍 = (𝑍1 𝑍2 𝑍3 )𝑇
3 1 1 1
Let
𝑌1 = 𝑍1 + 2𝑌3
𝑌2 = 𝑍1 + 𝑍2 − 𝑍3
𝑌3 = 3𝑍1 + 𝑍2 + 𝑍3
1
(a) Find the mean and variance of 𝑋 = 7 (𝑌1 + 𝑌2 + 𝑌3 )
𝑌1
(b) Find 𝐸(𝑌) and 𝑉𝑎𝑟(𝑌) where 𝑌 = (𝑌1 𝑌2 𝑌3 )𝑇 = (𝑌2 )
𝑌3
16
Solutions
1
𝑋 = (𝑌1 + 𝑌2 + 𝑌3 )
7
1
𝑋 = (𝑍1 + 2𝑌3 + 𝑍1 + 𝑍2 − 𝑍3 + 3𝑍1 + 𝑍2 + 𝑍3 )
7
1
𝑋 = (𝑍1 + 2(3𝑍1 + 𝑍2 + 𝑍3 ) + 𝑍1 + 𝑍2 − 𝑍3 + 3𝑍1 + 𝑍2 + 𝑍3 )
7
1
𝑋 = (𝑍1 + 6𝑍1 + 2𝑍2 + 2𝑍3 + 𝑍1 + 𝑍2 − 𝑍3 + 3𝑍1 + 𝑍2 + 𝑍3 )
7
1
𝑋 = (11𝑍1 + 4𝑍2 + 2𝑍3 )
7
1 𝑍1
𝑋 = (11 4 2) (𝑍2 )
7 𝑍3
𝑍1
1
That is 𝑋 = 𝐶 𝑇 𝑍 = 7 (11 4 2) (𝑍2 )
𝑍3
(a)
𝑍1
1
(i) 𝐸(𝑋) = 𝐶 𝑇 𝐸(𝑍) = 7 (11 4 2 )𝐸 ( 𝑍2)
𝑍3
1 1
𝐸(𝑋) = 𝐶 𝑇 𝐸(𝑍) = (11 4 2) (2)
7
3
11 + 8 + 6
𝐸(𝑋) = 𝐶 𝑇 𝐸(𝑍) = ( )
7
𝟐𝟓
∴ 𝑬(𝑿) = ( )
𝟕
(ii) 𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟(𝐶 𝑇 𝑍) = 𝐶 𝑇 ∑𝑧 𝐶
𝑉𝑎𝑟(𝑋) = 𝐶 𝑇 ∑ 𝐶
𝑧
11
7
11 4 2 3 2 1 4
𝑉𝑎𝑟(𝑋) = ( ) (2 2 1)
7 7 7 1 1 1 7
2
(7)
𝟔𝟑𝟓
∴ 𝑽𝒂𝒓(𝑿) = ( )
𝟒𝟗
17
𝑌1
(b) 𝑌 = (𝑌1 𝑌2 𝑌3 )𝑇 = (𝑌2 )
𝑌3
𝑌1 𝑍1 + 2𝑌3 𝑍1 + 2(3𝑍1 + 𝑍2 + 𝑍3 ) 7𝑍1 + 2𝑍2 + 2𝑍3

𝑌 = (𝑌2 ) = ( 𝑍1 + 𝑍2 − 𝑍3 ) = ( 𝑍1 + 𝑍2 − 𝑍3 ) = ( 𝑍1 + 𝑍2 − 𝑍3 )
𝑌3 3𝑍1 + 𝑍2 + 𝑍3 3𝑍1 + 𝑍2 + 𝑍3 3𝑍1 + 𝑍2 + 𝑍3
7 2 2 𝑍1
𝑌 = (1 1 −1) (𝑍2 )
3 1 1 𝑍3
(i) 𝑌 = 𝐴𝑍
𝐸(𝑌) = 𝐴𝐸(𝑍)
7 2 2 𝑍1
𝐸(𝑌) = (1 1 −1) 𝐸 (𝑍2 )
3 1 1 𝑍3
7 2 2 1
𝐸(𝑌) = (1 1 −1) (2)
3 1 1 3
17
∴ 𝐸(𝑌) = ( 0 )
8
(ii) 𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝐴𝑍) = 𝐴𝑉𝑎𝑟(𝑍)𝐴𝑇 = 𝐴 ∑𝑧 𝐴𝑇
𝑉𝑎𝑟(𝑌) = 𝐴 ∑ 𝐴𝑇
𝑧
7 2 2 3 2 1 7 1 3
𝑉𝑎𝑟(𝑌) = (1 1 −1) (2 2 1) (2 1 1)
3 1 1 1 1 1 2 −1 1
𝟐𝟓𝟏 𝟑𝟔 𝟏𝟏𝟐
∴ 𝑽𝒂𝒓(𝒀) = ( 𝟑𝟔 𝟔 𝟏𝟔 )
𝟏𝟏𝟐 𝟏𝟔 𝟓𝟎
5.6.5 Theorem
If Y is a random vector with component mean vector 𝜇𝑦 and Variance-Covariance
matrix ∑𝑦 which is non-singular and each of the k components of Y has a standard
normal distribution then
𝑇
𝑄(𝑌) = (𝑌 − 𝜇𝑦 ) ∑−1
𝑦 (𝑌 − 𝜇𝑦 ) follows a Chi-Square with k degrees of freedom.
18
Proof
For each symmetric matrix A, there exist a non-singular matrix P such that
𝑃𝑇 𝐴𝑃 = Diagonal Matrix
Since ∑𝑦 is symmetric, then 𝑃𝑇 ∑𝑦 𝑃 = 𝐼.
Let 𝑊 = 𝑃(𝑌 − 𝜇𝑦 ), then

𝑇
𝑊 𝑇 𝑊 = (𝑃(𝑌 − 𝜇𝑦 )) 𝑃(𝑌 − 𝜇𝑦 )
𝑇
𝑊 𝑇 𝑊 = (𝑌 − 𝜇𝑦 ) 𝑃𝑇 𝑃(𝑌 − 𝜇𝑦 )
𝑇
𝑊 𝑇 𝑊 = (𝑌 − 𝜇𝑦 ) ∑−1
𝑦 (𝑌 − 𝜇𝑦 )
That is 𝑊𝑖 ~𝑖𝑖𝑑 𝑁(0,1) and 𝑊 = (𝑤1 𝑤2 . . . 𝑤𝑘 )𝑇 . Replacing

𝑇
(𝑌 − 𝜇𝑦 ) ∑−1
𝑦 (𝑌 − 𝜇𝑦 ) by 𝑊𝑖 we have
𝑊 𝑇 𝑊 = ∑𝑛𝑖=1 𝑊𝑖2 which is the sum of squares of independent standard normal

𝑇
variables. Thus, 𝑄(𝑌) = (𝑌 − 𝜇𝑦 ) ∑−1
𝑦 (𝑌 − 𝜇𝑦 ) follows a Chi-Square with k degrees
of freedom.
5.7 Multivariate Normal Distribution

The multivariate normal distribution is a generalization of the univariate normal
distribution which has the density function
1 −
1
(𝑋−𝜇)2
𝑓(𝑥) = 𝑒 2𝜎2 , −∞ <𝑥<∞
√2𝜋𝜎 2
In the 𝑃 −dimension the density function becomes
1 1 𝑇
− (𝑌−𝜇𝑦 ) ∑−1 (𝑌−𝜇𝑦 )
𝑓(𝑥) = 𝑝 1𝑒
2
(2𝜋)2 |∑|2
Within the mean vector 𝜇 there are p (independent) parameters
19
Tutorial Sheet 5
1. Let 𝑌 = (𝑦1 𝑦2 𝑦3 )𝑇 be a random vector with mean 𝜇 = (𝜇1 𝜇2 𝜇3 )𝑇 and
Variance-Covariance matrix
𝜎11 𝜎12 𝜎13
∑= ( 21 𝜎22 𝜎23 )
𝜎
𝜎31 𝜎32 𝜎33
Consider the linear combination
𝑊1 = 𝑌1 + 𝑌2 + 𝑌3
𝑊2 = 𝑌1 + 𝑌2
𝑊3 = 𝑌1 − 𝑌2 − 𝑌3
(i) State the above linear combination in matrix notation
(ii) Find 𝐸(𝑈) and 𝑉𝑎𝑟(𝑉) of 𝑈 = (𝑊1 + 𝑊2 + 𝑊3 )
(iii) Find 𝐸(𝑊) and 𝑉𝑎𝑟(𝑊) of 𝑊 = (𝑊1 , 𝑊2 , 𝑊3 )𝑇
2. Let 𝑌 = (𝑦1 𝑦2 𝑦3 )𝑇 be a random vector with mean 𝜇 = (1 2 3)𝑇 and
Variance-Covariance matrix
1 0 1
∑= (0 2 −1)
1 −1 3
Consider the linear combination
𝑈1 = 𝑌1 − 𝑌2 + 2𝑌3
𝑈2 = 𝑌1 + 𝑌3
(i) State the above linear combination in matrix notation
(ii) Find 𝐸(𝑈) and 𝑉𝑎𝑟(𝑈)
3. Let X be (𝑁3 (𝜇, ∑)) a random normally distributed vector with mean
1 2 0
𝜇 𝑇 = (3 1 4) and Variance-Covariance matrix ∑ = (2 5 0)
0 0 2
Find the distribution of
𝑈1 = 𝑋1 − 𝑋2 + 3𝑈2
𝑈2 = 2𝑋1 + 𝑋3
𝑈3 = 𝑋1 + 𝑋2 + 𝑋3
20
4. Consider the function 𝑓(𝑥) = 𝑥12 + 2𝑥22 + 2𝑥32 − 2𝑥1 𝑥2 − 2𝑥1 𝑥3
(i) Express the quadratic form as product of matrices
𝑑(𝑓(𝑥))
(ii) Find the vector form of and express the answer in form of 2𝐴𝑋.
𝑑𝑋
5. Consider the function 𝑓(𝑥) = 𝑥12 + 2𝑥22 − 7𝑥32 + 𝑥42 − 4𝑥1 𝑥2 + 8𝑥1 𝑥3 − 6𝑥3 𝑥4
(i) Express the quadratic form as product of matrices
𝑑(𝑓(𝑥))
(ii) Find the vector form of and express the answer in form of 2𝐴𝑋.
𝑑𝑋
6. Write down the quadratic form associated with the given matrices;
0 2 −4 2
1 2 −1
(i) ( 2 1 3 ) (ii) ( 2 3 1 0)
−4 1 2 1
−1 3 1
2 0 1 7
7. Prove that if B is symmetric idempotent matrix then 𝑃𝑇 𝐵𝑃 is also symmetric
idempotent matrix.
8. Let A be an 𝑛 × 𝑛 matrices. Prove that if A is idempotent, then det(𝐴) is equal to
either 0 or 1.
9. Prove that if 𝐶 = 𝐴 + 𝐵, then 𝑇𝑟(𝐶) = 𝑇𝑟(𝐴) + 𝑇𝑟(𝐵). Assume that A,B,C are all
𝑛 × 𝑛 square matrices.
21
Unit 6
Regression Models
Introduction
The focus in regression is to determine the relationship between a set of variables (to
determine the relationship between the dependent variable and the independent
variable or variables). For example, in a chemical process, we might be interested in
the relationship between the output of the process, the temperature at which it occurs,
and the amount of catalyst employed. Knowledge of such a relationship would enable
us to predict the output for various values of temperature and amount of catalyst. In
this chapter, we will focus on Simple Linear Regression Models and Multiple
Regression Models.
6.1 Simple Linear Regression Models

Simple linear regression is a statistical method that looks at the relationship between
two continuous (quantitative) variables. One variable denoted by X represents the
predictor, explanatory or independent variable while the other variable denoted by Y
represents the response, outcome or dependent variable.
6.1.1 Model
The following is the model for simple linear regression;
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖
Where
𝑖 = 1,2, … , 𝑛
𝑌𝑖 is the dependent variable
𝑋𝑖 is the independent variable
22
𝛽0 and 𝛽1 are the regression coefficients
𝜀𝑖 is the random error.
6.1.2 Assumptions
The following are the assumptions about the random error;
 𝐸(𝜀𝑖 ) = 0
 𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2
 𝜀𝑖 is normally distributed
 𝜀𝑖 are independent.
6.1.3 Least Squares Estimation of 𝜷𝟎 and 𝜷𝟏
Recall from Second Year that using the equation 𝑆 = ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̂𝑖 )2 where
𝑌̂𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 , we had
(a) 𝜷𝟎 = 𝒀 ̂ 𝟏𝑿
̅−𝜷 ̅
𝒏 𝒏
∑𝒊=𝟏 𝒙𝒊 ∑𝒊=𝟏 𝒚𝒊
∑𝒏
𝒊=𝟏 𝒙𝒊 𝒚𝒊 −
̂𝟏 =
(b) 𝜷 𝒏
𝟐
𝒏 𝟐 (∑𝒏
𝒊=𝟏 𝒙𝒊 )
∑𝒊=𝟏 𝒙𝒊 −
𝒏
And if 𝐸(𝜀𝑖 ) = 0 is true for all 𝑖 = 1,2,3, …n, then

𝐸(𝑌̂𝑖 ) = 𝐸(𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 ) = 𝛽0 + 𝛽1 𝑋𝑖 and 𝑉𝑎𝑟(𝑌̂𝑖 ) = 𝑉𝑎𝑟(𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 ) = 𝜎 2
6.1.4 Properties of 𝜷𝟎 and 𝜷𝟏
From the regression model 𝑌̂𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 and the assumptions about the
random error (𝜀𝑖 ), the following can be shown;
(a) 𝐸(𝛽̂1 ) = 𝛽1
𝜎 2 ∑𝑛 2
𝑖=1 𝑥𝑖
(b) 𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 −𝑋)
(c) 𝐸(𝛽̂0 ) = 𝛽0
𝜎2
(d) 𝑉𝑎𝑟(𝛽̂1 ) = ∑𝑛 ̅ 2
̂ ,𝛽
̂ −𝜎2 𝑋
̅
̂ )
̅ 𝑉𝑎𝑟(𝛽
(e) 𝐶𝑜𝑣(𝛽 0 1) = 2 = −𝑋 1
∑𝑛 ̅
23
Proof
𝒏 𝒏
∑𝒊=𝟏 𝒙𝒊 ∑𝒊=𝟏 𝒚𝒊
∑𝒏
𝒊=𝟏 𝒙𝒊 𝒚𝒊 −
Simplifying β̂1 = 𝒏
𝒏
𝟐 we have;
∑𝒏 𝟐 (∑𝒊=𝟏 𝒙𝒊 )
𝒊=𝟏 𝒙𝒊 − 𝒏
n n 2 n 𝐧
(∑ni=1 xi )2 ∑ni=1 xi
2
∑ xi − 2
= ∑ xi − 𝑛 ( ) = ∑ xi2 − 𝑛𝑋̅ 2 = ∑(𝑿𝒊 − 𝑿
̅ )𝟐
n 𝑛
i=1 i=1 i=1 𝐢=𝟏
n n n n n
∑ni=1 xi ∑ni=1 yi
∑ xi yi − ̅) (𝒚𝒊− 𝒚
= ∑ xi yi − y̅ ∑ xi = ∑ xi yi − nx̅y̅ = ∑(𝒙𝒊 − 𝒙 ̅)
n
i=1 i=1 i=1 i=1 i=1
∑𝐧
𝐢=𝟏(𝒙𝒊 −𝒙
̅)(𝒚𝒊− 𝒚̅)
̂1
Therefore, β = 𝐧
∑𝐢=𝟏(𝑿𝒊 −𝑿̅ )𝟐
Note that
 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑦̅ = ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥̅ = ∑ni=1(𝑦𝑖 − 𝑦̅) 𝑥̅ = 0
 ∑ni=1(𝒙𝒊 − 𝒙 ̅) = ∑ni=1(𝒙𝒊 − 𝒙
̅) (𝒚𝒊− 𝒚 ̅) 𝒚𝒊 = ∑ni=1(𝒚𝒊 − 𝒚
̅ ) 𝒙𝒊
 ∑ni=1(𝑥𝑖 − 𝑥̅ )2 = ∑ni=1(𝑥𝑖 − 𝑥̅ )(𝑥𝑖 − 𝑥̅ ) = ∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
(a) 𝐸(𝛽̂1 ) = 𝛽1
∑ni=1(𝑥𝑖 − 𝑥
̅) (𝑦𝑖 − 𝑦
̅)
𝐸(𝛽̂1 ) = 𝐸 ( )
̅ )2
∑ni=1(𝑥𝑖 − 𝑥̅)𝑦𝑖 − ∑ni=1(𝑥𝑖 − 𝑥̅)𝑦̅

̂
𝐸(𝛽1 ) = 𝐸 ( )
∑ni=1(𝑥𝑖 − 𝑥̅ )2
∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑦𝑖 − 0
𝐸(𝛽̂1 ) = 𝐸 ( )
∑ni=1(𝑥𝑖 − 𝑥̅ )2
̅ ) 𝑦𝑖
̂
𝐸(𝛽1 ) = 𝐸 ( n )
∑i=1(𝑥𝑖 − 𝑥̅ )2
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑦𝑖
̂
𝐸(𝛽1 ) = 𝐸 ( n )
∑i=1(𝑥𝑖 − 𝑥̅ )(𝑥𝑖 − 𝑥̅ )
24
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑦𝑖
𝐸(𝛽̂1 ) = 𝐸 ( n )
∑i=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
∑ni=1(𝑥𝑖 − 𝑥̅ )𝐸(𝑦𝑖 )
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )𝑥𝑖
∑ni=1(𝑥𝑖 − 𝑥̅ )𝐸(𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 )
𝐸(𝛽̂1 ) =
∑ni=1(𝑥𝑖 − 𝑥̅ )(𝛽0 + 𝛽1 𝑥𝑖 )
𝐸(𝛽̂1 ) =
𝛽0 ∑ni=1(𝑥𝑖 − 𝑥̅ ) + 𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
Note that 𝛽0 ∑ni=1(𝑥𝑖 − 𝑥̅ ) = 0
0 + 𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
𝛽1 ∑ni=1(𝑥𝑖 − 𝑥̅ ) 𝑥𝑖
𝐸(𝛽̂1 ) =
̂ 𝟏 ) = 𝜷𝟏
∴ 𝑬(𝜷
𝜎 2 ∑𝑛 2
𝑖=1 𝑥𝑖
(b) 𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛
𝑖=1(𝑋𝑖 −𝑥̅ )
2
Y − β̂1 x̅
β0 = ̅
𝑉𝑎𝑟(𝛽̂0 ) = 𝑉𝑎𝑟(Y̅ − β̂1 𝑥̅)

𝑉𝑎𝑟(𝛽̂0 ) = 𝑉𝑎𝑟(y̅) + 𝑉𝑎𝑟(−β̂1 x̅)−2𝐶𝑜𝑣(y̅, β̂1 𝑥̅ )
𝑉𝑎𝑟(𝛽̂0 ) = 𝑉𝑎𝑟(y̅) + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−2𝐶𝑜𝑣(y̅, β̂1 𝑥̅ )
𝜎 2 𝑛 𝑛 n
∑i=1(𝑥𝑖 −𝑥)(𝑦𝑖− 𝑦)
̅ ̅
∑ 𝑦 ∑
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−2𝐶𝑜𝑣( 𝑖=1𝑛 𝑖 , 𝑖=1
𝑥𝑖
× 2 )
𝑛 𝑛
∑n ̅
i=1(𝑋𝑖 −𝑋)
25
𝜎 2 𝑛 𝑛 n
∑i=1(𝑥𝑖 −𝑥)(𝑦𝑖− 𝑦) ̅ ̅
∑ 𝑦 ∑
𝑥𝑖
× 2 )
𝑛 𝑛
∑n ̅
𝜎 2
∑ 𝑛 𝑛 ∑i=1(𝑥𝑖 −𝑥 n
̅ )𝑦
𝑦 ∑
𝑥𝑖 𝑖
× 2)
𝑛 𝑛
∑n ̅
𝜎 2 𝑛 𝑛
∑ 𝑦 ∑
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−2𝐶𝑜𝑣( 𝑖=1𝑛 𝑖 , 𝑛 ∑n𝑖 𝑖=1
𝑦 𝑥𝑖
(𝑥 −𝑥 ̅)
)
𝑛 i=1 𝑖
∑𝑛𝑖=1 𝑦𝑖 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖
2𝐶𝑜𝑣 ( , )=0
𝑛 𝑛 ∑n
i=1(𝑥𝑖 − 𝑥)
̅
𝜎 2
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )−0
𝑛
𝜎 2
𝑉𝑎𝑟(𝛽̂0 ) = + 𝑥̅2 𝑉𝑎𝑟(β̂1 )
𝑛
𝝈𝟐
𝑽𝒂𝒓(𝛃̂ 𝟏 )= ∑𝒏 ̅ 𝟐
𝒊=𝟏(𝒙𝒊 −𝒙)
𝜎2 𝜎2
𝑉𝑎𝑟(𝛽̂0 ) = 2
̅ 𝑛
+𝑥
𝑛 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝜎 2 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 + 𝑛𝜎 2 𝑥̅2
𝑉𝑎𝑟(𝛽̂0 ) =
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 + 𝑛𝑥̅2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( )
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1(𝑥𝑖2 − 2𝑥̅ 𝑥𝑖 + 𝑥̅ 2 ) + 𝑛𝑥̅2
𝑉𝑎𝑟(𝛽̂0 ) = 𝜎 ( 2
)
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 2𝑥̅ ∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑥̅ 2 + 𝑛𝑥̅2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( )
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 2𝑥̅ × 𝑛𝑥̅ + 𝑛𝑥̅ 2 + 𝑛𝑥̅2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( )
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 2𝑛𝑥̅ 2 + 𝑛𝑥̅ 2 + 𝑛𝑥̅2
𝑉𝑎𝑟(𝛽̂0 ) = 𝜎 ( 2
)
𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2 − 0
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( 𝑛 )
𝑛 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛𝑖=1 𝑥𝑖2
̂ 2
𝑉𝑎𝑟(𝛽0 ) = 𝜎 ( 𝑛 )
𝑛 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
26
𝝈𝟐 ∑𝒏𝒊=𝟏 𝒙𝟐𝒊
̂ 𝟎) =
∴ 𝑽𝒂𝒓(𝜷
𝒏 ∑𝒏𝒊=𝟏(𝑿𝒊 − 𝒙 ̅)𝟐
Try C, D and E
6.1.5 Confidence Interval and Hypothesis Testing
6.1.5.1 Confidence Interval
Under the assumptions of the simple linear regression model, a (1 − 𝛼)100%

confidence interval for the parameter 𝛽1 is
𝑀𝑆𝐸
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 2) (√ 𝑛 )
2 ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
From the sample variance, 𝑆 2 = . That is ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = (𝑛 − 1)𝑠 2
𝑛−1
𝑀𝑆𝐸
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 2) (√ )
2 (𝑛 − 1)𝑠 2
𝑆𝑆𝑅𝑒𝑠
Sample variance 𝑆 2 is calculated using the calculator and 𝑀𝑆𝐸 = .
𝑛−2
A (1 − 𝛼)100% confidence interval for the parameter 𝛽0 is
1 𝑥̅ 2
̂
𝛽0 ± (𝑡 , 𝑛 − 2) (√𝑀𝑆𝐸) (
𝛼 √ + )
2 𝑛 (𝑛 − 1)𝑆 2
Example
The following table shows X and Y where X is the weight of fish in tones and Y is the
amount of fish being sold in Kwacha.
27
X 7.23 8.53 9.82 10.26 8.96 12.27 10.28 4.45 1.78 4 3.3 4.3 0.8 0.5
Y 190 160 134 129 172 197 167 239 542 372 245 376 454 410
Find a 95% confidence interval for 𝛽1.
Solutions
Using the calculator;
𝑛 = 14, ∑𝑛𝑖=1 𝑦𝑖 = 3787, ∑𝑛𝑖=1 𝑦𝑖2 = 1257465, ∑𝑛𝑖=1 𝑥𝑖 = 86.48 , ∑𝑛𝑖=1 𝑥𝑖2 = 732.4876,
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 17562.8 , 𝑆 2 = 15.3
Calculations
3787 86.48
𝑦̅ = = 270.5, 𝑥̅ = = 6.177
14 14
∑ni=1 xi yi −
β̂1 = n
n
(∑ x )2
∑ni=1 xi2 − i=1 i
n
(86.48 × 3787)
17562.8 −
β̂1 = 14 = −𝟐𝟗. 𝟒𝟎𝟐
(86.48)2
732.4876 − 14
𝛽0 = 𝑌̅ − 𝛽̂1 𝑋̅
𝛽0 = 270.5 − (−29.402 × 6.177) = 452.116
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = ∑ 𝑦𝑖2 −
𝑛
𝑖=1
(3787)2
𝑆𝑆𝑇 = 1257465 − = 𝟐𝟑𝟑𝟎𝟖𝟏. 𝟓
14
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
3787 × 86.48 2
(17562.8 − )
SSReg = 14 = 𝟏𝟕𝟏𝟒𝟏𝟑. 𝟗
(86.48)2
732.4876 − 14
28
𝑺𝑺𝑹𝒆𝒔 (𝑬𝒓𝒓𝒐𝒓) = 𝑆𝑆𝑇 − SSReg = 233081.5 − 171413.9 = 𝟔𝟏𝟔𝟔𝟕. 𝟔
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹
Regression 𝑆𝑆𝑅𝑒𝑔 𝑘 SSReg SSReg

𝑘 𝑘
SSRes
𝑁−2
Residue 𝑆𝑆𝑅𝑒𝑠 𝑁−2 SSRes

𝑁−2
Total 𝑆𝑆𝑇 𝑁−1
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹
Regression 171413.9 1 171413.9 33.36
Residue 61667.6 12 5138.97
Total 233081.5 13
𝑆𝑆𝑅𝑒𝑠 61667.6
𝑀𝑆𝐸 = = = 𝟓𝟏𝟑𝟖. 𝟗𝟕
𝑛−2 12
95% confidence interval for 𝛽1 is
𝑀𝑆𝐸 5138.97
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 2) (√ ) = −29.402 ± (𝑡0.975 , 12) (√ )
2 (𝑛 − 1)𝑠 2 (14 − 1)15.3
= −29.402 ± (2.1788)(5.083)
= −29.402 ± 11.075
(−𝟒𝟎. 𝟒𝟖, −𝟏𝟖. 𝟑𝟑)
29
6.1.5.2 Hypothesis Testing
Here the focus is on how to conduct a hypothesis test to determine whether there is a
significant linear relationship between an independent variable X and a dependent
variable Y. The test focuses on the slope of the regression line 𝑌 = 𝛽0 + 𝛽1 𝑋. Where
𝛽0 is a constant, 𝛽1 is the slope (regression coefficient), X is the value of the
independent variable and Y is the value of the dependent variable.
Hypothesis for 𝜷𝟏
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0
Calculation for 𝜷𝟏
𝛽̂1 − 𝛽1
𝑇=
𝑀𝑆𝐸
√
(𝑛 − 1)𝑆 2
Hypothesis for 𝜷𝟎
𝐻0 : 𝛽0 = 0
𝐻1 : 𝛽0 ≠ 0
Calculation for 𝜷𝟎
𝛽̂0 − 𝛽0
𝑇=
1 𝑋̅ 2
√𝑀𝑆𝐸 ( + )
𝑛 (𝑛 − 1)𝑆 2
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡𝛼 ,𝑛 − 2

2
30
Example
A statistician in the United States of America wanted to find out the relationship
between skin cancer mortality and state latitude. The response variable Y is the
mortality rate (number of deaths per 10 million people) of white males due to malignant
skin melanoma from 1950-1959. The predictor variable X is the latitude (degrees
North) at the centre of each of 49 states the United States. A subset of the data was
summarised and looks like this:
𝑛 = 49, ∑𝑛𝑖=1 𝑦𝑖 = 7491, ∑𝑛𝑖=1 𝑦𝑖2 = 1198843, ∑𝑛𝑖=1 𝑥𝑖 =1937.1 , ∑𝑛𝑖=1 𝑥𝑖2 = 77599.19,
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 290039
(a) Find a 95% confidence interval for 𝛽0

(b) Test the hypothesis that 𝛽1 = 0 at 𝛼 = 0.05
(c) Test the hypothesis that 𝛽0 = 0 at 𝛼 = 0.05
Solutions
Calculations
7491 1937.1
𝑦̅ = = 𝟏𝟓𝟐. 𝟖𝟖 𝑥̅ = = 𝟑𝟗. 𝟓𝟑
49 49
∑ni=1 xi yi −
β̂1 = n
n
(∑ x )2
∑ni=1 xi2 − i=1 i
n
(1937.1 × 7491)
290039 −
β̂1 = 49 = −𝟓. 𝟗𝟖
(1937.1)2
77599.19 − 49
𝛽0 = 𝑌̅ − 𝛽̂1 𝑋̅
𝛽0 = 152.88 −(−5.98 × 39.53) = 𝟑𝟖𝟗. 𝟐𝟕
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = ∑ 𝑦𝑖2 −
𝑛
𝑖=1
(7491)2
𝑆𝑆𝑇 = 1198843 − = 𝟓𝟑𝟔𝟑𝟕. 𝟐𝟕
49
31
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖− )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
1937.1 × 7491 2
(290039 − )
SSReg = 49 = 𝟑𝟔𝟒𝟔𝟒. 𝟐
(1937.1)2
77599.19 − 49
𝑺𝑺𝑹𝒆𝒔 (𝑬𝒓𝒓𝒐𝒓) = 𝑆𝑆𝑇 − SSReg = 53637.27 − 36464.2 = 𝟏𝟕𝟏𝟕𝟑. 𝟎𝟕
ANOVA Table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Regression 36464.2 1 36464.2 99.8
Residue 17173.07 47 365.38
Total 53637.27 48
𝑛
1 (∑𝑛𝑖=1 𝑥𝑖 )2
2
𝜎̂ = [∑ 𝑥𝑖2 − ]
𝑛−1 𝑛
𝑖=1
1 (1937.1)2
𝜎̂ 2 = [77599.19 − ] = 𝟐𝟏. 𝟐𝟔
49 − 1 49
𝑛−1 2
𝑆2 = 𝜎̂
𝑛
49 − 1
𝑆2 = × 21.26 = 𝟐𝟎. 𝟖𝟑
49
(a) 95% confidence interval for 𝛽0
1 𝑋̅ 2
𝛽̂0 ± (𝑡𝛼 , 𝑛 − 2) √𝑀𝑆𝐸 ( + )
2 𝑛 (𝑛 − 1)𝑆 2
1 39.532
= 389.27 ± (𝑡0.975 , 47)√365.38 ( + )
49 (49 − 1)20.83
32
= 389.27 ± (2.01)(24.05)
= 389.27 ± (48.3405)
= (𝟑𝟒𝟎. 𝟗𝟑, 𝟒𝟑𝟕. 𝟔𝟏)
We are 95% confident that the population intercept is between 340.93 and 437.61.
That is, we can be confident that the mean mortality rate at a latitude of 0 degrees
north is between 340.93 and 437.61 deaths per 10 million people.
(a) Hypothesis for 𝜷𝟏
𝐻0 : 𝛽1 = 0 (There is no relationship between state latitude and skin cancer mortality)
𝐻1 : 𝛽1 ≠ 0 (There is a relationship between state latitude and skin cancer mortality)
Calculation for 𝜷𝟏
𝛽̂1 − 𝛽1
𝑇=
𝑀𝑆𝐸
√
(𝑛 − 1)𝑆 2
−5.98 − 0
𝑇=
365.38
√
(49 − 1)20.83
−5.98
𝑇=
√0.365
𝑻 = −𝟗. 𝟗
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡0.975 , 47 = 2.01
Conclusion
Since |−9.9| = 9.9 > 𝑡0.975 , 47 = 2.01, we reject 𝐻0 at 5% level of significance and
conclude that 𝛽1 ≠ 0 (There is a relationship between state latitude and skin cancer
mortality.
33
(b) Hypothesis for 𝜷𝟎
𝐻0 : 𝛽0 = 0 (The mean mortality rate at a latitude of 0 degrees is 0)
𝐻1 : 𝛽0 ≠ 0 (The mean mortality rate at a latitude of 0 degrees is not 0)
Calculation for 𝜷𝟎
𝛽̂0 − 𝛽0
𝑇=
1 𝑋̅ 2
√𝑀𝑆𝐸 ( + )
𝑛 (𝑛 − 1)𝑆 2
389.27 − 0
𝑇=
1 39.532
√365.38 ( + )
49 (49 − 1)20.83
389.27
𝑇=
√578.5
𝑇 =16.2
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡0.975 , 47 = 2.01
Conclusion
Since |16.2| = 16.2 > 𝑡0.975 , 47 = 2.01, we reject 𝐻0 at 5% level of significance and
conclude that 𝛽0 ≠ 0 (The mean mortality rate at a latitude of 0 degrees is not 0).
Coefficient of Determination (𝑹𝟐 )
The coefficient of determination is a measure of the amount of variability in the data

accounted for by the regression model.
34
𝑆𝑆𝑅𝑒𝑔
Coefficient of Determination (𝑅 2 ) = × 100. From the above example, Coefficient
𝑆𝑆𝑇
36464.2
of Determination (𝑅 2 ) = 53637.27 × 100 = 𝟔𝟖%.
6.2 Multiple Regression Models
Multiple regression is an extension of simple linear regression in which more than one
independent variable (X) is used to predict a single dependent variable (Y). A multiple
regression equation expresses a linear relationship between a response variable y and two or
more predictor variables (𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 ).
Model
The general form of a multiple regression model is given below
𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
Where
𝑦 is the dependent variable
𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 are the independent variables
𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑘 determines the contribution of the independent variables

𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 .
6.2.1 Least Square Estimation
The method of least squares may be used to estimate the regression coefficients in
the multiple regression model. The objective is to find 𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑘 . This is done by
minimizing the equation 𝐿 = ∑𝑛𝑖=1 𝜀𝑖2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦)2 with respect to 𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑘 . Let
𝑥𝑖𝑗 denote the 𝑖 𝑡ℎ observation, then multiple regression model becomes
𝑌 = 𝛽0 + 𝛽1 𝑥𝑖1 + 𝛽2 𝑥𝑖2 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜀𝑖
𝑦 = 𝛽0 + ∑ 𝛽𝑗 𝑥𝑖𝑗 + 𝜀
𝑗=1
The least squares function is

𝑛
2
𝐿 = ∑(𝑦𝑖 − (𝛽0 + 𝛽1𝑥𝑖1 + 𝛽2 𝑥𝑖2 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜀))
𝑖=1
35
2
𝑛 𝑘
𝐿 = ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗 𝑥𝑖𝑗 )
𝑖=1 𝑗=1
Differentiating 𝐿 with respect to 𝛽0 and equate the differential to 0 we have

𝑑𝐿
=0
𝑑𝛽0
2
𝑛 𝑘
𝑑
= (∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗 𝑥𝑖𝑗 ) ) = 0
𝑑𝛽0
𝑖=1 𝑗=1
𝑛 𝑘
= −2 ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗 𝑥𝑖𝑗 ) = 0
𝑖=1 𝑗=1
𝑛 𝑘
∴ ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽̂𝑗 𝑥𝑖𝑗 ) = 0

𝑖=1 𝑗=1
Where 𝑗 = 1,2, … , 𝑘
Simplifying further we have
𝑛 𝑘
∑ 𝑦𝑖 − 𝑛𝛽̂0 − ∑ 𝛽̂𝑗 𝑥𝑖𝑗 = 0

𝑖=1 𝑗=1
𝑛 𝑛 𝑛 𝑛
𝑛𝛽̂0 + 𝛽1 ∑ 𝑥𝑖1 + 𝛽2 ∑ 𝑥𝑖2 + ⋯ + 𝛽𝑘 ∑ 𝑥𝑖𝑘 = ∑ 𝑦𝑖

𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝑛𝛽̂0 + 𝛽̂1 ∑ 𝑥𝑖1 + 𝛽̂2 ∑ 𝑥𝑖2 = ∑ 𝑦𝑖

𝑖=1 𝑖=1 𝑖=1
Following the same procedures we have;

𝑛 𝑛 𝑛 𝑛
𝛽̂0 ∑ 𝑥𝑖1 + 2
𝛽1 ∑ 𝑥𝑖1 + 𝛽̂2 ∑ 𝑥𝑖1 𝑥𝑖2 = ∑ 𝑥𝑖1 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛
𝛽̂0 ∑ 𝑥𝑖2 + 𝛽̂1 ∑ 𝑥𝑖1 𝑥𝑖2 + 𝛽̂2 ∑ 𝑥𝑖2

2
= ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
Example 6.2.1.1
Use the following data to fit the model 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 .
36
𝑛 = 25, ∑25 25 25
𝑖=1 𝑦𝑖 = 725.82 ∑𝑖=1 𝑥𝑖1 = 206, ∑𝑖=1 𝑥𝑖2 = 8294
∑25 25 2 25 2
𝑖=1 𝑥𝑖1 𝑥𝑖2 =77177, ∑𝑖=1 𝑥𝑖1 =2396, ∑𝑖=1 𝑥𝑖2 =3,531,848
∑25 25
𝑖=1 𝑥𝑖2 𝑦𝑖 = 274,811.31 ∑𝑖=1 𝑥𝑖1 𝑦𝑖 = 8,008.37.
Solution
25𝛽̂0 + 206𝛽̂1 + 8294𝛽̂2 = 725.82
206𝛽̂0 + 2396𝛽̂1 + 77177𝛽̂2 = 8008.37
8294𝛽̂0 + 77177𝛽̂1 + 3531848𝛽̂2 = 274811.31
The solution to the above set of equations is;
𝛽0 = 2.266378726, 𝛽1 = 2.744204729, 𝛽2 = 0.012521625
Therefore, the fitted regression equation is
̂ = 𝟐. 𝟐𝟔𝟔𝟑𝟕𝟖𝟕𝟐𝟔 + 𝟐. 𝟕𝟒𝟒𝟐𝟎𝟒𝟕𝟐𝟗𝒙𝟏 + 𝟎. 𝟎𝟏𝟐𝟓𝟐𝟏𝟔𝟐𝟓𝒙𝟐
𝒚
Using Matrix Approach

In fitting a multiple regression model, it is much easier to calculate the parameters
𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑘 using the matrix approach.
The model 𝑌 = 𝛽0 + 𝛽1 𝑥𝑖1 + 𝛽2 𝑥𝑖2 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜀𝑖 is a system of 𝑛 equations that

can be expressed in matrix notation as
𝑌 = 𝑋𝛽 + 𝜀
Where
𝑦1 1 𝑥11 𝑥12 . . . 𝑥1𝑘 𝛽0 𝜀1

𝑦2 1 𝑥21 𝑥22 . . . 𝑥2𝑘 𝛽 𝜀2
. . . . , 𝛽 = .1 , 𝜀 = .
𝑦= 𝑋= . . . .
.. .. .. .. .. .. .. .. .. ..
[𝑦𝑛 ] [1 𝑥𝑛1 𝑥𝑛2 . . . 𝑥𝑛𝑘 ] [𝛽𝑘 ] [𝜀𝑛 ]
In general, 𝑦 is an (𝑛 × 1) vector of the observations, 𝑋is an (𝑛 × 𝑝)matrix of the levels

of the independent variables, 𝛽 is a (𝑝 × 1) vector of the regression coefficients, and
𝜀 is a (𝑛 × 1) vector of random errors. We wish to find the vector of least squares
estimators, 𝛽̂ that minimizes 𝐿 = ∑𝑛𝑖=1 𝜀𝑖2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦)2
𝑛 𝑛
𝐿= ∑ 𝜀𝑖2 = ∑ 𝜀 𝑇 𝜀 = (𝑦 − 𝑋𝛽)2 = (𝑦 − 𝑋𝛽)𝑇 (𝑦 − 𝑋𝛽)

𝑖=1 𝑖=1
𝐿 = (𝑦 − 𝑋𝛽)(𝑦 − 𝑋𝛽)
37
𝐿 = 𝑦 2 − 2𝑋𝑦𝛽 + 𝛽 2 𝑋 2 = 𝒚𝑻 𝒚 − 𝟐𝑿𝒚𝜷 + 𝜷𝑻 𝜷𝑿𝑻 𝑿
Differentiating 𝐿 with respect to 𝛽 and equate the differential to 0 we have
𝑑𝐿
=0
𝑑𝛽
𝑑
(𝑦 2 − 2𝑋𝑦𝛽 + 𝛽 2 𝑋 2 ) = 0
𝑑𝛽
−2𝑋𝑦 + 2𝛽𝑋 2 = 0
𝛽̂ 𝑋 2 = 𝑋𝑦
𝛽̂ 𝑋 𝑇 𝑋 = 𝑋𝑦
𝛽̂ (𝑋 𝑇 𝑋) 𝑋𝑦
𝑇
= 𝑇
(𝑋 𝑋) (𝑋 𝑋)
𝑋𝑦
𝛽̂ =
(𝑋 𝑇 𝑋)
Not that 𝑋 is a symmetric. That is 𝑋 = 𝑋 𝑇
̂ = (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚
∴𝜷
6.2.2 Theorem
1 1 1 . . . 1
Suppose that 𝑋 𝑇 = ( 𝑇 𝑦2 . . . 𝑦𝑛 )
𝑥1 𝑥2 𝑥3 . . . 𝑥𝑛 ) and 𝑦 = (𝑦1
𝑉𝑎𝑟(𝛽̂0 ) 𝐶𝑜𝑣(𝛽̂0 , 𝛽̂1 )
(a) 𝜎 2 (𝑋 𝑇 𝑋)−1 = ( )
𝐶𝑜𝑣(𝛽̂0 , 𝛽̂1 ) 𝑉𝑎𝑟(𝛽̂1 )
𝛽̂
(b) (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝛽̂ = ( 0 )
𝛽̂1
Proof
𝑛
𝑦1
∑ 𝑦𝑖
1 1 1 . . . 1 𝑦2
𝑋𝑇 𝑦 = ( ) .. = 𝑖=1
𝑥1 𝑥2 𝑥3 . . . 𝑥𝑛 𝑛
.
(𝑦𝑛 ) ∑ 𝑥𝑖 𝑦𝑖
( 𝑖=1 )
𝑛
𝑥1 1
𝑥2 𝑛 ∑ 𝑥𝑖
1 𝑥
1 1 1 . . . 1
𝑋𝑇 𝑋 = ( ..3 = 𝑛 𝑖=1
𝑥1 𝑥2 𝑥3 . . . 𝑥𝑛 ) 1.. 𝑛
. . ∑ 𝑥𝑖 ∑ 𝑥𝑖2
(1 𝑥𝑛 ) ( 𝑖=1 𝑖=1 )
38
𝑛 𝑛 2
|𝑋 𝑇 𝑋| = 𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
𝑖=1 𝑖=1
𝑛 𝑛
∑ 𝑥𝑖2 − ∑ 𝑥𝑖
1 1 𝑖=1 𝑖=1
(𝑋 𝑇 𝑋)−1 = (𝑋 𝑇 𝑋)𝑇 = 𝑛
𝑇
𝑑𝑒𝑡(𝑋 𝑋) 𝑛 ∑𝑖=1 𝑥𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )2
𝑛 2
− ∑ 𝑥𝑖 𝑛
( 𝑖=1 )
∑𝑛𝑖=1 𝑥𝑖2 − ∑𝑛𝑖=1 𝑥𝑖

𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2
(𝑋 𝑇 𝑋)−1 =
− ∑𝑛𝑖=1 𝑥𝑖 𝑛
𝑛 𝑛
2
(𝑛 ∑𝑖=1 𝑥𝑖 − (∑𝑖=1 𝑥𝑖 )
2 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 )
𝜎2 ∑𝑛 2
𝑖=1 𝑥𝑖 −𝜎2 ∑𝑛
𝑖=1 𝑥𝑖
2 2
𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 ) 𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
(a) 𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = ( )
−𝜎2 ∑𝑛
𝑖=1 𝑥𝑖 𝑛𝜎2
2 2
𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 ) 𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2 𝑛
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = 𝑛
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖 𝑛𝜎 2
𝑛 𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑ 𝑛
( 𝑖=1 𝑥𝑖
2
− ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖 )
𝑛 𝑛
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2 𝑛
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = 𝑛
−𝜎 2 ∑𝑛𝑖=1 𝑥𝑖 𝑛𝜎 2
𝑛 𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑ 𝑛
𝑥
( 𝑖=1 𝑖
2
− ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖 )
𝑛 𝑛
𝜎 2 ∑𝑛𝑖=1 𝑥𝑖2 −𝑋̅𝜎 2

𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝜎 2 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝑛
−𝑋̅𝜎 2 𝜎2
(∑𝑛𝑖=1 𝑥𝑖 )2 (∑𝑛 𝑥 )2
∑ 𝑛
𝑥
( 𝑖=1 𝑖
2
− ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖 )
𝑛 𝑛
39
𝑽𝒂𝒓(𝜷̂ 𝟎) ̂ 𝟎, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟏)
∴ 𝝈𝟐 (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒀 = ( )
̂ 𝟎, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟏) 𝑽𝒂𝒓(𝜷̂ 𝟏)
𝛽̂
(b) (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝛽̂ = ( 0 )
𝛽̂1
𝑛
∑𝑛𝑖=1 𝑥𝑖2 − ∑𝑛𝑖=1 𝑥𝑖 ∑ 𝑦𝑖
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 𝑖=1
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 𝑛
− ∑𝑛𝑖=1 𝑥𝑖 𝑛
𝑛 2 𝑛
(𝑛 ∑𝑖=1 𝑥𝑖 − (∑𝑖=1 𝑥𝑖 )
2 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 ) ∑ 𝑥𝑖 𝑦𝑖
( 𝑖=1 )
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 =
− ∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖 + 𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
2
( 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 ) )
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 + ((∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐 ∑𝒏𝒊=𝟏 𝒚𝒊 − (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐 ∑𝒏𝒊=𝟏 𝒚𝒊 ) − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖

2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
=
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖
2
( 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 ) )
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐 ∑𝒏𝒊=𝟏 𝒚𝒊 + (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐 ∑𝒏𝒊=𝟏 𝒚𝒊 − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖

2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
=
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖
2
( 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 ) )
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 − (∑ni=1 xi )2 ∑ni=1 yi + (∑ni=1 xi )2 ∑ni=1 yi − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖

2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
= 𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 ∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖
−
𝑛 𝑛
𝑛 2
( ∑ 𝑖=1 𝑥𝑖 )
( ∑𝑛𝑖=1 𝑥𝑖2 − )
𝑛
40
∑𝑛𝑖=1 𝑦𝑖 ∑𝑛𝑖=1 𝑥𝑖2 − (∑ni=1 xi )2 ∑ni=1 yi + (∑ni=1 xi )2 ∑ni=1 yi − ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖
2
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
= ∑𝑛 𝑦 ∑𝑛 𝑥
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑖=1 𝑖 𝑖=1 𝑖
𝑛
2
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
( ∑𝑖=1 𝑥𝑖 − )
𝑛
∑𝑛𝑖=1 𝑦𝑖 𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )2 ∑ni=1 xi 𝑛

∑ 𝑥 −
𝑛 ( 𝑖=1 𝑖 𝑛 )
𝑛
(∑𝑖=1 𝑥𝑖 𝑦𝑖 − ∑ni=1 xi ∑ni=1 yi )
2 − 2
𝑛
𝑛 2 (∑𝑖=1 𝑥𝑖 ) 𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 − ∑𝑖=1 𝑥𝑖 −
= 𝑛 𝑛
𝑛 𝑛
∑ 𝑦 ∑ 𝑥
𝑛
2
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
( ∑𝑖=1 𝑥𝑖 − )
𝑛
(∑𝑛𝑖=1 𝑥𝑖 )2
(∑𝑛𝑖=1 𝑥𝑖2 −
𝑛 ) (∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ∑ni=1 xi ∑ni=1 yi )
𝑦̅ 2 − 𝑋̅ 2
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 ) 𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 − ∑𝑖=1 𝑥𝑖 −
= 𝑛 𝑛
𝑛 𝑛
∑ 𝑦 ∑ 𝑥
𝑛
𝑛 2
( ∑ 𝑥 𝑖 )
( ∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 )
𝑛
𝑛 𝑛
∑𝑖=1 𝑦𝑖 ∑𝑖=1 𝑥𝑖
∑𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 −
Note that 𝑛
2 = 𝛽̂1
(∑𝑛
𝑖=1 𝑥𝑖 )
∑𝑛 𝑥
𝑖=1 𝑖
2
−
𝑛
41
2
(∑𝑛𝑖=1 𝑥𝑖 )
(∑𝑛𝑖=1 𝑥2𝑖 − )
𝑛
𝑦
̅ ̂ 𝑋
−𝛽 ̅
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 2 1
𝑛 2 (∑𝑛𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 −
𝑛
̂
𝛽
( 1 )
𝑦
̅−𝛽 ̂ 𝑋 ̅
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = ( 1
) and 𝛽̂0 = 𝑦 ̂ 𝑋
̅−𝛽 ̅
̂ 1
𝛽 1
̂
̂ = (𝜷𝟎 )
∴ (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚 = 𝜷
̂𝟏
𝜷
6.2.3 Theorem
If = 𝑋𝛽 + 𝜀 , 𝜀𝑖 ~𝑁(0, 𝜎 2 ) ,𝑟 = (𝑌 − 𝑌̂)and (Hat Matrix) 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 , then
(a) 𝑌̂ = 𝐻𝑌
(b) 𝐻 𝑇 = 𝐻
(c) 𝐻 2 = 𝐻
(d) 𝑟 = (1 − 𝐻)𝑌
(e) 𝛽̂ ~𝑁(𝛽, 𝜎 2 (𝑋 𝑇 𝑋)−1 )
(f) 𝑌̂~𝑁(𝑋𝛽, 𝜎 2 𝐻)
(g) 𝑟~𝑁(0, 𝜎 2 (1 − 𝐻))
(h) 𝐸(𝑟 𝑇 𝑟) = 𝜎 2 (𝑛 − 𝑝 − 1)
Proofs
(a) 𝑌̂ = 𝐻𝑌
Since 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 and 𝛽̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 then
𝑌̂ = 𝛽̂ 𝑋
𝑌̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌𝑋
𝑌̂ = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
𝑌̂ = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑌 and 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇
∴ 𝑌̂ = 𝐻𝑌
(b) 𝐻 𝑇 = 𝐻
𝐻 𝑇 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑇
𝐻 𝑇 = (𝑋 𝑇 (𝑋 𝑇 𝑋)−1 𝑋)
42
𝐻 𝑇 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 ) = 𝐻
(c) 𝐻 2 = 𝐻
𝐻 2 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )2 = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )
𝐻 2 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋 and (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋 = 𝑰
𝐻 2 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝐼 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 = 𝑯
(d) 𝑟 = (1 − 𝐻)𝑌
𝑟 = (𝑌 − 𝑌̂) where 𝑌̂ = (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑌
𝑟 = (𝑌 − (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑌). Factorising 𝑌 we have
𝑟 = (1 − (𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 ))𝑌. 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 = 𝐻
𝑟 = (1 − 𝐻)𝑌
Thus, 𝒓 = (𝟏 − 𝑯)𝒀
(e) 𝛽̂ ~𝑁(𝛽, 𝜎 2 (𝑋 𝑇 𝑋)−1 ) where 𝛽̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
𝛽̂ follows the normal distribution with mean 𝛽 and variance 𝜎 2 (𝑋 𝑇 𝑋)−1
Mean
𝐸(𝛽̂ ) = 𝐸((𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝐸(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝐸(𝑌). Where 𝑌 = 𝑋𝛽 + 𝜀
𝐸(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝐸(𝑋𝛽 + 𝜀 )
𝐸(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋𝛽 and (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋 = 𝑰
𝐸(𝛽̂ ) = 𝐼𝛽 = 𝜷
Variance
𝑉𝑎𝑟(𝛽̂ ) = 𝑉𝑎𝑟((𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝑉𝑎𝑟(𝛽̂ ) = ((𝑋 𝑇 𝑋)−1 𝑋 𝑇 )2 𝑉𝑎𝑟(𝑌)
𝑉𝑎𝑟(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 ((𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑇 𝜎 2
𝑉𝑎𝑟(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 𝜎 2 and 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝑰
𝑉𝑎𝑟(𝛽̂ ) = (𝑋 𝑇 𝑋)−1 𝐼𝜎 2 = 𝝈𝟐 (𝑿𝑻 𝑿)−𝟏
̂ ~𝑵(𝜷, 𝝈𝟐 (𝑿𝑻 𝑿)−𝟏 )

∴𝜷
(f) 𝑌̂~𝑁(𝑋𝛽, 𝜎 2 𝐻)
𝑌̂ = 𝐻𝑌
Mean
43
𝐸(𝑌̂) = 𝐸(𝐻𝑌)
𝐸(𝑌̂) = 𝐸(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝐸(𝑌̂) = 𝐸(𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 𝑌). 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝐼
𝐸(𝑌̂) = 𝐸(𝐼𝑌) = 𝐸(𝑋𝛽 + 𝜀) = 𝑋𝛽 + 0 = 𝑿𝜷
Variance
𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝐻𝑌)
𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝑉𝑎𝑟(𝑌̂) = 𝑉𝑎𝑟(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌)
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑉𝑎𝑟(𝑌)(𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 )𝑇
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝜎 2 . 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝐼
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝐼𝑋 𝑇 𝜎 2
𝑉𝑎𝑟(𝑌̂) = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝜎 2 and 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 = 𝐻
𝑉𝑎𝑟(𝑌̂) = 𝝈𝟐 𝑯
̂ ~𝑵(𝑿𝜷, 𝝈𝟐 𝑯)
Thus, 𝒀
(g) 𝑟~𝑁(0, 𝜎 2 (1 − 𝐻))

Mean
𝑟 = (𝑌 − 𝑌̂)
𝑟 = (𝑌 − 𝐻𝑌)
𝑟 = (1 − 𝐻)𝑌
𝐸(𝑟) = (1 − 𝐻)𝐸(𝑌)
𝐸(𝑟) = (1 − 𝐻)𝐸(𝑋𝛽 + 𝜀)
𝐸(𝑟) = (1 − 𝐻)𝑋𝛽 = (𝑋𝛽 − 𝑋𝛽𝐻) where 𝑋 𝑇 𝑋(𝑋 𝑇 𝑋)−1 = 𝐼
𝑋 and 𝛽 are matrices while 𝜎 2 is a constant number.
𝐸(𝑟) = (𝑋𝛽 − 𝑋𝛽) = 𝟎
Variance
𝑉𝑎𝑟(𝑟) = 𝑉𝑎𝑟((1 − 𝐻)𝑌) = (1 − 𝐻)(1 − 𝐻)𝑇 𝑉𝑟(𝑌)
𝑉𝑎𝑟(𝑟) = (1 − 𝐻)(1 − 𝐻)𝑇 𝜎 2 . (1 − 𝐻) is idempotent.
That is, (1 − 𝐻)(1 − 𝐻)𝑇 = (𝟏 − 𝑯)
𝑉𝑎𝑟(𝑟) = (𝟏 − 𝑯)𝝈𝟐
∴ 𝒓~𝑵 (𝟎, 𝝈𝟐 (𝟏 − 𝑯))
44
(h) 𝐸(𝑟 𝑇 𝑟) = 𝜎 2 (𝑛 − 𝑝 − 1)
𝑇
𝐸(𝑟 𝑇 𝑟) = 𝐸 {((𝑌 − 𝐻𝑌)) ((𝑌 − 𝐻𝑌))}
𝑇
𝐸(𝑟 𝑇 𝑟) = 𝐸 {((1 − 𝐻)) (𝑌)𝑇 𝑌((1 − 𝐻))}
𝑇
𝐸(𝑟 𝑇 𝑟) = 𝐸 {((1 − 𝐻)) (1 − 𝐻)(𝑌)𝑇 𝑌}
(1 − 𝐻) is idempotent. That is, (1 − 𝐻)(1 − 𝐻)𝑇 = (𝟏 − 𝑯)

𝐸(𝑟 𝑇 𝑟) = 𝐸{(1 − 𝐻)(𝑌)𝑇 𝑌}
𝐸(𝑟 𝑇 𝑟) = {(1 − 𝐻)𝐸(𝑌)𝑇 𝑌}
𝐸(𝑟 𝑇 𝑟) = {(1 − 𝐻)𝐸(𝑋𝛽 + 𝜀)𝑇 (𝑋𝛽 + 𝜀)}
𝐸(𝑟 𝑇 𝑟) = {(1 − 𝐻)𝐸(𝑋 𝑇 𝛽 𝑇 𝑋𝛽 + 𝑋 𝑇 𝛽 𝑇 𝜀 + 𝑋𝛽𝜀 + 𝜀 𝑇 𝜀)}
𝐸(𝑟 𝑇 𝑟) = {(1 − 𝐻)(𝑋 𝑇 𝛽 𝑇 𝑋𝛽 + 𝑋 𝑇 𝛽 𝑇 𝐸(𝜀) + 𝑋𝛽𝐸(𝜀) + 𝐸(𝜀 𝑇 𝜀))}
𝐸(𝑟 𝑇 𝑟) = {(1 − 𝐻)(𝑋 𝑇 𝛽 𝑇 𝑋𝛽 + 𝑋 𝑇 𝛽 𝑇 × 0 + 𝑋𝛽 × 0 + 𝐸(𝜀 𝑇 𝜀))}
𝐸(𝑟 𝑇 𝑟) = {(1 − 𝐻)(𝑋 𝑇 𝛽 𝑇 𝑋𝛽 + 𝐸(𝜀 𝑇 𝜀))}
𝐸(𝑟 𝑇 𝑟) = (1 − 𝐻)(𝑋 𝑇 𝛽 𝑇 𝑋𝛽) + (1 − 𝐻)(𝐸(𝜀 𝑇 𝜀))
Note: (1 − 𝐻)(𝑋 𝑇 𝛽 𝑇 𝑋𝛽) = 0 since 𝐻(𝑋 𝑇 𝛽 𝑇 𝑋𝛽) = 𝑋 𝑇 𝛽 𝑇 𝑋𝛽 and (𝐸(𝜀 𝑇 𝜀)) = 𝜎 2
𝐸(𝑟 𝑇 𝑟) = (1 − 𝐻)𝜎 2
Now we need to get the 𝑡𝑟(1 − 𝐻)
𝑡𝑟(1) = 𝑛 and 𝑡𝑟(𝐻) = 𝑡𝑟(𝑋 𝑇 (𝑋 𝑇 𝑋)−1 𝑋) = 𝑝 + 1) where 𝑝 is the number of

independent variables.
𝐸(𝑟 𝑇 𝑟) = (1 − 𝐻)𝜎 2
𝐸(𝑟 𝑇 𝑟) = (𝑛 − (𝑝 + 1))𝜎 2
𝑬(𝒓𝑻 𝒓) = (𝒏 − 𝒑 − 𝟏)𝝈𝟐 .
(𝒏 − (𝒑 + 𝟏)) is the degrees of freedom for error in multiple regression
analysis.
Example 6.2.1
The average monthly electric power consumption (Y) at a certain manufacturing
plant is considered to be dependent on the average temperature (𝑥1 ) and the
number of working days in a month (𝑥2 ). Consider the one year monthly data given
in the table below;
45
𝑥1 20 26 41 55 60 67 75 79 70 55 45 33
𝑥2 23 21 24 25 24 26 25 25 24 25 25 23
𝑦 210 206 260 244 271 285 270 265 234 241 258 230
Determine the least-square estimates of the associated linear regression

coefficients and fit the model.
Solutions
1 20 23 210
1 1 . . . 1 1. 26 21 206
𝑇 . . .
𝑋 = (20 26 . . . 33), 𝑋 = ,𝑌=
.. .. .. ..
23 21 . . . 23
(1 33 23) (230)
Procedures to get the matrix 𝑿𝑻 𝑿 and 𝑿𝑻 𝒀:

Using 𝑓𝑥 − 991𝑀𝑆 calculator to get the 3 × 3 matrix procedures:
Mode-Mode-REG-Lin. Enter the set of numbers (𝑋, 𝑌) then press 𝑀 + for each
entered set of numbers. After that, shift-1 to get 𝑛, ∑𝑛𝑖=1 𝑥𝑖1 𝑎𝑛𝑑 ∑𝑛𝑖=1 𝑥𝑖1
2
then next to
get ∑𝑛𝑖=1 𝑥𝑖1 𝑥𝑖2 , ∑𝑛𝑖=1 𝑥𝑖2 , ∑𝑛𝑖=1 𝑥𝑖2,
2
𝑛, = 12, ∑𝑛𝑖=1 𝑥𝑖1 = 626, ∑𝑛𝑖=1 𝑥𝑖1

2
= 36776 , ∑𝑛𝑖=1 𝑥𝑖1 𝑥𝑖2 = 15336, ∑𝑛𝑖=1 𝑥𝑖2 = 290,
2
= 7028
1 20 23
1 1 . . . 1 1. 26 21 12 626 290
. .
𝑋 𝑇 𝑋 = (20 26 . . . 33) = (626 36776 15336)
.. .. ..
23 21 . . . 23 290 15336 7028
(1 33 23)
210
1 1 . . . 1 206 2974
.
𝑋 𝑇 𝑌 = (20 26 . . . 33) = (159011)
..
23 21 . . . 23 72166
( 230 )
Procedures to get the inverse of the matrix 𝑿𝑻 𝑿:
Using 𝑓𝑥 − 991𝑀𝑆 calculator to get the 3 × 3 inverse matrix procedures:
Mode-Mode-Mode-MAT. After that, shift-4- Dimension (1) then choose (3 for 3 ×
3 matrix )-then press 3= again 3=. Enter the entries of the matrix 𝑋 𝑇 𝑋 then
press = for each entered entry. After that press shift-4-MAT(3) then choose C (3)
representing 3 × 3 matrix-𝑥 −1 = then next to get the 3 × 3 inverse matrix.
46
51.16997994 0.105362232 −2.341367299
(𝑋 𝑇 𝑋)−1 = ( 0.105362232 0.0005189824426 −0.005480102741)
−2.341367299 −0.005480102741 0.108713627
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
51.16997994 0.105362232 −2.341367299 2974
= ( 0.105362232 0.0005189824426 −0.005480102741) (159011)
−2.341367299 −0.005480102741 0.108713627 72166
−33.838285
𝑇 −1 𝑇
(𝑋 𝑋) 𝑋 𝑌 = (0.394100741)
10.8046419
−33.838285 𝛽0
∴ (𝑿𝑻 −𝟏 𝑻 ̂
𝑿) 𝑿 𝒀 = 𝜷 = (0.394100741) = (𝛽1 )
10.8046419 𝛽2
𝛽̂0 = −33.838285, 𝛽̂1 = 0.394100741, 𝛽̂2 = 10.8046419
Model
𝑌 = −33.838285 + 0.394100741𝑥1 + 10.8046419𝑥2 + 𝜀𝑖
𝑦̂ = −33.838285 + 0.394100741𝑥1 + 10.8046419𝑥2
Example 6.2.2
Fit a polynomial 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝜀 to the following data;
𝑋 1 2 3 4 5 6 7 8 9 10
𝑌 20.6 30.8 55 71.4 97.3 131.8 156.3 197.3 238.7 291.7
Solution
𝑥 1 2 3 4 5 6 7 8 9 10
𝑥2 1 4 9 16 25 36 49 64 81 100
𝑦 20.6 30.8 55 71.4 97.3 131.8 156.3 197.3 238.7 291.7
𝑛, = 10, ∑𝑛𝑖=1 𝑥𝑖1 = 55, ∑𝑛𝑖=1 𝑥𝑖1

2
= 385 , ∑𝑛𝑖=1 𝑥𝑖1 𝑥𝑖2 = 3025, ∑𝑛𝑖=1 𝑥𝑖2 = 385,
2
= 25333
1 1 1
1 1 . . . 1 1. 2. 4. 10 55 385
𝑇
𝑋 𝑋 = (1 2 . . . 10 ) = ( 55 385 3025 )
.. .. ..
1 4 . . . 100 385 3025 25333
(1 10 100)
47
20.6
1 1 . . . 1 30.8 1290.9
.
𝑋 𝑇 𝑌 = (1 2 . . . 10 ) = ( 9547.9 )
..
1 4 . . . 100 77749.1
(291.7)
1.383333333 −0.525 0.041666666
(𝑋 𝑇 𝑋)−1 = ( −0.525 0.241287878 −0.020833333 )
0.041666666 −0.020833333 0.001893939394
1.383333333 −0.525 0.041666666 1290.9
(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌 = ( −0.525 0.241287878 −0.020833333 ) ( 9547.9 )
0.041666666 −0.020833333 0.001893939394 77749.1
12.64328107
𝑇 −1 𝑇
(𝑋 𝑋) 𝑋 𝑌 = ( 6.29713961 )
2.125002326
12.64328107 𝛽0
𝑻 −𝟏 𝑻 ̂
∴ (𝑿 𝑿) 𝑿 𝒀 = 𝜷 = ( 6.29713961 ) = (𝛽1 )
2.125002326 𝛽2
𝛽̂0 = 12.64328107, 𝛽̂1 = 6.29713961, 𝛽̂2 = 2.125002326
Thus, the estimated quadratic equation is
𝑌 = 12.64328107 + 6.29713961𝑥 + 2.125002326𝑥 2 + 𝜀𝑖
𝑦̂ = 12.64328107 + 6.29713961𝑥 + 2.125002326𝑥 2
6.3 F-Test of all Slopes

The F-test for linear regression tests whether any of the independent variables in a
multiple linear regression model are significant.
Calculations
𝑇
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = 𝑌 𝑌 −
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑅𝑒𝑔 = 𝛽̂ 𝑇 𝑋 𝑇 𝑌 −
𝑛
𝑆𝑆𝑅𝑒𝑠 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑔
Hypothesis
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑝−1 = 0 (there is no relationship)
𝐻1 : 𝛽𝑖 ≠ 0, for at least one value of 𝑖 (there is a relationship)
48
ANOVA Table
Regression 𝑆𝑆𝑅𝑒𝑔 𝑝−1 𝑆𝑆𝑅𝑒𝑔 𝑆𝑆𝑅𝑒𝑔

𝑝−1 𝑝−1
Or 𝑆𝑆𝑅𝑒𝑠
𝑁−𝑝−1
𝑘
Error or 𝑆𝑆𝑅𝑒𝑠 𝑛−𝑘−1 𝑆𝑆𝑅𝑒𝑠

Residue 𝑁−𝑝−1
Or
𝑛−𝑝
Total 𝑆𝑆𝑇 𝑛−1
Where 𝑝 is the number of both the independent and dependent variables, 𝑘 is the
number of the independent variables and 𝑛 is the sample size.
Decision Rule
𝛼
Reject 𝐻0 if𝐹 ∗ > 𝑓(𝑝−1),(𝑛−𝑘−1)
49
Example 6.3.1
The table below shows pull strength of a wire bond in a semiconductor manufacturing
process, wire length and die height.
Observations Pull Wire Die Number 𝑦 𝑥1 𝑥2

Strength Strength Height
Number
𝑦
𝑥1 𝑥2
1 9.95 2 50 14 11.66 2 360
2 24.45 8 110 15 21.65 4 205
3 31.75 11 120 16 17.89 4 400
4 35.00 10 550 17 69.00 20 600
5 25.02 8 295 18 10.30 1 585
6 16.86 4 200 19 34.93 10 540
7 14.38 2 375 20 46.59 15 250
8 9.60 2 52 21 44.88 15 290
9 24.35 9 100 22 54.12 16 510
10 27.50 8 300 23 56.63 17 590
11 17.08 4 412 24 22.13 6 100
12 37.00 11 400 25 21.15 5 400
13 41.95 12 500
Test the hypothesis at 𝛼 = 0.05 that pull strength is linearly related to either wire
length or die height, or to both.
50
Solutions
𝑛, = 25, ∑25 25 2 25 25
𝑖=1 𝑥𝑖1 = 206, ∑𝑖=1 𝑥𝑖1 = 2396 , ∑𝑖=1 𝑥𝑖1 𝑥𝑖2 = 77177, ∑𝑖=1 𝑥𝑖2 =8294,
∑25 2 25 25 2
𝑖=1 𝑥𝑖2 = 3531848, ∑𝑖=1 𝑦𝑖 = 725.82, ∑𝑖=1 𝑦𝑖 = 27178.5316
1 2 50
1 1 . . . 1 1. 3. 110 25 206 8294
𝑇 .
𝑋 𝑋=(2 8 . . . 5 ) = ( 206 2396 77177 )
.. .. ..
50 110 . . . 400 8294 77177 3531848
(1 5 400)
9.95
1 1 .
. . 1 24.45 725.82
𝑇 .
𝑋 𝑌=(2 8 . . 5 )
. = ( 8008.47 )
..
50 110 . . . 400 274816.71
( 21.15 )
0.214652616 −0.007490914218 −0.0003403890868

(𝑋 𝑇 𝑋) −1
= ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
0.214652616 −0.007490914218 −0.0003403890868 725.82
= ( −0.007490914218 0.001670763131 −0.00001891781403 ) ( 8008.47 )
−0.0003403890868 −0.00001891781403 0.000001495876159 274816.71
2.263791003
= (2.744269642)
0.012527811
𝟐. 𝟐𝟔𝟑𝟕𝟗𝟏𝟎𝟎𝟑
̂ = (𝟐. 𝟕𝟒𝟒𝟐𝟔𝟗𝟔𝟒𝟐)
∴ (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒀 = 𝜷
𝟎. 𝟎𝟏𝟐𝟓𝟐𝟔𝟓𝟏𝟖
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = 𝑌 𝑇 𝑌 −
𝑛
(725.82)2
𝑆𝑆𝑇 = 27178.5316 − = 𝟔𝟏𝟎𝟓. 𝟗𝟒𝟓
25
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑛
725.82 (725.82)2
𝑆𝑆𝑅𝑒𝑔 = (2.263791003 2.744269642 )
0.012527811 ( 8008.47 ) −
25
274816.71
51
(725.82)2
𝑆𝑆𝑅𝑒𝑔 = 27063.35769 − = 𝟓𝟗𝟗𝟎. 𝟕𝟕𝟏
25
𝑆𝑆𝑅𝑒𝑠 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑔= 6105.945 − 5990.771 = 𝟏𝟏𝟓. 𝟏𝟕𝟒
ANOVA Table
Source 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
Regression 5990.771 2 2995.3856 𝟓𝟕𝟐. 𝟐
Error or
Residue 115.174 22 5.235
Total 6105.945 24
Hypothesis
𝐻0 : 𝛽1 = 𝛽2 = 0 (Pull strength is not linearly related to either wire length or die height,
or to both).
𝐻1 : 𝛽1 = 𝛽2 ≠ 0 (Pull strength is linearly related to either wire length or die height, or

to both).
Decision Rule
𝛼=0.05
Reject 𝐻0 if𝐹 ∗ > 𝑓2,22 = 3.4434
Conclusion
𝛼=0.05
Since 𝐹 ∗ = 572.2 > 𝑓2,22 = 3.4434, we reject 𝐻0 at 5% level of significant and
conclude that pull strength is linearly related to either wire length or die height, or to
both.
52
6.4 Tests of Individual Regression Coefficients and Subsets of
Coefficients
We are frequently interested in testing hypotheses on the individual regression
coefficients. Such tests would be useful in determining the potential value of each of
the regressor variables in the regression model.
Hypothesis
𝐻0 : 𝛽𝑗 = 0
𝐻1 : 𝛽𝑗 ≠ 0
Calculations
̂𝑗 −𝛽𝑗
𝛽
𝑡= ̂𝑗 )
𝑆𝐸(𝛽
𝑆𝐸(𝛽̂𝑗 ) = √𝑀𝑆𝐸(𝐶𝑗𝑗 )
̂𝑗 −𝛽𝑗
𝛽
𝑡= where 𝐶𝑗𝑗 is the diagonal element of (𝑋 𝑇 𝑋)−1 corresponding to 𝛽̂𝑗 .
√𝑀𝑆𝐸(𝐶𝑗𝑗 )
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡𝛼, 𝑛 − 𝑘 − 1
2
Example 6.4.1
Consider the wire bond pull strength data from example 6.3.1 and suppose that we
want to test the hypothesis that the regression coefficient for 𝑥2 (die height) is zero.
53
Solution
ANOVA Table
Regression 5990.771 2 2995.3856 𝟓𝟕𝟐. 𝟐
Error or
Residue 115.174 22 5.235
Total 6105.945 24
0.214652616 −0.007490914218 −0.0003403890868

(𝑋 𝑇 𝑋)−1 = ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
2.263791003
𝛽̂ = (2.744269642)
0.012526518
Hypothesis
𝐻0 : 𝛽2 = 0 (The variable 𝑥2 (die height) does not contribute significantly to the model).
𝐻1 : 𝛽2 ≠ 0 (The variable 𝑥2 (die height) contributes significantly to the model).
Calculations
̂2 −𝛽2
𝛽
𝑡= ̂𝑗 )
𝑆𝐸(𝛽
̂2 −𝛽2
𝛽
𝑡= .
√𝑀𝑆𝐸(𝐶22 )
The main diagonal element of the (𝑋 𝑇 𝑋)−1 matrix corresponding to 𝛽̂2 is

𝐶22 = 0.000001495876
54
𝐶22 = 0.0000015
0.012526518−0
𝑡= = 𝟒. 𝟒𝟕
√0.0000015(5.235)
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡0.975,22 = 2.07
Conclusion
Since |𝑡| = 4.47 > 𝑡0.975,22 = 2.07, we reject 𝐻0 at 5% level of significant and conclude
that the variable 𝑥2 (die height) contributes significantly to the model.
Example 6.4.2
A collector of antique grandfather clocks knows that the price received for the clocks
increases linearly with the age of the clocks. Moreover, then collector hypothesizes
that the auction price of the clocks will increase linearly as the number of bidders
increases. A sample of 32 auction prices of grandfather clocks, along with their age
and the number of bidders, is summarised below;
𝑛 = 32, ∑32 32 2 32 32
𝑖=1 𝑦𝑖 = 42460 ∑𝑖=1 𝑦𝑖 = 61138902, ,∑𝑖=1 𝑥𝑖1 = 4638, ∑𝑖=1 𝑥𝑖2 =305
∑32 32 2 32 2
𝑖=1 𝑥𝑖1 𝑥𝑖2 =43594, ∑𝑖=1 𝑥𝑖1 = 695486, ∑𝑖=1 𝑥𝑖2 =3157,
∑32 32
𝑖=1 𝑥𝑖1 𝑦𝑖 = 6397869, ∑𝑖=1 𝑥𝑖2 𝑦𝑖 =418386.
Where 𝑥1 is the age, 𝑥2 is the number of bidders and 𝑦 is the auction price.
(a) Test the hypothesis that the mean auction price of a clock increases as the
number of bidders increases. Use 𝛼 = 0.05.
(b) Form a 90% confidence interval for 𝛽1 and interpret the result.
Solutions
32 4638 305
𝑋 𝑇 𝑋 = (4638 695486 43594)
305 43594 3157
55
42460
𝑋 𝑇 𝑌 = (6397869)
418386
1.695446553 −0.007730243607 −0.057053835

(𝑋 𝑇 𝑋) −1
= (−0.007730243607 0.00004593937769 0.0001124621695)
−0.057053835 0.0001124621695 0.004275813752
1.695446553 −0.007730243607 −0.057053835 42460
= (−0.007730243607 0.00004593937769 0.0001124621695 6397869)
) (
−0.057053835 0.0001124621695 0.004275813752 418386
−1338.951106
= ( 12.7405741 )
85.95300626
−𝟏𝟑𝟑𝟖. 𝟗𝟓𝟏𝟏𝟎𝟔 𝛽̂0
̂ = ( 𝟏𝟐. 𝟕𝟒𝟎𝟓𝟕𝟒𝟏 )=(𝛽̂1 )
∴ (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒀 = 𝜷
𝟖𝟓. 𝟗𝟓𝟑𝟎𝟎𝟔𝟐𝟔 𝛽̂2
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑇
𝑆𝑆𝑇 = 𝑌 𝑌 −
𝑛
(42460)2
𝑆𝑆𝑇 = 61138902 − = 𝟒𝟕𝟗𝟗𝟕𝟖𝟗. 𝟓
32
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑛
42460 (42460)2
𝑆𝑆𝑅𝑒𝑔 = (−1338.951106 12.7405741 ) (
85.95300626 6397869 ) −
32
418386
(42460)2
𝑆𝑆𝑅𝑒𝑔 = 60622194.59 − = 𝟒𝟐𝟖𝟑𝟎𝟖𝟐. 𝟎𝟗
32
𝑆𝑆𝑅𝑒𝑠 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑔=4799789.5 − 4283082.09 =516707.41
56
ANOVA Table
Regression 4283082.09 2 2141541.045 𝟏𝟐𝟎. 𝟏𝟗
Error or
Residue 516707.41 29 17817.4969
Total 4799789.5 31
Hypothesis
(a) 𝐻0 : 𝛽2 = 0 (The mean auction price of a clock decreases as the number of
bidders increases.
𝐻1 : 𝛽2 > 0 (The mean auction price of a clock increases as the number of bidders
increases).
Decision Rule
Reject 𝐻0 if |𝑡| > 𝑡0.95,29 = 1.6991
Calculation
̂2 −𝛽2
𝛽
𝑡= .
√𝑀𝑆𝐸(𝐶22 )

𝐶22 = 0.004275813752
85.95300626−0
𝑡= = 𝟗. 𝟖𝟓
√17817.4969(0.004275813752)
Conclusion
Since |𝑡| = 9.85 > 𝑡0.95,29 = 1.6991, we reject 𝐻0 at 5% level of significant and
conclude that the mean auction price of a clock increases as the number of bidders
increases.
(b) 90% confidence interval for 𝛽1 and interpret the result.
57
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 𝑘 − 1) √𝑀𝑆𝐸(𝐶11 )
2

𝐶11 = 0.00004593937769
𝛽̂1 ± (𝑡𝛼 , 𝑛 − 𝑘 − 1) √𝑀𝑆𝐸(𝐶11 )

2
12.7405741 ± (𝑡0.95 , 29)√17817.4969(0.00004593937769)
12.7405741 ± (1.6991)√17817.4969(0.00004593937769)
12.7405741 ± (1.5372158)
(𝟏𝟏. 𝟐, 𝟏𝟒. 𝟑)
Interpretation
Thus, we are 90% confident that 𝛽1 falls between 11.2 and 14.3. Since 𝛽1 is the slope
of the line relating auction price (y) to age of the clock (𝑥1 ), we conclude that price
increases between $11.2 and $14.3 for every 1-year increase in age, holding number
of bidders (𝑥1 ) constant.
6.4 Multicollinearity
Multicollinearity refers to a situation in which two or more explanatory (independent)
variables in a multiple regression model are highly linearly related. More commonly,
the issue of multicollinearity arises when there is an approximate linear relationship
among two or more independent variables. Multicollinearity can lead to wider
confidence intervals and less reliable probability values (P-values) for the independent
variables.
Causes of Multicollinearity
 Multicollinearity occurs when the variables are highly correlated to each other.
 Multicollinearity can result from the repetition of the same kind of variable.
 It is caused by the inclusion of a variable which is computed from other
variables in the data set.
How to deal with Multicollinearity
The following are the solutions;
 Remove some of the highly correlated independent variables.
 Perform an analysis designed for highly correlated variables such as principle
components analysis.
58
How to Check for Multicollinearity
The effects of multicollinearity may be easily demonstrated. The diagonal elements of
the matrix 𝐶 = (𝑋 𝑇 𝑋)−1 can be written as
1
𝐶𝑗𝑗 = , 𝑗 = 1,2, … 𝑘
(1−𝑅𝑗2 )
Where 𝑅𝑗2 is the coefficient of multiple determination resulting from regressing 𝑥𝑗 on the
other 𝑘 − 1 regressor variables. Clearly, the stronger the linear dependency of 𝑥𝑗 on
the remaining regressor variables, and hence the stronger the multicollinearity, the
larger the value of 𝑅𝑗2 will be. Recall that 𝑉𝑎𝑟(𝛽̂𝑗 ) = 𝜎 2 𝐶𝑗𝑗 Therefore, we say that the
1
variance of 𝛽̂𝑗 is “inflated’’ by the quantity . Consequently, we define the variance
(1−𝑅𝑗2 )
inflation factor for 𝛽𝑗 as

1
𝑉𝐼𝐹(𝛽𝑗 ) = , 𝑗 = 1,2, … 𝑘
(1−𝑅𝑗2 )
These factors are an important measure of the extent to which multicollinearity is

present.
The presence of multicollinearity can be detected in several ways. Two of the more
easily understood of these will be discussed briefly.
 The variance inflation factors, defined in the equation above are very useful
measures of multicollinearity. The larger the variance inflation factor, the more
severe the multicollinearity. Some authors have suggested that if any variance
inflation factor exceeds 10, multicollinearity is a problem. Other authors
consider this value too liberal and suggest that the variance inflation factors
should not exceed 4 or 5.
 If the F-test for significance of regression is significant, but tests on the
individual regression coefficients are not significant, multicollinearity may be
present.
59
Example
Consider the wire bond pull strength data from example 6.3.1
Observations Pull Wire Die Number 𝑦 𝑥1 𝑥2

Strength Strength Height
Number
𝑦
𝑥1 𝑥2
1 9.95 2 50 14 11.66 2 360
2 24.45 8 110 15 21.65 4 205
3 31.75 11 120 16 17.89 4 400
4 35.00 10 550 17 69.00 20 600
5 25.02 8 295 18 10.30 1 585
6 16.86 4 200 19 34.93 10 540
7 14.38 2 375 20 46.59 15 250
8 9.60 2 52 21 44.88 15 290
9 24.35 9 100 22 54.12 16 510
10 27.50 8 300 23 56.63 17 590
11 17.08 4 412 24 22.13 6 100
12 37.00 11 400 25 21.15 5 400
13 41.95 12 500
𝑛, = 25, ∑25 25 2 25 25
𝑖=1 𝑥𝑖1 = 206, ∑𝑖=1 𝑥𝑖1 = 2396 , ∑𝑖=1 𝑥𝑖1 𝑥𝑖2 = 77177, ∑𝑖=1 𝑥𝑖2 =8294,
∑25 2
𝑖=1 𝑥𝑖2 = 3531848
Calculate
(a) 𝑉𝐼𝐹(𝛽1 )
(b) 𝑉𝐼𝐹(𝛽2 )
60
Solutions
To calculate the variance inflation factor for 𝛽1 and 𝛽2, we need to calculate 𝑅12 and
𝑅22 . 𝑅12 and 𝑅22 are the coefficients of determinations for the model when 𝑥1 is
regressed on the remaining variables and when 𝑥2 is regressed on the remaining
variables.
(a) Coefficient of Determination (𝑹𝟐𝟏 )

𝑥𝑖1 becomes 𝑦 and 𝑥𝑖2 becomes 𝑥
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = ∑ 𝑦𝑖2 −
𝑛
𝑖=1
(206)2
𝑆𝑆𝑇 = 2396 − = 𝟔𝟗𝟖. 𝟓𝟔
25
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
206 × 8294 2
(77177 − )
SSReg = 25 = 𝟏𝟎𝟎. 𝟎𝟑𝟏𝟏𝟏𝟏𝟓
(8294)2
3531848 −
25
𝑅12 =
𝑆𝑆𝑇
100.0311115
𝑅12 = = 𝟎. 𝟏𝟒𝟑𝟏𝟗𝟔𝟏𝟔𝟐
698.56
1
𝑉𝐼𝐹(𝛽1 ) =
(1 − 𝑅12 )
1
𝑉𝐼𝐹(𝛽1 ) = = 1.167128293 = 𝟏. 𝟐
(1 − 0.143196162)
(b) Coefficient of Determination (𝑹𝟐𝟐 )

𝑥𝑖2 becomes 𝑦 and 𝑥𝑖1 becomes 𝑥
𝑛
(∑𝑛𝑖=1 𝑦𝑖 )2
𝑆𝑆𝑇 = ∑ 𝑦𝑖2 −
𝑛
𝑖=1
(8294)2
𝑆𝑆𝑇 = 3531848 − = 𝟕𝟖𝟎𝟐𝟑𝟎. 𝟓𝟔
25
61
(∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) 2
(∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − )
𝑛
SSReg =
(∑𝑛 𝑥 )2
∑𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
206 × 8294 2
(77177 − )
S SReg = 25 = 𝟏𝟏𝟏𝟕𝟐𝟔. 𝟎𝟐𝟐𝟑
(206)2
2396 −
25
𝑅22 =
𝑆𝑆𝑇
111726.0223
𝑅22 = = 𝟎. 𝟏𝟒𝟑𝟏𝟗𝟔𝟏𝟔𝟐
780230.56
1
𝑉𝐼𝐹(𝛽2 ) =
(1 − 𝑅12 )
1
𝑉𝐼𝐹(𝛽2 ) = = 1.167128293 = 𝟏. 𝟐
(1 − 0.143196162)
Conclusion
Since both 𝑉𝐼𝐹1and 𝑉𝐼𝐹2 are small, there is no problem with multicollinearity.
6.5 Prediction of New Observations

A regression model can be used to predict new or future observations on the response
variable Y corresponding to particular values of the independent variables, say
𝑥01 , 𝑥02 , 𝑥03 , … , 𝑥0𝑘 . A point estimate of the future observation 𝑌0 at the point
̂
̂𝟎 = 𝑿𝑻𝟎 𝜷
𝑥01 , 𝑥02 , 𝑥03 , … , 𝑥0𝑘 is 𝒚
A 100 (1 − 𝛼)% prediction interval for this future observation is
̂𝟎 ± 𝒕𝜶,𝒏−𝒑 √𝑴𝑺𝑬(𝟏 + 𝑿𝑻𝟎 (𝑿𝑻 𝑿)−𝟏 𝑿𝟎 )

𝒚
𝟐
Example
Consider the wire bond pull strength data from example 6.3.1, construct a 95%
prediction interval on the wire bond pull strength when the wire length is 𝑥1 = 8 and
the die height is 𝑥2 = 275.
62
Solution
2.263791003
𝑋0𝑇 = (1 8 275) and 𝒚 ̂ = (1
̂𝟎 = 𝑿𝑻𝟎 𝜷 )
8 275 2.744269642)
(
0.012527811
̂𝟎 = 𝟐𝟕. 𝟔𝟔
𝒚
0.214652616 −0.007490914218 −0.0003403890868
(𝑋 𝑇 𝑋)−1 = ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
𝑋0𝑇 (𝑋 𝑇 𝑋)−1 𝑋0
0.214652616 −0.007490914218 −0.0003403890868 1
= (1 8 275) ( −0.007490914218 0.001670763131 −0.00001891781403) ( 8 )
−0.0003403890868 −0.00001891781403 0.000001495876159 275
0.061118303
𝑋0𝑇 (𝑋 𝑇 𝑋)−1 𝑋0 = (1 8 275 ) ( 0.0006727919718 ) = 𝟎. 𝟎𝟒𝟒𝟒
−0.00008036565532
95% prediction interval
𝑦̂0 ± 𝑡𝛼,𝑛−𝑝 √𝑀𝑆𝐸(1 + 𝑋0𝑇 (𝑋 𝑇 𝑋)−1 𝑋0 )

2
27.66 ± 𝒕𝟎.𝟗𝟕𝟓,𝟐𝟐 √5.235(1 + 0.0444)
27.66 ± 2.0739√5.235(1 + 0.0444)

27.66 ± 2.0739(2.33825447)
(𝟐𝟐. 𝟖𝟏, 𝟑𝟐. 𝟓𝟏)
6.6 Residual Analysis

The residuals from the multiple regression model, defined by 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 play an
important role in judging model adequacy just as they do in simple linear regression.
It is also helpful to plot the residuals against variables not presently in the model that
are possible candidates for inclusion. Patterns in these plots may indicate that the
model may be improved by adding the candidate variable.
63
Standardized Residuals
Standardized residuals are often more useful than the ordinary residuals when
𝑒𝑖
assessing residual magnitude and is given by 𝑑𝑖 = .
√𝑀𝑆𝐸
Some analysts prefer to plot standardized residuals instead of ordinary residuals,

because the standardized residuals are scaled so that their standard deviation is
approximately unity. Consequently, large residuals (that may indicate possible outliers
or unusual observations) will be more obvious from inspection of the residual plots.
Many regression computer programs compute other types of scaled residuals. One of
the most popular is the studentized residual
𝑒𝑖
𝑟𝑖 =
√𝑀𝑆𝐸(1 − ℎ𝑖𝑖 )
Where ℎ𝑖𝑖 is the 𝑖 𝑡ℎ diagonal element of the matrix ℎ𝑖𝑖 = 𝑋𝑖𝑇 (𝑋 𝑇 𝑋)−1 𝑋.
Example
Consider the wire bond pull strength data from example 6.3.1
(a) Calculate the standardized residuals corresponding to 𝑒15 and 𝑒17 .
(b) Calculate studentized residuals corresponding to 𝑒15 and 𝑒17 .
Solutions
0.214652616 −0.007490914218 −0.0003403890868
(𝑋 𝑇 𝑋)−1 = ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
𝑒𝑖
𝑑𝑖 =
√𝑀𝑆𝐸
64
ANOVA Table
Regression 5990.771 2 2995.3856 𝟓𝟕𝟐. 𝟐
Error or
Residue 115.174 22 5.235
Total 6105.945 24
𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
𝑦̂𝑖 = 2.263791003 + 2.744269642𝑥1 + 0.012527811𝑥2
𝑒15 means 𝑛 = 15, 𝑦 = 21.65, 𝑥1 = 4 𝑥2 =205
𝑦̂𝑖 = 2.263791003 + 2.744269642(4) + 0.012527811(205) = 15.81
𝑒15 = 21.65 − 15.81 = 𝟓. 𝟖𝟒
𝑦̂𝑖 = 2.263791003 + 2.744269642(20) + 0.012527811(600) = 64.666
𝑒17 = 69 − 64.666 = 𝟒. 𝟑𝟑
𝑒𝑖
(a) 𝑑𝑖 =
√𝑀𝑆𝐸
5.84
𝑑15 = = 𝟐. 𝟓𝟓
√5.235
𝑒𝑖
𝑑𝑖 =
√𝑀𝑆𝐸
4.33
𝑑17 = = 𝟏. 𝟖𝟗
√5.235
𝑒𝑖
(b) 𝑟𝑖 =
√𝑀𝑆𝐸 (1−ℎ𝑖𝑖 )
ℎ𝑖𝑖 = 𝑋𝑖𝑇 (𝑋 𝑇 𝑋)−1 𝑋

0.214652616 −0.007490914218 −0.0003403890868
(𝑋 𝑇 𝑋)−1 = ( −0.007490914218 0.001670763131 −0.00001891781403)
−0.0003403890868 −0.00001891781403 0.000001495876159
65
𝑋𝑖𝑇 = (1 4 205)
ℎ15
0.214652616 −0.007490914218 −0.0003403890868 1
= (1 4 205) ( −0.007490914218 0.001670763131 −0.00001891781403) ( 4 )
−0.0003403890868 −0.00001891781403 0.000001495876159 205
0.114909196
ℎ15 = (1 4 )
205 ( −0.00468601357 ) = 𝟎. 𝟎𝟕𝟑𝟕
−0.0001094057303
ℎ17
0.214652616 −0.007490914218 −0.0003403890868 1
= (1 20 ) (
600 −0.007490914218 0.001670763131 −0.00001891781403 ) ( 20 )
−0.0003403890868 −0.00001891781403 0.000001495876159 600
−0.13939912
ℎ17 = (1 20 600) ( 0.014573659 ) = 𝟎. 𝟐𝟓𝟗𝟑
0.000178780328
5.84
𝑟15 = = 𝟐. 𝟔𝟓
√5.235(1 − 0.0737)
4.33
𝑟17 = = 𝟐. 𝟐
√5.235(1 − 0.2593)
6.7 Weighted Regression

The method of ordinary least squares assume that there is constant variance in the
errors (which is called homoscedasticity). The method of weighted least squares can
be used when ordinary least squares assumption of constant variance in the errors is
violated (which is called heteroscedasticity). The model under consideration is
Where 𝜀 is assumed to be (multivariate) normally distributed with mean vector 0 and
non-constant variance-covariance matrix. The weighted least square is given by
66
𝑛 𝑛 𝑛
𝛽0 ∑ 𝑤𝑖 + 𝛽1 ∑ 𝑤𝑖 𝑥𝑖 = ∑ 𝑤𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝛽0 ∑ 𝑤𝑖 𝑥𝑖 + 𝛽1 ∑ 𝑤𝑖 𝑥𝑖2 = ∑ 𝑤𝑖 𝑥𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
1
Each weight is inversely proportional to the error variance (𝑤𝑖 = ). An observation
𝑥𝑖
with small error variance has a large weight and it contains relatively more information
than an observation with large error variance (small weight).
67
Tutorial Sheet 6
1. The following data represent the relationship between the number of alignment
errors and the number of missing rivets for 10 different aircrafts.
Missing Rivets (X) 13 15 10 22 30 7 25 16 20 15

Alignment Errors (Y) 7 7 5 12 15 2 13 9 11 8
(a) Fit a polynomial 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝜀 to the following data.

(b) Test the hypothesis at 𝛼 = 0.05 that 𝛽0 = 1
2. A regression model is to be developed for predicting the ability of soil to absorb

chemical contaminants. Ten observations have been taken on a soil absorption
index (𝑦) and two regressors:
𝑥1 = amount of extractable iron ore and 𝑥2 = amount of bauxite. We wish to fit the
model 𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜀 .Some necessary quantities are:
1.17991 −0.00730982 0.00073006
(𝑋 𝑇 𝑋)−1 = ( − 0.000079799 −0.000123713)
− − 0.00046576
320
𝑇
𝑋 𝑌 = (36768), 𝑆𝑆𝑇 = 𝟕𝟒𝟐𝟎
9965
(a) Use the results above to compute 𝛽̂ and fit the model 𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜀
(b) Write the analysis of variance table.
(c) Construct a 95% confidence interval for each regression coefficient in the model.
68
(d) Compute t statistics to test simple hypothesis for regression coefficient is equal to
0. State the hypothesis for 𝛽1 and 𝛽2 in words and write the conclusion for each
regression coefficient.
3. A study was performed on wear of bearing 𝑌 and its relationship to

𝑥1 = Oil Viscosity and 𝑥2 = Load. The following data was obtained
𝑥1 1.6 15.5 22 43 33 40
𝑥2 851 816 1058 1201 1357 1115
𝑦 293 230 172 91 113 125
(a) Fit a multiple regression model to these data.

(b) Use ANOVA approach to test the hypothesis that wear of bearing is either
related to oil viscosity or load, or to both. Use 𝛼 = 0.05
(c) Construct a 95% confidence interval for (i) 𝛽0, 𝛽1 and 𝛽2
4. A study was performed to investigate the shear strength of soil 𝑦 if it is related to

depth in feet 𝑥1 and moisture content 𝑥2 . The following data was collected;
𝑛 = 10, ∑10 10 10 10
𝑖=1 𝑦𝑖 = 1916 ∑𝑖=1 𝑥𝑖1 = 223, ∑𝑖=1 𝑥𝑖2 = 553, ∑𝑖=1 𝑥𝑖1 𝑥𝑖2 =12352,
∑10 2 10 2 10 10
𝑖=1 𝑥𝑖1 =5200.9, ∑𝑖=1 𝑥𝑖2 =31729, ∑𝑖=1 𝑥𝑖2 𝑦𝑖 = 104736.8 ∑𝑖=1 𝑥𝑖1 𝑦𝑖 = 43550.8,
∑10 2
𝑖=1 𝑦𝑖 =371595.6
(a) Estimate the parameters and fit the model.

(b) Calculate the t-test on each regression coefficient. What are your conclusions?
Use 𝛼 = 0.05.
(c) Find a 95% prediction interval when 𝑥1 = 200 and 𝑥2 = 50.
(d) Calculate the coefficient of determination (𝑅 2 ).
5. A bar soap manufacturer was interested in predicting the number of bars

purchased by 15 consumers over a 6 month period. They also counted the number
of coupons they redeemed and the proportion of their purchase bought when the
item was on sale. Results are summarised in the table below;
69
Number of Bars of Soap Coupons Redeemed Proportion on Sale
31 22 0.87
24 15 0.79
17 9 0.41
14 3 0.36
13 4 0.38
26 15 0.69
10 8 0.1
33 24 0.76
17 10 0.47
28 21 0.82
33 27 0.85
42 31 0.95
11 1 0.09
18 9 0.44
12 4 0.17
(a) Which variable is the dependent variable and which ones are the independent
variables?
(b) Construct a multiple regression model.
(c) Interpret the coefficients of the model.
(d) Write the ANOVA table
(e) Calculate the standardized residuals corresponding to 𝑒13
(f) Calculate studentized residuals corresponding to 𝑒13 .
6. The following data represent travel times in a downtown area of a certain city. The
independent, or input, variable is the distance to be travelled.
70
Distance
0.5 1 1.5 2 3 4 5 6 8 10
(X)
Time (Y) 15 15.1 16.5 19.9 27.7 29.7 26.7 35.9 42 49.4
Calculate the weighted least square and fit the weighted least squares regression
equation.
7. An X matrix consists of two independent variables constructed in the following way;

𝑥1 is the sequence of numbers 20 to 29 and repeated.
𝑥2 is 𝑥1 minus 25 with the first and eleventh observations changed to −4 (from −5)
to avoid complete linear dependency.
Calculate
(a) Variance Inflation Factors (𝑉𝐼𝐹1 )
(b) Variance Inflation Factors (𝑉𝐼𝐹2 )
71
Unit 7
Analysis of Variance and
Covariance
7.1 Analysis of Variance
Analysis of Variance (ANOVA) is a collection of statistical models and their
associated estimation procedures (such as the variation among and between
groups) used to test differences between two or more means. ANOVA helps when
conducting research to figure out if to reject the null hypothesis or not. Experiments
involves collecting and analysing data from a comparative investigations. There
are two types of comparative investigations namely observational study and
experimental study. Observational study involves drawing conclusion about the
population based on the sample where the independent variable is not under the
control of the researcher because of ethical concerns. Experimental study
involves the random assignment of participants into different groups.
7.1.1 Assumption of Analysis of Variance

The following are the assumptions of Analysis of Variance;
 Independence: Individual observations are mutually independent. The value
of one observation must not influence the value of other observations.
 Additivity: Data adhere to an additive statistical model comprising fixed effects
and random errors.𝑦𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗
 Normality: The random errors are normally distributed.
 Homogeneous Variance: Random errors have homogeneous variance.
72
7.1.2 Terminologies
 Experimental Unit: This is the physical entity which can be assigned at
random to a treatment.
 Confounding: Inability of the investigator to distinguish between two
explanatory variables (𝑥1 𝑎𝑛𝑑 𝑥2 ) as the cause of the response 𝑦. In other
words, Confounding occurs when the experimental controls do not allow the
experimenter to reasonably eliminate plausible alternative explanations for an
observed relationship between independent and dependent variables.
 Interaction: Refers to a situation where the relationship between one
explanatory variable and the response variable is affected by the value of
another explanatory variable. In other words, an interaction may arise when
considering the relationship among three or more variables, and describes a
situation in which the simultaneous influence of two variables on a third is not
additive. Most commonly, interactions are considered in the context of
regression analyses.
 Blocking: Involves forming groups of units in which one or more explanatory
are held fixed while different treatments are applied to the units within the
group.
 Blocking prevents confounding due to those explanatory variables held fixed
within each block
 Improves the precision of the conclusions.
 Replication: Involves applying each treatment to more than one unit in sample.
Replication helps;
 Reduce sample error
 Estimate the precision of our conclusions
 Randomization: Is a process of assigning the treatments to the sample units
at random. Randomization helps to reduce the risk of confounding by unknown
explanatory variables.
73
7.1.3 Assumptions
The following are the assumptions of analysis of variance:
 Independence: Individual observations are mutually independent. The value

of 1 observation must not influence the value of the other observations. All
experimental units must be independent and each experimental unit must
contribute 1 response value.
 Additivity: The data adhere to an additive statistical model comprising fixed
effects and random errors 𝑦𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗
 Normality: Random errors are normally distributed. 𝜀𝑖𝑗 ~ 𝑁(0, 𝜎 2 ).
 Homogeneous Variance: Random errors have homogeneous variance. We
assume that random errors within groups have identical variance across all
treatment groups represented by the parameters.
7.1.4 Contrast Estimation
In statistics, particularly in analysis of variance and linear regression, a contrast is a

linear combination of parameters whose coefficients add up to zero. For, example,
𝑘 𝑘
𝜇1 + 𝜇2
𝐶 = ∑ 𝑐𝑖 𝜇𝑖 𝑤ℎ𝑒𝑟𝑒 ∑ 𝑐𝑖 = 0 𝑒. 𝑔. 𝜇1 − 𝜇2 , − 𝜇3
2
𝑖=1 𝑖=1
Contrast is estimated using
𝑘 𝑘
𝐶 = ∑ 𝑐𝑖 𝜇𝑖 𝑏𝑦 𝐶̂ = ∑ 𝑐𝑖 𝑦̅𝑖.
𝑖=1 𝑖=1
Now,
𝑘 𝑘 𝑘 𝑘
𝜎2 𝜎2 𝑀𝑆𝐸
̂
𝑉𝑎𝑟(𝐶 ) = 𝑉𝑎𝑟 (∑ 𝑐𝑖 𝑦̅𝑖. ) = ∑ 𝑐𝑖 2
= ∑ 𝑐𝑖 2 = ∑ 𝑐𝑖 2
𝑛 𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑖=1
Thus a 100(1 - ) % confidence interval for 𝐶 = ∑𝑘𝑖=1 𝑐𝑖 𝜇𝑖 is given by;
𝑘 𝑘
𝑀𝑆𝐸
∑ 𝑐𝑖 𝑦̅𝑖. ± 𝑡𝑛𝑘−𝑘,𝛼/2 √ ∑ 𝑐𝑖 2
𝑛
𝑖=1 𝑖=1
74
Example
An investigation was conducted to compare five brands of AAA alkaline batteries. The
response variable was the watt – hour delivered by the battery under the test
conditions. Ten batteries of each brand (A, B, C, D, and E) were purchased from a
single store and tested in a random order. The data is shown below:
A B CA B D C DE E Total
1.37 1.24 1.27
1.37 1.241.191.27 1.28
1.19 1.28 6.35
1.35 1.25 1.27
1.35 1.251.221.27 1.24 1.24
1.22 6.33
1.34 1.18 1.23
1.34 1.181.141.23 1.21 1.21
1.14 6.10
1.37 1.18 1.25
1.37 1.181.211.25 1.26 1.26
1.21 6.27
1.33 1.19 1.25
1.33 1.191.181.25 1.23 1.23
1.18 6.18
1.39 1.23 1.29
1.39 1.231.231.29 1.25 1.25
1.23 6.39
1.31 1.27 1.27
1.31 1.271.171.27 1.26
1.17 1.26 6.28
1.34 1.22 1.21 1.18 1.27 6.22
1.34 1.22 1.21 1.18 1.27
1.32 1.20 1.22 1.19 1.27 6.20
1.32 1.20 1.22 1.19 1.27
1.34 1.20 1.27 1.23 1.26 6.30
1.34 1.20 1.27 1.23 1.26
Total 13.46 12.16 12.53 11.94 12.53 62.62
(a) Write down a model for the data and estimate all the parameters in your model
(b) Are there any significant differences among battery brands
(c) Brands A and B are relatively high priced. Is there any evidence that these two
brands have more average power?
Solutions
Model
From the information above, the data that we have is for completely randomise design
(balance one way analysis of variance)
(a) 𝑦𝑖𝑗 = 𝜇 + 𝜏𝑖 + 𝜀𝑖𝑗 , 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1,2,3,4,5. 𝑎𝑛𝑑 𝑗 = 1,2,3, … ,10 𝑎𝑛𝑑 𝜀𝑖𝑗 ~ 𝑁(0, 𝜎 2 )
𝑦𝑖𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑏𝑎𝑡𝑡𝑒𝑟𝑦 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑 𝜏𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑏𝑎𝑡𝑡𝑒𝑟𝑦 𝑏𝑟𝑎𝑛𝑑 𝑒𝑓𝑓𝑒𝑐𝑡.
From the table below, we have the following parameter estimates:
𝑦1. = 13.46  𝑦̅1. = 1.346, 𝑦2. = 12.16  𝑦̅2. = 1.216
75
𝑦3. = 12.53  𝑦̅3. = 1.253, 𝑦4. = 11.94  𝑦̅4. = 1.194
𝑦5. = 12.53  𝑦̅5. = 1.253, 𝑦.. = 62.62  𝑦̅.. = 1.2524
𝜏̂ 𝑖 = 𝑦̅𝑖. − 𝑦̅..  𝜏̂1 = 𝑦̅1. − 𝑦̅.. = 0.0936 = 𝜏̂𝐴
𝜏̂2 = 𝑦̅2. − 𝑦̅.. = −0.064 = 𝜏̂ 𝐵
𝜏̂3 = 𝑦̅3. − 𝑦̅.. = 0.0006 = 𝜏̂𝐶
𝜏̂ 4 = 𝑦̅4. − 𝑦̅.. = −0.0584 = 𝜏̂ 𝐷
𝜏̂5 = 𝑦̅5. − 𝑦̅.. = 0.0006 = 𝜏̂ 𝐸
(b) 𝐻0 : 𝜏1 = 𝜏2 = 𝜏3 = 𝜏4 = 𝜏5 = 0 𝑎𝑔𝑎𝑖𝑠𝑡 𝐻1 : 𝜏𝑖 ≠ 0 𝑓𝑜𝑟 𝑎𝑡𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖.
𝑘
𝑦𝑖 . 2 𝑦.. 2 13.46 2 12.16 2 12.53 2 11.94 2 12.53 2 62.62 2
𝑆𝑆𝑇𝑟𝑡 = ∑ − = + + + + −
𝑛 𝑛𝑘 10 10 10 10 10 50
𝑖=1
= 78.56026 − 78.425288
= 𝟎. 𝟏𝟑𝟒𝟗𝟕𝟐
𝑘 𝑛
2
𝑦.. 2 2 2 2
62.62 2
𝑆𝑆𝑇 = ∑ ∑ 𝑦𝑖𝑗 − = 1.37 + 1.35 + ⋯ + 1.26 − = 𝟎. 𝟏𝟔𝟔𝟓𝟏𝟐
𝑛𝑘 50
𝑖=1 𝑗=1
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑇𝑟𝑡 = 𝟎. 𝟎𝟑𝟏𝟓𝟒
The ANOVA table below illustrates this information
Source of Variation SS df MS F - ratio

Treatment 0.134972 4 0.03374 48.143
Error 0.03154 45 0.0007
Total 0.166512 49
𝐴𝑡 5% 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒, 2.53 < 𝐹(4, 45) < 2.61  𝐹 ∗ > 𝐹(4, 45)
Alternatively,
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃(𝐹4,45 > 48.143) < 0.01  𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05
Based on these results, we reject H0 at 5% level of significance and conclude that

there are significant differences among the battery brands.
76
(c) In this case, the contrast is given by;
𝜇1 + 𝜇2 𝜇3 + 𝜇4 + 𝜇5
𝐶= −
2 3
𝑘
1 1
 𝐶̂ = ∑ 𝑐𝑖 𝑦̅𝑖. = (1.346 + 1.216) − (1.253 + 1.194 + 1.253)
2 3
𝑖=1
= 1.281 − 1.2333
= 0.048
𝑘
𝑀𝑆𝐸 0.0007 1 2 1 2 1 2 1 2 1 2
̂
𝑉𝑎𝑟(𝐶 ) = 2
∑ 𝑐𝑖 = [( ) + ( ) + (− ) + (− ) + (− ) ]
𝑛 10 2 2 3 3 3
𝑖=1
= 0.00005833
𝑆𝑒(𝐶̂ ) = √0.00005833 = 0.007637626
The hypotheses are;
𝐻0 : 𝐶 = 0, 𝐻1 : 𝐶 > 0
𝐶̂ − 0 0.048 − 0
𝑇= = = 𝟔. 𝟐𝟖𝟒𝟕
𝑆𝑒(𝐶̂ ) 0.007637626
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃(𝑇49 > 6.2847) < 0.05
 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05 𝑎𝑛𝑑 𝑠𝑜 𝑤𝑒 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑎𝑡 5% 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒.
Therefore, brands A and B have more average power.
7.1.5 Least Significance Difference (LSD) Method

When an analysis of variance (ANOVA) gives a significant result, this
indicates that at least one group differ from the other groups. Yet, the test
does not indicate which group differs. Least Significance Difference (LSD)
helps to compute the smallest significant difference between two means. A
pair of means 𝜇𝑖 and 𝜇𝑗 are significantly different if
|𝑦̅.𝑖 − 𝑦̅.𝑗 | > 𝐿𝑆𝐷 for 𝑖 ≠ 𝑗
For Randomised Design
77
1 1
𝐿𝑆𝐷 = (𝑡𝛼 , 𝑘(𝑛 − 1)) √𝑀𝑆𝐸 ( + )
2 𝑛𝑖 𝑛𝑗
If the design is balanced
2𝑀𝑆𝐸
𝐿𝑆𝐷 = (𝑡𝛼 , 𝑘(𝑛 − 1)) √
2 𝑛
For Block Design
1 1
𝐿𝑆𝐷 = (𝑡𝛼 , (𝑘 − 1)(𝑛 − 1)) √𝑀𝑆𝐸 ( + )
2 𝑛𝑖 𝑛𝑗
Example 2
An Engineer is interested in determining if the cotton weight in fibre affects the tensile
strength. She runs an experiment with five levels of cotton weight percentage and gets
the following results:
Weight of Cotton (%) Tensile Strength

15 7 7 15 11 9
20 12 17 12 18 18
25 14 18 18 19 19
30 19 25 22 19 23
35 7 10 11 15 11
(i) Is the tensile strength the same for all the five levels of cotton weight? Use
𝛼 = 0.05
(ii) Use the LSD method to make comparisons between pairs of means. Use
𝛼 = 0.05
Solutions
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4 = 𝜇5 𝑎𝑛𝑑 𝐻1 : 𝑁𝑜𝑡 𝑎𝑙𝑙 𝜇𝑖′𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙
78
Weight of Cotton (%) Tensile Strength Total
15 7 7 15 11 9 49
20 12 17 12 18 18 77
25 14 18 18 19 19 88
30 19 25 22 19 23 108
35 7 10 11 15 11 54
Total 376
𝑘 𝑛
𝑦.. 2 376 2 376 2
𝑆𝑆𝑇 = ∑ ∑ 𝑦𝑖𝑗 2 − = 72 + 122 + ⋯ + 112 − = 𝟔𝟐𝟗𝟐 −
𝑛𝑘 25 25
𝑖=1 𝑗=1
= 𝟔𝟑𝟔. 𝟗𝟔
𝑘
𝑦𝑖 . 2 𝑦.. 2 49 2 77 2 88 2 108 2 54 2 376 2
𝑆𝑆𝑇𝑟𝑡 = ∑ − = + + + + −
𝑛 𝑛𝑘 5 5 5 5 5 25
𝑖=1
= 𝟒𝟕𝟓. 𝟕𝟔
𝑆𝑆𝐸 = 636.96 − 475.76 = 𝟏𝟔𝟏. 𝟐
ANOVA table
𝑆𝑜𝑢𝑟𝑐𝑒 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑆𝑆 𝑑𝑓 𝑀𝑆 𝐹∗
𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 475.76 4 118.94 14.7568
𝐸𝑟𝑟𝑜𝑟 161.2 20 8.06
𝑇𝑜𝑡𝑎𝑙 636.96 24
Conclusion
Since 𝐹 ∗ = 14.7568 > 𝑓4,20 = 2.87, we reject 𝐻0 at 5% level of significance and

conclude that the tensile strength is not the same for the 5 levels of cotton weight (I.e.
the difference in tensile strength is statistically significant).
(b) In this case;
1 1
𝐿𝑆𝐷 = (𝑡𝛼 , 𝑘(𝑛 − 1)) √𝑀𝑆𝐸 ( + )
2 𝑛𝑖 𝑛𝑗
79
2(8.06)
= (𝑡0.975,20 ) √
5
= 2.086(1.795550055)
= 3.745517415
= 𝟑. 𝟕𝟓
Means for each of the five levels of cotton weight are:
𝑦̅1. = 9.8, 𝑦̅2. = 15.4, 𝑦̅3. = 17.6, 𝑦̅4. = 21.6 𝑎𝑛𝑑 𝑦̅5. = 10.8
Ordering means:
𝑦̅1. 𝑦̅5. 𝑦̅2. 𝑦̅3. 𝑦̅4.
9.8 10.8 15.4 17.6 21.6
|𝑦̅5. − 𝑦̅1. | = 1 < 𝐿𝑆𝐷  𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇1 𝑎𝑛𝑑 𝜇5 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
|𝑦̅2. − 𝑦̅5. | = 4.6 > 𝐿𝑆𝐷  𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇2 𝑎𝑛𝑑 𝜇5 𝑖𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
|𝑦̅3. − 𝑦̅2. | = 2.2 < 𝐿𝑆𝐷  𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇2 𝑎𝑛𝑑 𝜇3 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
|𝑦̅4. − 𝑦̅3. | = 4 > 𝐿𝑆𝐷  𝑇ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜇3 𝑎𝑛𝑑 𝜇4 𝑖𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
Conclusion
 The tensile strength of fibre at 15% cotton weight is not significantly different from
the tensile strength of fibre at 35% cotton weight.
 The tensile strength of fibre at 20% and 25% cotton weight are not significantly
different
 The tensile strength of fibre at 35% cotton weight is significantly different from the
tensile strength of fibre at 20% cotton weight.
 The tensile strength of fibre at 30% cotton weight is significantly different from the
rest.
80
7.1.6 The Latin Square Design (Three Factor Analysis)
A Latin square design is an example of an incomplete block design where there is a

single treatment and with the same number of levels. Only a single treatment is applied
within each combination of blocking variables. For example, you are interested to
compare 4 varieties of Wheat using 4 different fertilizers over a period of 4 years as
shown in the table below;
Column
Year 𝐹1 𝐹2 𝐹3 𝐹4
1 𝐴 𝐵 𝐶 𝐷
Row 2 𝐶 𝐴 𝐷 𝐵
3 𝐵 𝐷 𝐴 𝐶
4 𝐷 𝐶 𝐵 𝐴
4 Varieties 𝐴, 𝐵, 𝐶 𝑎𝑛𝑑 𝐷 (Treatments)
4 Fertilizers 𝐹1 , 𝐹2 , 𝐹3 and 𝐹4 (Columns)
4 Years 1,2,3 and 4 (Rows)
Model
𝑦𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛽𝑗 + 𝜏𝑘 + 𝜀𝑖𝑗𝑘
Where
𝑖 = 1,2,3, … , 𝑟
𝑗 = 1,2,3, … , 𝑟
𝑘 = 1,2,3, … , 𝑟
𝜇 is the overall mean
𝛼𝑖 is the effect in the 𝑖 𝑡ℎ column
𝛽𝑗 is the effect in the 𝑗 𝑡ℎ row
𝜏𝑘 is the effect in the 𝑘 𝑡ℎ treatment
81
𝜀𝑖𝑗𝑘 is the error made in the 𝑖 𝑡ℎ column and in the 𝑗 𝑡ℎ row under the 𝑘 𝑡ℎ treatment
and 𝜀𝑖𝑗𝑘 ~𝑁(0, 𝜎 2 )
𝑟 𝑟 𝑟
∑ 𝛼𝑖 = 0, ∑ 𝛽𝑗 = 0, ∑ 𝜏𝑘 = 0
𝑖=1 𝑖=1 𝑖=1
Calculations
𝑟 𝑟 𝑟
2
(𝑦… )2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 − 2
𝑟
𝑖=1 𝑖=1 𝑖=1
𝑟
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑅 = ∑ − 2
𝑟 𝑟
𝑖=1
𝑟
𝑦.𝑗2 (𝑦… )2
𝑆𝑆𝐶 = ∑ − 2
𝑟 𝑟
𝑖=1
𝑘
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑇𝑟𝑡 = ∑ − 2
𝑘 𝑟
𝑖=1
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − (𝑆𝑆𝑅 + 𝑆𝑆𝐶 + 𝑆𝑆𝑇𝑟𝑡)
ANOVA Table
Rows 𝑆𝑆𝑅 𝑟−1 𝑀𝑆𝑅 𝐹1
Columns 𝑆𝑆𝐶 𝑟−1 𝑀𝑆𝐶 𝐹2
Treatment 𝑆𝑆𝑇𝑟𝑡 𝑟−1 𝑀𝑆𝑇𝑟𝑡 𝐹3
Error 𝑆𝑆𝐸 (𝑟 − 1)(𝑟 − 2) 𝑀𝑆𝐸
Total 𝑆𝑆𝑇 𝑟2 − 1
Decision Rule
𝛼
Reject 𝐻0 if 𝐹𝑖∗ > 𝑓(𝑟−1),((𝑟−1)(𝑟−2))
82
Hypothesis
𝐻0 : 𝜏1 = 𝜏2 = 𝜏3 = ⋯ = 𝜏𝑘 = 0
𝐻1 : 𝜏𝑖 ≠ 𝜏𝑗 for some 𝑖 ≠ 𝑗
Example
The Mathematics department of a large university wishes to evaluate a teaching

capabilities of 4 professors. In order to eliminate any effects due to different
Mathematics courses and different times of the day. It was decided that they will
conduct the experiment using Latin Square Design in which letters 𝐴, 𝐵, 𝐶 and 𝐷
represent 4 professors. The data in the table below shows the grades assigned by
these professors.
Course
Time Period Algebra Geometry Statistics Calculus Total
1 𝐴 84 𝐵 79 𝐶 63 𝐷 97 323
2 𝐵 91 𝐶 82 𝐷 80 𝐴 93 346
3 𝐶 59 𝐷 70 𝐴 77 𝐵 80 286
4 𝐷 75 𝐴 91 𝐵 75 𝐶 68 309
Total 309 322 295 338 𝟏𝟐𝟔𝟒
(a) Test the hypothesis that different professors have no effect on the grades.
(b) Use LSD method to compare one grade by the 4 professors.
(c) Is it true that professor 𝐶 gives different grades compared to other professors?
83
Solutions
(a) Hypothesis
𝐻0 : 𝜏1 = 𝜏2 = 𝜏3 = 𝜏4 = 0 (Different professors have no effect on the grades)
𝐻1 : 𝜏𝑖 ≠ 𝜏𝑗 for some 𝑖 ≠ 𝑗 (Different professors have effect on the grades)
Calculations
𝐴 = 𝑦.1 = 84 + 91 + 77 + 93 = 𝟑𝟒𝟓
𝐵 = 𝑦.2 = 91 + 79 + 75 + 80 = 𝟑𝟐𝟓
𝐶 = 𝑦.3 = 59 + 82 + 63 + 68 = 𝟐𝟕𝟐
𝐷 = 𝑦.4 = 75 + 70 + 80 + 97 = 𝟑𝟐𝟐
𝑟 𝑟 𝑟
(𝑦… )2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 2 −
𝑟2
𝑖=1 𝑖=1 𝑖=1
𝑟 𝑟 𝑟
2
(𝑦… )2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 − 2
𝑟
𝑖=1 𝑖=1 𝑖=1
4 4 4
(1264)2
𝑆𝑆𝑇 = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 2 −
42
𝑖=1 𝑖=1 𝑖=1
(1264)2
𝑆𝑆𝑇 = 842 + 912 + ⋯ 682 − = 𝟏𝟕𝟑𝟖
42
𝑟
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑅 = ∑ − 2
𝑟 𝑟
𝑖=1
3232 3462 2862 3092 (1264)2

𝑆𝑆𝑅 = + + + − = 𝟒𝟕𝟒. 𝟓
4 4 4 4 42
𝑟
𝑦.𝑗2 (𝑦… )2
𝑆𝑆𝐶 = ∑ − 2
𝑟 𝑟
𝑖=1
84
3092 3222 2952 3382 (1264)2
𝑆𝑆𝐶 = + + + − = 𝟐𝟓𝟐. 𝟓
4 4 4 4 42
𝑘
𝑦.𝑖2 (𝑦… )2
𝑆𝑆𝑇𝑟𝑡 = ∑ − 2
𝑘 𝑟
𝑖=1
3452 3252 2722 3222 (1264)2

𝑆𝑆𝑇𝑟𝑡 = + + + − = 𝟕𝟐𝟑. 𝟓
4 4 4 4 42
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − (𝑆𝑆𝑅 + 𝑆𝑆𝐶 + 𝑆𝑆𝑇𝑟𝑡)
𝑆𝑆𝐸 = 1738 − (474.5 + 252.5 + 723.5) = 𝟐𝟖𝟕. 𝟓
ANOVA Table
Rows 474.5 3 158.16667 𝐹1 = 3.301
Columns 252.5 3 84.16667 𝐹2 = 1.757
Treatment 723.5 3 241.16667 𝐹3 = 5.033
Error 287.5 6 47.91667
Total 1738 15
Decision Rule
𝛼=0.05
Reject 𝐻0 if 𝐹3∗ > 𝑓3,6 = 4.75
Conclusion
𝛼=0.05
Since 𝐹3∗ = 5.033 > 𝑓3,6 = 4.75, we reject 𝐻0 at 5% level of significance and
conclude that different professors have effect on the grades.
Note
 If it was time we would have tested rows (𝐹1 ).

 If it was course we would have tested columns (𝐹2 )
85
(b) Using Least Significance Difference (LSD) Method
2𝑀𝑆𝐸
𝐿𝑆𝐷 = (𝑡0.05 ) √
2
,6 𝑛
2(47.91667)
𝐿𝑆𝐷 = (𝑡0.975,6 )√
4
2(47.91667)
𝐿𝑆𝐷 = (2.447)√
4
𝐿𝑆𝐷 = 𝟏𝟏. 𝟗𝟕𝟕
Means for each of the 4 professors:
𝑦̅1. = 86.25, 𝑦̅2. = 81.25, 𝑦̅3. = 68, 𝑦̅4. = 80.5
Ordering means:
𝑦̅1. 𝑦̅2. 𝑦̅3. 𝑦̅4.

86.25 81.25 68 80.5
Conclusion
 Professor 𝐴, 𝐵 and 𝐷 gives similar grades. This is because

|𝑦̅4. − 𝑦̅1. | < 𝐿𝑆𝐷, |𝑦̅4. − 𝑦̅2. | < 𝐿𝑆𝐷 and |𝑦̅2 − 𝑦̅1. | < 𝐿𝑆𝐷 (not significantly
different).
 Professor 𝐶 gives different grades. This is because

|𝑦̅4. − 𝑦̅3. | > 𝐿𝑆𝐷, |𝑦̅2. − 𝑦̅3. | > 𝐿𝑆𝐷 and |𝑦̅3 − 𝑦̅1. | > 𝐿𝑆𝐷 (Significantly
different).
86
(c) Using Contrast Method
𝜏1 + 𝜏2 + 𝜏4
𝐶̂ = − 𝜏3
3
86.25 + 81.25 + 80.5
𝐶̂ = − 68
3
𝐶̂ = 𝟏𝟒. 𝟔𝟔𝟕
𝑘
𝑀𝑆𝐸 47.91667 1 2 1 2 1 2
𝑉𝑎𝑟(𝐶̂ ) = ∑ 𝑐𝑖 2 = [( ) + ( ) + ( ) + (−1)2 ]
𝑛 4 3 3 3
𝑖=1
𝑉𝑎𝑟(𝐶̂ ) = 𝟏𝟓. 𝟗𝟕𝟐

95% 𝐶. 𝐼 𝑓𝑜𝑟 𝐶̂
𝐶̂ ± (𝑡0.975,6 )√𝑉𝑎𝑟(𝐶̂ )
(14.667 ± (2.447)√(15.972))
(14.667 ± 9.779)
(𝟒. 𝟖𝟗, 𝟐𝟒. 𝟒𝟓)
The hypotheses are;
𝐻0 : 𝐶 = 0 (Professor 𝐶 gives same grades compared to other professors).
𝐻1 : 𝐶 ≠ 0 (Professor 𝐶 gives different grades compared to other professors).
Conclusion
Since 0 ∉ (4.89, 24.45), we reject 𝐻0 at 5% level of significance and conclude that

professor 𝐶 gives different grades compared to other professors.
7.2 Analysis of Covariance (ANCOVA)
Analysis of Covariance is a general linear model which blends analysis of variance

and simple linear regression. ANCOVA evaluates whether population means of a
different dependent variable are equal across the levels of a categorical independent
variable called treatment.
87
Assumptions
ANCOVA has the same assumptions as any linear model.
 For each independent variable, the relationship between the dependent variable
(𝑦) and the covariate (𝑥) is linear.
 The covariate is independent of the treatment effects (𝑖. 𝑒 the covariate and the
independent variables are independent).
Differences between ANCOVA and ANOVA
 ANCOVA has a covariate while ANOVA does not have.

 ANOVA uses both linear and non-linear models while ANCOVA uses a general
linear model.
 ANCOVA removes the effect of the thickness while ANOVA does not.
Model
𝑦𝑖𝑗 = 𝜇 + 𝜏𝑖 + 𝛽(𝑋𝑖𝑗 − 𝑋̅𝑖 ) + 𝜀𝑖𝑗
Where
𝑖 = 1,2,3, … , 𝑘
𝑗 = 1,2,3, … , 𝑛
 𝑦𝑖𝑗 is the 𝑗 𝑡ℎ observation under the 𝑖 𝑡ℎ treatment.

 𝜇 is the grand mean.
 𝜏𝑖 is the effect in the 𝑖 𝑡ℎ treatment.
 𝛽 is the linear regression coefficient indicating the dependency of 𝑦𝑖𝑗 on 𝑥𝑖𝑗 .
 𝑥𝑖𝑗 is the 𝑗 𝑡ℎ observation of the covariate under the 𝑖 𝑡ℎ group.
 𝑋̅𝑖 is the 𝑖 𝑡ℎ group mean.
 𝜀𝑖𝑗 is the random error and 𝜀𝑖𝑗 ~𝑁(0, 𝜎 2 )
88
Decision Rule
𝛼
Reject 𝐻0 if 𝐹 ∗ > 𝑓(𝑘−1),(𝑁−𝑘−1)
Hypothesis
𝐻0 : 𝜏1 = 𝜏2 = 𝜏3 = ⋯ = 𝜏𝑘 = 0
𝐻1 : 𝜏𝑖 ≠ 𝜏𝑗 for some 𝑖 ≠ 𝑗
Calculations (Sum of Squares and Products)
𝐾 𝑛
2
𝑦…2
𝑆𝑆𝑇𝑌 = ∑ ∑ 𝑦𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
𝑘
𝑦.𝑖2 𝑦…2
𝑆𝑆𝑇𝑟𝑡𝑌 = ∑ −
𝑛 𝑁
𝑖=1
𝑆𝑆𝐸𝑌 = 𝑆𝑆𝑇𝑌 − 𝑆𝑆𝑇𝑟𝑡𝑌
𝐾 𝑛
2
𝑥…2
𝑆𝑆𝑇𝑋 = ∑ ∑ 𝑥𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
𝑘
𝑥.𝑖2 𝑥…2
𝑆𝑆𝑇𝑟𝑡𝑋 = ∑ −
𝑛 𝑁
𝑖=1
𝑆𝑆𝐸𝑋 = 𝑆𝑆𝑇𝑋 − 𝑆𝑆𝑇𝑟𝑡𝑋
𝐾 𝑛𝑖
(𝑥… )(𝑦… )
𝑆𝑆𝑇𝑋𝑌 = ∑ ∑(𝑥𝑖𝑗 )(𝑦𝑖𝑗 ) −
𝑁
𝑖=1 𝑗=1
𝑘
(𝑥.𝑖 )(𝑦.𝑗 ) (𝑥… )(𝑦… )
𝑆𝑆𝑇𝑟𝑡𝑋𝑌 = ∑ −−
𝑛𝑗 𝑁
𝑖=1
𝑆𝑆𝐸𝑋𝑌 = 𝑆𝑆𝑇𝑋𝑌 − 𝑆𝑆𝑇𝑟𝑡𝑋𝑌
89
Summary
Source 𝑋 𝑌 𝑋𝑌
treatment 𝑆𝑆𝑇𝑟𝑡𝑋 𝑆𝑆𝑇𝑟𝑡𝑌 𝑆𝑆𝑇𝑟𝑡𝑋𝑌
Error 𝑆𝑆𝐸𝑋 𝑆𝑆𝐸𝑌 𝑆𝑆𝐸𝑋𝑌
Total 𝑆𝑆𝑇𝑋 𝑆𝑆𝑇𝑌 𝑆𝑆𝑇𝑋𝑌
Calculations (Adjusted treatment means, errors and totals)
2
(𝑆𝑆𝑇𝑥𝑦 )
𝑆𝑆𝑇(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 −
𝑆𝑆𝑇𝑥
2
(𝑆𝑆𝐸𝑥𝑦 )
𝑆𝑆𝐸(𝑎𝑑𝑗) = 𝑆𝑆𝐸𝑦 −
𝑆𝑆𝐸𝑥
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 𝑆𝑆𝑇(𝑎𝑑𝑗) − 𝑆𝑆𝐸(𝑎𝑑𝑗)
Or
2 2
(𝑆𝑆𝐸𝑥𝑦 ) (𝑆𝑆𝑇𝑥𝑦 )
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 − 𝑆𝑆𝐸𝑦 + −
𝑆𝑆𝐸𝑥 𝑆𝑆𝑇𝑥
ANCOVA Table
Treatment 𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) 𝑘−1 𝑀𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) 𝑀𝑆𝑇𝑟𝑡(𝑎𝑑𝑗)
𝑀𝑆𝐸(𝑎𝑑𝑗)
Error 𝑆𝑆𝐸(𝑎𝑑𝑗) 𝑁−𝑘−1 𝑀𝑆𝐸(𝑎𝑑𝑗)
Total 𝑆𝑆𝑇(𝑎𝑑𝑗) 𝑁−2
90
Example
An engineer is studying the cutting effect of the cutting speed on the rate of method
removal in a machine operation. However, the rate of metal removal is also related to
the hardness of the test specimen. Five observations are taken at each cutting speed.
The amount of metal removal (𝑦) and the hardness of the specimen (𝑥) are shown in
the table below;
Cutting Speed
1000 1200 1400
𝑦 𝑥 𝑦 𝑥 𝑦 𝑥
68 120 112 165 118 175
90 140 94 140 82 132
98 150 65 120 73 124
77 125 74 125 92 141
88 136 85 133 80 130
Total 𝟒𝟐𝟏 𝟔𝟕𝟏 𝟒𝟑𝟎 𝟔𝟖𝟑 𝟒𝟒𝟓 𝟕𝟎𝟐
(a) Obtain the sum of squares and products for single factor analysis of covariance.
(b) Compute the adjusted totals, standard errors and adjusted treatment means.
(c) Analyse the data and test the hypothesis that the rate of metal removal is also
related to the hardness of the test specimen. Take 𝛼 = 5%.
(d) Use the Analysis of Variance approach to test the hypothesis that the rate of metal
removal is also related to the hardness of the test specimen. Take 𝛼 = 5%.
(e) Remove the effect of the thickness.
Solutions
(a)
𝐾 𝑛
2
𝑦…2
𝑆𝑆𝑇𝑌 = ∑ ∑ 𝑦𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
91
2
(421 + 430 + 445)2
2 2
𝑆𝑆𝑇𝑌 = 68 + 90 + ⋯ 80 − = 𝟑𝟏𝟕𝟑. 𝟔
15
𝑘
𝑦.𝑖2 𝑦…2
𝑆𝑆𝑇𝑟𝑡𝑌 = ∑ −
𝑛 𝑁
𝑖=1
4212 4302 4452 (421 + 430 + 445)2

𝑆𝑆𝑇𝑟𝑡𝑌 = + + − = 𝟓𝟖. 𝟖
5 5 5 15
𝑆𝑆𝐸𝑌 = 3173.6 − 58.8 = 𝟑𝟏𝟏𝟒. 𝟖
𝐾 𝑛
2
𝑥…2
𝑆𝑆𝑇𝑋 = ∑ ∑ 𝑥𝑖𝑗 −
𝑁
𝑖=1 𝑗=1
(671 + 683 + 702)2

𝑆𝑆𝑇𝑋 = 1202 + 1402 + ⋯ 1302 − = 𝟑𝟓𝟓𝟔. 𝟗𝟑
15
𝑘
𝑥.𝑖2 𝑥…2
𝑆𝑆𝑇𝑟𝑡𝑋 = ∑ −
𝑛 𝑁
𝑖=1
6712 6832 7022 (671 + 683 + 702)2

𝑆𝑆𝑇𝑟𝑡𝑋 = + + − = 𝟗𝟕. 𝟕𝟑
5 5 5 15
𝑆𝑆𝐸𝑋 = 3556.93 − 97.73 = 𝟑𝟒𝟓𝟗. 𝟐
𝐾 𝑛𝑖
(𝑥… )(𝑦… )
𝑆𝑆𝑇𝑋𝑌 = ∑ ∑(𝑥𝑖𝑗 )(𝑦𝑖𝑗 ) −
𝑁
𝑖=1 𝑗=1
(1296)(2056)
𝑆𝑆𝑇𝑋𝑌 = (68)(120) + (90)(140) + ⋯ + (80)(130) − = 𝟑𝟑𝟎𝟕. 𝟔
15
𝑘
(𝑥.𝑖 )(𝑦.𝑗 ) (𝑥… )(𝑦… )
𝑆𝑆𝑇𝑟𝑡𝑋𝑌 = ∑ −−
𝑛𝑗 𝑁
𝑖=1
92
(421)(671) (430)(683) (445)(702) (1296)(2056)
𝑆𝑆𝑇𝑟𝑡𝑋𝑌 = + + − = 𝟕𝟓. 𝟖
5 5 5 15
𝑆𝑆𝐸𝑋𝑌 = 3307.6 − 75.8 = 𝟑𝟐𝟑𝟏. 𝟖
Summary
Source 𝑋 𝑌 𝑋𝑌
Treatment 97.73 58.8 75.8
Error 3459.2 3114.8 3231.8
Total 3556.93 3173.6 3307.6
2
(𝑆𝑆𝑇𝑥𝑦 )
(b) 𝑆𝑆𝑇(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 − 𝑆𝑆𝑇𝑥
(3307.6)2
𝑆𝑆𝑇(𝑎𝑑𝑗) = 3173.6 − = 𝟗𝟕. 𝟖𝟓
3556.93
2
(𝑆𝑆𝐸𝑥𝑦 )
𝑆𝑆𝐸(𝑎𝑑𝑗) = 𝑆𝑆𝐸𝑦 −
𝑆𝑆𝐸𝑥
(3231.8)2
𝑆𝑆𝐸(𝑎𝑑𝑗) = 3114.8 − = 𝟗𝟓. 𝟒𝟓
3459.2
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 𝑆𝑆𝑇(𝑎𝑑𝑗) − 𝑆𝑆𝐸(𝑎𝑑𝑗)
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 97.85 − 95.45 = 𝟐. 𝟒
Or
2 2
(𝑆𝑆𝐸𝑥𝑦 ) (𝑆𝑆𝑇𝑥𝑦 )
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 𝑆𝑆𝑇𝑦 − 𝑆𝑆𝐸𝑦 + −
𝑆𝑆𝐸𝑥 𝑆𝑆𝑇𝑥
(3231.8)2 (3307.6)2
𝑆𝑆𝑇𝑟𝑡(𝑎𝑑𝑗) = 3173.6 − 3114.8 + − = 𝟐. 𝟒
3459.2 3556.93
93
(c)
ANCOVA Table
Treatment 2.4 2 1.2 0.138
Error 95.45 11 8.677
Total 97.85 13
Decision Rule
𝛼=0.05
Reject 𝐻0 if 𝐹 ∗ > 𝑓2,11 = 3.98
Hypothesis
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 (The rate of metal removal is also related to the hardness of the test
specimen).
𝐻1 : 𝜇1 ≠ 𝜇2 = 𝜇3 (The rate of metal removal is not related to the hardness of the test
specimen).
Conclusion
𝛼=0.05
Since 𝐹 ∗ = 0.138 < 𝑓2,11 = 3.98, we fail to reject 𝐻0 at 5% level of significance and
conclude that the rate of metal removal is also related to the hardness of the test
specimen.
(d) ANOVA Table
Treatment 58.8 2 29.4 0.113
Error 3114.8 12 259.6
Total 3173.6 14
94
Conclusion
𝛼=0.05
Since 𝐹 ∗ = 0.113 < 𝑓2,12 = 3.88, we fail to reject 𝐻0 at 5% level of significance and
conclude that the rate of metal removal is also related to the hardness of the test
specimen.
(e) Removing effect of the thickness

421
𝑦̅.1 = = 84.2
5
430
𝑦̅.2 = = 86
5
445
𝑦̅.3 = = 89
5
671
𝑥̅.1 = = 134.2
5
683
𝑥̅.2 = = 136.6
5
702
𝑥̅.3 = = 140.4
5
671 + 683 + 702
𝑋̅ = = 137.06667 = 𝟏𝟑𝟕. 𝟏
15
𝑆𝑆𝐸𝑋𝑌 3231.8
𝛽= = = 𝟎. 𝟗𝟑𝟒𝟑
𝑆𝑆𝐸𝑋 3459.2
𝑦̅.𝑖(𝑎𝑑𝑗) = 𝑦̅.𝑖 − 𝛽(𝑥̅.𝑖 − 𝑥̅… )

𝑦̅.1(𝑎𝑑𝑗) = 84.2 − 0.9343(134.2 − 137.1) = 𝟖𝟔. 𝟗
𝑦̅.2(𝑎𝑑𝑗) = 86 − 0.9343(136.6 − 137.1) = 𝟖𝟔. 𝟓
𝑦̅.3(𝑎𝑑𝑗) = 89 − 0.9343(140.4 − 137.1) = 𝟖𝟓. 𝟗
Hence we have removed the effect of the thickness.
95

Module 2 MAT 350

Uploaded by

Module 2 MAT 350

Uploaded by

MUKUBA UNIVERSITY

MATHEMATICAL AND APPLIED

𝜎11 𝜎12 . . . 𝜎1𝑛

5.1.1 Definition (Rectangular Array of Elements)

𝑎11 𝑎12 𝑎13 . . . 𝑎1𝑚

5.1.2 Definition (Transpose of a Matrix)

If A is an 𝑛 × 𝑚, the transpose of A denoted by 𝐴𝑇 is a matrix 𝐴𝑇 = (𝑎𝑗𝑖 ) where

5.2 Properties of Matrices

5.3 Types of Special Matrices

5.3.2 Determinant of a Matrix

5.3.3 Inverse of a Matrix

5.3.4 Idempotent Matrix

An 𝑛 × 𝑛 matrix is said to be an idempotent matrix if 𝑃𝑛 = 𝑃 for all 𝑛 ∈ ℛ.

A matrix P is said to be orthogonal if 𝑃𝑃𝑇 = 𝑃−1 𝑃 = 𝐼 𝑜𝑟 𝑃−1 = 𝑃𝑇

5.3.6 Symmetric Matrix

A symmetric matrix is a square matrix that is equal to its transpose. That is

5.4.1 Definition (Column Vector and Row Vector)

A column vector 0r column matrix is an 𝑚 × 1 matrix, that is, a matrix consisting of a

𝜆𝑋 𝑇 = [𝜆𝑥1 𝜆𝑥2 . . . 𝜆𝑥𝑛 ]

5.4.3 Nullity of a Matrix

5.4.4 Trace of a Matrix

5.5 Matrix Algebra

5.5.2 Definition (Derivative of a Quadratic Form)

𝑋 𝑇 𝐴𝑋 = 𝑎11 𝑋1 𝑋1 + 𝑎12 𝑋1 𝑋2 + 𝑎13 𝑋1 𝑋3 + 𝑎21 𝑋2 𝑋1 + 𝑎22 𝑋2 𝑋2 + 𝑎23 𝑋2 𝑋3 + 𝑎31 𝑋3 𝑋1

𝑑𝑄(𝑋) 𝑑(𝑋 𝑇 𝐴𝑋)

𝑑𝑄(𝑋) 12𝑥1 − 2𝑥2

𝒇(𝒙) = −𝟐𝒙𝟐𝟏 + 𝟒𝒙𝟏 𝒙𝟐 + 𝟐𝒙𝟐𝟐

Let 𝑌 be an n dimension random vector and 𝑋 be a 𝑝 × 𝑛 random matrix then

 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)

5.6.2 Variance-Covariance Matrix

If 𝑦1 and 𝑦2 are random variables with means 𝜇1 and 𝜇2 , then

 𝑉𝑎𝑟(𝑌1 ) = 𝐸(𝑦1 − 𝜇1 )2 = 𝐸(𝑦12 ) − 𝜇12

Let 𝑌 𝑇 = (𝑦1 , 𝑦2 , 𝑦3 , … , 𝑦𝑛 ) be a random vector with 𝐸(𝑌) = (𝜇1 , 𝜇2 , 𝜇3 , … , 𝜇𝑛 ), then

Therefore, 𝑽𝒂𝒓(𝒀) is a matrix with diagonal elements 𝑽𝒂𝒓(𝒀𝒊 ) where

 𝐶𝑜𝑣(𝑦𝑖 𝑦𝑗 ) = 𝐶𝑜𝑣(𝑦𝑖 𝑦𝑖 ) = 𝑉𝑎𝑟(𝑦𝑖 ) if 𝑖 = 𝑗 (Symmetric matrix)

𝐸(𝑦1 − 𝜇1 )(𝑦1 − 𝜇1 ) 𝐸(𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦1 − 𝜇1 )(𝑦𝑛 − 𝜇𝑛 )

𝐸(𝑦1 − 𝜇1 )2 𝐸(𝑦1 − 𝜇1 )(𝑦2 − 𝜇2 ) . . . 𝐸(𝑦1 − 𝜇1 )(𝑦𝑛 − 𝜇𝑛 )

The above matrix can also be written as

Hence, 𝑽𝒂𝒓(𝒁) = 𝑨𝑽𝒂𝒓(𝒀)𝑨𝑻 = 𝑨 ∑𝒚 𝑨𝑻

Let 𝑍1 𝑍2 𝑍3 be random variables with mean vector and Variance-Covariance

𝑌1 𝑍1 + 2𝑌3 𝑍1 + 2(3𝑍1 + 𝑍2 + 𝑍3 ) 7𝑍1 + 2𝑍2 + 2𝑍3

Let 𝑊 = 𝑃(𝑌 − 𝜇𝑦 ), then

That is 𝑊𝑖 ~𝑖𝑖𝑑 𝑁(0,1) and 𝑊 = (𝑤1 𝑤2 . . . 𝑤𝑘 )𝑇 . Replacing

𝑊 𝑇 𝑊 = ∑𝑛𝑖=1 𝑊𝑖2 which is the sum of squares of independent standard normal

5.7 Multivariate Normal Distribution

In the 𝑃 −dimension the density function becomes

Within the mean vector 𝜇 there are p (independent) parameters

6.1 Simple Linear Regression Models

6.1.3 Least Squares Estimation of 𝜷𝟎 and 𝜷𝟏

And if 𝐸(𝜀𝑖 ) = 0 is true for all 𝑖 = 1,2,3, …n, then

6.1.4 Properties of 𝜷𝟎 and 𝜷𝟏

∑ni=1(𝑥𝑖 − 𝑥̅)𝑦𝑖 − ∑ni=1(𝑥𝑖 − 𝑥̅)𝑦̅

Note that 𝛽0 ∑ni=1(𝑥𝑖 − 𝑥̅ ) = 0

𝑉𝑎𝑟(𝛽̂0 ) = 𝑉𝑎𝑟(Y̅ − β̂1 𝑥̅)

6.1.5 Confidence Interval and Hypothesis Testing

6.1.5.1 Confidence Interval

Under the assumptions of the simple linear regression model, a (1 − 𝛼)100%

A (1 − 𝛼)100% confidence interval for the parameter 𝛽0 is

Find a 95% confidence interval for 𝛽1.

Using the calculator;

Regression 𝑆𝑆𝑅𝑒𝑔 𝑘 SSReg SSReg

Residue 𝑆𝑆𝑅𝑒𝑠 𝑁−2 SSRes

Total 𝑆𝑆𝑇 𝑁−1

Regression 171413.9 1 171413.9 33.36

Residue 61667.6 12 5138.97

95% confidence interval for 𝛽1 is

(−𝟒𝟎. 𝟒𝟖, −𝟏𝟖. 𝟑𝟑)

Reject 𝐻0 if |𝑡| > 𝑡𝛼 ,𝑛 − 2

(a) Find a 95% confidence interval for 𝛽0

Regression 36464.2 1 36464.2 99.8

Residue 17173.07 47 365.38

(a) 95% confidence interval for 𝛽0

= (𝟑𝟒𝟎. 𝟗𝟑, 𝟒𝟑𝟕. 𝟔𝟏)

(a) Hypothesis for 𝜷𝟏

𝐻0 : 𝛽1 = 0 (There is no relationship between state latitude and skin cancer mortality)

𝐻1 : 𝛽1 ≠ 0 (There is a relationship between state latitude and skin cancer mortality)