Regression Analysis: Terminology and Notation: The PRF (Population Regression Function)
Regression Analysis: Terminology and Notation: The PRF (Population Regression Function)
Abbott
Consider the generic version of the simple (two-variable) linear regression model.
Yi = f (X i ) + u i = β 0 + β1X i + u i
• Observable Variables:
• Unobservable Variable:
ui ≡ the random error term for the i-th member of the population
The true population values of the regression coefficients β0 and β1 are unknown.
PRE: Yi = f ( X i ) + u i = β 0 + β1X i + u i
• The variables of the regression model are Yi, Xi, and ui.
Yi and Xi are the observable variables; their values can be observed or measured.
• β0 and β1 are the parameters of the regression model, together with any unknown
parameters of the probability distribution of the random error term ui.
The true population values of the regression coefficients β0 and β1 are unknown.
• The PRE (population regression equation) for the simple linear regression
model:
Yi = f (X i ) + u i = β0 + β1X i + u i (1a)
↑ ↑
PRF random error
f ( X i ) = β 0 + β1X i
= the PRF (population regression function) for the i-th population member
u i = Yi − f ( X i ) = Yi − β 0 − β1X i
= the random error for the i-th population member
β 0 , β1 = the unknown regression coefficients β0 and β1
Number of regression coefficients = K = 2.
Number of slope coefficients = K − 1 = 2 − 1 = 1.
• Sample Data: A random sample of N members of the population for which the
observed values of Y and X are measured. Each sample observation is of the form
• The SRE (sample regression equation) for the simple linear regression model:
Yi = f̂ (X i ) + û i = Ŷi + û i = βˆ 0 + βˆ 1X i + û i (1b)
↑ ↑
SRF residual
f̂ (X i ) = Ŷi = βˆ 0 + βˆ 1X i
= the SRF (sample regression function) for sample observation i
û i = Yi − f̂ (X i ) = Yi − Ŷi = Yi − βˆ 0 − βˆ 1X i
= the residual for sample observation i
β 0 , β1 = estimators or estimates of the regression coefficients β0 and β1
• The PRE (population regression equation) for the multiple linear regression
model is:
• Sample Data: A random sample of N members of the population for which the
observed values of Y and X1, X2, …, Xk are measured. Each sample observation
is of the form
• The SRE (sample regression equation) for the multiple linear regression model:
♦ A three-variable linear regression model has two regressors; its PRE is written
as
Yi = β 0 + β1X1i + β 2 X 2i + u i .
♦ A four-variable linear regression model has three regressors; its PRE is written
as
Yi = β 0 + β1X1i + β 2 X 2i + β3 X 3i + u i .
♦ The general multiple linear regression model has K − 1 regressors; its PRE is
written as
Yi = β 0 + β1X1i + β 2 X 2i + L + β k X ki + u i .
The PRE (population regression equation) for this model can be written as
Yi = β 0 + β1X i + u i (1)
We assume that the weekly disposable incomes of these families take only 10
distinct values -- i.e., X takes only the 10 distinct values
Xi = 80, 100, 120, 140, 160, 180, 200, 220, 240, 260.
Table 2.1: Population data points (Yi, Xi) for the population of 60 families.
Xi values → 80 100 120 140 160 180 200 220 240 260
Yi values ↓ 55 65 79 80 102 110 120 135 137 150
60 70 84 93 107 115 136 137 145 152
65 74 90 95 110 120 140 140 155 175
70 80 94 103 116 130 144 152 165 178
75 85 98 108 118 135 145 157 175 180
-- 88 -- 113 125 140 -- 160 189 185
-- -- -- 115 -- -- -- 162 -- 191
Sum Yi values 325 462 445 707 678 750 685 1043 966 1211
Number of Yi 5 6 5 7 6 6 5 7 6 7
♦ The first column gives the conditional distribution of Y for Xi = 80; five
families in the population have weekly disposable income equal to 80 dollars.
♦ The fifth column gives the conditional distribution of Y for Xi = 160; six
families in the population have weekly disposable income equal to 160 dollars.
♦ The tenth (last) column gives the conditional distribution of Y for Xi = 260;
seven families in the population have weekly disposable income equal to 260
dollars.
• Notation:
Xi values → 80 100 120 140 160 180 200 220 240 260
p( Y X i ) ↓ 1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7
1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7
1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7
1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7
1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7
-- 1/6 -- 1/7 1/6 1/6 -- 1/7 1/6 1/7
-- -- -- 1/7 -- -- -- 1/7 -- 1/7
Sum Yi values 325 462 445 707 678 750 685 1043 966 1211
Number of Yi 5 6 5 7 6 6 5 7 6 7
E( Y X i ) 65 77 89 101 113 125 137 149 161 173
The probability of observing any one family whose weekly disposable income
is Xi = 80 equals 1/5: e.g.,
1
p(Y = 55 | Xi = 80) = .
5
1
p(Y = 60 | Xi = 80) = .
5
1
p(Y = 65 | Xi = 80) = .
5
1
p(Y = 70 | Xi = 80) = .
5
1
p(Y = 75 | Xi = 80) = .
5
The probability of observing any one family whose weekly disposable income
is Xi = 160 equals 1/6: e.g.,
1
p(Y = 102 | Xi = 160) = .
6
1
p(Y = 110 | Xi = 160) = .
6
For each of the 10 population values of Xi, we can compute from Tables 2.1 and 2.2
the corresponding conditional mean value of the population values of Y.
• Notation:
E( Y X i ) = E( Y X = X i )
= the population conditional mean of Y for X = Xi
= the “expected value of Y given that X takes the specific value Xi"
• Definition:
E ( Y X i ) = E ( Y X = X i ) = ∑ p( Y X i ) Y
X =Xi
where
In words, the above formula for E( Y X i ) = E( Y X = X i ) says that for the value
Xi of X,
• Illustrative Calculations of E( Y X i ) :
E ( Y X i = 80 ) =
1 1 1 1 1
55 + 60 + 65 + 70 + 75
5 5 5 5 5
55 + 60 + 65 + 70 + 75
=
5
325
=
5
= 65
E( Y X i = 160 ) =
1 1 1 1 1 1
102 + 107 + 110 + 116 + 118 + 125
6 6 6 6 6 6
102 + 107 + 110 + 116 + 118 + 125
=
6
678
=
6
= 113
E( Y X i = 260 ) =
1 1 1 1 1 1 1
150 + 152 + 175 + 178 + 180 + 185 + 191
7 7 7 7 7 7 7
150 + 152 + 175 + 178 + 180 + 185 + 191
=
7
1211
=
7
= 173
Table 2.3
Xi E( Y X i )
80 65
100 77
120 89
140 101
160 113
180 125
200 137
220 149
240 161
260 173
Table 2.3 tabulates the relationship between E( Y X i ) and X i for this particular
population of 60 families.
Table 2.3
Xi E( Y X i )
80 65
100 77
120 89
140 101
160 113
180 125
200 137
220 149
240 161
260 173
∆E( Y X i ) 12
∆X i = 20 ⇒ ∆E( Y X i ) = 12 ⇒ = = 0.60.
∆X i 20
E( Y X i ) = β0 + β1X i .
β 0 = 17 and β1 = 0.60 .
E( Y X i ) = β0 + β1X i = 17 + 0.60 X i .
f (X i ) = E( Y X i ) = β0 + β1X i = 17 + 0.60 X i .
where
• Figure 2.1 Plot of Population Data Points, Conditional Means E(Y|X), and
the Population Regression Function PRF
Y Fitted values
200
PRF =
Weekly consumption expenditure, $
175
E(Y|X)
150
125
100
75
50
W eekly income, $
1. The small dots in Figure 2.1 constitute a scatterplot of the population values
of Y and X for the population of 60 families:
Each small dot corresponds to a single population data point of the form
(Yi, Xi) i = 1, 2, ..., 60.
2. The solid line in Figure 2.1 is the population regression line for the
population of 60 families.
Each pair of population values of ( E (Y | X i ), X i ) , is represented by a large
square dot in Figure 2.1.
This population regression line is the locus of the 10 points in Table 2.3 -- i.e.,
it connects the 10 points of the form ( E (Y | X i ), X i ) , i = 1, ..., 10.
• Definition: The unobservable random error term for the i-th population
member is denoted as ui and defined as
u i = Yi − E( Y X i ) ∀ i.
For each population member -- for each of the 60 families in our hypothetical
population -- the random error term ui equals the deviation of that population
member's individual Yi value from the population conditional mean value of Y for
the corresponding value Xi of X.
Terminology: The random error term ui is also known as the stochastic error term,
the random disturbance term, or the stochastic disturbance term
Yi = E( Y X i ) + u i
= β 0 + β1X i + u i since E( Y X i ) = β0 + β1X i .
(1) E( Y X i ) = β0 + β1X i
= the population conditional mean of Y for X = Xi
= the mean weekly consumption expenditure for all families in
the population who have weekly disposable income X = Xi.
(2) u i = the random error term for the i-th population member
= Yi − E( Y X i )
= the deviation of family i’s weekly consumption expenditure Yi
from the population mean value E(Y | Xi) of all families in the
population that have the same weekly disposable income X = Xi.
Implication 2: The population conditional mean value of the random error terms for
each population value Xi of X equals 0 -- i.e.,
E( u i X i ) = 0 ∀ i.
Proof:
The random error terms represent all the unknown and unobservable
variables other than X that determine the individual population values Yi of
the dependent variable Y.
Ŷi = βˆ 0 + βˆ 1X i (i = 1, ..., N)
where
1. The sample observations {(Yi, Xi): i = 1, ..., N} are typically a small subset of
the parent population of all population data points (Yi, Xi).
Sample size N is much smaller than the number of population data points.
2. Each random sample from a given population yields one estimate of the PRF
-- i.e., one estimate of the numerical value of β0, and one estimate of the
numerical value of β1.
• Important Point 2: Each random sample from the same population yields a
different SRF -- i.e., a different numerical value of β̂ 0 , and a different numerical
value of β$ .
1
Sample 1 Sample 2
Xi Yi Xi Yi
80 70 80 55
100 65 100 88
120 90 120 90
140 95 140 80
160 110 160 118
180 115 180 120
200 120 200 145
220 140 220 135
240 155 240 145
260 150 260 175
Because the two samples contain different Yi values for the 10 Xi values, they
will yield different SRFs -- a different numerical value of β̂ 0 , and a different
numerical value of β$ .
1
where the Sample 1 coefficient estimates are β̂ 0 (1) = 24.46 and β$ 1 (1) = 0.5091
where the Sample 2 coefficient estimates are β̂ 0 (2) = 17.17 and β$ 1 (2) = 0.5761
• Figure 2.2 Plot of Sample Data Points and Sample Regression Functions
for Random Samples 1 and 2
SRF1 is the flatter regression line, SRF2 is the steeper regression line.
Important Points:
(1) Neither of these SRFs is identical to the true PRF. Each is merely an
approximation to the true PRF.
(2) How good an approximation any SRF provides to the true PRF depends on
how the SRF is constructed from sample data -- i.e., on the properties of the
coefficient estimators β̂ 0 and β$ 1 .
Y1 Y2
200
Weekly consumption expenditure, $
175
150
125
100
75
50
W eekly income, $
where
• Interpretation of the SRE: The SRE represents each sample value of Y -- each Yi
value -- as the sum of two components:
(1) the estimated (or predicted) value of Y for each sample value Xi of X, i.e.,
Compare the Population and Sample Regression Equations: the PRE and SRE
Yi = f (X i ) + u i = E(Yi X i ) + u i = β0 + β1X i + u i
Yi = Ŷi + û i = βˆ 0 + βˆ 1X i + û i
(Yi, Xi)
Yi • SRF
Ŷi PRF
E (Y | X i )
Xi X
(Yi, Xi)
Yi • SRF
Ŷi PRF
E (Y | X i )
Xi X
At X = Xi: