Two-Variable Regression, Interval Estimation and Hypothesis Testing
Two-Variable Regression, Interval Estimation and Hypothesis Testing
• The theory of estimation consists of two parts: point estimation and interval
estimation. We have discussed point estimation thoroughly in the previous
two chapters. In this chapter we first consider interval estimation and then
take up the topic of hypothesis testing, a topic related to interval estimation.
INTERVAL ESTIMATION: SOME BASIC IDEAS
• where the se (βˆ2) now refers to the estimated standard error. Therefore,
instead of using the normal distribution, we can use the t distribution to
establish a confidence interval for β2 as follows:
• Pr (−tα/2 ≤ t ≤ tα/2) = 1 − α (5.3.3)
• where tα/2 is the value of the t variable obtained from the t distribution for α/2
level of significance and n − 2 df; it is often called the critical t value at α/2
level of significance.
• Substitution of (5.3.2) into (5.3.3) Yields
• How “good” is the fitted model? We need some criteria with which to
answer this question.
• First, are the signs of the estimated coefficients in accordance with
theoretical or prior expectations? e.g., the income consumption model
should be positive.
• Second, if theory says that the relationship should also be statistically
significant. The p value of the estimated t value is extremely small.
• Third, how well does the regression model explain variation in the
consumption expenditure? One can use r2 to answer this question, which is
a very high.
• There is one assumption that we would like to check, the normality of the
disturbance term, ui.
• Normality Tests
• Although several tests of normality are discussed in the literature, we will
consider just three:
• (1) histogram of residuals;
• (2) normal probability plot (NPP), a graphical device; and
• (3) the Jarque–Bera test.
• Histogram of Residuals.
• A histogram of residuals is a simple graphic device that is used to learn
something about the shape of the probability density function PDF of a
random variable.
• If you mentally superimpose the bell shaped normal distribution curve on
the histogram, you will get some idea as to whether normal (PDF)
approximation may be appropriate.
• Normal Probability Plot.
• A comparatively simple graphical device is the normal probability plot
(NPP). If the variable is in fact from the normal population, the NPP will be
approximately a straight line. The NPP is shown in Figure 5.7. We see that
residuals from our illustrative example are approximately normally
distributed, because a straight line seems to fit the data reasonably well.
• Jarque–Bera (JB) Test of Normality.
• The JB test of normality is an asymptotic, or large-sample, test. It is also
based on the OLS residuals. This test first computes the skewness and
kurtosis, measures of the OLS residuals and uses the following test statistic:
• JB = n[S2 / 6 + (K − 3)2 / 24] (5.12.1)
• where n = sample size, S = skewness coefficient, and K = kurtosis coefficient.
For a normally distributed variable, S = 0 and K = 3. In that case the value
of the JB statistic is expected to be 0.
• The JB statistic follows the chi-square distribution with 2 df.
• If the computed p value of the JB statistic in an application is sufficiently
low, which will happen if the value of the statistic is very different from 0,
one can reject the hypothesis that the residuals are normally distributed. But if
the p value is reasonably high, which will happen if the value of the statistic
is close to zero, we do not reject the normality assumption.
• The sample size in our consumption–income example is rather small. If we
mechanically apply the JB formula to our example, the JB statistic turns
out to be 0.7769. The p value of obtaining such a value from the chi-square
distribution with 2 df is about 0.68, which is quite high. In other words, we
may not reject the normality assumption for our example. Of course, bear
in mind the warning about the sample size.
A CONCLUDING EXAMPLE
• Let us return to Example 3.2 about food expenditure in India. Using the
data given in (3.7.2) and adopting the format of (5.11.1), we obtain the
following expenditure equation:
• FoodExpˆi = 94.2087 + 0.4368 TotalExpi
• se = (50.8563) (0.0783)
• t= (1.8524) (5.5770)
• p= (0.0695) (0.0000)*
• r 2 = 0.3698; df = 53
• F1,53 = 31.1034 (p value = 0.0000)*
• As expected, there is a positive relationship between expenditure on food
and total expenditure. If total expenditure went up by a rupee, on average,
expenditure on food increased by about 44 paise.
• If total expenditure were zero, the average expenditure on food would be
about 94 rupees.
• The r2 value of about 0.37 means that 37 percent of the variation in food
expenditure is explained by total expenditure, a proxy for income.
• Suppose we want to test the null hypothesis that there is no relationship
between food expenditure and total expenditure, that is, the true slope
coefficient β2 = 0.
• The estimated value of β2 is 0.4368. If the null hypothesis were true, what is
the probability of obtaining a value of 0.4368? Under the null hypothesis,
we observe from (5.12.2) that the t value is 5.5770 and the p value of
obtaining such a t value is practically zero. In other words, we can reject the
null hypothesis. But suppose the null hypothesis were that β2 = 0.5. Now
what? Using the t test we obtain:
• t = 0.4368 − 0.5 / 0.0783 = −0.8071
• The probability of obtaining a |t | of 0.8071 is greater than 20 percent.
Hence we do not reject the hypothesis that the true β2 is 0.5.
• Notice that, under the null hypothesis, the true slope coefficient is zero.
• The F value is 31.1034. Under the same null hypothesis, we obtained a t
value of 5.5770. If we square this value, we obtain 31.1029, which is about
the same as the F value, again showing the close relationship between the t
and the F statistic.
• Using the estimated residuals from the regression, what can we say about
the probability distribution of the error term? The information is given in
Figure 5.8. As the figure shows, the residuals from the food expenditure
regression seem to be symmetrically distributed.
• Application of the Jarque–Bera test shows that the JB statistic is about
0.2576, and the probability of obtaining such a statistic under the normality
assumption is about 88 percent. Therefore, we do not reject the hypothesis
that the error terms are normally distributed. But keep in mind that the
sample size of 55 observations may not be large enough.
END
NEXT