Lecture 7 - BYU Department of Economics

advertisement
Econ 388 R. Butler 2014 revisions Lecture 7
I. One more assumption about the error: Part III, normality, it is that
Assumption 5: the errors are normally distributed.
The regression model can be conveniently summarized as follows (using matrix algebra):
Y  X  u
u ~ N (0, 2 I )
where the last notation stands for “u is normally distributed with mean vector, 0, and a
variance-covariance matrix  2 I ”. Why do we make this final assumption?
There is a very useful rule that the states the following (Wooldridge, appendix B):
The sum of independent, identically distributed normal random variables is
also a normal random variable
This rule leads to the genealogy of normality in regression
1. it starts with the error term: ui .
2. Then by the useful summation rule, normality spreads to Yi because of the
regression specification Yi   0  1 X1i  ui , even though we condition on the values of
X.
3. And because Yi is normal, then the estimated slope coefficients will also be
normally distributed:
n
ˆ1 =
 (X - X)(Y - Y)  (X  X)Y  Y  (X
=
 (X  X )
)
 (X - X
i
i
i
i =1
i
n
i
 X)
2
2
i
i

 (X  X )Y
 (X  X )
I
i
2
, or
i
i =1
(X i  X)
 (X i  X ) 2
Since the estimated slope coefficient, ˆ , is a linear sum of normal random variables, Yi ,
ˆ   a Y
1
i
i
where
ai 
1
then ˆ1 will also be normally distributed (we show this here for the case of simple
regression; but from the matrix specification above, it is also clearly the case in the multiple
regression model. In summary: normality of ui  normality of Yi  normality of ̂1
result: ˆ ~ N (  ,  2ˆ )
k
k
k
We expect, because of natural sampling variability, that ˆ k will vary from
sample to sample. So if we want to test a theory about ˆ , then we need to take into
k
account this natural sampling variability, and ask what the distribution of ˆ k would look
like if the null hypothesis were true. Then we ask what the probability is that we
achieved a given sample outcome, assuming the null hypothesis is true. Our testing
procedure does NOT test whether the null hypothesis is true (such a test would be
“unconditional” in the sense that would have to know something about the “objective”
1
state of the hypothesis, presumably because we could examine the whole population and
not just a sample). Our tests are based on how likely our sample result was IF the null
hypothesis were true (hence, it’s a “conditional” test that says if the null hypothesis were
true, is this an outcome that we would expect to observe).
II. 3 tests on individual slope coefficients ( ˆ k )
A. one sided: we think that the slope is negative (in the alternative hypothesis)
null: Ho: ˆ k  0
alternative: Ha: ˆ k < 0
what is the critical value,  C , for this test?
B. one sided: we think that the slope is positive (in the alternative hypothesis)
null: Ho: ˆ k  0
alternative : Ha: ˆ k > 0
what is the critical value,  C , for this test?
C. two sided: we think that the slope is not zero (the alternative hypothesis that effect is
important, but we are not sure whether it is positive or negative)
null: Ho: ˆ k = 0
alternative: ˆ k  0
what are the critical values,  C , for this test?
D. type I and type II errors with tests on the value of slope coefficients
Suppose: Ho :  j  1 vs. Ha :  j  2 and we know (the standard error angel has
visited us) that S.E. j  .3 in either case. 1) We employ the usual 95% confidence
interval against a type I error, constructing the cutoff value for accepting the null
hypothesis (Ho). So far, we have supposed Ho is true. 2) Next, to get to the type II error,
assume the alternative hypothesis were true, and see – given the critical cut off for the
type I error that we computed supposing that Ho were true– what the size of the type II
would be.
Critical  Hypo
1) Find the critical value under Ho : Z  score95 
S .E.
Critical  1
or Critical  1.4935
Or 1.645 
.3
2)Find the likelihood of type II error (given Ha is true).
So given our critical cutoff (1.4935) computed from our test for type I error, the type II
error would be
1.4935  2.00
Z
.3
Z  1.6883
From the standard normal table this is a probability of about .0454
E. Zero is not sacred: we can generalize the nature to tests about estimated coefficients
have “cutoff” values other than 0.
2
example: null: Ho: ˆ k  1
alternative: Ha: ˆ k < 1 for the wealth/income coefficient
in a consumption regression: Consumption =  0 +  1 Permanent Income + u
III. t-tests: some examples
A. the empirical rule (an ‘approximation’ for normal and t-distributions) For bell-shaped
(mound-shaped or normal) distributions, knowing the mean and standard deviation tells
you quite a bit about the distribution. Other common shapes are skewed to the right, and
skewed to the left distributions. Occasionally, when leaving near campus, you also see
bimodal age distributions in the wards (the newly wed and nearly dead wards).
The empirical rule (appendix C on the standard normal distribution “rule of thumb”) for
mound-shaped (normal or bell-shaped) distributions:
1.Approx. 68% of the observations will be within 1 standard deviation of the mean.
2.Approx. 95% of the observations will be within 2 standard deviations of the mean.
3.Approx. 99% of the observations will be within 3 standard deviations of the mean.
The ESSENCE OF MOST STATISTICAL TESTING is to find out whether the
difference between a statistic and its hypothesized value is smaller than two
standard deviations—if so, its not statistically significant (that is, you can’t reject the
null hypothesis). An equivalent way of expressing this is to say that the probability
significance level is greater than .05 (using the 5 percent level as the cutoff value in
evaluating significance). That is, roughly,
( statisic  hypothesized _ value)
2
( s tan dard _ deviation)
when a statistic is not significant at the 5 percent level.
Suppose our statistic of interest is the ith regression coefficient, i , and we hypothesize
that its associated variable (xi) has no effect on the outcome, which is to say, that the null
hypothesis is i  0 . Letting ˆi be the statistic—the estimated value of i --and letting
the standard deviation of i be denoted as  i (the standard deviation of the estimated
coefficient is called the “standard error”), then the test for significance under the null
hypothesis is given as
( ˆi  0)

.
i
If ˆi is more than two standard deviations away from zero, we reject the null hypothesis
(in this case, we would reject the hypothesis that the coefficient is zero—i.e., we reject
the implicit idea in this null hypothesis that xi it can be dropped from the right hand side
regressors).
3
****MATH MOMENT**********************************************
When we actually go to test a hypothesis, we don’t know  (the true standard deviation),
and we have to estimate its value by s (the standard deviation we estimate from our
sample). But estimating the standard deviation (instead of simply knowing it) introduces
some additional uncertainty about the size of “empirical rule” intervals. The t-distribution
captures this additional uncertainty: it is bell shaped like the standard normal distribution
with its mean value at zero, but it has thicker tails than the standard normal. This
additional uncertainty is captured by the t-distribution. So, for example, for a small
sample of n=20--where the degrees of freedom=19-- the 95 percent confidence interval
would be plus or minus 2.093 (see the Standard Normal distribution in Wooldridge)
rather than 1.96 for the normal (which is approximately 2), where the variance is known
and doesn’t have to be estimated.
As n gets large, the t-distribution approaches the normal distribution in shape (see the
distributions in the back of the Wooldridge book). For large n (30 or more), the computer
will still compute a confidence interval based on the t-distribution, but it will be very
close to the standard normal, or z-score, distribution. You can think of them as roughly
equivalent: the t-distribution is slightly “fatter” than the normal distribution since it
accounts for the estimation of standard deviation.
Often, we will ignore this difference in the course since our sample sizes make the Zscore (standard normal) approximation good enough.
B. one sided, MEAP93.raw data (chapter 4 in Wooldridge)
Case A-type example: the impact of school size on student performance
IV. p-values: the likelihood of getting the sample outcome you obtained IF the null
hypothesis were true. (see the glossary in Wooldridge for an alternative definition)
That is what is reported in STATA and SAS (that is, the null hypothesis that these
programs automatically compute is that the respective coefficient i =0, assuming a two
tailed test).
V. practical significance does not equal statistical significance
VI. Confidence intervals (random intervals for ̂ k constructed to capture the true  k 95
percent of the time—or 90 percent, or 99 percent, depending on application)
A. Crude intervals from the empirical rule (OK guide in many cases, particularly with
large samples—for small samples, we need to probably construct the confidence interval
using t-table):
d. 68% within one standard deviation
e. 95% within two standard deviations
f. 99% within three standard deviations
More precise confidence intervals for small samples, and for obsessive people, use the ttables.
4
95% confidence interval for  k = ̂ k  c  se( ˆk ) where c is value corresponding to the
97.5 percentile for the t-distribution (p. 849, the “1-tailed, .025 significance level”
column)
III. TESTING A SINGLE LINEAR COMBINATION OF THE PARAMETERS:
Utah data: testing whether an additional year of schooling at high school, college, have an
equivalent on the log of a workers wage (following the example in Wooldridge, chapter
four, part one.) Recall example on “C Transformations” lecture 4.
ln wage   0  1 H .S .   2 coll  
let   1   2
so Ho :   0
ln wage   0  1 H .S .   2 coll  
(recall 1     2 )
  0  (   2 ) H .S .   2 coll  
  0   H .S .   2 ( H .S .  College )  
So if coefficient on HS=0 then accept null hypothesis test 1   2 .
VII. TESTING MULTIPLE LINEAR COMBINATIONS OF PARAMETERS: F-tests or
Chow tests
A. testing subsets of regressors within a regime
B. testing parameters between regimes
5
Download