Econ 399 Chapter4c

advertisement
4.3 Confidence Intervals
-Using our CLM assumptions, we can construct
CONFIDENCE INTERVALS or CONFIDENCE
INTERVAL ESTIMATES of the form:
CI  ˆ j  t * se( ˆ j )
-Given a significance level α (which is used to
determine t*), we construct 100(1- α)%
confidence intervals
-Given random samples, 100(1- α)% of our
confidence intervals contain the true value Bj
-we don’t know whether an individual confidence
interval contains the true value
4.3 Confidence Intervals
-Confidence intervals are similar to 2-tailed tests
in that α/2 is in each tail when finding t*
-if our hypothesis test and confidence interval
use the same α:
1) we can not reject the null hypothesis (at the
given significance level) that Bj=aj if aj is
within the confidence interval
2) we can reject the null hypothesis (at the given
significance level) that Bj=aj if aj is not within
the confidence interval
4.3 Confidence Example
-Going back to our Pepsi example, we now look
at geekiness:
Coˆol  4.3 0.3 Geek  0.5 Pepsi
2.1
R  0.62
2
0.25
0.21
N  43
-From before our 2-sided t* with α=0.01 was
t*=2.704, therefore our 99% CI is:
CI  ˆ j  t * se( ˆ j )
CI  0.3  2.704(0.25)
CI  [0.376,0.976]
4.3 Confidence Intervals
-Remember that a CI is only as good as the 6
CLM assumptions:
1) Omitted variables cause the estimates (Bjhats)
to be unreliable
-CI is not valid
2) If heteroskedasticity is present, standard error
is not a valid estimate of standard deviation
-CI is not valid
3) If normality fails, CI MAY not be valid if our
sample size is too small
4.4 Complicated Single Tests
-In this section we will see how to test a single
hypothesis involving more than one Bj
-Take again our coolness regression:
Coˆol  4.3 0.3 Geek  0.5 Pepsi
2.1
R  0.62
2
0.25
0.21
N  43
-If we wonder if geekiness has more impact on
coolness than Pepsi consumption:
H 0 : 1   2
H a : 1   2
4.4 Complicated Single Tests
-This test is similar to our one coefficient tests,
but our standard error will be different
-We can rewrite our hypotheses for clarity:
H 0 : 1   2  0
H a : 1   2  0
-We can reject the null hypothesis if the
estimated difference between B1hat and B2hat
is positive enough
4.4 Complicated Single Tests
-Our new t statistic becomes:
ˆ1  ˆ2
t
se( ˆ1  ˆ2 )
-And our test continues as before:
1) Calculate t
2) Pick α and calculate t*
3) Reject if t<t*
4.4 Complicated Standard Errors
-The standard error in this test is more
complicated than before
-If we simply subtract standard errors, we
may end up with a negative value
-this is theoretically impossible
-se must always be positive since it
estimates standard deviations
4.4 Complicated Standard Errors
-Using the properties of variances, we know that:
Var ( ˆ1  ˆ2 )  Var ( ˆ1 )  Var ( ˆ2 )  2Cov( ˆ1 , ˆ2 )
-Where the variances are always added and the
covariance always subtracted
-transferring to standard deviation, this becomes:
2
2
ˆ
ˆ
ˆ
ˆ
se(1   2 )  {se(1 )}  {se( 2 )}  2s12
-Where s12 is an estimate of the covariance
between coefficients
-s12 can either be calculated using matrix algebra
or be supplied by econometrics programs
4.4 Complicated Standard Errors
-To see how to find this standard error, take our
typical regression:
y   0  1 x1   2 x2  3 x3  u
-and consider the related equation where
θ=B1-B2 or B1= θ+B2:
y   0  (   2 ) x1   2 x2   3 x3  u
y   0  x1   2 ( x1  x2 )   3 x3  u
-where x1 and x1 could be related concepts (ie:
sleep time and naps) and x3 could be relatively
unrelated (ie: study time)
4.4 Complicated Standard Errors
-By running this new regression, we can find the
standard error for our hypothesis test
-using an econometric program is easier
-Empirically:
1) B0 and se(B0) are the same for both
regressions
2) B2 and B3 are the same for both regressions
3) Only B1 (the coefficient of θ) changes
-given this new standard error, CI’s are created as
normal
4.5 Testing Multiple Restrictions
-Thus far we have tested whether a SINGLE
variable is significant, or how two different
variable’s impacts compare
-In this section we will test whether a SET of
variables are significant; have a partial effect
on the dependent variable
-Even though a group of variables may be
individually insignificant, they may be
significant as a group due to multicollinearity
4.5 Testing Multiple Restrictions
-Consider our general true model and an example
measuring reading week utility (rwu):
y   0  1 x1   2 x2   3 x3  u
rwu   0  1ski   2trips   3homework  u
-we want to test the hypothesis that B1 and B2
equal zero at the same time, that x1 and x1
have no partial effect simultaneously:
H 0 : 1  0,  2  0
-in our example, we are testing that positive
activities have no effect on r.w. utility
4.5 Testing Multiple Restrictions
-our null hypothesis had two EXCLUSION
RESTRICTIONS
-this set of MULTIPLE RESTRICTIONS is tested
using a MULTIPLE HYPOTHESIS TEST or JOINT
HYPOTHESIS TEST
-the alternate hypothesis is unique:
H a : H 0 is not true
-note that we CANNOT use individual t tests to
test this multiple restriction; we need to test
the restriction jointly
4.5 Testing Multiple Restrictions
-to test joint significance, we need to use SSR
and R squared values obtained from two
different regressions
-we know that SSR increases and R2 decreases
when variable are dropped from the model
-in order to conduct our test, we need to regress
two models:
1) An UNRESTRICTED model with all of the
variables
2) A RESTRICTED MODEL that excludes the
variables in the test
4.5 Testing Multiple Restrictions
-Given a hypothesis test with q restrictions, we
have the following regressions:
H 0 :  k q 1  0,.....,  k  0
(4.35)
y   0  1 x1   2 x2  ...   k xk  u (4.34)
y   0  1 x1   2 x2  ...   k q xk q  u (4.36)
-Where 4.34 is the UNRESTIRCTED MODEL giving
us SSRu and 4.35 is the RESTRICTED MODEL
giving us SSRr
4.5 Testing Multiple Restrictions
-These SSR values combine to give us our F
STATISTIC or TEST F STATISTIC:
( SSRr  SSRur ) / q
F
SSRur /( n  k  1)
(4.37)
-Where q is the number of restrictions in the null
hypothesis and q=numerator degrees of freedom
-n-k-1=denominator degrees of freedom (the
denominator is the unbiased estimator of σ2)
-since SSRr≥SSRur, F is always positive
4.5 Testing Multiple Restrictions
-Once can think of our test F stat as measuring
the relative increase in SSR from moving from
the unrestricted model to restricted
-a large F indicates that the excluded variables
have much explanatory power
-using Ho and our CLM assumptions, we know
that F has an F distribution with q, n-k-1
degrees of freedom: F~Fq, n-k-1
-we obtain F* from F tables and reject Ho if:
F  F*
4.5 Multiple Example
-Given our previous example of reading week
utility, a restricted and unrestricted model give
us: rwˆ u  15.9 2.0 ski  3.0 trips  0.5 homework
( 4.3)
N  572
( 0.9 )
(1.3)
( 0.12)
SSR  175
rwˆ u  17.6 0.6 homework
( 6.3)
N  572
( 0.17)
SSR  141
-Which correspond to the hypotheses:
H 0 :  2  0,  3  0
H a : H 0 is not true
4.5 Multiple Example
-We use these SSR to construct a test statistic:
( SSRr  SSRur ) / q
F
SSRur /( n  k  1)
(175 - 141)/2
F
 68.6
141 /(572  3  1)
-given α=0.05, F*2,569=3.00
-since F>F*, reject H0 at a 95% confidence level;
positive activities have an impact on reading
week utility
4.5 Multiple Notes
-Once the degrees of freedom in F’s denominator
reach about 120, the F distribution is no longer
sensitive to it
-hence the infinity entry in the F table
-if H0 is rejected, the variables in question are
JOINTLY (STATISTICALLY) SIGNIFICANT at the
given alpha level
-if H0 is not rejected the variables in question are
JOINTLY INSIGNIFICANT at the alpha level
-an F test can often be not rejected when
individual t tests are rejected due to
multicollinearity
4.5 F, t’s secret identity?
-the F statistic can also be used to test
significance of a single variable
-in this case, q=1
-it can be shown that F=t2 in this case
-or t2n-k-1 ~F1, n-k-1
-this only applies to two-sided tests
-therefore t statistic is more flexible since it
allows for one-sided tests
-the t statistic is always best suited for testing a
single hypothesis
4.5 F tests and abuse
-we have already seen where individually
insignificant variables may be jointly significant
due to multicollinearity
-a significant variable can also prove to be jointly
insignificant if grouped with enough
insignificant variables
-an insignificant variable can also prove to be
significant if grouped with significant variables
-therefore t tests are much better than F tests at
determining individual significance
4.5 R2 and F
-While SSR can be large, R2 is bounded, often
making it an easier way to calculate F:
( R ur  R r ) / q
F
(4.41)
2
(1  R ur ) /( n  k  1)
2
2
-Which is also called the R-SQUARED FORM OF
THE F STATISTIC
-since R2ur>R2r, F is still always positive
-this form is NOT valid for testing all linear
restrictions (as seen later)
4.5 F and p-values
-similar to t-tests, F tests can produce p-values
which are defined as:
p - value  P(F*  F) (4.43)
-the p-value is the “probability of observing a
value of F at least as large as we did, given that
the null hypothesis is true”
-a small p-value is therefore evidence against H0
-as before, reject H0 if p>α
-p-values can give us a more complete view of
significance
4.5 Overall significance
-Often it is valid to test if the model is significant
overall
-the hypothesis that NONE of the explanatory
variables have an effect on y is given as:
H 0 : 1   2  .... k  0 (4.44)
-as before with multiple restrictions, we compare
against the restricted model:
y   0  u (4.45)
4.5 Overall significance
-Since our restricted model has no independent
variables, its R2 is zero and our F formula
simplifies to:
2
R /k
F
(4.46)
2
(1  R ) /( n  k  1)
-Which is only valid for this special test
-this test determines the OVERALL
SIGNIFICANCE OF THE REGRESSION
-if this tests fails, we need to find other
explanatory variables
4.5 Testing General Linear Restrictions
-Sometimes economic theory (generally using
elasticity) requires us to test complicated joint
restrictions, such as:
H 0 : 1  0,  2  1, 3  2
-Which expects our model:
y   0  1 x1   2 x2  3 x3  u
-To be of the form:
y   0  0 x1  1x2  2 x3  u
4.5 Testing General Linear Restrictions
-We rewrite this expected model to obtain a
restricted model:
y  1x2  2 x3   0  u
-We then calculate the F statistic using the SSR
formula
-note that since the dependent variable changes
between the two models, the R2 F formula is
not valid in this case
-note that the number of restrictions (q) is simply
equal to the number of = in the null hypothesis
4.6 Reporting Regression Results
-When reporting single regressions, the proper
reporting method is:
ln( Taˆste) i  3.7 0.2 ln( Time)i  1.4 ln( Skill ) i
( 0.9 )
( 0.15)
( 0.78)
R 2  0.41
N  143
-where R2, estimated coefficients, and N MUST
be reported (note also the ^ and i’s)
-either standard errors or t-values must also be
reported (se is more robust for tests other than
Bk=0)
-SSR and standard error of the regression can
also be reported
4.6 Reporting Regression Results
-When multiple, related regressions are run
(often to test for joint significance), the results
can be expressed in table format, as seen on the
next slide
-whether a simple or table reporting method is
done, the meanings and scaling of all the
included variables must always be explained in a
proper project
Ie: price: average price, measured weekly, in
American dollars
College: Dummy Variable. 0 if no college
education, 1 if college education
4.6 Reporting Regression Results
Dependent variable: Midterm readiness
Ind. variables
1
Study Time
0.47
(0.12)
Intellect
1.89
(1.7)
2.36
(1.4)
Intercept
2.5
(0.03)
2.8
(0.02)
33
0.48
33
0.34
Observations
R2
2
Download