t-distribution for the standardized estimators.

advertisement
Research Method
Lecture 3 (Ch4)
Inferences
©
1
Sampling distribution of OLS
estimators
We have learned that MLR.1-MLR4 will
guarantee that OLS estimators are
unbiased.
In addition, we have learned that, by
adding MLR.5, you can estimate the
variance of OLS estimators.
However, in order to conduct hypothesis
tests, we need to know the sampling
distribution of the OLS estimators.
2
To do so, we introduce one more
assumption
Assumption MLR.6
(i) The population error u is independent
of explanatory variables, x1,x2,…,xk, and
(ii) u~N(0,σ2).
3
Classical Linear Assumption
MLR.1 through MLR6 are called the classical
linear model (CLM) assumptions.
Note that MLR.6(i) automatically satisfies
MLR.4(provided E(u)=0 which we always
assume), but MLR.4 does not necessarily
indicate MLR.6(i). In this sense, MLR.4 is
redundant. However, to emphasize that we
are making additional assumption, MLR1
through MLR.6 are called CLM assumptions.
4
Theorem 4.1
Conditional on X, we have
(a)
ˆ j ~ N ( j ,Var(ˆ j ))
and
(b)
(ˆ j   j ) / sd (ˆ j ) ~ N (0,1)
Proof: See the front board
5
Hypothesis testing
Consider the following multiple linear
regression.
y=β0+β1x1+β2x2+….+βkxk+u
Now, I present a well known theorem.
6
Theorem 4.2: t-distribution for the
standardized estimators.
Under MLR1 through MLR6 (CLM
assumptions) we have
ˆ j   j
~ t n  k 1
se ( ˆ j )
This means that the standardized coefficient
follows t-distribution with n-k-1 degree of
freedom.
Proof: See the front board.
7
One-sided test
One sided test has the following form
The null hypothesis:
The alternative hypothesis:
H0: βj=0
H1: βj>0
8
Test procedure.
1.Set the significance level . Typically, it is
set at 0.05.
2.Compute the t-statistics under the H0. that
is
ˆ j   j
ˆ j
t - st at 

se ( ˆ j )
se ( ˆ j )
Note: Under H0, βj=0, so this simplified to this.
9
3. Find the cutoff number t
number is illustrated below.
n k 1,
This cutoff
T-distribution with n-k-1
degree of freedom
1 

tn-k-1,α
The cutoff number.
4. Reject the null hypothesis if the t-statistic falls
in the rejection region. This is illustrated in the
next page.
10
The illustration of the rejection decision.
T-distribution with n-k-1
degree of freedom
1 

tn-k-1,α
Rejection region.
(Reject H0)
If t-statistic falls in the rejection region, you reject the null
hypothesis. Otherwise, you fail to reject the null hypothesis.
11
Note, if you want to test if βj is negative, you have the
following null and alternative hypotheses,
H0: βj=0
H1: βj<0
Then the rejection region will be on the negative side.
Nothing else changes.

1 
-tn-k-1,α
Rejection region.
12
Example
The next slide shows the estimated result of
the log salary equation for 338 Japanese
economists. (Estimation is done by STATA.)
The estimated regression is
Log(salary)=β0+β1(female)+ δ(other variables)+u
13
SSE
Source
SSR
SS
df
MS
Model
Residual
19.7186266 11 1.79260242
9.58915481 326 .029414585
Total
29.3077814 337 .08696671
Number of obs
F( 11, 326)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
338
60.94
0.0000
0.6728
0.6618
.17151
T-statistics
SST
lsalary
female
fullprof
assocprof
experience
experiencesq
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
Coef. Std. Err.
-.0725573
.3330248
.1665502
.0214346
-.0003603
.0847564
.0051719
.0442625
.0001081
.1675923
.0751014
6.200925
t
.0258508 -2.81
.0505602
6.59
.0397755
4.19
.0042789
5.01
.0000925 -3.90
.027398
3.09
.0224497
0.23
.0310316
1.43
.0000506
2.14
.0199125
8.42
.0202832
3.70
.0412649 150.27
P>|t|
[95% Conf. Interval]
0.005
0.000
0.000
0.000
0.000
0.002
0.818
0.155
0.033
0.000
0.000
0.000
-.1234127 -.021702
.2335594 .4324903
.0883011 .2447994
.0130168 .0298524
-.0005423 -.0001783
.0308573 .1386556
-.0389927 .0493364
-.016785
.10531
8.56e-06 .0002076
.1284191 .2067654
.035199 .1150039
6.119746 6.282104
14
Q1. Test if female salary is lower than
male salary at 5% significance level (i.e., 
=0.05). That is test,
H0: β1=0
H1: β1<0
15
Two sided test
Two sided test has the following form
The null hypothesis:
The alternative hypothesis:
H0: βj=0
H1: βj≠0
16
Test procedure.
1.Set the significance level . Typically, it is
set at 0.05.
2.Compute the t-statistics under the H0. that
is
ˆ j   j
ˆ j
t - st at 

se ( ˆ j )
se ( ˆ j )
Note: Under H0, βj=0, so this simplified to this.
17
3. Find the cutoff number t
number is illustrated below.
. This cutoff
nk 1, / 2
T-distribution with n-k-1
degree of freedom
 /2
1 
-tn-k-1,α/2
 /2
tn-k-1,α/2
The cutoff number.
Rejection region
4. Reject the null hypothesis if t-statistic falls in
the rejection region above.
18
When you reject the null hypothesis βj≠0
using two sided test, we say that the
variable xj is statistically significant.
19
Exercise
Consider again the following regression
Log(salary)=β0+β1(female)+ δ(other variables)+u
This time, test if female coefficient is equal to
zero or not using two sided test at the 5%
significance level. That is, test
H0: β1=0
H1: β1≠0
20
SSE
Source
SSR
SS
df
MS
Model
Residual
19.7186266 11 1.79260242
9.58915481 326 .029414585
Total
29.3077814 337 .08696671
Number of obs
F( 11, 326)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
338
60.94
0.0000
0.6728
0.6618
.17151
SST
lsalary
female
fullprof
assocprof
experience
experiencesq
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
Coef. Std. Err.
-.0725573
.3330248
.1665502
.0214346
-.0003603
.0847564
.0051719
.0442625
.0001081
.1675923
.0751014
6.200925
t
.0258508 -2.81
.0505602
6.59
.0397755
4.19
.0042789
5.01
.0000925 -3.90
.027398
3.09
.0224497
0.23
.0310316
1.43
.0000506
2.14
.0199125
8.42
.0202832
3.70
.0412649 150.27
P>|t|
[95% Conf. Interval]
0.005
0.000
0.000
0.000
0.000
0.002
0.818
0.155
0.033
0.000
0.000
0.000
-.1234127 -.021702
.2335594 .4324903
.0883011 .2447994
.0130168 .0298524
-.0005423 -.0001783
.0308573 .1386556
-.0389927 .0493364
-.016785
.10531
8.56e-06 .0002076
.1284191 .2067654
.035199 .1150039
6.119746 6.282104
21
The p-value
The p-value is the minimum level of the
significance level ( ) at which, the
coefficient is statistically significant.
STATA program automatically compute
this value for you.
Take a look at the salary regression again.
22
SSE
Source
SSR
Model
Residual
Total
SS
df
MS
Number of obs
F( 11, 326)
Prob > F
R-squared
Adj R-squared
Root MSE
19.7186266 11 1.79260242
9.58915481 326 .029414585
29.3077814 337 .08696671
=
=
=
=
=
=
338
60.94
0.0000
0.6728
0.6618
.17151
P-values
SST
lsalary
female
fullprof
assocprof
experience
experiencesq
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
Coef. Std. Err.
-.0725573
.3330248
.1665502
.0214346
-.0003603
.0847564
.0051719
.0442625
.0001081
.1675923
.0751014
6.200925
t
.0258508 -2.81
.0505602
6.59
.0397755
4.19
.0042789
5.01
.0000925 -3.90
.027398
3.09
.0224497
0.23
.0310316
1.43
.0000506
2.14
.0199125
8.42
.0202832
3.70
.0412649 150.27
P>|t|
[95% Conf. Interval]
0.005
0.000
0.000
0.000
0.000
0.002
0.818
0.155
0.033
0.000
0.000
0.000
-.1234127 -.021702
.2335594 .4324903
.0883011 .2447994
.0130168 .0298524
-.0005423 -.0001783
.0308573 .1386556
-.0389927 .0493364
-.016785
.10531
8.56e-06 .0002076
.1284191 .2067654
.035199 .1150039
6.119746 6.282104
23
Other hypotheses about βj
You can test other hypotheses, such as
βj=1 or βj=-1. Consider the null hypothesis
β j=a
Then, all you have to do is to compute tstatistics as
ˆ j  a
t - st at 
se ( ˆ j )
Then other test procedure is exactly the
same.
24
Consider the following regression results.
Log(crime)=-6.63 + 1.27log(enroll)
(1.03) (0.11)
n=97, R2=0.585
Now, test if coefficient for log(enroll) is
equal to 1 or not using two sided test at
the 5% significance level.
25
The F-test
Testing general linear
restrictions
You are often interested in more
complicated hypothesis testing.
First, I will show you some examples of
such tests using the salary regression
example.
26
27
Example 1: Modified salary equation.
Log(salary)=β0+β1(female)
+β2(female)×(Exp>20)
+β(other variables)+u
Where (Exp>20) is the dummy variable for those
with experience greater than 20 years.
Then, it is easy to show that gender salary gap
among those with experience greater than 20
years is given by β1+β2.
Then you want to test the following
H0: β1+β2=0 H1: β1+β2≠0
28
Example 2: More on modified salary
equation.
Log(salary)=β0+β1(female)
+β2(female)×(Exp)
+β(other variables)+u
Where exp is the years of experience.
Then, if you want to show if there is a gender salary
gap at experience equal to 5, you test
H0: β1+5*β2=0
H1: β1+5*β2≠0
29
Example 3: The price of houses.
Log(price)=β0 +β1(assessed price)
+β2(lot size)
+β3(square footage)
+β4(# bedrooms)
Then you would be interested in
H0: β1=1, β2=0, β3=0, β4=0
H1: H0 is not true.
Note in this case, there are 4 equations in
H0.
30
The procedure for F-test
Linear restrictions are tested using F-test.
The general procedure can be explained
using the following example.
Y= β0+β1x1+β2x2+β3x3+β4x4+u --------------(1)
Suppose you want to test
H0: β1=1, β2=β3, β4=0
H1: H0 is not true
31
Step 1: Plug in the hypothetical values of
coefficient given by H0 in the equation 1.
Then you get
Y= β0+1*x1+β2x2+β2x3+0*x4 +u
(Y-x1)= β0+β2(x2+x3)+u ----------------------(2)
(2) Is called the restricted model. On the
other hand, the original equation (1) is
called the unrestricted model.
32
In the restricted model, the dependent
variable is (Y-x1). And now, there is only
one explanatory variable, which is (x2+x3).
Now, I can describe the testing procedure.
33
Step 1: Estimate the unrestricted model
(1), and compute SSR. Call this SSRur.
Step 2: Estimate the restricted model (2),
and compute SSR. Call this SSRr.
Step 3: Compute the F-statistics as
( SSRr  SSRur ) / q
F
SSRur /(n  k  1)
Where q is the number of equations in
H0.
q
= numerator degree of freedom
(n-k-1) =denominator degree of
freedom
34
It is know that F statistic follows the F
distribution with degree of freedom (q,nk-1). That is;
Numerator degree of freedom
F~Fq,n-k-1
Denominator degree of freedom
Step5: Set the significance level .
(Usually, it is set at 0.05)
Step 6. Find the cutoff value c, such that
P(Fq,n-k-1>c)=. This is illustrated in the
next slide.
35
Step 7: Reject if F stat falls in the rejection
region.
The density of Fq,n-k-1
1- 

c
Rejection region
The cutoff points can be found in
the table in the next slide.
36
Copyright © 2009
SouthWestern/Cengage
Learning
37
Example
Log(salary)=β0+β1(female)
+β2(female)×(Exp>20)
+ δ(other variables)+u -----(1)
Now, let us test the following
H0: β1+β2=0
H1: β1+β2≠0
38
Then, restricted model is
Log(salary)=β0
+β1[(female)-(female)×(Exp>10)]
+β(other variables)+u ------------(2)
The following slides show the estimated
results for unrestricted and restricted
models.
39
SSRur
The unrestricted model
Source
SS
df
MS
Model
Residual
19.7668781
9.54090327
12 1.64723984
325 .029356625
Total
29.3077814
337
lsalary
Coef.
female
female_exp20
fullprof
assocprof
experience
experiencesq
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0873223
.0824519
.3418992
.1713122
.0202163
-.0003409
.0888877
.0051322
.0459212
.0000922
.1661488
.0747895
6.205193
Number of obs
F( 12, 325)
Prob > F
R-squared
Adj R-squared
Root MSE
.08696671
Std. Err.
.0282769
.0643129
.0509825
.0399095
.0043791
.0000936
.02756
.0224276
.031028
.000052
.0199247
.0202647
.0413584
t
-3.09
1.28
6.71
4.29
4.62
-3.64
3.23
0.23
1.48
1.77
8.34
3.69
150.03
P>|t|
0.002
0.201
0.000
0.000
0.000
0.000
0.001
0.819
0.140
0.078
0.000
0.000
0.000
=
=
=
=
=
=
338
56.11
0.0000
0.6745
0.6624
.17134
[95% Conf. Interval]
-.1429511
-.0440702
.2416019
.0927986
.0116014
-.0005251
.0346692
-.0389894
-.0151199
-.0000102
.1269512
.0349231
6.123829
-.0316935
.2089739
.4421966
.2498259
.0288312
-.0001566
.1431062
.0492537
.1069623
.0001946
.2053464
.114656
6.286557
40
Restricted model.
Source
SS
SSRr
df
Female –Female*(Exp>20)
MS
Model
Residual
19.7666765
9.54110486
11 1.79697059
326 .029267193
Total
29.3077814
337
lsalary
Coef.
f_minus_fe20
fullprof
assocprof
experience
experiencesq
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0872393
.3424814
.1716157
.0201523
-.0003397
.0891255
.0051464
.0461026
.000091
.1659818
.0748104
6.205144
Number of obs
F( 11, 326)
Prob > F
R-squared
Adj R-squared
Root MSE
.08696671
Std. Err.
.028216
.0504192
.0396806
.0043037
.0000925
.0273684
.0223927
.0309036
.00005
.0197922
.0202322
.0412911
t
-3.09
6.79
4.32
4.68
-3.67
3.26
0.23
1.49
1.82
8.39
3.70
150.28
P>|t|
0.002
0.000
0.000
0.000
0.000
0.001
0.818
0.137
0.070
0.000
0.000
0.000
=
=
=
=
=
=
338
61.40
0.0000
0.6745
0.6635
.17108
[95% Conf. Interval]
-.1427477
.2432934
.0935533
.0116856
-.0005217
.0352845
-.0389061
-.0146931
-7.40e-06
.1270452
.0350082
6.123913
-.0317308
.4416694
.2496781
.0286189
-.0001577
.1429665
.0491989
.1068982
.0001894
.2049183
.1146126
6.286374
41
Since we have only one equation in H0, q=1.
And you can see that (n-k-1)=(338-12-1)=325
F=[(9.54110486 -9.54090327)/1]/[9.54090327/325]
=0.0068
The cutoff point at 5% significance level is 3.84.
Since F-stat does not falls in the rejection, we
fail to reject the null hypothesis. In other
words, we did not find evidence that there is a
gender gap among those with experience
42
greater than 20 years.
Copyright © 2009
SouthWestern/Cengage
Learning
43
In fact, STATA does F-test automatically.
Source
SS
df
MS
Model
Residual
19.7668781
9.54090327
12
325
1.64723984
.029356625
Total
29.3077814
337
.08696671
lsalary
Coef.
female
female_exp20
fullprof
assocprof
experience
experiencesq
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0873223
.0824519
.3418992
.1713122
.0202163
-.0003409
.0888877
.0051322
.0459212
.0000922
.1661488
.0747895
6.205193
Std. Err.
.0282769
.0643129
.0509825
.0399095
.0043791
.0000936
.02756
.0224276
.031028
.000052
.0199247
.0202647
.0413584
.
#delimit cr
delimiter now cr
.
. test female + female_exp20=0
( 1)
female + female_exp20 = 0
F(
1,
325) =
Prob > F =
t
-3.09
1.28
6.71
4.29
4.62
-3.64
3.23
0.23
1.48
1.77
8.34
3.69
150.03
Number of obs
F( 12,
325)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.002
0.201
0.000
0.000
0.000
0.000
0.001
0.819
0.140
0.078
0.000
0.000
0.000
=
=
=
=
=
=
338
56.11
0.0000
0.6745
0.6624
.17134
[95% Conf. Interval]
-.1429511
-.0440702
.2416019
.0927986
.0116014
-.0005251
.0346692
-.0389894
-.0151199
-.0000102
.1269512
.0349231
6.123829
-.0316935
.2089739
.4421966
.2498259
.0288312
-.0001566
.1431062
.0492537
.1069623
.0001946
.2053464
.114656
6.286557
After estimation, type this
command
0.01
0.9340
44
F-test for special case
The exclusion restrictions
Consider the following model
Y= β0+β1x1+β2x2+β3x3+β4x4+u -------(1)
Often you would like to test if a subset of
coefficients are all equal to zero. This type
of restriction is called `the exclusion
restrictions’.
45
Suppose you want to test if β2,β3,β4 are
jointly equal to zero. Then, you test
H0 : β2=0, β3=0, β4=0
H1: H0 is not true.
46
In this special type of F-test, the restricted
and unrestricted equations look like
Y= β0+β1x1+β2x2+β3x3+β4x4+u -------(1)
Y= β0+β1x1
+u -------(2)
In this special case, F statistic has the
following representation
(SSRr  SSRur ) / q
( Rur2  Rr2 ) / q
F

SSRur /(n  k  1) (1  Rur2 ) /(n  k  1)
Proof: See the front board.
47
When we reject this type of null
hypothesis, we say x2, x3 and x4 are jointly
significant.
48
Example of the test of
exclusion restrictions
Suppose you are estimating an salary
equations for baseball players.
Log(salary)=β0 + β1(years in league)
+β2(average games played)
+β3(batting average)
+β4(homeruns)
+β5(runs batted) +u
49
Do batting average, homeruns and runs
batted matters for salary after years in
league and average games played are
controlled for? To answer to this question,
you test
H0: β3=0, β4=0, β5=0
H1: H0 is not true.
50
Unrestricted model
Variables
Coefficient
Standard errors
Years in league
0.0689***
0.0121
Average games played
0.0126***
0.0026
Batting average
0.00098
0.0011
Homeruns
0.0144
0.016
Runs batted
0.108
0.0072
Constant
11.19***
0.29
# obs
353
R squared
0.6278
SST
181.186
As can bee seen, batting average, homeruns and runs batted do
not have statistically significant t-stat at the 5% level.
51
Restricted model
Variables
Coefficient
Standard errors
Years in league
0.0713***
0.0125
Average games played
0.0202***
0.0013
Constant
11.22***
0.11
# obs
353
R squared
0.5971
SST
198.311
The F stat is
F=[(198.311-181.186)/3]/[181.186/(353-5-1)]=10.932
The cutoff number of about 2.60. So we reject the
null hypothesis at 5% significance level. This is an
reminder that even if each coefficient is
individually insignificant, they may be jointly
significant.
52
Download