Uploaded by Marcello Davico

Econometrics solutions

advertisement
Chapter 1 Economic Questions and Data
1.1 Multiple Choice
1) Analyzing the behavior of unemployment rates across U.S. states in March of 2006 is an example of using
A) time series data.
B) panel data.
C) cross-sectional data.
D) experimental data.
Answer: C
2) Studying inflation in the United States from 1970 to 2006 is an example of using
A) randomized controlled experiments.
B) time series data.
C) panel data.
D) cross-sectional data.
Answer: B
3) Analyzing the effect of minimum wage changes on teenage employment across the 48 contiguous U.S. states
from 1980 to 2004 is an example of using
A) time series data.
B) panel data.
C) having a treatment group vs. a control group, since only teenagers receive minimum wages.
D) cross-sectional data.
Answer: B
4) Panel data
A) is also called longitudinal data.
B) is the same as time series data.
C) studies a group of people at a point in time.
D) typically uses control and treatment groups.
Answer: A
5) Econometrics can be defined as follows with the exception of
A) the science of testing economic theory.
B) fitting mathematical economic models to real-world data.
C) a set of tools used for forecasting future values of economic variables.
D) measuring the height of economists.
Answer: D
6) To provide quantitative answers to policy questions
A) it is typically sufficient to use common sense.
B) you should interview the policy makers involved.
C) you should examine empirical evidence.
D) is typically impossible since policy questions are not quantifiable.
Answer: C
7) An example of a randomized controlled experiment is when
A) households receive a tax rebate in one year but not the other.
B) one U.S. state increases minimum wages and an adjacent state does not, and employment differences are
observed.
C) random variables are controlled for by holding constant other factors.
D) some 5 th graders in a specific elementary school are allowed to use computers at school while others are
not, and their end-of-year performance is compared holding constant other factors.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 1
8) Ideal randomized controlled experiments in economics are
A) often performed in practice.
B) often used by the Federal Reserve to study the effects of monetary policy.
C) useful because they give a definition of a causal effect.
D) sometimes used by universities to determine who graduates in four years rather than five.
Answer: C
9) Most economic data are obtained
A) through randomized controlled experiments.
B) by calibration methods.
C) through textbook examples typically involving ten observation points.
D) by observing real-world behavior.
Answer: D
10) One of the primary advantages of using econometrics over typical results from economic theory, is that
A) it potentially provides you with quantitative answers for a policy problem rather than simply suggesting
the direction (positive/negative) of the response.
B) teaching you how to use statistical packages
C) learning how to invert a 4 by 4 matrix.
D) all of the above.
Answer: A
11) In a randomized controlled experiment
A) there is a control group and a treatment group.
B) you control for the effect that random numbers are not truly randomly generated
C) you control for random answers
D) the control group receives treatment on even days only.
Answer: A
12) The reason why economists do not use experimental data more frequently is for all of the following reasons
except that real-world experiments
A) cannot be executed in economics.
B) with humans are difficult to administer.
C) are often unethical.
D) have flaws relative to ideal randomized controlled experiments.
Answer: A
13) The most frequently used experimental or observational data in econometrics are of the following type:
A) cross-sectional data.
B) randomly generated data.
C) time series data.
D) panel data.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 2
14) In the graph below, the vertical axis represents average real GDP growth for 65 countries over the period
1960-1995, and the horizontal axis shows the average trade share within these countries.
This is an example of
A) cross-sectional data.
B) experimental data.
C) a time series.
D) longitudinal data.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 3
15) The accompanying graph
Is an example of
A) cross-sectional data.
B) experimental data.
C) a time series.
D) longitudinal data.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 4
16) The accompanying graph
is an example of
A) experimental data.
B) cross-sectional data.
C) a time series.
D) longitudinal data.
Answer: C
1.2 Essays
1) Give at least three examples from economics where each of the following type of data can be used:
cross-sectional data, time series data, and panel data.
Answer: Answers will vary by student. At this level of economics, students most likely have heard of the
following use of cross-sectional data: earnings functions, growth equations, the effect of class size
reduction on student performance (in this chapter), demand functions (in this chapter: cigarette
consumption); time series: the Phillips curve (in this chapter), consumption functions, Okun s law; panel
data: various U.S. state panel studies on road fatalities (in this book), unemployment rate and
unemployment benefits variations, growth regressions (across states and countries), and crime and
abortion (Freakonomics).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 5
Chapter 2 Review of Probability
2.1 Multiple Choice
1) The probability of an outcome
A) is the number of times that the outcome occurs in the long run.
B) equals M × N, where M is the number of occurrences and N is the population size.
C) is the proportion of times that the outcome occurs in the long run.
D) equals the sample mean divided by the sample standard deviation.
Answer: C
2) The probability of an event A or B (Pr(A or B)) to occur equals
A) Pr(A) × Pr(B).
B) Pr(A) + Pr(B) if A and B are mutually exclusive.
Pr(A)
C)
.
Pr(B)
D) Pr(A) + Pr(B) even if A and B are not mutually exclusive.
Answer: B
3) The cumulative probability distribution shows the probability
A) that a random variable is less than or equal to a particular value.
B) of two or more events occurring at once.
C) of all possible events occurring.
D) that a random variable takes on a particular value given that another event has happened.
Answer: A
4) The expected value of a discrete random variable
A) is the outcome that is most likely to occur.
B) can be found by determining the 50% value in the c.d.f.
C) equals the population median.
D) is computed as a weighted average of the possible outcome of that random variable, where the weights
are the probabilities of that outcome.
Answer: D
5) Let Y be a random variable. Then var(Y) equals
A) E[Y - Y)2 ].
B) E (Y - Y) .
C) E (Y - )2 .
Y
D) E (Y - Y) .
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 6
6) The skewness of the distribution of a random variable Y is defined as follows:
E (Y3 - Y)
A)
2
Y
B) E (Y 3
Y)
E Y3 C)
D)
3
Y
3
Y
E (Y 3
Y)
3
Y
Answer: D
7) The skewness is most likely positive for one of the following distributions:
A) The grade distribution at your college or university.
B) The U.S. income distribution.
C) SAT scores in English.
D) The height of 18 year old females in the U.S.
Answer: B
8) The kurtosis of a distribution is defined as follows:
4
E YY
A)
4
Y
E Y4 B)
C)
4
Y
2
Y
skewness
var(Y)
D) E[(Y - Y)4 )
Answer: A
9) For a normal distribution, the skewness and kurtosis measures are as follows:
A) 1.96 and 4
B) 0 and 0
C) 0 and 3
D) 1 and 2
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 7
10) The conditional distribution of Y given X = x, Pr(Y = y X=x), is
Pr(Y = y)
A)
.
Pr(X = x)
l
Pr(X = x i, Y = y).
i=1
Pr(X = x, Y = y)
C)
Pr(Y = y)
B)
D)
Pr(X = x, Y = y)
.
Pr(X = x)
Answer: D
11) The conditional expectation of Y given X, E(Y X = x), is calculated as follows:
k
A)
Yi Pr(X = x i Y= y)
i=1
B) E E(Y X)]
k
C)
y i Pr(Y = y i X= x)
i=1
l
D)
E(Y X= x i) Pr(X = x i)
i=1
Answer: C
12) Two random variables X and Y are independently distributed if all of the following conditions hold, with the
exception of
A) Pr(Y = y X = x) = Pr(Y = y).
B) knowing the value of one of the variables provides no information about the other.
C) if the conditional distribution of Y given X equals the marginal distribution of Y.
D) E(Y) = E[E(Y X)].
Answer: D
13) The correlation between X and Y
A) cannot be negative since variances are always positive.
B) is the covariance squared.
C) can be calculated by dividing the covariance between X and Y by the product of the two standard
deviations.
cov(X, Y)
.
D) is given by corr(X, Y) =
var(X) var(Y)
Answer: C
14) Two variables are uncorrelated in all of the cases below, with the exception of
A) being independent.
B) having a zero covariance.
C)
XY
2
X
2
Y.
D) E(Y X) = 0.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 8
15) var(aX + bY) =
A) a2
2
2
X+ b
2
Y.
B) a2
2
2
X + 2ab XY + b
2
Y.
C) XY + X Y.
D) a
2
X +b
2
Y.
Answer: B
16) To standardize a variable you
A) subtract its mean and divide by its standard deviation.
B) integrate the area below two points under the normal distribution.
C) add and subtract 1.96 times the standard deviation to the variable.
D) divide it by its standard deviation, as long as its mean is 1.
Answer: A
17) Assume that Y is normally distributed N( , 2 ). Moving from the mean ( ) 1.96 standard deviations to the left
and 1.96 standard deviations to the right, then the area under the normal p.d.f. is
A) 0.67
B) 0.05
C) 0.95
D) 0.33
Answer: C
18) Assume that Y is normally distributed N( , 2 ). To find Pr(c1
to calculate Pr(d 1
Z
Y
c2 ), where c1 < c2 and d i =
ci –
, you need
d2) =
A)
(d 2 ) -
B)
C)
(1.96) - (1.96)
(d 2 ) - (1 - (d 1 ))
(d 1 )
D) 1 - ( (d 2 ) -
(d 1 ))
Answer: A
19) If variables with a multivariate normal distribution have covariances that equal zero, then
A) the correlation will most often be zero, but does not have to be.
B) the variables are independent.
C) you should use the 2 distribution to calculate probabilities.
D) the marginal distribution of each of the variables is no longer normal.
Answer: B
20) The Student t distribution is
A) the distribution of the sum of m squared independent standard normal random variables.
B) the distribution of a random variable with a chi-squared distribution with m degrees of freedom, divided
by m.
C) always well approximated by the standard normal distribution.
D) the distribution of the ratio of a standard normal random variable, divided by the square root of an
independently distributed chi-squared random variable with m degrees of freedom divided by m.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 9
21) When there are
degrees of freedom, the t distribution
A) can no longer be calculated.
B) equals the standard normal distribution.
C) has a bell shape similar to that of the normal distribution, but with “fatter” tails.
D) equals the
2
distribution.
Answer: B
22) The sample average is a random variable and
A) is a single number and as a result cannot have a distribution.
B) has a probability distribution called its sampling distribution.
C) has a probability distribution called the standard normal distribution.
D) has a probability distribution that is the same as for the Y1 ,..., Yn i.i.d. variables.
Answer: B
23) To infer the political tendencies of the students at your college/university, you sample 150 of them. Only one of
the following is a simple random sample: You
A) make sure that the proportion of minorities are the same in your sample as in the
entire student body.
B) call every fiftieth person in the student directory at 9 a.m. If the person does not answer the phone, you
pick the next name listed, and so on.
C) go to the main dining hall on campus and interview students randomly there.
D) have your statistical package generate 150 random numbers in the range from 1 to the total number of
students in your academic institution, and then choose the corresponding names in the student telephone
directory.
Answer: D
24) The variance of Y,
A)
2
Y.
B)
Y
.
n
2
Y , is given by the following formula:
2
Y
C)
n
.
2
Y
D)
n
.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 10
25) The mean of the sample average Y, E(Y), is
1
A)
.
n Y
B) Y.
Y
.
n
C)
Y
D)
for n > 30.
Y
Answer: B
26) In econometrics, we typically do not rely on exact or finite sample distributions because
A) we have approximately an infinite number of observations (think of re -sampling).
B) variables typically are normally distributed.
C) the covariances of Yi, Yj are typically not zero.
D) asymptotic distributions can be counted on to provide good approximations to the exact sampling
distribution (given the number of observations available in most cases).
Answer: D
27) Consistency for the sample average Y can be defined as follows, with the exception of
A) Y converges in probability to Y.
B) Y has the smallest variance of all estimators.
p
C) Y
Y.
D) the probability of Y being in the range Y ± c becomes arbitrarily close to one as n increases for any
constant c > 0.
Answer: B
28) The central limit theorem states that
A) the sampling distribution of
Y- Y
is approximately normal.
Y
B) Y
p
Y.
C) the probability that Y is in the range Y ± c becomes arbitrarily close to one as n increases for any constant
c > 0.
D) the t distribution converges to the F distribution for approximately n > 30.
Answer: A
29) The central limit theorem
A) states conditions under which a variable involving the sum of Y1 ,..., Yn i.i.d. variables becomes the
standard normal distribution.
B) postulates that the sample mean Y is a consistent estimator of the population mean Y.
C) only holds in the presence of the law of large numbers.
D) states conditions under which a variable involving the sum of Y1 ,..., Yn i.i.d. variables becomes the
Student t distribution.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 11
30) The covariance inequality states that
2
A) 0
1.
XY
B)
2
XY
C)
2
2
XY
X
D)
2 2
.
X Y
2
.
Y
2
X
2
XY
2
Y
.
Answer: B
n
(axi + byi + c)=
31)
i=1
n
A) a
i=1
n
B) a
xi + b
xi + b
i=1
n
i=1
n
y i + n× c
yi + c
i=1
C) ax + by + n×c
n
n
xi + b
yi
D) a
i=1
i=1
Answer: A
n
32)
(axi+b)
i=1
A) n×a×x+
n×b
B) n(a+b)
C)
D)
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 12
33) Assume that you assign the following subjective probabilities for your final grade in your econometrics course
(the standard GPA scale of 4 = A to 0 = F applies):
Probability
0.20
0.50
0.20
0.08
Grade
A
B
C
D
F
0.02
The expected value is:
A) 3.0
B) 3.5
C) 2.78
D) 3.25
Answer: C
34) The mean and variance of a Bernoille random variable are given as
A) cannot be calculated
B) np and np(1-p)
C) p and p(1-p)
D) p and (1- p)
Answer: D
35) Consider the following linear transformation of a random variable y =
x- x
x
where x is the mean of x and x
is the standard deviation. Then the expected value and the standard deviation of Y are given as
A) 0 and 1
B) 1 and 1
C) Cannot be computed because Y is not a linear function of X
D)
x
and x
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 13
2.2 Essays and Longer Questions
1) Think of the situation of rolling two dice and let M denote the sum of the number of dots on the two dice. (So M
is a number between 1 and 12.)
(a) In a table, list all of the possible outcomes for the random variable M together with its probability
distribution and cumulative probability distribution. Sketch both distributions.
(b) Calculate the expected value and the standard deviation for M.
(c) Looking at the sketch of the probability distribution, you notice that it resembles a normal distribution.
Should you be able to use the standard normal distribution to calculate probabilities of events? Why or why
not?
Answer: (a)
2
3
4
5
6
7
8
9
10
11
12
Outcome
(sum of dots)
Probability 0.028 0.056 0.083 0.111 0.139 0.167 0.139 0.111 0.083 0.056 0.028
distribution
Cumulative 0.028 0.083 0.167 0.278 0.417 0.583 0.722 0.833 0.912 0.972 1.000
probability
distribution
(b) 7.0; 2.42.
(c) You cannot use the normal distribution (without continuity correction) to calculate probabilities of
events, since the probability of any event equals zero.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 14
2) What is the probability of the following outcomes?
(a) Pr(M = 7)
(b) Pr(M = 2 or M = 10)
(c) Pr(M = 4 or M 4)
(d) Pr(M = 6 and M = 9)
(e) Pr(M < 8)
(f) Pr(M = 6 or M > 10)
Answer: (a) 0.167 or
1
6
= ;
36 6
(b) 0.111 or
1
4
= ;
39 9
(c) 1;
(d) 0;
(e) 0.583;
(f) 0.222 or
2
8
= .
36 9
3) Probabilities and relative frequencies are related in that the probability of an outcome is the proportion of the
time that the outcome occurs in the long run. Hence concepts of joint, marginal, and conditional probability
distributions stem from related concepts of frequency distributions.
You are interested in investigating the relationship between the age of heads of households and weekly
earnings of households. The accompanying data gives the number of occurrences grouped by age and income.
You collect data from 1,744 individuals and think of these individuals as a population that you want to
describe, rather than a sample from which you want to infer behavior of a larger population. After sorting the
data, you generate the accompanying table:
Joint Absolute Frequencies of Age and Income, 1,744 Households
Household Income
Y1 $0-under $200
Y2 $200-under $ 400
Age of head of household
X1
X2
X3
X4
X5
16-under 20 20-under 25 25-under 45 45-under 65 65 and >
80
76
130
86
24
13
90
346
140
8
Y3 $400-under $600
0
19
251
101
6
Y4 $600-under $800
1
11
110
55
1
Y5 $800 and >
1
1
108
84
2
The median of the income group of $800 and above is $1,050.
(a) Calculate the joint relative frequencies and the marginal relative frequencies. Interpret one of each of these.
Sketch the cumulative income distribution.
(b) Calculate the conditional relative income frequencies for the two age categories 16 -under 20, and 45-under
65. Calculate the mean household income for both age categories.
(c) If household income and age of head of household were independently distributed, what would you expect
these two conditional relative income distributions to look like? Are they similar here?
(d) Your textbook has given you a primary definition of independence that does not involve conditional
relative frequency distributions. What is that definition? Do you think that age and income are independent
here, using this definition?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 15
Answer: (a) The joint relative frequencies and marginal relative frequencies are given in the accompanying table.
5.2 percent of the individuals are between the age of 20 and 24, and make between $200 and under $400.
21.6 percent of the individuals earn between $400 and under $600.
Joint Relative and Marginal Frequencies of Age and Income, 1,744 Households
Age of head of household
X1
X2
X3
X4
Household Income 16-under 20 20-under 25 25-under 45 45-under 65
Y1 $0-under $200
0.046
0.044
0.075
0.049
X5
65 and >
Total
0.014
0.227
Y2 $200-under $400
Y3 $400-under $600
0.007
0.052
0.198
0.080
0.005
0.342
0.000
0.011
0.144
0.058
0.003
0.216
Y4 $600-under $800
0.001
0.006
0.063
0.032
0.001
0.102
Y5 $800 and >
0.001
0.001
0.062
0.048
0.001
0.112
(b) The mean household income for the 16-under 20 age category is roughly $144. It is approximately
$489 for the 45-under 65 age category.
Conditional Relative Frequencies of Income and
Age 16-under 20, and 45-under 65, 1,744 Households
Age of head of household
X1
X4
Household Income 16-under 20
Y1 $0-under $200
0.842
45-under 65
0.185
Y2 $200-under $400
0.300
0.137
Stock/Watson 2e -- CVC2 8/23/06 -- Page 16
Y3 $400-under $600
0.000
0.217
Y4 $600-under $800
0.001
0.118
Y5 $800 and >
0.001
0.180
(c) They would have to be identical, which they clearly are not.
(d) Pr(Y = y, X = x) = Pr(Y = y) Pr(X = x). We can check this by multiplying two marginal probabilities to
see if this results in the joint probability. For example, Pr(Y = Y3 ) = 0.216 and Pr(X = X3 ) = 0.542,
resulting in a product of 0.117, which does not equal the joint probability of 0.144. Given that we are
looking at the data as a population, not a sample, we do not have to test how “close” 0.117 is to 0.144.
4) Math and verbal SAT scores are each distributed normally with N (500,10000).
(a) What fraction of students scores above 750? Above 600? Between 420 and 530? Below 480? Above 530?
(b) If the math and verbal scores were independently distributed, which is not the case, then what would be the
distribution of the overall SAT score? Find its mean and variance.
(c) Next, assume that the correlation coefficient between the math and verbal scores is 0.75. Find the mean and
variance of the resulting distribution.
(d) Finally, assume that you had chosen 25 students at random who had taken the SAT exam. Derive the
distribution for their average math SAT score. What is the probability that this average is above 530? Why is
this so much smaller than your answer in (a)?
Answer: (a) Pr(Y>750) = 0.0062; Pr(Y>600) = 0.1587; Pr(420<Y<530) = 0.4061; Pr(Y<480) = 0.4270; Pr(Y>530) =
0.3821.
(b) The distribution would be N(1000, 2000), using equations (2.29) and (2.31) in the textbook. Note that
the standard deviation is now roughly 141 rather than 200.
(c) Given the correlation coefficient, the distribution is now N(1000, 35000) , which has a standard
deviation of approximately 187.
(d) The distribution for the average math SAT score is N(500, 400). Pr(Y > 530) = 0.0668. This probability
is smaller because the sample mean has a smaller standard deviation (20 rather than 100).
5) The following problem is frequently encountered in the case of a rare disease, say AIDS, when determining the
probability of actually having the disease after testing positively for HIV. (This is often known as the accuracy
of the test given that you have the disease.) Let us set up the problem as follows: Y = 0 if you tested negative
using the ELISA test for HIV, Y = 1 if you tested positive; X = 1 if you have HIV, X = 0 if you do not have HIV.
Assume that 0.1 percent of the population has HIV and that the accuracy of the test is 0.95 in both cases of (i)
testing positive when you have HIV, and (ii) testing negative when you do not have HIV. (The actual ELISA
test is actually 99.7 percent accurate when you have HIV, and 98.5 percent accurate when you do not have
HIV.)
(a) Assuming arbitrarily a population of 10,000,000 people, use the accompanying table to first enter the
column totals.
Test Positive (Y=1)
HIV (X=1)
No HIV (X=0)
Total
Test Negative (Y=0)
Total
10,000,000
(b) Use the conditional probabilities to fill in the joint absolute frequencies.
(c) Fill in the marginal absolute frequencies for testing positive and negative. Determine the conditional
probability of having HIV when you have tested positive. Explain this surprising result.
(d) The previous problem is an application of Bayes’ theorem, which converts Pr( Y = y X = x) into Pr(X = x Y =
y). Can you think of other examples where Pr( Y = y X = x) Pr(X = x Y = y)?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 17
Answer: (a)
Test Positive (Y=1)
Test Negative (Y=0)
Test Positive (Y=1)
9,500
499,500
Test Negative (Y=0)
500
9,490,500
Total
10,000
9,990,000
10,000,000
Test Positive (Y=1)
9,500
499,500
509,000
Test Negative (Y=0)
500
9,490,500
9,491,000
Total
10,000
9,990000
10,000,000
HIV (X=1)
No HIV (X=0)
Total
Total
10,000
9,990,000
10,000,000
(b)
HIV (X=1)
No HIV (X=0)
Total
(c)
HIV (X=1)
No HIV (X=0)
Total
Pr(X=1 Y=1) = 0.0187. Although the test is quite accurate, there are very few people who have HIV
(10,000), and many who do not have HIV (9,999,000). A small percentage of that large number
(499,500/9,990,000) is large when compared to the higher percentage of the smaller number
(9,500/10,000).
d. Answers will vary by student. Perhaps a nice illustration is the probability to be a male given that you
play on the college/university men’s varsity team, versus the probability to play on the college/university
men’s varsity team given that you are a male student.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 18
6) You have read about the so-called catch-up theory by economic historians, whereby nations that are further
behind in per capita income grow faster subsequently. If this is true systematically, then eventually laggards
will reach the leader. To put the theory to the test, you collect data on relative (to the United States) per capita
income for two years, 1960 and 1990, for 24 OECD countries. You think of these countries as a population you
want to describe, rather than a sample from which you want to infer behavior of a larger population. The
relevant data for this question is as follows:
Y
X1
0.023
0.014
….
0.041
0.033
0.625
0.770
1.000
….
0.200
0.130
13.220
X2
1.030
1.000
Y × X1
0.018
0.014
….
….
0.450
0.008
0.230
0.004
17.800 0.294
Y2
0.00053
0.00020
….
0.00168
0.00109
0.01877
2
X1
2
X2
0.593
1.000
1.0609
1.0000
….
….
0.040 0.2025
0.017 0.0529
8.529 13.9164
where X1 and X2 are per capita income relative to the United States in 1960 and 1990 respectively, and Y is the
average annual growth rate in X over the 1960-1990 period. Numbers in the last row represent sums of the
columns above.
(a) Calculate the variance and standard deviation of X1 and X2 . For a catch-up effect to be present, what
relationship must the two standard deviations show? Is this the case here?
(b) Calculate the correlation between Y and . What sign must the correlation coefficient have for there to be
evidence of a catch-up effect? Explain.
Answer: (a) The variances of X1 and X2 are 0.0520 and 0.0298 respectively, with standard deviations of 0.2279
and 0.1726. For the catch-up effect to be present, the standard deviation would have to shrink over time.
This is the case here.
(b) The correlation coefficient is –0.88. It has to be negative for there to be evidence of a catch -up effect. If
countries that were relatively ahead in the initial period and in terms of per capita income grow by
relatively less over time, then eventually the laggards will catch -up.
7) Following Alfred Nobel’s will, there are five Nobel Prizes awarded each year. These are for outstanding
achievements in Chemistry, Physics, Physiology or Medicine, Literature, and Peace. In 1968, the Bank of
Sweden added a prize in Economic Sciences in memory of Alfred Nobel. You think of the data as describing a
population, rather than a sample from which you want to infer behavior of a larger population. The
accompanying table lists the joint probability distribution between recipients in economics and the other five
prizes, and the citizenship of the recipients, based on the 1969-2001 period.
Joint Distribution of Nobel Prize Winners in Economics and Non -Economics
Disciplines, and Citizenship, 1969-2001
Economics Nobel
Prize (X = 0)
Physics, Chemistry,
Medicine, Literature,
and Peace Nobel
Prize (X = 1)
Total
U.S. Citizen
(Y = 0)
0.118
Non= U.S. Citizen
(Y = 1)
0.049
Total
0.345
0.488
0.833
0.463
0.537
1.00
(a) Compute E(Y) and interpret the resulting number.
(b) Calculate and interpret E(Y X=1) and E(Y X=0).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 19
0.167
(c) A randomly selected Nobel Prize winner reports that he is a non-U.S. citizen. What is the probability that
this genius has won the Economics Nobel Prize? A Nobel Prize in the other five disciplines?
(d) Show what the joint distribution would look like if the two categories were independent.
Answer: (a) E(Y) = 0.53.7 . 53.7 percent of Nobel Prize winners were non-U.S. citizens.
(b) E(Y X=1) = 0.586 . 58.6 percent of Nobel Prize winners in non-economics disciplines were non-U.S.
citizens. E(Y X=0) = 0.293 . 29.3 percent of the Economics Nobel Prize winners were non -U.S. citizens.
(c) There is a 9.1 percent chance that he has won the Economics Nobel Prize, and a 90.9 percent chance
that he has won a Nobel Prize in one of the other five disciplines.
(d)
Joint Distribution of Nobel Prize Winners in Economics and Non -Economics Disciplines,
and Citizenship, 1969-2001, under assumption of independence
Economics Nobel
Prize (X = 0)
Physics, Chemistry,
Medicine, Literature,
and Peace Nobel
Prize (X = 1)
Total
U.S. Citizen
(Y = 0)
0.077
Non= U.S. Citizen
(Y = 1)
0.090
Total
0.386
0.447
0.833
0.463
0.537
1.00
0.167
8) A few years ago the news magazine The Economist listed some of the stranger explanations used in the past to
predict presidential election outcomes. These included whether or not the hemlines of women’s skirts went up
or down, stock market performances, baseball World Series wins by an American League team, etc. Thinking
about this problem more seriously, you decide to analyze whether or not the presidential candidate for a
certain party did better if his party controlled the house. Accordingly you collect data for the last 34
presidential elections. You think of this data as comprising a population which you want to describe, rather
than a sample from which you want to infer behavior of a larger population. You generate the accompanying
table:
Joint Distribution of Presidential Party Affiliation and Party Control
of House of Representatives, 1860 -1996
Democratic
President (X = 0)
Republican
President (X = 1)
Total
Democratic Control Republican Control
of House (Y = 0)
of House (Y = 1)
0.412
0.030
Total
0.441
0.176
0.382
0.559
0.588
0.412
1.00
(a) Interpret one of the joint probabilities and one of the marginal probabilities.
(b) Compute E(X). How does this differ from E(X Y = 0 )? Explain.
(c) If you picked one of the Republican presidents at random, what is the probability that during his term the
Democrats had control of the House?
(d) What would the joint distribution look like under independence? Check your results by calculating the two
conditional distributions and compare these to the marginal distribution.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 20
Answer: (a) 38.2 percent of the presidents were Republicans and were in the White House while Republicans
controlled the House of Representatives. 44.1 percent of all presidents were Democrats.
(b) E(X)= 0.559. E(X Y = 0) = 0.701. E(X) gives you the unconditional expected value, while E(X Y = 0) is
the conditional expected value.
(c) E(X) = 0.559 . 55.9 percent of the presidents were Republicans. E(X Y = 0) = 0.299 . 29.9 percent of
those presidents who were in office while Democrats had control of the House of Representatives were
Republicans. The second conditions on those periods during which Democrats had control of the House
of Representatives, and ignores the other periods.
(d)
Joint Distribution of Presidential Party Affiliation and Party Control of House of
Representatives, 1860-1996, under the Assumption of Independence
Democratic Control Republican Control
of House (Y = 0)
of House (Y = 1)
0.259
0.182
Democratic
President (X = 0)
Republican
President (X = 1)
Total
Total
0.441
0.329
0.230
0.559
0.588
0.412
1.00
Pr(X = 0 Y = 0) =
0.259
= 0.440 (there is a small rounding error).
0.588
Pr(Y = 1 X = 1) =
0.230
= 0.411 (there is a small rounding error).
0.559
9) The expectations augmented Phillips curve postulates
p=
– f (u – u),
where p is the actual inflation rate, is the expected inflation rate, and u is the unemployment rate, with –
indicating equilibrium (the NAIRU – Non-Accelerating Inflation Rate of Unemployment). Under the
assumption of static expectations ( = p –1), i.e., that you expect this period’s inflation rate to hold for the next
period ( the sun shines today, it will shine tomorrow ), then the prediction is that inflation will accelerate if the
unemployment rate is below its equilibrium level. The accompanying table below displays information on
accelerating annual inflation and unemployment rate differences from the equilibrium rate (cyclical
unemployment), where the latter is approximated by a five-year moving average. You think of this data as a
population which you want to describe, rather than a sample from which you want to infer behavior of a larger
population. The data is collected from United States quarterly data for the period 1964:1 to 1995:4.
Joint Distribution of Accelerating Inflation and Cyclical Unemployment,
1964:1-1995:4
p–
p –1 > 0
(X = 0)
p– p –1
(X = 1)
Total
0
(u – u) > 0
(Y = 0)
0.156
(u – u) 0
(Y = 1)
0.383
Total
0.297
0.164
0.461
0.453
0.547
1.00
0.539
(a) Compute E(Y) and E(X), and interpret both numbers.
(b) Calculate E(Y X= 1) and E(Y X= 0). If there was independence between cyclical unemployment and
acceleration in the inflation rate, what would you expect the relationship between the two expected values to
Stock/Watson 2e -- CVC2 8/23/06 -- Page 21
be? Given that the two means are different, is this sufficient to assume that the two variables are independent?
(c) What is the probability of inflation to increase if there is positive cyclical unemployment? Negative cyclical
unemployment?
(d) You randomly select one of the 59 quarters when there was positive cyclical unemployment (( u – u) > 0).
What is the probability there was decelerating inflation during that quarter?
Answer: (a) E(Y) = 0.547 . 54.7 percent of the quarters saw cyclical unemployment.
E(Y) = 0.461 . 46.1 percent of the quarters saw decreasing inflation rates.
(b) E(Y X = 1) = 0.356; E(Y X = 0 ) = 0.711. You would expect the two conditional expectations to be the
same. In general, independence in means does not imply statistical independence, although the reverse
is true.
(c) There is a 34.4 percent probability of inflation to increase if there is positive cyclical unemployment.
There is a 70 percent probability of inflation to increase if there is negative cyclical unemployment.
(d) There is a 65.6 percent probability of inflation to decelerate when there is positive cyclical
unemployment.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 22
10) The accompanying table shows the joint distribution between the change of the unemployment rate in an
election year and the share of the candidate of the incumbent party since 1928. You think of this data as a
population which you want to describe, rather than a sample from which you want to infer behavior of a larger
population.
Joint Distribution of Unemployment Rate Change and Incumbent Party’s Vote
Share in Total Vote Cast for the Two Major -Party Candidates,
1928-2000
u > 0 (X = 0)
u 0 (X = 1)
Total
(Incumbent- 50%) > 0 (Incumbent- 50%)
(Y = 0)
(Y = 1)
0.053
0.211
0.579
0.157
0.632
0.368
0
Total
0.264
0.736
1.00
(a) Compute and interpret E(Y) and E(X).
(b) Calculate E(Y X = 1) and E(Y X = 0). Did you expect these to be very different?
(c) What is the probability that the unemployment rate decreases in an election year?
(d) Conditional on the unemployment rate decreasing, what is the probability that an incumbent will lose the
election?
(e) What would the joint distribution look like under independence?
Answer: (a) E(Y) = 0.368; E(X) = 0.736. The probability of an incumbent to have less than 50% of the share of votes
cast for the two major-party candidates is 0.368. The probability of observing falling unemployment
rates during the election year is 73.6 percent.
(b) E(Y X = 1) = 0.213; E(Y X = 0) = 0.799 . A student who believes that incumbents will attempt to
manipulate the economy to win elections will answer affirmatively here.
(c) Pr(X = 1) = 0.736.
(d) Pr(Y = 1 X = 1) = 0.213.
(e)
Joint Distribution of Unemployment Rate Change and Incumbent Party’s Vote
Share in Total Vote Cast for the Two Major -Party Candidates,
1928-2000 under Assumption of Statistical Independence
u > 0 (X = 0)
u 0 (X = 1)
Total
(Incumbent- 50%) > 0 (Incumbent- 50%) > 0
(Y = 0)
(Y = 1)
0.167
0.097
0.465
0.271
0.632
0.368
Stock/Watson 2e -- CVC2 8/23/06 -- Page 23
Total
0.264
0.736
1.00
11) The table accompanying lists the joint distribution of unemployment in the United States in 2001 by
demographic characteristics (race and gender).
Joint Distribution of Unemployment by Demographic Characteristics,
United States, 2001
Age 16-19
(X = 0)
Age 20 and above
(X = 1)
Total
White
(Y = 0)
0.13
Black and Other
(Y = 1)
0.05
Total
0.60
0.22
0.82
0.73
0.27
1.00
0.18
(a) What is the percentage of unemployed white teenagers?
(b) Calculate the conditional distribution for the categories white and black and other.
(c) Given your answer in the previous question, how do you reconcile this fact with the probability to be 60% of
finding an unemployed adult white person, and only 22% for the category black and other.
Answer: (a) Pr(Y = 0, X = 0) = 0.13.
(b)
Conditional Distribution of Unemployment by Demographic
Characteristics, United States, 2001
Age 16-19
(X = 0)
Age 20 and above
(X = 1)
Total
White
(Y = 0)
0.18
Black and Other
(Y = 1)
0.19
0.82
0.81
1.00
1.00
(c) The original table showed the joint probability distribution, while the table in (b) presented the
conditional probability distribution.
12) From the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website the chapter 8 CPS data
set (ch8_cps.xls) into a spreadsheet program such as Excel. For the exercise, use the first 500 observations only.
Using data for average hourly earnings only (ahe), describe the earnings distribution. Use summary statistics,
such as the mean, meadian, variance, and skewness. Produce a frequency distribution (“histogram”) using
reasonable earnings class sizes.
Answer: ahe
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample
Variance
Kurtosis
Skewness
Range
Minimum
19.79
0.51
16.83
19.23
11.49
131.98
0.23
0.96
58.44
2.14
Stock/Watson 2e -- CVC2 8/23/06 -- Page 24
Maximum
Sum
Count
60.58
9897.45
500.0
The mean is $19.79. The median ($16.83) is lower than the average, suggesting that the mean is
being pulled up by individuals with fairly high average hourly earnings. This is confirmed by
the skewness measure, which is positive, and therefore suggests a distribution with a long tail to
the right. The variance is $2 131.96, while the standard deviation is $11.49.
To generate the frequency distribution in Excel, you first have to settle on the number of class
intervals. Once you have decided on these, then the minimum and maximum in the data
suggests the class width. In Excel, you then define “bins” (the upper limits of the class intervals).
Sturges’s formula can be used to suggest the number of class intervals (1+3.31log(n) ), which
would suggest about 9 intervals here. Instead I settled for 8 intervals with a class width of $8 —
minimum wages in California are currently $8 and approximately the same in other U.S. states.
The table produces the absolute frequencies, and relative frequencies can be calculated in a
straightforward way.
bins
8
16
24
32
40
48
56
66
More
Frequency
50
187
115
68
38
33
8
1
0
rel. freq.
0.1
0.374
0.23
0.136
0.076
0.066
0.016
0.002
Substitution of the relative frequencies into the histogram table then produces the following
graph (after eliminating the gaps between the bars).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 25
2.3 Mathematical and Graphical Problems
1) Think of an example involving five possible quantitative outcomes of a discrete random variable and attach a
probability to each one of these outcomes. Display the outcomes, probability distribution, and cumulative
probability distribution in a table. Sketch both the probability distribution and the cumulative probability
distribution.
Answer: Answers will vary by student. The generated table should be similar to Table 2.1 in the text, and figures
should resemble Figures 2.1 and 2.2 in the text.
2) The height of male students at your college/university is normally distributed with a mean of 70 inches and a
standard deviation of 3.5 inches. If you had a list of telephone numbers for male students for the purpose of
conducting a survey, what would be the probability of randomly calling one of these students whose height is
(a) taller than 6 0 ?
(b) between 5 3 and 6 5 ?
(c) shorter than 5 7 , the mean height of female students?
(d) shorter than 5 0 ?
(e) taller than Shaq O’Neal, the center of the Miami Heat, who is 7 1 tall?
Compare this to the probability of a woman being pregnant for 10 months (300 days), where days of pregnancy
is normally distributed with a mean of 266 days and a standard deviation of 16 days.
Answer: (a) Pr(Z > 0.5714) = 0.2839;
(b) Pr( –2 < Z < 2) = 0.9545 or approximately 0.95;
(c) Pr(Z < -0.8571) = 0.1957;
(d) Pr(Z < -2.8571) = 0.0021;
(e) Pr(Z > 4.2857) = 0.000009 (the text does not show values above 2.99 standard deviations, Pr(Z >2.99 =
0.0014) and Pr(Z > 2.1250) = 0.0168.
3) Calculate the following probabilities using the standard normal distribution. Sketch the probability distribution
in each case, shading in the area of the calculated probability.
(a) Pr(Z < 0.0)
(b) Pr(Z 1.0)
(c) Pr(Z > 1.96)
(d) Pr(Z < –2.0)
(e) Pr(Z > 1.645)
(f) Pr(Z > –1.645)
(g) Pr(–1.96 < Z < 1.96)
(h.) Pr(Z < 2.576 or Z > 2.576)
(i.) Pr(Z > z) = 0.10; find z.
(j.) Pr(Z < –z or Z > z) = 0.05; find z.
Answer: (a) 0.5000;
(b) 0.8413;
(c) 0.0250;
(d) 0.0228;
(e) 0.0500;
(f) 0.9500;
(g) 0.0500;
(h) 0.0100;
(i) 1.2816;
(j) 1.96.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 26
4) Using the fact that the standardized variable Z is a linear transformation of the normally distributed random
variable Y, derive the expected value and variance of Z.
Answer: Z =
Y- Y
Y
-
Y
Y
+
=-
Y
Y
1
Y
+
1
Y = a + bY, with a = -
Y
Y = 0, and Z =
Y
Y
1
2
Z
and b =
1
. Given (2.29) and (2.30) in the text, E(Z) =
Y
2
= 1.
Z
5) Show in a scatterplot what the relationship between two variables X and Y would look like if there was
(a) a strong negative correlation.
(b) a strong positive correlation.
(c) no correlation.
Answer: (a)
(b)
(c)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 27
6) What would the correlation coefficient be if all observations for the two variables were on a curve described by
Y = X2 ?
Answer: The correlation coefficient would be zero in this case, since the relationship is non -linear.
7) Find the following probabilities:
(a) Y is distributed
2
4 . Find Pr(Y > 9.49).
(b) Y is distributed t . Find Pr(Y > –0.5).
(c) Y is distributed F4, . Find Pr(Y < 3.32).
(d) Y is distributed N(500, 10000). Find Pr(Y > 696 or Y < 304).
Answer: (a) 0.05.
(b) 0.6915.
(c) 0.99.
(d) 0.05.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 28
8) In considering the purchase of a certain stock, you attach the following probabilities to possible changes in the
stock price over the next year.
Stock Price Change During
Next Twelve Months (%)
+15
+5
0
–5
–15
Probability
0.2
0.3
0.4
0.05
0.05
What is the expected value, the variance, and the standard deviation? Which is the most likely outcome? Sketch
the cumulative distribution function.
Answer: E(Y) = 3.5;
2
Y = 8.49; Y = 2.91; most likely: 0.
9) You consider visiting Montreal during the break between terms in January. You go to the relevant Web site of
the official tourist office to figure out the type of clothes you should take on the trip. The site lists that the
average high during January is –7° C, with a standard deviation of 4° C. Unfortunately you are more familiar
with Fahrenheit than with Celsius, but find that the two are related by the following linear function:
5
C= (F – 32).
9
Find the mean and standard deviation for the January temperature in Montreal in Fahrenheit.
Answer: Using equations (2.29) and (2.30) from the textbook, the result is 19.4 and 7.2.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 29
10) Two random variables are independently distributed if their joint distribution is the product of their marginal
distributions. It is intuitively easier to understand that two random variables are independently distributed if
all conditional distributions of Y given X are equal. Derive one of the two conditions from the other.
Answer: If all conditional distributions of Y given X are equal, then
Pr(Y = y X = 1) = Pr(Y = y X = 2) = ... = Pr(Y = y X = l).
But if all conditional distributions are equal, then they must also equal the marginal distribution, i.e.,
Pr(Y = y X = x) = Pr(Y - y).
Given the definition of the conditional distribution of Y given X = x, you then get
Pr(Y = y X = x) =
Pr(Y = y, X = x)
= Pr(Y = y),
Pr(X = x)
which gives you the condition
Pr(Y = y, X = x) = Pr(Y = y) Pr(X = x).
11) There are frequently situations where you have information on the conditional distribution of Y given X, but
Pr(X = x, Y = y)
, derive a
are interested in the conditional distribution of X given Y. Recalling Pr(Y = y X = x) =
Pr(X = x)
relationship between Pr(X = x Y = y) and Pr(Y = y X = x). This is called Bayes’ theorem.
Answer: Given Pr(Y = y X = x) =
Pr(X = x Y = y)
,
Pr(X = x)
Pr(Y = y X = x) × Pr(X = x) = Pr(X = x, Y = y);
Pr(X = x Y = y)
similarly Pr(X = x Y = y) =
and
Pr(Y = y)
Pr(X = x Y = y) × Pr(Y = y) = Pr(X = x, Y = y). Equating the two and solving for Pr(X = x Y = y) then
results in
Pr(Y = y X = x) × Pr(X = x)
.
Pr(X = x Y = y) =
Pr(Y = y)
12) You are at a college of roughly 1,000 students and obtain data from the entire freshman class (250 students) on
height and weight during orientation. You consider this to be a population that you want to describe, rather
than a sample from which you want to infer general relationships in a larger population. Weight ( Y) is
measured in pounds and height (X) is measured in inches. You calculate the following sums:
n
i=1
2
y i = 94,228.8,
n
i=1
2
x i = 1,248.9,
n
x iy i = 7,625.9
i=1
(small letters refer to deviations from means as in z i = Zi – Z).
(a) Given your general knowledge about human height and weight of a given age, what can you say about the
shape of the two distributions?
(b) What is the correlation coefficient between height and weight here?
Answer: (a) Both distributions are bound to be normal.
(b) 0.703.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 30
13) Use the definition for the conditional distribution of Y given X = x and the marginal distribution of X to derive
the formula for Pr(X = x, Y = y). This is called the multiplication rule. Use it to derive the probability for
drawing two aces randomly from a deck of cards (no joker), where you do not replace the card after the first
draw. Next, generalizing the multiplication rule and assuming independence, find the probability of having
four girls in a family with four children.
3
1 4
1
4
.
Answer:
×
= 0.0045; 0.0625 or
=
2
16
52 51
14) The systolic blood pressure of females in their 20s is normally distributed with a mean of 120 with a standard
deviation of 9. What is the probability of finding a female with a blood pressure of less than 100? More than
135? Between 105 and 123? You visit the women’s soccer team on campus, and find that the average blood
pressure of the 25 members is 114. Is it likely that this group of women came from the same population?
Answer: Pr(Y<100) = 0.0131; Pr(Y>135) = 0.0478; Pr(105<Y<123) = 0.6784; Pr(Y< 114) = Pr(Z < -3.33) = 0.0004.
(The smallest z-value listed in the table in the textbook is –2.99, which generates a probability value of
0.0014.) This unlikely that this group of women came from the same population.
15) Show that the correlation coefficient between Y and X is unaffected if you use a linear transformation in both
variables. That is, show that corr(X,Y) = corr(X*, Y*), where X* = a + bX and Y* = c + dY, and where a, b, c, and d
are arbitrary non–zero constants.
Answer: corr(X*, Y*) =
cov(X*, Y*)
=
var(X*) var(Y*)
bd cov(X, Y)
corr(X, Y).
2
b var(X) d 2 var(Y)
16) The textbook formula for the variance of the discrete random variable Y is given as
2
Y =
k
(y i –
2
Y) p i.
i=1
Another commonly used formulation is
2
Y =
k
2
y i pi –
2
Y.
i=1
Prove that the two formulas are the same.
Answer:
2
Y =
k
(y i 2
Y) pi =
k
2
(y i +
2
Y - 2 Yyi) p i =
k
2
( y i pi +
2
Y p i - 2 Yy ip i).
i=1
i=1
i=1
Moving the summation sign through results in
k
k
k
k
k
2
2
2
But
y
p
p
p
.
p
2
y
1
and
y ip i , giving you the second
=
+
=
Y
i i
Y
i
i
Y
i i
Y
i=1
i=1
i=1
i=1
i=1
expression after simplification.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 31
17) The Economic Report of the President gives the following age distribution of the United States population for the
year 2000:
United States Population By Age Group, 2000
Outcome (age
category
Percentage
Under 5 5-15
16-19
20-24
25-44
45-64
0.06
0.06
0.07
0.30
0.22
0.16
65 and
over
0.13
Imagine that every person was assigned a unique number between 1 and 275,372,000 (the total population in
2000). If you generated a random number, what would be the probability that you had drawn someone older
than 65 or under 16? Treating the percentages as probabilities, write down the cumulative probability
distribution. What is the probability of drawing someone who is 24 years or younger?
Answer: Pr(Y < 16 or Y > 65) = 0.35;
Outcome (age
category
Cumulative
probability
distribution
Pr(Y
Under 5 5-15
16-19
20-24
25-44
45-64
0.06
0.28
0.35
0.65
0.87
0.22
65 and
over
1.00
24) = 0.35.
18) The accompanying table gives the outcomes and probability distribution of the number of times a student
checks her e-mail daily:
Probability of Checking E-Mail
Outcome
(number of email checks)
Probability
distribution
0
1
2
3
4
5
6
0.05
0.15
0.30
0.25
0.15
0.08
0.02
Sketch the probability distribution. Next, calculate the c.d.f. for the above table. What is the probability of her
checking her e-mail between 1 and 3 times a day? Of checking it more than 3 times a day?
Answer: Outcome
(number of email checks)
Cumulative
probability
distribution
Pr(1
Y
0
1
2
3
4
5
6
0.05
0.20
0.50
0.75
0.90
0.98
1.00
3) 0.70 ; Pr(Y > 0.25).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 32
Stock/Watson 2e -- CVC2 8/23/06 -- Page 33
19) The accompanying table lists the outcomes and the cumulative probability distribution for a student renting
videos during the week while on campus.
Video Rentals per Week during Semester
Outcome (number of weekly 0
video rentals)
Probability distribution
0.05
1
2
3
4
5
6
0.55
0.25
0.05
0.07
0.02
0.01
Sketch the probability distribution. Next, calculate the cumulative probability distribution for the above table.
What is the probability of the student renting between 2 and 4 a week? Of less than 3 a week?
Answer: The cumulative probability distribution is given below. The probability of renting between two and four
videos a week is 0.37. The probability of renting less than three a week is 0.85.
Outcome (number of
weekly video rentals)
Cumulative probability
distribution
0
1
2
3
4
5
6
0.05
0.60
0.85
0.90
0.97
0.99
1.00
20) The textbook mentioned that the mean of Y, E(Y) is called the first moment of Y, and that the expected value of
the square of Y, E(Y2 ) is called the second moment of Y, and so on. These are also referred to as moments about
the origin. A related concept is moments about the mean, which are defined as E[(Y – Y)r]. What do you call
the second moment about the mean? What do you think the third moment, referred to as skewness,
measures? Do you believe that it would be positive or negative for an earnings distribution? What measure of
the third moment around the mean do you get for a normal distribution?
Answer: The second moment about the mean is the variance. Skewness measures the departure from symmetry.
For the typical earnings distribution, it will be positive. For the normal distribution, it will be zero.
21) Explain why the two probabilities are identical for the standard normal distribution: Pr(–1.96
Pr(–1.96 < X < 1.96).
Answer: For a continuous distribution, the probability of a point is zero.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 34
X 1.96) and
22) SAT scores in Mathematics are normally distributed with a mean of 500 and a standard deviation of 100. The
1 Y- Y 2
)
- (
1
2
Y
formula for the normal distribution is f(Y)=
e
Use the scatter plot option in a standard
2
2
Y
spreadsheet program, such as Excel, to plot the Mathematics SAT distribution using this formula. Start by
entering 300 as the first SAT score in the first column (the lowest score you can get in the mathematics section
as long as you fill in your name correctly), and then increment the scores by 10 until you reach 800. In the
second column, use the formula for the normal distribution and calculate f(Y). Then use the scatter plot option,
where you eventually remove markers and substitute these with the solid line option.
Answer:
23) Use a standard spreadsheet program, such as Excel, to find the following probabilities from various
distributions analyzed in the current chapter:
a. If Y is distributed N (1,4), find Pr(Y 3)
b. If Y is distributed N (3,9), find Pr(Y>0)
c. If Y is distributed N (50,25), find Pr(40 Y 52)
d. If Y is distributed N (5,2), find Pr(6 Y 8)
Answer: The answers here are given together with the relevant Excel commands.
a.
=NORMDIST(3,1,2,TRUE) = 0.8413
b.
=1-NORMDIST(0,3,3,TRUE) = 0.8413
c.
=NORMDIST(52,50,5,TRUE)-NORMDIST(40,50,5,TRUE) = 0.6326
d.
=NORMDIST(8,5,SQRT(2),TRUE)-NORMDIST(6,5,SQRT(2),TRUE) = 0.2229
Stock/Watson 2e -- CVC2 8/23/06 -- Page 35
24) Looking at a large CPS data set with over 60,000 observations for the United States and the year 2004, you find
that the average number of years of education is approximately 13.6. However, a surprising large number of
individuals (approximately 800) have quite a low value for this variable, namely 6 years or less. You decide to
drop these observations, since none of your relatives or friends have that few years of education. In addition,
you are concerned that if these individuals cannot report the years of education correctly, then the observations
on other variables, such as average hourly earnings, can also not be trusted. As a matter of fact you have found
several of these to be below minimum wages in your state. Discuss if dropping the observations is reasonable.
Answer: While it is always a good idea to check the data carefully before conducting a quantitative analysis, you
should never drop data before carefully thinking about the problem at hand. While it is not plausible to
find many individuals in the U.S. who were raised here with that few years of education, there will be
immigrants in the survey. Average years of education can be quite low in other countries. For example,
Brazil’s average years of schooling is less than 6 years. The point of the exercise is to think hard whether
or not observations are outliers generated by faulty data entry or if there is a reason for observing values
which may appear strange at first.
25) Use a standard spreadsheet program, such as Excel, to find the following probabilities from various
distributions analyzed in the current chapter:
a.
If Y is distributed
2
4 , find Pr( Y
b.
If Y is distributed
2
10 , find Pr( Y > 18.31)
c.
d.
If Y is distributed F10, , find Pr( Y > 1.83)
If Y is distributed t15, find Pr( Y > 1.75)
e.
f.
g.
h.
If Y is distributed t90, find Pr( -1.99 Y 1.99)
If Y is distributed N(0,1), find Pr( -1.99 Y 1.99)
If Y is distributed F7,4, find Pr( Y > 4.12)
If Y is distributed F7,120, , find Pr( Y > 2.79)
7.78)
Answer: The answers here are given together with the relevant Excel commands.
a.
=1-CHIDIST(7.78,4) = 0.90
b.
=CHIDIST(18.31,10) = 0.05
c.
=FDIST(1.83,10,1000000) = 0.05
d. =TDIST(1.75,15,1) = 0.05
e.
=1-TDIST(1.99,90,2) = 0.95
f.
=NORMDIST(1.99,0,1,1)-NORMDIST(-1.99,0,1,1) = 0.953
g.
=FDIST(4.12,7,4) = 0.10
h. =FDIST(2.79,7,120) = 0.01
Stock/Watson 2e -- CVC2 8/23/06 -- Page 36
Chapter 3 Review of Statistics
3.1 Multiple Choice
1) An estimator is
A) an estimate.
B) a formula that gives an efficient guess of the true population value.
C) a random variable.
D) a nonrandom number.
Answer: C
2) An estimate is
A) efficient if it has the smallest variance possible.
B) a nonrandom number.
C) unbiased if its expected value equals the population value.
D) another word for estimator.
Answer: B
^
3) An estimator Y of the population value Y is unbiased if
^
A) Y =
.
Y
B) Y has the smallest variance of all estimators.
p
C) Y
Y.
^
D) E( Y) = Y.
Answer: D
^
4) An estimator Y of the population value Y is consistent if
A)
^
Y
p
Y.
B) its mean square error is the smallest possible.
C) Y is normally distributed.
p
D) Y
0.
Answer: A
~
^
5) An estimator Y of the population value Y is more efficient when compared to another estimator Y, if
^
~
A) E( Y) > E( Y).
B) it has a smaller variance.
C) its c.d.f. is flatter than that of the other estimator.
^
~
D) both estimators are unbiased, and var( Y) < var( Y).
Answer: D
6) With i.i.d. sampling each of the following is true except
A) E(Y) = Y.
B) var(Y) =
2
Y /n.
C) E(Y) < E(Y).
D) Y is a random variable.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 37
^
7) The standard error of Y, SE(Y) = Y is given by the following formula:
n
1
A)
(Yi – Y)2 .
n
i=1
2
SY
B)
n
.
C) SY.
SY
D)
.
n
Answer: D
8) The critical value of a two-sided t-test computed from a large sample
A) is 1.64 if the significance level of the test is 5%.
B) cannot be calculated unless you know the degrees of freedom.
C) is 1.96 if the significance level of the test is 5%.
D) is the same as the p-value.
Answer: C
9) A type I error is
A) always the same as (1-type II) error.
B) the error you make when rejecting the null hypothesis when it is true.
C) the error you make when rejecting the alternative hypothesis when it is true.
D) always 5%.
Answer: B
10) A type II error
A) is typically smaller than the type I error.
B) is the error you make when choosing type II or type I.
C) is the error you make when not rejecting the null hypothesis when it is false.
D) cannot be calculated when the alternative hypothesis contains an = .
Answer: C
11) The size of the test
A) is the probability of committing a type I error.
B) is the same as the sample size.
C) is always equal to (1-the power of test).
D) can be greater than 1 in extreme examples.
Answer: A
12) The power of the test is
A) dependent on whether you calculate a t or a t2 statistic.
B) one minus the probability of committing a type I error.
C) a subjective view taken by the econometrician dependent on the situation.
D) one minus the probability of committing a type II error.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 38
13) When you are testing a hypothesis against a two-sided alternative, then the alternative is written as
A) E(Y) > Y,0.
B) E(Y) = Y,0.
C) Y
Y,0.
D) E(Y)
Y,0.
Answer: D
14) A scatterplot
A) shows how Y and X are related when their relationship is scattered all over the place.
B) relates the covariance of X and Y to the correlation coefficient.
C) is a plot of n observations on Xi and Yi, where each observation is represented by the point (Xi, Yi).
D) shows n observations of Y over time.
Answer: C
15) The following types of statistical inference are used throughout econometrics, with the exception of
A) confidence intervals.
B) hypothesis testing.
C) calibration.
D) estimation.
Answer: C
16) Among all unbiased estimators that are weighted averages of Y1 ,..., Yn Y, is
A) the only consistent estimator of Y.
B) the most efficient estimator of
Y.
C) a number which, by definition, cannot have a variance.
D) the most unbiased estimator of Y.
Answer: B
17) To derive the least squares estimator Y, you find the estimator m which minimizes
n
A)
(Yi – m)2 .
i=1
n
B)
(Yi – m) .
i=1
n
2
C)
mY i .
i=1
n
D)
(Yi – m) .
i=1
Answer: A
18) If the null hypothesis states H0 : E(Y) = Y,0, then a two-sided alternative hypothesis is
A) H1 : E(Y)
Y,0.
B) H1 : E(Y)
Y,0.
Y,0.
C) H1 : Y <
D) H1 : E(Y) > Y,0.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 39
19) The p-value is defined as follows:
A) p = 0.05.
B) PrH0 [ Y – Y,0 > Y act– Y,0 ].
C) Pr(z > 1.96).
D) PrH0 [ Y – Y,0 < Y act– Y,0 ]..
Answer: B
20) A large p-value implies
A) rejection of the null hypothesis.
B) a large t-statistic.
C) a large Yact.
D) that the observed value Yact is consistent with the null hypothesis.
Answer: D
21) The formula for the sample variance is
n
2
1
(Yi – Y).
A) S Y =
n–1
i=1
n
2
1
(Yi – Y)2 .
B) S Y =
n–1
i=1
n
2
1
(Yi –
2
C) S Y =
Y) .
n–1
i=1
n–1
2
1
(Yi – Y)2 .
D) S Y =
n–1
i=1
Answer: B
22) Degrees of freedom
A) in the context of the sample variance formula means that estimating the mean uses up some of the
information in the data.
B) is something that certain undergraduate majors at your university/college other than economics seem to
have an amount of.
C) are (n-2) when replacing the population mean by the sample mean.
2
D) ensure that S Y =
2
Y.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 40
23) The t-statistic is defined as follows:
A) t =
Y – Y,0
2
Y
.
n
B) t =
C) t =
Y – Y,0
SE(Y)
.
(Y – Y,0)2
SE(Y)
.
D) 1.96.
Answer: A
24) The power of the test
A) is the probability that the test actually incorrectly rejects the null hypothesis when the null is true.
B) depends on whether you use Y or Y2 for the t-statistic.
C) is one minus the size of the test.
D) is the probability that the test correctly rejects the null when the alternative is true.
Answer: D
25) The sample covariance can be calculated in any of the following ways, with the exception of:
n
1
(Xi – X)(Yi – Y).
A)
n–1
i=1
n
1
XiYi – n XY.
B)
n–1
n–1
i=1
C)
1
n
n
(Xi – X)(Yi –
Y).
i=1
D) rXYSYSY, where rXY is the correlation coefficient.
Answer: C
26) When the sample size n is large, the 90% confidence interval for Y is
A) Y ± 1.96SE(Y).
B) Y ± 1.64SE(Y).
C) Y ± 1.64 Y.
D) Y ± 1.96.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 41
27) The standard error for the difference in means if two random variables M and W , when the two population
variances are different, is
2
2
S M+ S W
A)
B)
SM SW
.
+
nM n
W
2
SM
C)
2
SW
1
(
).
+
2 nM
nW
2
SM
D)
.
nM + n
W
nM
+
2
SW
.
nW
Answer: D
28) The t-statistic has the following distribution:
A) standard normal distribution for n < 15
B) Student t distribution with n–1 degrees of freedom regardless of the distribution of the Y.
C) Student t distribution with n–1 degrees of freedom if the Y is normally distributed.
D) a standard normal distribution if the sample standard deviation goes to zero.
Answer: C
29) The following statement about the sample correlation coefficient is true.
A) –1 rXY 1.
p
2
B) r XY
corr(Xi, Yi).
C) rXY < 1.
D) rXY =
2
S XY
2 2
SXSY
.
Answer: A
30) The correlation coefficient
A) lies between zero and one.
B) is a measure of linear association.
C) is close to one if X causes Y.
D) takes on a high value if you have a strong nonlinear relationship.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 42
31) When testing for differences of means, the t-statistic t =
Ym-Yw
SE(Ym-Yw )
, where SE(Ym-Yw )=
2
sm
nm
+
2
sw
nw
has
A) a student t distribution if the population distribution of Y is not normal
B) a student t distribution if the population distribution of Y is normal
C) a normal distribution even in small samples
D) cannot be computed unless nw=nm
Answer: B
32) When testing for differences of means, you can base statistical inference on the
A) Student t distribution in general
B) normal distribution regardless of sample size
C) Student t distribution if the underlying population distribution of Y is normal, the two groups have the
same variances, and you use the pooled standard error formula
D) Chi-squared distribution with (nw + nm - 2) degrees of freedom
Answer: C
33) Assume that you have 125 observations on the height ( H) and weight (W) of your peers in college. Let
sHW = 68, sH = 3.5, sW = 29. The sample correlation coefficient is
A) 1.22
B) 0.50
C) 0.67
D) Cannot be computed since males and females have not been separated out.
Answer: C
34) You have collected data on the average weekly amount of studying time ( T) and grades (G) from the peers at
your college. Changing the measurement from minutes into hours has the following effect on the correlation
coefficient:
A) decreases the rTG by dividing the original correlation coefficient by 60
B) results in a higher rTG
C) cannot be computed since some students study less than an hour per week
D) does not change the rTG
Answer: A, D
35) A low correlation coefficient implies that
A) the line always has a flat slope
B) in the scatterplot, the points fall quite far away from the line
C) the two variables are unrelated
D) you should use a tighter scale of the vertical and horizontal axis to bring the observations closer to the line
Answer: B
3.2 Essays and Longer Questions
1) Think of at least nine examples, three of each, that display a positive, negative, or no correlation between two
economic variables. In each of the positive and negative examples, indicate whether or not you expect the
correlation to be strong or weak.
Answer: Answers will vary by student. Students frequently bring up the following correlations. Positive
correlations: earnings and education (hopefully strong), consumption and personal disposable income
(strong), per capita income and investment-output ratio or saving rate (strong); negative correlation:
Okun’s Law (strong), income velocity and interest rates (strong), the Phillips curve (strong); no
correlation: productivity growth and initial level of per capita income for all countries of the world
(beta-convergence regressions), consumption and the (real) interest rate, employment and real wages.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 43
2) Adult males are taller, on average, than adult females. Visiting two recent American Youth Soccer Organization
(AYSO) under 12 year old (U12) soccer matches on a Saturday, you do not observe an obvious difference in the
height of boys and girls of that age. You suggest to your little sister that she collect data on height and gender
of children in 4th to 6th grade as part of her science project. The accompanying table shows her findings.
Height of Young Boys and Girls, Grades 4-6, in inches
YBoys
Boys
SBoys
nBoys
57.8
3.9
55
YGirls
Girls
SGirls
nGirls
58.4
4.2
57
(a) Let your null hypothesis be that there is no difference in the height of females and males at this age level.
Specify the alternative hypothesis.
(b) Find the difference in height and the standard error of the difference.
(c) Generate a 95% confidence interval for the difference in height.
(d) Calculate the t-statistic for comparing the two means. Is the difference statistically significant at the 1%
level? Which critical value did you use? Why would this number be smaller if you had assumed a one -sided
alternative hypothesis? What is the intuition behind this?
Answer: (a) H0 : Boys -
Girls = 0 vs. H1 : Boys -
Girls
(b) YBoys - YGirls = -0.6, SE(YBoys - YGirls) =
0
3.92 4.22
+
= 0.77.
55
57
(c) -0.6 ± 1.96 × 0.77 = (-2.11, 0.91).
(d) t = -0.78, so t < 2.58, which is the critical value at the 1% level. Hence you cannot reject the null
hypothesis. The critical value for the one-sided hypothesis would have been 2.33. Assuming a
one-sided hypothesis implies that you have some information about the problem at hand, and, as a
result, can be more easily convinced than if you had no prior expectation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 44
3) Math SAT scores (Y) are normally distributed with a mean of 500 and a standard deviation of 100. An evening
school advertises that it can improve students’ scores by roughly a third of a standard deviation, or 30 points, if
they attend a course which runs over several weeks. (A similar claim is made for attending a verbal SAT
course.) The statistician for a consumer protection agency suspects that the courses are not effective. She views
the situation as follows: H0 : Y = 500 vs. H1 : Y = 530.
(a) Sketch the two distributions under the null hypothesis and the alternative hypothesis.
(b) The consumer protection agency wants to evaluate this claim by sending 50 students to attend classes. One
of the students becomes sick during the course and drops out. What is the distribution of the average score of
the remaining 49 students under the null, and under the alternative hypothesis?
(c) Assume that after graduating from the course, the 49 participants take the SAT test and score an average of
520. Is this convincing evidence that the school has fallen short of its claim? What is the p-value for such a score
under the null hypothesis?
(d) What would be the critical value under the null hypothesis if the size of your test were 5%?
(e) Given this critical value, what is the power of the test? What options does the statistician have for increasing
the power in this situation?
Answer: (a)
(b) Y of the 49 participants is normally distributed, with a mean of 500 and a standard deviation of
14.286 under the null hypothesis. Under the alternative hypothesis, it is normally distributed with a
mean of 530 and a standard deviation of 14.286.
(c) It is possible that the consumer protection agency had chosen a group of 49 students whose average
score would have been 490 without attending the course. The crucial question is how likely it is that 49
students, chosen randomly from a population with a mean of 500 and a standard deviation of 100, will
score an average of 520. The p-value for this score is 0.081, meaning that if the agency rejected the null
hypothesis based on this evidence, it would make a mistake, on average, roughly 1 out of 12 times.
Hence the average score of 520 would allow rejection of the null hypothesis that the school has had no
effect on the SAT score of students at the 10% level.
(d) The critical value would be 523.
(e) Pr(Y < 523 H1 is true) = 0.312. Hence the power of the test is 0.688. She could increase the power by
decreasing the size of the test. Alternatively, she could try to convince the agency to hire more test
subjects, i.e., she could increase the sample size.
4) Your packaging company fills various types of flour into bags. Recently there have been complaints from one
chain of stores: a customer returned one opened 5 pound bag which weighed significantly less than the label
indicated. You view the weight of the bag as a random variable which is normally distributed with a mean of 5
pounds, and, after studying the machine specifications, a standard deviation of 0.05 pounds.
(a) You take a sample of 20 bags and weigh them. Sketch below what the average pattern of individual weights
might look like. Let the horizontal axis indicate the sampled bag number (1, 2, …, 20). On the vertical axis,
mark the expected value of the weight under the null hypothesis, and two ( 1.96) standard deviations above
and below the expected value. Draw a line through the graph for E(Y) + 2 Y, E(Y), and E(Y) – 2 Y. How many
of the bags in a sample of 20 will you expect to weigh either less than 4.9 pounds or more than 5.1 pounds?
(b) You sample 25 bags of flour and calculate the average weight. What is the distribution of the average
weight of these 25 bags? Repeating the same exercise 20 times, sketch what the distribution of the average
weights would look like in a graph similar to the one you drew in (b), where you have adjusted the standard
Stock/Watson 2e -- CVC2 8/23/06 -- Page 45
error of Y accordingly.
(c) For each of the twenty observations in (c) a 95% confidence interval is constructed. Draw these confidence
intervals, using the same graph as in (c). How many of these 20 confidence intervals would you expect to
weigh 5 pounds under the null hypothesis?
Answer: (a) On average, there should be one bag in every sample of 20 which weighs less than 4.9 pounds or
more than 5.1 pounds.
(b) The average weight of 25 bags will be normally distributed, with a mean of 5 pounds and a standard
deviation of 0.01 pounds. (Same graph as in (a), but with the following lower and upper bounds.)
(c) You would expect 19 of the 20 confidence intervals to contain 5 pounds.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 46
Stock/Watson 2e -- CVC2 8/23/06 -- Page 47
5) Assume that two presidential candidates, call them Bush and Gore, receive 50% of the votes in the population.
You can model this situation as a Bernoulli trial, where Y is a random variable with success probability Pr(Y =
^
1) = p, and where Y = 1 if a person votes for Bush and Y = 0 otherwise. Furthermore, let p be the fraction of
p(1-p)
) in reasonably large samples, say for n 40.
successes (1s) in a sample, which is distributed N(p,
n
(a) Given your knowledge about the population, find the probability that in a random sample of 40, Bush
would receive a share of 40% or less.
(b) How would this situation change with a random sample of 100?
(c) Given your answers in (a) and (b), would you be comfortable to predict what the voting intentions for the
^
entire population are if you did not know p but had polled 10,000 individuals at random and calculated p ?
Explain.
(d) This result seems to hold whether you poll 10,000 people at random in the Netherlands or the United States,
where the former has a population of less than 20 million people, while the United States is 15 times as
populous. Why does the population size not come into play?
^
Answer: (a) Pr(p < 0.40) = Pr(Z <
0.40 - 0.50
) = Pr(Z < -1.26)
0.25
40
0.104. In roughly every 10 th sample of this size,
Bush would receive a vote of less than 40%, although in truth, his share is 50%.
^
0.40 - 0.50
(b) Pr(p < 0.40) = Pr(Z <
) = Pr(Z < -2.00) 0.023. With this sample size, you would expect
0.25
100
this to happen only every 50 th sample.
(c) The answers in (a) and (b) suggest that for even moderate increases in the sample size, the estimator
does not vary too much from the population mean. Polling 10,000 individuals, the probability of finding
^
a p of 0.48, for example, would be 0.00003. Unless the election was extremely close, which the 2000
election was, polls are quite accurate even for sample sizes of 2,500.
(d) The distribution of sample means shrinks very quickly depending on the sample size, not the
population size. Although at first this does not seem intuitive, the standard error of an estimator is a
value which indicates by how much the estimator varies around the population value. For large sample
sizes, the sample mean typically is very close to the population mean.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 48
6) You have collected weekly earnings and age data from a sub-sample of 1,744 individuals using the Current
Population Survey in a given year.
(a) Given the overall mean of $434.49 and a standard deviation of $294.67, construct a 99% confidence interval
for average earnings in the entire population. State the meaning of this interval in words, rather than just in
numbers. If you constructed a 90% confidence interval instead, would it be smaller or larger? What is the
intuition?
(b) When dividing your sample into people 45 years and older, and younger than 45, the information shown in
the table is found.
Age Category
Average Earnings
Age 45
Age < 45
Y
$488.87
$412.20
Standard Deviation
SY
N
$328.64
$276.63
507
1237
Test whether or not the difference in average earnings is statistically significant. Given your knowledge of
age-earning profiles, does this result make sense?
Answer: (a) The confidence interval for mean weekly earnings is 434.49 ± 2.58 ×
294.67
= 434.49 ± 18.20 = (416.29,
1744
452.69). Based on the sample at hand, the best guess for the population mean is $434.49. However,
because of random sampling error, this guess is likely to be wrong. Instead, the interval estimate for the
average earnings lies between $416.29 and $452.69. Committing to such an interval repeatedly implies
that the resulting statement is incorrect 1 out of 100 times. For a 90% confidence interval, the only change
in the calculation of the confidence interval is to replace 2.58 by 1.64. Hence the confidence interval is
smaller. A smaller interval implies, given the same average earnings and the standard deviation, that the
statement will be false more often. The larger the confidence interval, the more likely it is to contain the
population value.
(488.87 - 412.20)
(b) Assuming unequal population variances, t =
= 4.62, which is statistically
328.642 276.632
+
12.7
507
significant at conventional levels whether you use a two-sided or one-sided alternative. Hence the null
hypothesis of equal average earnings in the two groups is rejected. Age-earning profiles typically take
on an inverted U-shape. Maximum earnings occur in the 40s, depending on some other factors such as
years of education, which are not considered here. Hence it is not clear if the alternative hypothesis
should be one-sided or two-sided. In such a situation, it is best to assume a two-sided alternative
hypothesis.
7) A manufacturer claims that a certain brand of VCR player has an average life expectancy of 5 years and 6
months with a standard deviation of 1 year and 6 months. Assume that the life expectancy is normally
distributed.
(a) Selecting one VCR player from this brand at random, calculate the probability of its life expectancy
exceeding 7 years.
(b) The Critical Consumer magazine decides to test fifty VCRs of this brand. The average life in this sample is 6
years and the sample standard deviation is 2 years. Calculate a 99% confidence interval for the average life.
(c) How many more VCRs would the magazine have to test in order to halve the width of the confidence
interval?
Answer: (a) Pr (Y > 7) = Pr(Z > 1) = 0.1587.
2
(b) 6 ± 2.58 ×
= 6 ± 0.73 = (5.27, 6.73).
50
(c)
1
× (2.58 ×
2
2
1
) = 2.58 × ×
2
50
2
= 2.58 ×
50
2
, or n = 200.
4 × 50
Stock/Watson 2e -- CVC2 8/23/06 -- Page 49
8) U.S. News and World Report ranks colleges and universities annually. You randomly sample 100 of the national
universities and liberal arts colleges from the year 2000 issue. The average cost, which includes tuition, fees,
and room and board, is $23,571.49 with a standard deviation of $7,015.52.
(a) Based on this sample, construct a 95% confidence interval of the average cost of attending a
university/college in the United States.
(b) Cost varies by quite a bit. One of the reasons may be that some universities/colleges have a better reputation
than others. U.S. News and World Reports tries to measure this factor by asking university presidents and chief
academic officers about the reputation of institutions. The ranking is from 1 ( marginal ) to 5 ( distinguished ).
You decide to split the sample according to whether the academic institution has a reputation of greater than
3.5 or not. For comparison, in 2000, Caltech had a reputation ranking of 4.7, Smith College had 4.5, and Auburn
University had 3.1. This gives you the statistics shown in the accompanying table.
Reputation
Category
Average Cost
N
Y
Standard deviation
of Cost (SY)
Ranking > 3.5
Ranking 3.5
$29,311.31
$21,227.06
$5,649.21
$6,133.38
29
71
Test the hypothesis that the average cost for all universities/colleges is the same independent of the reputation.
What alternative hypothesis did you use?
(c) What other factors should you consider before making a decision based on the data in (b)?
Answer: (a) 23,571.49 ± 1.96 ×
7,015.52
= 23,571.49 ± 701.55 = (22,869.94, 24,273.04).
100
(b) Assuming unequal population variances, t =
(29311.31 - 21,227.06)
= 6.33, which is statistically
5,649.21 2 6,133.38 2
+
29
71
significant whether or not you use a one-sided or two-sided hypothesis test. Your prior expectation is
that academic institutions with a higher reputation will charge more for attending, and hence a
one-sided alternative would have been appropriate here.
(c) There may be other variables which potentially have an effect on the cost of attending the academic
institution. Some of these factors might be whether or not the college/university is private or public, its
size, whether or not it has a religious affiliation, etc. It is only after controlling for these factors that the
“pure” relationship between reputation and cost can be identified.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 50
9) The development office and the registrar have provided you with anonymous matches of starting salaries and
GPAs for 108 graduating economics majors. Your sample contains a variety of jobs, from church pastor to
stockbroker.
(a) The average starting salary for the 108 students was $38,644.86 with a standard deviation of $7,541.40.
Construct a 95% confidence interval for the starting salary of all economics majors at your university/college.
(b) A similar sample for psychology majors indicates a significantly lower starting salary. Given that these
students had the same number of years of education, does this indicate discrimination in the job market against
psychology majors?
(c) You wonder if it pays (no pun intended) to get good grades by calculating the average salary for economics
majors who graduated with a cumulative GPA of B+ or better, and those who had a B or worse. The data is as
shown in the accompanying table.
Cumulative GPA
B+ or better
B or worse
Average Earnings
n
Y
Standard deviation
SY
$39,915.25
$37,083.33
$8,330.21
$6,174.86
59
49
Conduct a t-test for the hypothesis that the two starting salaries are the same in the population. Given that this
data was collected in 1999, do you think that your results will hold for other years, such as 2002?
Answer: (a) 38,644.86 ± 1.96 ×
7,541.40
= 38,644.86 ± 1,422.32 = (37,222.54, 40,067.18).
108
(b) It suggests that the market values certain qualifications more highly than others. Comparing means
and identifying that one is significantly lower than others does not indicate discrimination.
(39,915.25 - 37,083.33)
(c) Assuming unequal population variances, t =
= 2.03. The critical value for a
8,33.212 6,174.86 2
+
59
49
one-sided test is 1.64, for a two-sided test 1.96, both at the 5% level. Hence you can reject the null
hypothesis that the two starting salaries are equal. Presumably you would have chosen as an alternative
that better students receive better starting salaries, so that this becomes your new working hypothesis.
1999 was a boom year. If better students receive better starting offers during a boom year, when the
labor market for graduates is tight, then it is very likely that they receive a better offer during a recession
year, assuming that they receive an offer at all.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 51
10) During the last few days before a presidential election, there is a frenzy of voting intention surveys. On a given
day, quite often there are conflicting results from three major polls.
(a) Think of each of these polls as reporting the fraction of successes (1s) of a Bernoulli random variable Y,
^
where the probability of success is Pr(Y = 1) = p. Let p be the fraction of successes in the sample and assume that
p(1-p)
this estimator is normally distributed with a mean of p and a variance of
. Why are the results for all
n
polls different, even though they are taken on the same day?
^
^
^ p (1-p )
(b) Given the estimator of the variance of p ,
n
^
, construct a 95% confidence interval for p . For which value
^
^
of p is the standard deviation the largest? What value does it take in the case of a maximum p ?
(c) When the results from the polls are reported, you are told, typically in the small print, that the “margin of
error” is plus or minus two percentage points. Using the approximation of 1.96 2, and assuming,
“conservatively,” the maximum standard deviation derived in (b), what sample size is required to add and
subtract (“margin of error”) two percentage points from the point estimate?
(d) What sample size would you need to halve the margin of error?
^
Answer: (a) Since all polls are only samples, there is random sampling error. As a result, p will differ from sample
to sample, and most likely also from p.
^
(b) p ± 1.96 ×
^
^
^
p (1-p )
. A bit of thought or calculus will show that the standard deviation will be largest
n
for p = 0.5, in which case it becomes
0.5
.
n
(c) n = 2,500.
(d) n = 10,000.
11) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website go to Student Resources
and select the option “Datasets for Replicating Empirical Results.” Then select the “CPS Data Used in Chapter 8
” (ch8_cps.xls) and open it in Excel. This is a rather large data set to work with, so just copy the first 500
observations into a new Worksheet (these are rows 1 to 501).
In the newly created Worksheet, mark A1 to A501, then select the Data tab and click on “sort.” A dialog box
will open. First select “Add level” from one of the options on the left. Then select “sort by” and choose
“Northeast” and “Largest to Smallest.” Repeat the same for the “South” as a second option. Finally press “ok.”
This should give you 209 observations for average hourly earnings for the Northeast region, followed by 205
observations for the South.
a.
For each of the 209 average hourly earnings observations for the Northeast region and separately for
the South region, calculate the mean and sample standard deviation.
b
Use the appropriate test to determine whether or not average hourly earnings in the Northeast region
the same as in the South region.
c
Find the 1%, 5%, and 10% confidence interval for the differences between the two populatioon means.
Is your conclusion consistent with the test in part (b)?
d
In all three cases of using the confidence interval in (c), the power of the test is quite low (5%). What
can you do to increase the power of the test without reducing the size of the test?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 52
Answer: a. YNortheast=$21.12; YSouth=$18.18; s Northeast=$11.86; sSouth=$11.18
21.12 - 18.80
= 2.05 You cannot reject the null hypothesis of equal average earnings in the two
11.86 2 11.18 2
+
205
209
b. t =
regions at the1% level, but you are able to reject it at the 10% and 5% significance level.
c.
For the 10% significance level, the confidence interval is ($0.46,$4.18). For the 5% significance
level, the interval becomes larger and is ($0.10,$4.54). In either one of the cases you can reject
the null hypothesis, since $0 is not contained in the confidence interval. It is only for the 1%
significance level that the null hypothesis cannot be rejected. In that case, the confidence
interval is ($-0.60, $5.24).
d. You would have to increase the sample size, since that would shrink the standard error (assuming
that the sample mean and variance will not change).
3.3 Mathematical and Graphical Problems
1) Your textbook defined the covariance between X and Y as follows:
n
1
(Xi – X)(Yi – Y)
n–1
i=1
Prove that this is identical to the following alternative specification:
n
n
1
XiYi XY
n-1
n-1
i=1
Answer:
1
n-1
=
=
n
i=1
1
(
n-1
1
n-1
1
(Xi - X)(Yi - Y) =
n-1
n
i=1
n
i=1
XiYi - X
n
Yi - Y
i=1
n
i=1
n
i=1
(XiYi - XYi - YXi + YX)
1
Xi + nYX) =
(
n-1
n
XiYi - nXY - nYX + nYX)
i=1
n
XY.
XiYi n-1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 53
2) For each of the accompanying scatterplots for several pairs of variables, indicate whether you expect a positive
or negative correlation coefficient between the two variables, and the likely magnitude of it (you can use a
small range).
(a)
(b)
(c)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 54
(d)
Answer: (a)
(b)
(c)
(d)
Positive correlation. The actual correlation coefficient is 0.46.
No relationship. The actual correlation coefficient is 0.00007.
Negative relationship. The actual correlation coefficient is –0.70.
Nonlinear (inverted U) relationship. The actual correlation coefficient is 0.23.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 55
3) Your textbook defines the correlation coefficient as follows:
n
1
n-1
r=
i=1
(Yi – Y)2 (Xi – X)2
n
1
n-1
1
n-1
(Yi – Y ) 2
i=1
n
i=1
(Xi - X)2
Another textbook gives an alternative formula:
n
n
n
2
Yi -(
n
n
i=1
i=1
n
Yi)(
i=1
i=1
r=
n
YiXi - (
i=1
n
Yi)2
Xi)
n
2
Xi -(
n
i=1
i=1
Xi ) 2
Prove that the two are the same.
1
n-1
Answer: r =
1
n-1
n
i=1
n
i=1
(Yi - Y ) 2
n
1
n-1
(Yi - Y)2 (Xi - X)2
1
n-1
=
n
i=1
n
YiXi - nYX
n
i=1
=
2
Y - nY2
i
i=1
n
n
n
YiXi - (
i=1
n
n
i=1
n
i=1
2
( Y i - 2YYi + Y2 )
i=1
n
( X 2 - 2XXi + X2 )
i
i=1
YiXi - nYnX
i=1
2
Y i - nY2
n
i=1
X
2 - X2
i
n
i=1
2 -(
Yi)2
i
i=1
n
Yi) (
n
Y
n
2
X - nX2
i
i=1
n
(YiXi - YXi - XYi + YX)
i=1
=
n
=
1
n-1
(Xi - X)2
n
Xi)
i=1
n
n
X
i=1
.
n
2 -(
Xi)2
i
i=1
4) IQs of individuals are normally distributed with a mean of 100 and a standard deviation of 16. If you sampled
students at your college and assumed, as the null hypothesis, that they had the same IQ as the population, then
in a random sample of size
(a) n = 25, find Pr(Y < 105).
(b) n = 100, find Pr(Y > 97).
(c) n = 144, find Pr(101 < Y < 103).
Answer: (a) 0.94
(b) 0.97
(c) 0.21
Stock/Watson 2e -- CVC2 8/23/06 -- Page 56
5) Consider the following alternative estimator for the population mean:
~ 1 1
7
1
7
1
7
Y= ( Y1 + Y2 + Y3 + Y4 + ... + Yn–1 + Yn)
4
4
4
4
4
n 4
~
Prove that Y is unbiased and consistent, but not efficient when compared to Y.
~
Answer: E(Y)=
=
1 1
7
1
7
1
7
( E(Y1 ) + E(Y2 ) + E(Y3 ) + E(Y4 )+ ... + E(Yn-1 ) + E(Yn))
n 4
4
4
4
4
4
~
1 7
n
1
(2 + 2 + ... + + ) =
= Y. Hence Y is unbiased.
4 4
n Y
n Y
~
~
1 1
7
1
7
1
7
var(Y) = E(Y) - Y ) 2 = E[ ( Y1 + Y2 + Y3 + Y4 + ... + Yn-1 + Yn) - Y]2
n 4
4
4
4
4
4
=
=
=
1
n2
7
1
7
2
E[ 1 (Y1 Y)+ 4 (Y2 - Y) + ... + 4 (Yn-1 - Y) + 4 (Yn - Y)]
4
1
1
2 49
2
2 49
2
[ 1 E(Y1 Y) + 16 E(Y2 - Y) + ... + 16 E(Yn-1 - Y) + 16 E(Yn - Y) ]
n2 16
1
1
[
n2 16
2 49
Y + 16
~
Since var(Y)
efficient.
2
1
Y + ... + 16
0 as n
~
2 49
Y + 16
2
Y] =
2
Y
n2
[ n ( 1 + 49 )] = 1.5625
6
2 16
2
Y
n
.
~
, Y is consistent. Y has a larger variance than Y and is therefore not as
6) Imagine that you had sampled 1,000,000 females and 1,000,000 males to test whether or not females have a
higher IQ than males. IQs are normally distributed with a mean of 100 and a standard deviation of 16. You are
excited to find that females have an average IQ of 101 in your sample, while males have an IQ of 99. Does this
difference seem important? Do you really need to carry out a t-test for differences in means to determine
whether or not this difference is statistically significant? What does this result tell you about testing hypotheses
when sample sizes are very large?
Answer: The difference seems very small, both in terms of absolute values and, more importantly, in terms of
standard deviations. With a sample size as large as n=1,000,000, the standard error becomes extremely
small. This implies that the distribution of means, or differences in means, has almost turned into a
spike. In essence, you are (very close to) observing the population. It is therefore unnecessary to test
whether or not the difference is statistically significant. After all, if in the population, the male IQ were
99.99 and the female IQ were 100.01, they would be different. In general, when sample sizes become very
large, it is very easy to reject null hypotheses about population means, which involve sample means as
an estimator, even if hypothesized differences are very small. This is the result of the distribution of
sample means collapsing fairly rapidly as sample sizes increase.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 57
7) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p, and let Y1 ,..., Yn be i.i.d. draws
^
from this distribution. Let p be the fraction of successes (1s) in this sample. In large samples, the distribution of
^
^
p(1- p)
p will be approximately normal, i.e., p is approximately distributed N(p,
). Now let X be the number of
n
successes and n the sample size. In a sample of 10 voters (n=10), if there are six who vote for candidate A, then X
^
= 6. Relate X, the number of success, to p , the success proportion, or fraction of successes. Next, using your
knowledge of linear transformations, derive the distribution of X.
^
^
Answer: X = n × p . Hence if p is distributed N(p,
^
p(1- p)
), then, given that X is a linear transformation of p , X is
n
distributed N(np, np(1- p)).
8) When you perform hypothesis tests, you are faced with four possible outcomes described in the accompanying
table.
Decision based on
sample
Reject H0
Don not reject H0
H0 is true
I
Truth (Population)
H1 is true
II
“ ” indicates a correct decision, and I and II indicate that an error has been made. In probability terms, state
the mistakes that have been made in situation I and II, and relate these to the Size of the test and the Power of
the test (or transformations of these).
Answer: I: Pr(reject H0 H0 is correct) = Size of the test.
II: Pr(reject H1 H1 is correct) = (1-Power of the test).
9) Assume that under the null hypothesis, Y has an expected value of 500 and a standard deviation of 20. Under
the alternative hypothesis, the expected value is 550. Sketch the probability density function for the null and the
alternative hypothesis in the same figure. Pick a critical value such that the p-value is approximately 5%. Mark
the areas, which show the size and the power of the test. What happens to the power of the test if the
alternative hypothesis moves closer to the null hypothesis, i.e.,, Y = 540, 530, 520, etc.?
Answer: For a given size of the test, the power of the test is lower.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 58
10) The net weight of a bag of flour is guaranteed to be 5 pounds with a standard deviation of 0.05 pounds. You are
concerned that the actual weight is less. To test for this, you sample 25 bags. Carefully state the null and
alternative hypothesis in this situation. Determine a critical value such that the size of the test does not exceed
5%. Finding the average weight of the 25 bags to be 4.7 pounds, can you reject the null hypothesis? What is the
power of the test here? Why is it so low?
Answer: Let Y be the net weight of the bag of flour. Then H0 : E(Y) = 5 and H1 : E(Y) < 5. Under the null
hypothesis, Y is distributed normally, with a mean of 5 pounds and a standard deviation of 0.01 pounds.
The critical value is approximately 4.98 pounds. Since 4.7 pounds falls in the rejection region, the null
hypothesis is rejected. The power of the test is low here, since there is no simple alternative. In the
extreme case, where the alternative hypothesis would place the net weight marginally below five
pounds, the power of the test would approximately equal its size, or 5% in this case.
11) Some policy advisors have argued that education should be subsidized in developing countries to reduce
fertility rates. To investigate whether or not education and fertility are correlated, you collect data on
population growth rates (Y) and education (X) for 86 countries. Given the sums below, compute the sample
correlation:
n
Yi = 1.594;
n
Xi = 449.6;
i=1
i=1
n
n
YiXi = 6.4697;
i=1
Y
i=1
2
= 0.03982;
i
n
X
i=1
2
= 3,022.76
i
Answer: r = –0.716.
12) (Advanced) Unbiasedness and small variance are desirable properties of estimators. However, you can imagine
situations where a trade-off exists between the two: one estimator may be have a small bias but a much smaller
variance than another, unbiased estimator. The concept of “mean square error” estimator combines the two
^
^
^
concepts. Let be an estimator of . Then the mean square error (MSE) is defined as follows: MSE( ) = E( –
^
^
^
^
)2 . Prove that MSE( ) = bias2 + var( ). (Hint: subtract and add in E( ) in E( – )2 .)
^
^
^
^
^
^
^
Answer: MSE ( ) = E( - E( ) + E( ) - )2 = E[( - E( )) + (E( ) - )]2
^
^
^
^
^
^
= E[( - E( ))2 + (E( ) - )2 + 2( - E( ))(E( ) - )]
Next, moving through the expectation operator results in
^
^
^
^
^
^
E[ - E( )]2 + E[E( ) - )]2 + 2E[( ) - E( ))( E( ) - )].
The first term is the variance, and the second term is the squared bias, since
^
^
^
^
E[E( ) - )]2 = [E( ) - )]2 . This proves MSE ( ) = bias2 + var( ) if the last term equals zero. But
^
^
^
^ ^
^
^
^
E[( - E( ))(E( ) - )] = E[E( ) - (E( ))2 + E( )]
^
^
^
^ 2
^
= E( ) E( ) - E( ) - (E( )) + E( ) = 0.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 59
13) Your textbook states that when you test for differences in means and you assume that the two population
variances are equal, then an estimator of the population variance is the following “pooled” estimator:
2
S pooled =
1
nm+ nw - 2
nm
(Yi - Ym)2 +
nw
(Yi - Yw)2
i=1
i=1
Explain why this pooled estimator can be looked at as the weighted average of the two variances.
2
1
Answer: S pooled =
nm+ nw - 2
nm
(Yi - Ym)2 +
nw
(Yi - Yw)2
i=1
i=1
=
1
2
2
(n - 1) s m + (nw - 1) s w
nm+ nw - 2 m
=
(nw - 1)
(nm - 1)
2
2
S m+
S .
nm + nw - 2 w
nm+ nw - 2
14) Your textbook suggests using the first observation from a sample of n as an estimator of the population mean.
It is shown that this estimator is unbiased but has a variance of
2
Y , which makes it less efficient than the
sample mean. Explain why this estimator is not consistent. You develop another estimator, which is the simple
average of the first and last observation in your sample. Show that this estimator is also unbiased and show
that it is more efficient than the estimator which only uses the first observation. Is this estimator consistent?
Answer: The estimator is not consistent because its variance does not vanish as n goes to infinity, i.e., var(Y1 )
as n
0
does not hold.
~ 1
~ 1
~
~
~
1
Y= (Y1 + Yn). E(Y) = (E(Y1 ) + E(Yn)) = ( Y + Y) = Y. Hence Y is unbiased. var(Y ) = E(Y - Y)2 =
2
2
2
1
1
E[( Y1 + Yn) 2
2
1
= E[( (Y1 2
=
2
Y]
1
Y) + 2 (Yn -
1
2
2
Y)] = 4 [E(Y1 + Y] + E(Yn -
1
2
2
Y) ] = 4 [ Y +
2
Y]
2
Y
2
.
~
~
Since var(Y)
0 as n
, does not hold, Y is not consistent.
~
var(Y) < var(Y1 ), and is therefore more efficient than the estimator, which only uses the first observation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 60
^
15) Let p be the success probability of a Bernoulli random variable Y, i.e., p = Pr(Y = 1). It can be shown that p , the
p(1 – p)
fraction of successes in a sample, is asymptotically distributed N(p,
. Using the estimator of the variance
n
^
^
^ p (1 - p )
of p ,
n
, construct a 95% confidence interval for p. Show that the margin for sampling error simplifies to
1/ n if you used 2 instead of 1.96 assuming, conservatively, that the standard error is at its maximum.
Construct a table indicating the sample size needed to generate a margin of sampling error of 1%, 2%, 5% and
10%. What do you notice about the increase in sample size needed to halve the margin of error? (The margin of
^
sampling error is 1.96×SE(p ).)
^
^
p (1 - p )
.
n
^
Answer: The 95% confidence interval for p is p ± 1.96 ×
^
case the confidence interval reduces to p ± 1.96 ×
0.25
n
^
^
^
p (1 - p )
is at a maximum for p = 0.5, in which
n
^
p±
1
, and the margin of sampling error is
n
1
.
n
1
n
n
0.01
0.02
0.05
0.10
10,000
2,500
400
100
To halve the margin of error, the sample size has to increase fourfold.
16) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p, and let Y1 ,..., Yn be i.i.d. draws
^
from this distribution. Let p be the fraction of successes (1s) in this sample. Given the following statement
Pr(-1.96 < z < 1.96) = 0.95
^
and assuming that p being approximately distributed N(p,
p(1 - p)
, derive the 95% confidence interval for p by
n
solving the above inequalities.
^
Answer: Pr(-1.96 <
p-p
< 1.96) = 0.95. Multiplying through by the standard deviation results in Pr( -1.96 ×
p(1 - p)
n
p(1 - p) ^
< p - p < 1.96 ×
n
^
(-1), Pr(p - 1.96 ×
± 1.96 ×
^
p(1 - p)
)= 0.95. Subtraction of p then yields, after multiplying both sides by
n
^
p(1 - p)
< p < p + 1.96 ×
n
^
p(1 - p)
) = 0.95. The 95% confidence interval for p then is p
n
p(1 - p)
.
n
Stock/Watson 2e -- CVC2 8/23/06 -- Page 61
17) Your textbook mentions that dividing the sample variance by n –1 instead of n is called a degrees of freedom
correction. The meaning of the term stems from the fact that one degree of freedom is used up when the mean
is estimated. Hence degrees of freedom can be viewed as the number of independent observations remaining
after estimating the sample mean.
Consider an example where initially you have 20 independent observations on the height of students. After
calculating the average height, your instructor claims that you can figure out the height of the 20 th student if
she provides you with the height of the other 19 students and the sample mean. Hence you have lost one
degree of freedom, or there are only 19 independent bits of information. Explain how you can find the height of
the 20th student.
Answer: Since Y =
1
20
20
Yi, 20 × Y =
i=1
20
i=1
Yi = Y20 +
19
Yi . Hence knowledge of the sample mean and the
i=1
height of the other 19 students is sufficient for finding the height of the 20 th student.
18) The accompanying table lists the height (STUDHGHT) in inches and weight (WEIGHT) in pounds of five
college students. Calculate the correlation coefficient.
STUDHGHT
WEIGHT
165
165
145
155
140
74
73
72
68
66
Answer: r = 0.72.
19) (Requires calculus.) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p. It can be
p(1 – p)
shown that the variance of the success probability p is
. Use calculus to show that this variance is
n
maximized for p = 0.5.
Answer:
p(1 - p)
n
p
=
1- p p
1
- = 0. Hence 1 - 2p = 0 or p = .
n
n
2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 62
20) Consider two estimators: one which is biased and has a smaller variance, the other which is unbiased and has a
larger variance. Sketch the sampling distributions and the location of the population parameter for this
situation. Discuss conditions under which you may prefer to use the first estimator over the second one.
Answer: The bias indicates “how far away,” on average, the estimator is from the population value. Although this
average is zero for an unbiased estimator, there may be quite some variation around the population
mean. In a single draw, there is therefore a high probability of being some distance away from the
population mean. On the other hand, if the variance is very small and the estimator is biased by a small
amount, then the probability of being closer to the population value may be higher. (The biased
estimator may have a smaller mean square error than the unbiased estimator.)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 63
21) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website go to Student Resources
and select the option “Datasets for Replicating Empirical Results.” Then select the chapter 8 CPS data set
(ch8_cps.xls) into a spreadsheet program such as Excel. For the exercise, use the first 500 observations only.
Using data for average hourly earnings only (ahe) and years of education ( yrseduc), produce a scatterplot with
earnings on the vertical axis and education level on the horizontal axis. What kind of relationship does the
scatterplot suggest? Confirm your impression by adding a linear trendline. Find the correlation coefficient
between the two and interpret it.
Answer:
Without the trendline added, there does not seem to be much of a linear relationship between average
hourly earnings and years of education. Perhaps a linear relationship is not plausible since it would
imply that the returns to education would become smaller as further years of education are added.
However, and regardless of the linearity issues, there is a positive relationship in the data between the
two variables, which becomes visible when the trend line is added. The correlation coefficient is positive
and has a value of 46.9%, which is reasonably high (the correlation between height and weight for
college students is approximately 50% by comparison).
22) IQ scores are normally distributed with an average of 100 and a standard deviation of 16. Some research
suggests that left-handed individuals have a higher IQ score than right-handed individuals. To test this
hypothesis, a researcher randomly selects 132 individuals and finds that their average IQ is 103.2 with a sample
standard deviation of 14.6. Using the results from the sample, can you reject the null hypothesis that
left-handed people have an IQ of 100 vs. the alternative that they have a higher IQ? What critical value should
you choose if the size of the test is 5%?
Answer: The hypothesis is H0 :
= 100 versus the alternative H1 :
> 100. The test statistic is t =
103.2-100
=2.52.
14.6
132
Since the critical value for the one-sided alternative is 1.645 at the 5% significance level, the researcher
should reject the null hypothesis that left-handed individuals have an IQ of 100.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 64
23) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website go to Student Resources
and select the option “Datasets for Replicating Empirical Results.” Then select the “Test Score data set used in
Chapters 4-9” (caschool.xls) and open the Excel data set. Next produce a scatterplot of the average reading
score (horizontal axis) and the average mathematics score (vertical axis). What does the scatterplot suggest?
Calculate the correlation coefficient between the two series and give an interpretation.
Answer:
The scatterplot suggests that, on average, schools which perform highly on the reading score will also
perform highly on the mathematics score. The sample correlation between the two series is 92.3%,
suggesting a high positive correlation between the two variables.
24) In 2007, a study of close to 250,000 18-19 year-old Norwegian males found that first-borns have an IQ that is
2.3 points higher than those who are second -born. To see if you can find a similar evidence at your university,
you collect data from 250 students, of which 140 are first-borns. After subjecting each of these individuals to an
IQ test, you find that the first-borns score 108.3 with a standard deviation of 13.2, while the second borns
achieve 107.1 with a standard deviation of 11.6. You hypothesize that first -borns and second-borns in a
university population have identical IQs against the one -sided alternative hypothesis that first borns have
higher IQs. Using a size of the test of 5%, what is your conclusion?
Answer: Given that your null hypothesis states H0 : first = second , your test statistic is t =
108.3 - 107.1
=
13.22 11.62
+
140
110
0.76. Since the critical value for the one-sided alternative test is 1.64, you cannot reject the null
hypothesis.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 65
Chapter 4 Linear Regression with One Regressor
4.1 Multiple Choice
^
1) When the estimated slope coefficient in the simple regression model, 1 , is zero, then
A) R2 = Y .
B) 0 < R2 < 1.
C) R2 = 0.
D) R2 > (SSR/TSS).
Answer: C
2) The regression R2 is defined as follows:
ESS
A)
TSS
B)
RSS
TSS
n
C)
n
i=1
D)
(Yi - Y)(Xi - X)
i=1
n
(Yi - Y)2
i=1
(Xi - X)2
SSR
n-2
Answer: A
3) The standard error of the regression (SER) is defined as follows
n ^
1
2
A)
ui
n-2
i=1
B) SSR
C) 1-R2
D)
1
n-1
n ^
2
ui
i=1
Answer: A
4) (Requires Appendix material) Which of the following statements is correct?
A) TSS = ESS + SSR
B) ESS = SSR + TSS
C) ESS > TSS
D) R2 = 1 - (ESS/TSS)
Answer: A
5) Binary variables
A) are generally used to control for outliers in your sample.
B) can take on more than two values.
C) exclude certain individuals from your sample.
D) can take on only two values.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 66
6) The following are all least squares assumptions with the exception of:
A) The conditional distribution of ui given Xi has a mean of zero.
B) The explanatory variable in regression model is normally distributed.
C) (Xi, Yi), i = 1,..., n are independently and identically distributed.
D) Large outliers are unlikely.
Answer: B
7) The reason why estimators have a sampling distribution is that
A) economics is not a precise science.
B) individuals respond differently to incentives.
C) in real life you typically get to sample many times.
D) the values of the explanatory variable and the error term differ across samples.
Answer: D
8) In the simple linear regression model, the regression slope
A) indicates by how many percent Y increases, given a one percent increase in X.
B) when multiplied with the explanatory variable will give you the predicted Y.
C) indicates by how many units Y increases, given a one unit increase in X.
D) represents the elasticity of Y on X.
Answer: C
9) The OLS estimator is derived by
A) connecting the Yi corresponding to the lowest Xi observation with the Yi corresponding to the highest Xi
observation.
B) making sure that the standard error of the regression equals the standard error of the slope estimator.
C) minimizing the sum of absolute residuals.
D) minimizing the sum of squared residuals.
Answer: D
10) Interpreting the intercept in a sample regression function is
A) not reasonable because you never observe values of the explanatory variables around the origin.
B) reasonable because under certain conditions the estimator is BLUE.
C) reasonable if your sample contains values of Xi around the origin.
D) not reasonable because economists are interested in the effect of a change in X on the change in Y.
Answer: C
11) The variance of Yi is given by
A)
2
0 +
2
1 var(Xi) + var(ui).
B) the variance of ui.
C)
2
1 var(Xi) + var(ui).
D) the variance of the residuals.
Answer: C
12) (Requires Appendix) The sample average of the OLS residuals is
A) some positive number since OLS uses squares.
B) zero.
C) unobservable since the population regression function is unknown.
D) dependent on whether the explanatory variable is mostly positive or negative.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 67
^
13) The OLS residuals, ui, are defined as follows:
^
^
^
A) Yi - 0 - 1 Xi
B) Yi - 0 - 1 Xi
^
C) Yi - Yi
D) (Yi - Y)2
Answer: C
14) The slope estimator, 1 , has a smaller standard error, other things equal, if
A) there is more variation in the explanatory variable, X.
B) there is a large variance of the error term, u.
C) the sample size is smaller.
D) the intercept, 0 , is small.
Answer: A
15) The regression R2 is a measure of
A) whether or not X causes Y.
B) the goodness of fit of your regression line.
C) whether or not ESS > TSS.
D) the square of the determinant of R.
Answer: B
16) (Requires Appendix) The sample regression line estimated by OLS
A) will always have a slope smaller than the intercept.
B) is exactly the same as the population regression line.
C) cannot have a slope of zero.
D) will always run through the point (X, Y).
Answer: D
17) The OLS residuals
A) can be calculated using the errors from the regression function.
B) can be calculated by subtracting the fitted values from the actual values.
C) are unknown since we do not know the population regression function.
D) should not be used in practice since they indicate that your regression does not run through all your
observations.
Answer: B
^
18) The normal approximation to the sampling distribution of 1 is powerful because
A) many explanatory variables in real life are normally distributed.
B) it allows econometricians to develop methods for statistical inference.
C) many other distributions are not symmetric.
D) is implies that OLS is the BLUE estimator for 1 .
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 68
^
19) If the three least squares assumptions hold, then the large sample normal distribution of 1 is
1 var[Xi - X)ui]
).
A) N(0,
n
[var(Xi)]2
1 var(ui)]2
).
B) N( 1 ,
n [var(Xi)]2
2
u
C) N( 1 ,
.
n
i=1
(Xi - X)2
1 var(ui)]
).
D) N( 1 ,
n [var(Xi)]2
Answer: B
20) In the simple linear regression model Yi = 0 + 1 Xi + ui,
A) the intercept is typically small and unimportant.
B) 0 + 1 Xi represents the population regression function.
C) the absolute value of the slope is typically between 0 and 1.
D) 0 + 1 Xi represents the sample regression function.
Answer: B
21) To obtain the slope estimator using the least squares principle, you divide the
A) sample variance of X by the sample variance of Y.
B) sample covariance of X and Y by the sample variance of Y.
C) sample covariance of X and Y by the sample variance of X.
D) sample variance of X by the sample covariance of X and Y.
Answer: C
22) To decide whether or not the slope coefficient is large or small,
A) you should analyze the economic importance of a given increase in X.
B) the slope coefficient must be larger than one.
C) the slope coefficient must be statistically significant.
D) you should change the scale of the X variable if the coefficient appears to be too small.
Answer: A
23) E(ui Xi) = 0 says that
A) dividing the error by the explanatory variable results in a zero (on average).
B) the sample regression function residuals are unrelated to the explanatory variable.
C) the sample mean of the Xs is much larger than the sample mean of the errors.
D) the conditional distribution of the error given the explanatory variable has a zero mean.
Answer: D
24) In the linear regression model, Yi = 0 + 1 Xi + ui, 0 + 1 Xi is referred to as
A) the population regression function.
B) the sample regression function.
C) exogenous variation.
D) the right-hand variable or regressor.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 69
25) Multiplying the dependent variable by 100 and the explanatory variable by 100,000 leaves the
A) OLS estimate of the slope the same.
B) OLS estimate of the intercept the same.
C) regression R2 the same.
D) variance of the OLS estimators the same.
Answer: C
26) Assume that you have collected a sample of observations from over 100 households and their consumption and
income patterns. Using these observations, you estimate the following regression Ci = 0 + 1 Yi+ ui where C is
consumption and Y is disposable income. The estimate of 1 will tell you
Income
A)
Consumption
B) The amount you need to consume to survive
Income
C)
Consumption
D)
Consumption
Income
Answer: D
27) In which of the following relationships does the intercept have a real-world interpretation?
A) the relationship between the change in the unemployment rate and the growth rate of real GDP
(“Okun’s Law”)
B) the demand for coffee and its price
C) test scores and class-size
D) weight and height of individuals
Answer: A
^
28) The OLS residuals, u i, are sample counterparts of the population
A) regression function slope
B) errors
C) regression function’s predicted vlaues
D) regression function intercept
Answer: B
29) Changing the units of measurement, e.g. measuring testscores in 100s, will do all of the following EXCEPT for
changing the
A) residuals
B) numerical value of the slope estimate
C) interpretation of the effect that a change in X has on the change in Y
D) numerical value of the intercept
Answer: C
30) To decide whether the slope coefficient indicates a “large” effect of X on Y, you look at the
A) size of the slope coefficient
B) regression
C) economic importance implied by the slope coefficient
D) value of the intercept
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 70
4.2 Essays and Longer Questions
1) Sir Francis Galton, a cousin of James Darwin, examined the relationship between the height of children and
their parents towards the end of the 19 th century. It is from this study that the name “regression” originated.
You decide to update his findings by collecting data from 110 college students, and estimate the following
relationship:
Studenth = 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0
where Studenth is the height of students in inches, and Midparh is the average of the parental heights.
(Following Galton’s methodology, both variables were adjusted so that the average female height was equal to
the average male height.)
(a) Interpret the estimated coefficients.
(b) What is the meaning of the regression R2 ?
(c) What is the prediction for the height of a child whose parents have an average height of 70.06 inches?
(d) What is the interpretation of the SER here?
(e) Given the positive intercept and the fact that the slope lies between zero and one, what can you say about
the height of students who have quite tall parents? Those who have quite short parents?
(f) Galton was concerned about the height of the English aristocracy and referred to the above result as
“regression towards mediocrity.” Can you figure out what his concern was? Why do you think that we refer to
this result today as “Galton’s Fallacy ?
Answer: (a) For every one inch increase in the average height of their parents, the student’s height increases by
0.73 of an inch. There is no reasonable interpretation for the intercept.
(b) The model explains 45 percent of the variation in the height of students.
(c) 19.6 + 0.73 × 70.06 = 70.74.
(d) The SER is a measure of the spread of the observations around the regression line. The magnitude of
the typical deviation from the regression line or the typical regression error here is two inches.
(e) Tall parents will have, on average, tall students, but they will not be as tall as their parents. Short
parents will have short students, although on average, they will be somewhat taller than their parents.
(f) This is an example of mean reversion. Since the aristocracy was, on average, taller, he was concerned
that their children would be shorter and resemble more the rest of the population. If this conclusion were
true, then eventually everyone would be of the same height. However, we have not observed a decrease
in the variance in height over time.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 71
2) (Requires Appendix material) At a recent county fair, you observed that at one stand people’s weight was
forecasted, and were surprised by the accuracy (within a range). Thinking about how the person could have
predicted your weight fairly accurately (despite the fact that she did not know about your “heavy bones”), you
think about how this could have been accomplished. You remember that medical charts for children contain
5%, 25%, 50%, 75% and 95% lines for a weight/height relationship and decide to conduct an experiment with
110 of your peers. You collect the data and calculate the following sums:
n
i=1
n
y
i=1
n
Yi = 17,375,
2
= 94,228.8,
i
Xi = 7,665.5,
i=1
n
2
x i = 1,248.9,
i=1
n
x iy i = 7,625.9
i=1
where the height is measured in inches and weight in pounds. (Small letters refer to deviations from means as
in zi = Zi – Z.)
(a) Calculate the slope and intercept of the regression and interpret these.
(b) Find the regression R2 and explain its meaning. What other factors can you think of that might have an
influence on the weight of an individual?
^
^
7625.9
Answer: (a) 1 =
= 6.11, 0 = 157.95 - 6.11 × 69.69 = -267.86. For every additional inch in height, students
1,248.9
weigh roughly 6 pounds more, on average.
n
^2
2
xi
1
i=1
46,624.1
ESS
(b) R2 =
=
=
= 0.495. Roughly half of the weight variation in the 110 students
n
94,228.8
TSS
2
yi
i=1
is explained by the single explanatory variable, height. Answers will vary by student for the other
factors, but calorie intake and amount of exercise typically appear as part of the list.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 72
3) You have obtained a sub-sample of 1744 individuals from the Current Population Survey (CPS) and are
interested in the relationship between weekly earnings and age. The regression, using
heteroskedasticity-robust standard errors, yielded the following result:
Earn = 239.16 + 5.20 × Age, R2 = 0.05, SER = 287.21.,
where Earn and Age are measured in dollars and years respectively.
(a) Interpret the results.
(b) Is the effect of age on earnings large?
(c) Why should age matter in the determination of earnings? Do the results suggest that there is a guarantee for
earnings to rise for everyone as they become older? Do you think that the relationship between age and
earnings is linear?
(d) The average age in this sample is 37.5 years. What is annual income in the sample?
(e) Interpret the measures of fit.
Answer: (a) A person who is one year older increases her weekly earnings by $5.20. There is no meaning attached
to the intercept. The regression explains 5 percent of the variation in earnings.
(b) Assuming that people worked 52 weeks a year, the effect of being one year older translates into an
additional $270.40 a year. This does not seem particularly large in 2002 dollars, but may have been
earlier.
(c) In general, age-earnings profiles take on an inverted U-shape. Hence it is not linear and the linear
approximation may not be good at all. Age may be a proxy for “experience,” which in itself can
approximate “on the job training.” Hence the positive effect between age and earnings. The results do
not suggest that there is a guarantee for earnings to rise for everyone as they become older since the
regression R2 does not equal 1. Instead the result holds “on average.”
Y = 0 + 1 X. Substituting the estimates for the slope and the intercept then
(d) Since 0 = Y - 1 X
results in average weekly earnings of $434.16 or annual average earnings of $22,576.32.
(e) The regression R2 indicates that five percent of the variation in earnings is explained by the model.
The typical error is $287.21.
4) The baseball team nearest to your home town is, once again, not doing well. Given that your knowledge of
what it takes to win in baseball is vastly superior to that of management, you want to find out what it takes to
win in Major League Baseball (MLB). You therefore collect the winning percentage of all 30 baseball teams in
MLB for 1999 and regress the winning percentage on what you consider the primary determinant for wins,
which is quality pitching (team earned run average). You find the following information on team performance:
Summary of the Distribution of Winning Percentage and
Team Earned Run Average for MLB in 1999
Average
Standard
Percentile
deviation
10% 25% 40% 50%
60% 75%
(median)
4.71
0.53
3.84 4.35 4.72 4.78
4.91 5.06
Team
ERA
Winning
0.50
Percentage
0.08
0.40
0.43
0.46
0.48
0.49
0.59
90%
5.25
0.60
(a) What is your expected sign for the regression slope? Will it make sense to interpret the intercept? If not,
should you omit it from your regression and force the regression line through the origin?
(b) OLS estimation of the relationship between the winning percentage and the team ERA yield the following:
Winpct = 0.9 – 0.10 × teamera , R2 =0.49, SER = 0.06,
where winpct is measured as wins divided by games played, so for example a team that won half of its games
Stock/Watson 2e -- CVC2 8/23/06 -- Page 73
would have Winpct = 0.50. Interpret your regression results.
(c) It is typically sufficient to win 90 games to be in the playoffs and/or to win a division. Winning over 100
games a season is exceptional: the Atlanta Braves had the most wins in 1999 with 103. Teams play a total of 162
games a year. Given this information, do you consider the slope coefficient to be large or small?
(d) What would be the effect on the slope, the intercept, and the regression R2 if you measured Winpct in
percentage points, i.e., as (Wins/Games) × 100?
(e) Are you impressed with the size of the regression R2 ? Given that there is 51% of unexplained variation in
the winning percentage, what might some of these factors be?
Answer: (a) You expect a negative relationship, since a higher team ERA implies a lower quality of the input. No
team comes close to a zero team ERA, and therefore it does not make sense to interpret the intercept.
Forcing the regression through the origin is a false implication from this insight. Instead the intercept
fixes the level of the regression.
(b) For every one point increase in Team ERA, the winning percentage decreases by 10 percentage
points, or 0.10. Roughly half of the variation in winning percentage is explained by the quality of team
pitching.
(c) The coefficient is large, since increasing the winning percentage by 0.10 is the equivalent of winning
16 more games per year. Since it is typically sufficient to win 56 percent of the games to qualify for the
playoffs, this difference of 0.10 in winning percentage turns can easily turn a loosing team into a winning
team.
(d) Clearly the regression R2 will not be affected by a change in scale, since a descriptive measure of the
quality of the regression would depend on whim otherwise. The slope of the regression will compensate
in such a way that the interpretation of the result is unaffected, i.e., it will become 10 in the above
example. The intercept will also change to reflect the fact that if X were 0, then the dependent variable
would now be measured in percentage, i.e., it will become 94.0 in the above example.
(e) It is impressive that a single variable can explain roughly half of the variation in winning percentage.
Answers to the second question will vary by student, but will typically include the quality of hitting,
fielding, and management. Salaries could be included, but should be reflected in the inputs.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 74
5) You have learned in one of your economics courses that one of the determinants of per capita income (the
“Wealth of Nations”) is the population growth rate. Furthermore you also found out that the Penn World
Tables contain income and population data for 104 countries of the world. To test this theory, you regress the
GDP per worker (relative to the United States) in 1990 ( RelPersInc) on the difference between the average
population growth rate of that country (n) to the U.S. average population growth rate (nus ) for the years 1980
to 1990. This results in the following regression output:
RelPersInc = 0.518 – 18.831 × 18.831 × (n – nus), R2 = 0.522, SER = 0.197
(a) Interpret the results carefully. Is this relationship economically important?
(b) What would happen to the slope, intercept, and regression R2 if you ran another regression where the
above explanatory variable was replaced by n only, i.e., the average population growth rate of the country?
(The population growth rate of the United States from 1980 to 1990 was 0.009.) Should this have any effect on
the t-statistic of the slope?
(c) 31 of the 104 countries have a dependent variable of less than 0.10. Does it therefore make sense to interpret
the intercept?
Answer: (a) A relative increase in the population rate of one percentage point, from 0.01 to 0.02, say, lowers
relative per-capita income by almost 20 percentage points (0.188). This is a quantitatively important and
large effect. Nations which have the same population growth rate as the United States have, on average,
roughly half as much per capita income.
(b) The interpretation of the partial derivative is unaffected, in that the slope still indicates the effect of a
one percentage point increase in the population growth rate. The regression R2 will remain the same
since only a constant was removed from the explanatory variable. The intercept will change as a result of
the change in X.
(c) To interpret the intercept, you must observe values of X close to zero, not Y.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 75
6) The neoclassical growth model predicts that for identical savings rates and population growth rates, countries
should converge to the per capita income level. This is referred to as the convergence hypothesis. One way to
test for the presence of convergence is to compare the growth rates over time to the initial starting level.
(a) If you regressed the average growth rate over a time period (1960-1990) on the initial level of per capita
income, what would the sign of the slope have to be to indicate this type of convergence? Explain. Would this
result confirm or reject the prediction of the neoclassical growth model?
(b) The results of the regression for 104 countries were as follows:
g6090 = 0.019 – 0.0006 × RelProd 60 , R2 = 0.00007, SER = 0.016,
where g6090 is the average annual growth rate of GDP per worker for the 1960 -1990 sample period, and
RelProd60 is GDP per worker relative to the United States in 1960.
Interpret the results. Is there any evidence of unconditional convergence between the countries of the world? Is
this result surprising? What other concept could you think about to test for convergence between countries?
(c) You decide to restrict yourself to the 24 OECD countries in the sample. This changes your regression output
as follows:
g6090 = 0.048 – 0.0404 RelProd 60 , R2 = 0.82 , SER = 0.0046
How does this result affect your conclusions from above?
Answer: (a) You would require a negative sign. Countries that are far ahead of others at the beginning of the
period would have to grow relatively slower for the others to catch up. This represents unconditional
convergence, whereas the neoclassical growth model predicts conditional convergence, i.e., there will
only be convergence if countries have identical savings, population growth rates, and production
technology.
(b) An increase in 10 percentage points in RelProd60 results in a decrease of 0.00006 in the growth rate
from 1960 to 1990, i.e., countries that were further ahead in 1960 do grow by less. There are some
countries in the sample that have a value of RelProd60 close to zero (China, Uganda, Togo, Guinea) and
you would expect these countries to grow roughly by 2 percent per year over the sample period. The
regression R2 indicates that the regression has virtually no explanatory power. The result is not
surprising given that there are not many theories that predict unconditional convergence between the
countries of the world.
(c) Judging by the size of the slope coefficient, there is strong evidence of unconditional convergence for
the OECD countries. The regression R2 is quite high, given that there is only a single explanatory
variable in the regression. However, since we do not know the sampling distribution of the estimator in
this case, we cannot conduct inference.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 76
7) In 2001, the Arizona Diamondbacks defeated the New York Yankees in the Baseball World Series in 7 games.
Some players, such as Bautista and Finley for the Diamondbacks, had a substantially higher batting average
during the World Series than during the regular season. Others, such as Brosius and Jeter for the Yankees, did
substantially poorer. You set out to investigate whether or not the regular season batting average is a good
indicator for the World Series batting average. The results for 11 players who had the most at bats for the two
teams are:
AZWsavg = –0.347 + 2.290 AZSeasavg , R2 =0.11, SER = 0.145,
NYWsavg = 0.134 + 0.136 NYSeasavg , R2 =0.001, SER = 0.092,
where Wsavg and Seasavg indicate the batting average during the World Series and the regular season
respectively.
(a) Focusing on the coefficients first, what is your interpretation?
(b) What can you say about the explanatory power of your equation? What do you conclude from this?
Answer: (a) The two regressions are quite different. For the Diamondbacks, players who had a 10 point higher
batting average during the regular season had roughly a 23 point higher batting average during the
World Series. Hence top performers did relatively better. The opposite holds for the Yankees.
(b) Both regressions have little explanatory power as seen from the regression R2 . Hence performance
during the season is a poor forecast of World Series performance.
8) For the simple regression model of Chapter 4, you have been given the following data:
420
Yi = 274, 745.75;
XiYi = 5,392, 705;
i=1
Xi = 8,248.979;
i=1
i=1
420
420
420
i=1
2
X i = 163,513.03;
420
2
Y i = 179,878, 841.13
i=1
(a) Calculate the regression slope and the intercept.
(b) Calculate the regression R2
^
Answer: (a) 1 =
^
5,392, 705 - 420 × 19.64 × 654.16
= -2.28; 0 = 654.2-2.28 × 19.6 = 698.9.
163513.03 - 420 × 19.64 2
(This is the data set for Chapter 4).
-2.28 × (5392704.6 × 19.6 × 654.2)
(b) R2 =
= 0.051
179878841.1 - 420 × 654.2 2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 77
9) Your textbook presented you with the following regression output:
TestScore = 698.9 – 2.28 × STR
n = 420, R2 = 0.051, SER = 18.6
(a) How would the slope coefficient change, if you decided one day to measure testscores in 100s, i.e., a
testscore of 650 became 6.5? Would this have an effect on your interpretation?
(b) Do you think the regression R2 will change? Why or why not?
(c) Although Chapter 4 in your textbook did not deal with hypothesis testing, it presented you with the large
sample distribution for the slope and the intercept estimator. Given the change in the units of measurement in
(a), do you think that the variance of the slope estimator will change numerically? Why or why not?
Answer: (a) The new regression line would be NewTestScore = 6.989 - 0.0228 × STR. Hence the decimal point
would simply move two digits to the left. The interpretation remains the same, since an increase in the
student-teacher ratio by 2, say, increases the new testscore by 0.0456 points on the new testscore scale,
which is 4.56 in the original testscores.
(b) The regression R2 should not change, since, if it did, an objective measure of fit would depend on
whim (the units of measurement). The SER will change (from 18.6 to 0.186). This is to be expected, since
the TSS obviously changes, and with the regression R2 unchanged, the SSR (and hence SER) have to
adjust accordingly.
(c) Since statistical inference will depend on the ratio of the estimator and its standard error, the
standard error must change in proportion to the estimator. If this was not true, then statistical inference
again would depend on the whim of the investigator.
10) The news-magazine The Economist regularly publishes data on the so called Big Mac index and exchange rates
between countries. The data for 30 countries from the April 29, 2000 issue is listed below:
Country
Currency
Indonesia
Italy
South Korea
Chile
Spain
Hungary
Japan
Taiwan
Thailand
Czech Rep.
Russia
Denmark
Sweden
Mexico
France
Israel
China
South Africa
Switzerland
Poland
Germany
Malaysia
New Zealand
Singapore
Brazil
Rupiah
Lira
Won
Peso
Peseta
Forint
Yen
Dollar
Baht
Crown
Ruble
Crown
Crown
Peso
Franc
Shekel
Yuan
Rand
Franc
Zloty
Mark
Dollar
Dollar
Dollar
Real
Price of Actual Exchange Rate
Big Mac per U.S. dollar
14,500
7,945
4,500
2,088
3,000
1,108
1,260
514
375
179
339
279
294
106
70
30.6
55
38.0
54.37
39.1
39.50
28.5
24.75
8.04
24.0
8.84
20.9
9.41
18.5
.07
14.5
4.05
9.90
8.28
9.0
6.72
5.90
1.70
5.50
4.30
4.99
2.11
4.52
3.80
3.40
2.01
3.20
1.70
2.95
1.79
Stock/Watson 2e -- CVC2 8/23/06 -- Page 78
Canada
Australia
Argentina
Britain
United States
Dollar
Dollar
Peso
Pound
Dollar
2.85
2.59
2.50
1.90
2.51
1.47
1.68
1.00
0.63
The concept of purchasing power parity or PPP (“the idea that similar foreign and domestic goods … should
have the same price in terms of the same currency,” Abel, A. and B. Bernanke, Macroeconomics, 4th edition,
Boston: Addison Wesley, 476) suggests that the ratio of the Big Mac priced in the local currency to the U.S.
dollar price should equal the exchange rate between the two countries.
(a) Enter the data into your regression analysis program (EViews, Stata, Excel, SAS, etc.). Calculate the
predicted exchange rate per U.S. dollar by dividing the price of a Big Mac in local currency by the U.S. price of
a Big Mac ($2.51).
(b) Run a regression of the actual exchange rate on the predicted exchange rate. If purchasing power parity
held, what would you expect the slope and the intercept of the regression to be? Is the value of the slope and
the intercept “far” from the values you would expect to hold under PPP?
(c) Plot the actual exchange rate against the predicted exchange rate. Include the 45 degree line in your graph.
Which observations might cause the slope and the intercept to differ from zero and one?
Answer: (a)
Country
Predicted Exchange Rate
per U.S. dollar
Indonesia
Italy
South Korea
Chile
Spain
Hungary
Japan
Taiwan
Thailand
Czech Rep.
Russia
Denmark
Sweden
Mexico
France
Israel
China
South Africa
Switzerland
Poland
Germany
Malaysia
New Zealand
Singapore
Brazil
Canada
Australia
Argentina
Britain
5777
1793
1195
502
149
135
117
27.9
21.9
21.7
15.7
9.86
9.56
8.33
7.37
5.78
3.94
3.59
2.35
2.19
1.99
1.80
1.35
1.27
1.18
1.14
1.03
1.00
0.76
(b) The estimated regression is as follows:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 79
ActualExRate = -27.05 + 1.35 × Pr edExRate
R2 = 0.994, n = 29, SER = 122.15
For PPP to hold exactly, you would expect an intercept of zero and a slope of unity. Since we do not
know the standard error of the slope and the intercept, and since Chapter 4 has not dealt with
hypothesis testing, it is hard to judge how “far” 27.05 and 1.35 are away from zero and one respectively.
(c) The regression is represented by the solid line, while the dashed one is the 45 degree line. Most of the
observations are bunched towards the origin, making it hard to judge from this graph which
observations cause the regression line to differ from the 45 degree line. However, the Indonesian Rupiah
is certainly a possible candidate.
11)
At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website go to Student
Resources and select the option “Datasets for Replicating Empirical Results.” Then select the
“California Test Score Data Used in Chapters 4-9” (caschool.xls) and open it in a spreadsheet program
such as Excel.
In this exercise you will estimate various statistics of the Linear Regression Model with One Regressor
through construction of various sums and ratio within a spreadsheet program.
Throughout this exercise, let Y correspond to Test Scores (testscore) and X to the Student Teacher Ratio
(str). To generate answers to all exercises here, you will have to create seven columns and the sums of
five of these. They are
(i) Yi, (ii) Xi, (iii) (Yi- Y), (iv) (Xi- X), (v) (Yi- Y)×(Xi- X), (vi) (Xi- X)2 , (vii) (Yi- Y)2
Although neither the sum of (iii) or (iv) will be required for further calculations, you may want to
generate these as a check (both have to sum to zero).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 80
a.
b.
c.
d.
e.
f.
Use equation (4.7) and the sums of columns (v) and (vi) to generate the slope of the regression.
Use equation (4.8) to generate the intercept.
Display the regression line (4.9) and interpret the coefficients.
Use equation (4.16) and the sum of column (vii) to calculate the regression R2 .
Use equation (4.19) to calculate the SER.
Use the “Regression” function in Excel to verify the results.
Answer: Column (i): 654.156548
Column (ii): 19.64043
Column (iii): 1.27329E-11
Column (iv): 1.13E-12
Column (v): -3418.76
Column (vi): 1499.58
Column (vii): 152109.6
a.
b.
c.
^
^
-3418.76
1 = 1499.58 = - 2.27981
0 = 274745.75-(-2.27981)×8248.979 = 698.933
^
Yi= 698.9 - 2.28 × Xi. A decrease in the student-teacher ratio of one results in an increase in test
scores of 2.28. It is best not to interpret the intercept; it simply determines the height of the
regression line.
d. To calculate the regression R2 , you need the TSS given from the sum in column (vii) and either the
ESS or SSR. In principle, you could use equation (4.10) to generate the residuals, square these and sum
n
^2
them up to get SSR. However, the textbook suggests a shortcut at the bottom of p. 142:
ui =
i=1
n
n
^2
(Yi-Y)2 - 1
(Xi-X)2 (the cross-product vanishes due to the orthogonality conditions (4.32)
i=1
i=1
and (4.36)). The various terms on the RHS of the equation have been calculated
and equation (4.35)
n
^2
7794.11
implies that 1
(Xi-X)2 = ESS = 7794.11. Hence the regression R2 = 152109.6 = 0.051
i=1
e.
The answer in (d) can be used to calculate the SSR, which are 144325.5. Hence the SEE must be 18.6.
f.
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.226
0.051
0.049
18.581
420
ANOVA
df
SS
Stock/Watson 2e -- CVC2 8/23/06 -- Page 81
Regression
Residual
Total
1 7794.11
418 144315.5
419 152109.6
Coefficients
698.93
-2.28
Intercept
str
12) You have obtained a sample of 14,925 individuals from the Current Population Survey (CPS) and are interested
in the relationship between average hourly earnings and years of education. The regression yields the
following result:
^
ahe= -4.58 + 1.71×educ , R2 = 0.182, SER = 9.30
where ahe and educ are measured in dollars and years respectively.
a.
Interpret the coefficients and the regression R2 .
b.
Is the effect of education on earnings large?
c.
Why should education matter in the determination of earnings? Do the results suggest that there is
a guarantee for average hourly earnings to rise for everyone as they receive an additional year of
education? Do you think that the relationship between education and average hourly earnings is
linear?
d.
The average years of education in this sample is 13.5 years. What is mean of average hourly
earnings in the sample?
e.
Interpret the measure SER. What is its unit of measurement.
Answer: a. A person with one more year of education increases her earnings by $1.71. There is no meaning
attached to the intercept, it just determines the height of the regression. The model explains 5 percent of
the variation in average hourly earnings.
b. The difference between a high school graduate and a college graduate is four years of education.
Hence a college graduate will earn almost $7 more per hour, on average ($6.84 to be precise). If you
assume that there are 2,000 working hours per year, then the average salary difference would be close to
$14,000 (actually $13,680). Depending on how much you have spent for an additional year of education
and how much income you have forgone, this does not seem particularly large.
c. In general, you would expect to find a positive relationship between years of education and average
hourly earnings. Education is considered investment in human capital. If this were not the case, then it
would be a puzzle as to why there are students in the econometrics course — surely they are not there to
just “find themselves” (which would be quite expensive in most cases). However, if you consider
education as an investment and you wanted to see a return on it, then the relationship will most likely
not be linear. For example, a constant percent return would imply an exponential relationship whereby
the additional year of education would bring a larger increase in average hourly earnings at higher
levels of education. The results do not suggest that there is a guarantee for earnings to rise for everyone
as they become more educated since the regression R2 does not equal 1. Instead the result holds “on
average.”
^
^
^
^
d. Since 0 = Y - 1 X Y = 0 + 1 X Substituting the estimates for the slope and the intercept then
results in a mean of average hourly earnings of roughly $18.50.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 82
e. The typical prediction error is $9.30. Since the measure is related to the deviation of the actual and
fitted values, the unit of measurement must be the same as that of the dependent variable, which is in
dollars here.
4.3 Mathematical and Graphical Problems
1) Prove that the regression R2 is identical to the square of the correlation coefficient between two variables Y and
X. Regression functions are written in a form that suggests causation running from X to Y. Given your proof,
does a high regression R2 present supportive evidence of a causal relationship? Can you think of some
regression examples where the direction of causality is not clear? Is without a doubt?
Answer: The regression R2 =
^
Hence (Yi - Y)2 =
ESS
, where ESS is given by
TSS
^2
n
^
^
^
^
^
^
(Y - Y)2 . But Yi = 0 + 1 Xi and Y = 0 + 1 X.
i=1
2
1 (Xi - X) and therefore ESS =
^2
n
1
i=1
(Xi - X)2 . Using small letters to indicate
^2
n
1
deviations from mean, i.e., zi = Zi - Z, we get that the regression R2 =
n
correlation coefficient is r2 =
i=1
n
i=1
n
(y ix i)2
2
xi
n
i=1
=
2
yi
n
(y ix i)2
2
xi
i=1
n
2
yi
i=1
2
xi
i=1
i=1
=
n
n
2 2
2
(
xi )
yi
i=1
i=1
^2
1
n
. The square of the
2
xi
i=1
n
2
yi
i=1
. Hence the two
are the same. Correlation does not imply causation. Income is a regressor in the consumption function,
yet consumption enters on the right-hand side of the GDP identity. Regressing the weight of individuals
on the height is a situation where causality is without doubt, since the author of this test bank should be
seven feet tall otherwise. The authors of the textbook use weather data to forecast orange juice prices
later in the text.
2) You have analyzed the relationship between the weight and height of individuals. Although you are quite
confident about the accuracy of your measurements, you feel that some of the observations are extreme, say,
two standard deviations above and below the mean. Your therefore decide to disregard these individuals.
What consequence will this have on the standard deviation of the OLS estimator of the slope?
Answer: Other things being equal, the standard error of the slope coefficient will decrease the larger the variation
in X. Hence you prefer more variation rather than less. This can be seen from formula (4.20) in the text.
Intuitively it is easier for OLS to detect a response to a unit change in X if the data varies more.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 83
3) In order to calculate the regression R2 you need the TSS and either the SSR or the ESS. The TSS is fairly
straightforward to calculate, being just the variation of Y. However, if you had to calculate the SSR or ESS by
hand (or in a spreadsheet), you would need all fitted values from the regression function and their deviations
from the sample mean, or the residuals. Can you think of a quicker way to calculate the ESS simply using
terms you have already used to calculate the slope coefficient?
n
Answer: The ESS is given by
i=1
and therefore ESS =
^2
1
^
^
^
^
^
^
^2
^
(Yi - Y)2 . But Yi = 0 + 1 Xi and Y = 0 + 1 X. Hence (Yi - Y)2 = 1 (Xi - X)2 ,
n
(Xi - X)2 . The right-hand side contains the estimated slope squared and the
i=1
denominator of the slope, i.e., all values that have already been calculated.
4) (Requires Appendix material) In deriving the OLS estimator, you minimize the sum of squared residuals with
^
^
respect to the two parameters 0 and 1 . The resulting two equations imply two restrictions that OLS places on
n ^
n ^
ui Xi = 0. Show that you get the same formula for the regression slope
ui = 0 and
the data, namely that
i=1
i=1
and the intercept if you impose these two conditions on the sample regression function.
^
Answer: The sample regression function is Yi = o +
^
^
1 Xi + ui. Summing both sides results in
n
^
Yi = n o + ^1
i=1
n ^
Xi +
ui . Imposing the first restriction, namely that the sum of the residuals is zero, dividing
i=1
i=1
n
both sides of the equation by n, and solving for
^
o gives the OLS formula for the intercept.
For the second restriction, multiply both sides of the sample regression function by Xi and then sum
n ^
n
n
n ^
n
2
uiXi . After imposing the restriction
Xi + ^1
uiXi =0
YiXi = ^o
Xi +
both sides to get
i=1
i=1
i=1
i=1
i=1
and substituting the formula for the intercept, you get
n
n
n
n
^
2
2 ^
YiXi = (Y - ^1 X)nX + ^1
YiXi - nYX = ^1
X i or
X i - 1 X , which, after isolating 1
i=1
i=1
i=1
i=1
and dividing by the variation in ,X results in the OLS estimator for the slope.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 84
5) (Requires Appendix material) Show that the two alternative formulae for the slope given in your textbook are
identical.
n
n
1
(Xi – X)(Yi – Y)
XiYi – XY
n
i=1
i=1
=
n
n
1
2
2
Xi -X
(Xi - X)2
n
i=1
i=1
Answer: Let’s start with the first equality. The numerator of the right -hand side expression can be written as
follows:
n
(Xi - X)(Yi - Y) =
n
(XiYi - XYi - YXi + XY) =
n
XiYi - X
n
Yi - Y
n
Xi - nXY
i=1
i=1
i=1
i=1
i=1
n
n
n
YiXi - nXY. (Note that
Xi = nX .)
YiXi - nXY - nXY + nXY =
=
i=1
i=1
i=1
Multiplying out the terms in the denominator and moving the summation sign into the expression in
n
2
parentheses similarly yields
X i - nX2 . Dividing both of these expressions by n then results in the
i=1
left-hand side fraction.
6) (Requires Calculus) Consider the following model:
Yi = 0 + ui.
Derive the OLS estimator for 0 .
n
Answer: To derive the OLS estimator, minimize the sum of squared prediction mistakes
i=1
n
the derivative with respect to b0 results in
= (-2)
n
(Yi - b0 ) = (-2)
i=1
n
^
(-2)
Yi - n 0 = 0
i=1
n
b0
i=1
(Yi - b0 )2 =
n
i=1
n
2(Yi - b0 )(-1)
i=1
Yi - nb0 . Setting the derivative to zero then results in the OLS estimator:
i=1
^
b0
(Yi - b0 )2 =
(Yi - b0 )2 . Taking
o=Y .
Stock/Watson 2e -- CVC2 8/23/06 -- Page 85
7) (Requires Calculus) Consider the following model:
Yi = 1 Xi + ui.
Derive the OLS estimator for 1 .
n
Answer: To derive the OLS estimator, minimize the sum of squared prediction mistakes
i=1
n
the derivative with respect to b1 results in
n
b1
i=1
(Yi - b1 Xi)2 =
n
i=1
b1
(Yi - b1 Xi)2 . Taking
(Yi - b1 Xi)2 =
2(Yi - b1 Xi)(-Xi)
i=1
n
= (-2)
(Yi - b1 Xi)(Xi) = (-2)(
i=1
OLS estimator:
n
2
(YiXi - b1 X i ) . Setting the derivative to zero then results in the
i=1
n
n
(-2)(
i=1
^
YiXi - 1
n
i=1
2
Xi =0
^
1=
i=1
n
YiXi
.
2
Xi
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 86
8) Show first that the regression R2 is the square of the sample correlation coefficient. Next, show that the slope of
a simple regression of Y on X is only identical to the inverse of the regression slope of X on Y if the regression
R2 equals one.
ESS
, where ESS is given by
TSS
Answer: The regression R2 =
n
^
i=1
^
^
^
^
^
(Yi - Y)2 . But Yi = 0 + 1 Xi and Y = 0 + 1 X .
n
^2
2
(Xi - X)2 . Using small letters to indicate
1 (Xi - X) , and therefore ESS = 1
i=1
n
^2
2
xi
1
i=1
deviations from mean, i.e., zi = Zi - Z, we get that the regression R2 =
. The square of the
n
2
yi
i=1
^
Hence (Yi - Y)2 =
^2
n
correlation coefficient is r2 =
i=1
n
n
(y ix i)2
2
xi
i=1
=
n
2
yi
i=1
n
(y ix i)2
^2
2
xi
1
i=1
i=1
=
n
n
2
2
(
yi
x i )2
i=1
i=1
n
2
xi
i=1
n
2
yi
i=1
. Hence the two
are the same.
^2
1
Now 1 = r2 =
n
2
xi
i=1
n
2
yi
i=1
n
^2
1 =
i=1
n
n
2
yi
. But
^2
2
xi
i=1
^ i=1
1 = 1 n
i=1
which is the inverse of the regression slope of X on Y.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 87
n
xiy i
^
2
xi
and therefore 1 =
i=1
n
i=1
2
yi
,
xiy i
9) Consider the sample regression function
^
^
^
Yi = 0 + 1 Xi + ui.
First, take averages on both sides of the equation. Second, subtract the resulting equation from the above
equation to write the sample regression function in deviations from means. (For simplicity, you may want to
use small letters to indicate deviations from the mean, i.e., zi = Zi – Z.) Finally, illustrate in a two-dimensional
diagram with SSR on the vertical axis and the regression slope on the horizontal axis how you could find the
least squares estimator for the slope by varying its values through trial and error.
^
^
Answer: Taking averages results in the following equation: Y = 0 + 1 X. Subtracting this equation from the
^
^
above one, we get y i = 1 x i + ui.
n ^
^
^
2
u i = (y i = 1 x i )2 is a quadratic which takes on different values for different choices of 1
i=1
(the y and x are given in this case, i.e., different from the usual calculus problems, they cannot vary
here). You could choose a starting value of the slope and calculate SSR. Next you could choose a
different value for the slope and calculate the new SSR. There are two choices for the new slope value for
you to make: first, in which direction you want to move, and second, how large a distance you want to
choose the new slope value from the old one. (In essence, this is what sophisticated search algorithms
do.) You continue with this procedure until you find the smallest SSR. The slope coefficient which has
generated this SSR is the OLS estimator.
SSR =
10) Given the amount of money and effort that you have spent on your education, you wonder if it was (is) all
worth it. You therefore collect data from the Current Population Survey (CPS) and estimate a linear
relationship between earnings and the years of education of individuals. What would be the effect on your
regression slope and intercept if you measured earnings in thousands of dollars rather than in dollars? Would
the regression R2 be affected? Should statistical inference be dependent on the scale of variables? Discuss.
Answer: It should be clear that interpretation of estimated relationships and statistical inference should not
depend on the units of measurement. Otherwise whim could dictate conclusions. Hence the regression
R2 and statistical inference cannot be effected. It is easy but tedious to show this mathematically. Next,
the intercept indicates the value of Y when X is zero. The change in the units of measurement have no
^
effect on this, since the change in X is cancelled by the change in 1 . The slope coefficient will change to
compensate for the change in the units of measurement of X. In the above case, the decimal point will
move 3 digits to the left.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 88
11) (Requires Appendix material) Consider the sample regression function
^
* ^
* ^
Y i = 0 + 1 X i + ui ,
where * indicates that the variable has been standardized. What are the units of measurement for the
dependent and explanatory variable? Why would you want to transform both variables in this way? Show that
the OLS estimator for the intercept equals zero. Next prove that the OLS estimator for the slope in this case is
identical to the formula for the least squares estimator where the variables have not been standardized, times
^
^ SX
the ratio of the sample standard deviation of X and Y, i.e., 1 = 1 *
.
SY
Answer: The units of measurement are in standard deviations. Standardizing the variables allows conversion into
common units and allows comparison of the size of coefficients. The mean of standardized variables is
zero, and hence the OLS intercept must also be zero. The slope coefficient is given by the formula
n
i=1
n
^
1=
* *
xi yi
, where small letters indicate deviations from mean, i.e., z = Z - Z.
*2
xi
i=1
n
Note that means of standardized variables are zero, and hence we get
^
1=
* *
Xi Yi
i=1
n
. Writing this
*2
Xi
i=1
^
expression in terms of originally observed variables results in 1 =
1 1
SX SY
1
n
i=1
n
2
S X i=1
as the sought after expression after simplification.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 89
x iyi
, which is the same
2
xi
12) The OLS slope estimator is not defined if there is no variation in the data for the explanatory variable. You are
interested in estimating a regression relating earnings to years of schooling. Imagine that you had collected
data on earnings for different individuals, but that all these individuals had completed a college education (16
years of education). Sketch what the data would look like and explain intuitively why the OLS coefficient does
not exist in this situation.
Answer: There is no variation in X in this case, and it is therefore unreasonable to ask by how much Y would
change if X changed by one unit. Regression analysis cannot figure out the answer to this question,
because a change in X never happens in the sample.
13) Indicate in a scatterplot what the data for your dependent variable and your explanatory variable would look
like in a regression with an R2 equal to zero. How would this change if the regression R2 was equal to one?
Answer: For the zero regression R2 , the data would look something like this:
In the case of the regression R2 being one, all observations would lie on a straight line.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 90
14) Imagine that you had discovered a relationship that would generate a scatterplot very similar to the
2
relationship Yi = X i , and that you would try to fit a linear regression through your data points. What do you
expect the slope coefficient to be? What do you think the value of your regression R2 is in this situation? What
are the implications from your answers in terms of fitting a linear regression through a non -linear relationship?
Answer: You would expect the slope to be a straight line (=0) and the regression R2 to be zero in this situation.
The implication is that although there may be a relationship between two variables, you may not detect
it if you use the wrong functional form.
15) (Requires Appendix material) A necessary and sufficient condition to derive the OLS estimator is that the
n ^
n ^
n ^
ui = 0 and
uiXi = 0. Show that these conditions imply that
uiYi =
following two conditions hold:
i=1
i=1
i=1
0.
n ^
n ^ ^
uiYi =
ui( 0 +
i=1
i=1
Answer:
1Xi) =
0
n ^
ui +
i=1
1
n ^
uiXi = 0
i=1
16) The help function for a commonly used spreadsheet program gives the following definition for the regression
slope it estimates:
n
n
n
n
XiYi – (
i=1
Xi)(
i=1
n
n
2
Xi -(
n
i=1
Yi )
i=1
i=1
Xi)2
Prove that this formula is the same as the one given in the textbook.
n
XiYi - (
n
Answer:
i=1
n
i=1
n
n
i=1
2
Xi -(
n
n
Xi)(
Yi )
i=1
n
i=1
Xi)2
XiYi - nXnY
n
XiYi - nXY
i=1
i=1
.
=
=
n
n
2
2
n
n
X i - (nX)2
X i - nX2
i=1
i=1
n
n
Dividing both numerator and denominator by n then gives you the desired result.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 91
17) In order to calculate the slope, the intercept, and the regression R2 for a simple sample regression function, list
the five sums of data that you need.
Answer: Depending whether or not the data is in deviations from means or not ( zi = Zi - Z or Zi, say), you need
the following sums:
n
n
n
n
n
2
2
Xi,
x iyi,
yi ,
x i (data in deviation form) or
Yi,
i=1
i=1
i=1
i=1
i=1
n
n
n
n
n
^
2
2
Yi,
Xi,
XiYi,
Yi ,
X i . Using these five columns, you can calculate the slope 1 =
i=1
i=1
i=1
i=1
i=1
n
i=1
n
^
x iyi
^
^
i=1
^2 n
xiy i
i=1
n
2
yi
i=1
, the intercept 0 = Y- 1 X, and the regression R2 =
2
xi
n
1
1
i=1
n
2
yi
i=1
=
n
if the data is not given in deviation form, the formulae are as follows:
^
1=
i=1
n
i=1
^
regression R2 =
n
1(
i=1
n
i=1
^2
XiYi - nXY )
1(
=
2
Y i - nY2
n
2
xi
. Alternatively,
YiXi - nXY
, and for the
2
X i - nX2
2
X i - nX2 )
i=1
n
2
Y i - nY2
i=1
.
18) A peer of yours, who is a major in another social science, says he is not interested in the regression slope and/or
intercept. Instead he only cares about correlations. For example, in the testscore/student -teacher ratio
regression, he claims to get all the information he needs from the negative correlation coefficient
corr(X,Y)=-0.226. What response might you have for your peer?
Answer: First of all, the regression slope is related to the regression R2 , and hence its square root, the correlation
coefficient, since
^
R2 =
n
1(
i=1
n
i=1
^2
(
XiYi - nXY)
=
2
Y i - nY2
n
1 i=1
n
i=1
X
2
- nX2 )
i
.
2
Y i - nY2
However, while the correlation coefficient tells you something about the direction and strength of the
relationship between two variables, it does not inform you about the effect a one unit increase in the
explanatory variable. Hence it cannot answer the question whether or not the relationship is important
(although even with the knowledge of the slope coefficient, this requires further information). Your
friend would not be able to answer the question which policy makers and researchers are typically
interested in, such as, what would be the effect on test scores of a reduction in the student-teacher ratio
by one?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 92
19) Assume that there is a change in the units of measurement on both Y and X. The new variables are Y*= aY and
X* = bX. What effect will this change have on the regression slope?
^* ^ *
^* *
Answer: We now have the following sample regression function Y = 0 +
1 X . The formula for the slope will
be
n
^*
1=
i=1
n
n
* *
xi yi
n
(bx i)(ayi)
=
*2
xi
i=1
n
i=1
i=1
ab
=
(bx i)2
b2
xiy i
i=1
n
=
2
xi
a^
.
b 1
i=1
20) Assume that there is a change in the units of measurement on X. The new variables X* = bX. Prove that this
change in the units of measurement on the explanatory variable has no effect on the intercept in the resulting
regression.
^
Answer: Consider the sample regression function Y =
n
^*
1 bX. But
^*
1 =
i=1
n
i=1
n
*
x i yi
=
*2
xi
0 +
n
(bx i) y i
i=1
n
i=1
^*
b
=
(bx i)2
^*
*
1 X . The formula for the intercept will be
xiy i
^*
^
1^
1^
i=1
. Hence 0 = Y bX = 0 .
=
b 1
b 1
n
2
b2
xi
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 93
^*
0 =Y-
21) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website, go to Student Resources
and select the option “Datasets for Replicating Empirical Results.” Then select the “California Test Score Data
Used in Chapters 4-9” and read the data either into Excel or STATA (or another statistical program). First run a
regression where the dependent variable is test scores and the independent variable is the student -teacher
ratio. Record the regression R2 . Then run a regression where the dependent variable is the student-teacher
ratio and the independent variable is test scores. Record the regression R2 from this regression. How do they
compare?
Answer: The regression R2 is 0.051, confirming the idea that the regression R2 is only the square of the correlation
coefficient between two variables. This can also be shown formally as follows:
n
^
^
^
^
^
ESS
^
The regression R2 =
where ESS is given by
(Yi-Y)2 . But Yi= 0 + 1 Xi and Y= 0 + 1 X.
TSS
i=1
^
Hence (Yi- Y)2 =
^2
2
1 (Xi-X) and therefore ESS =
^2
2
1 (Xi-X) . Using small letters to indicate
n
^2
1
deviations from mean, i.e., : zi = Zi- Z, we get that the regression R2 =
i=1
n
(y ix i)2
n
(y ix i)2
x i2
i=1
i=1
correlation coefficient is r2 =
=
=
n
n
n
n
(
x i2 )2
y i2
x i2
y i2
i=1
i=1
i=1
i=1
i=1
. The square of the
n
i=1
n
x i2
y i2
^2
n
1
i=1
x i2
n
i=1
. Hence the two are
y i2
the same.
22) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website, go to Student Resources
and select the option “Datasets for Replicating Empirical Results.” Then select the “California Test Score Data
Used in Chapters 4-9” and read the data either into Excel or STATA (or another statistical program).
Run a regression of the average reading score (read_scr) on the average math score (math_scr). What values for
the slope and the intercept would you expect? Interpret the coefficients in the resulting regression output and
the regression R2 .
Answer: On average, it would seem plausible, a priori, that schools which score high on the math score would also
do well in the reading score. Perhaps an underlying variable, such as genes, parental interest, or the
quality of teachers, is driving results in both. The relationship is close to the 45 degree line, where the
intercept would be zero and the slope would be one. Interpreted literally, 85 percent of the variation in
the reading score is explained by our model.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 94
23) In a simple regression with an intercept and a single explanatory variable, the variation in Y (TSS =
n
n
^
(Yi-Y)2 ) and the sum of squared
(Yi-Y)2 ) can be decomposed into the explained sums of squares ( ESS =
i=1
i=1
n
n
^
^
residuals (SSR =
ui2 =
(Yi-Y)2 ) (see, for example, equation (4.35) in the textbook).
i=1
i=1
Consider any regression line, positively or negatively sloped in {X,Y} space. Draw a horizontal line
where, hypothetically, you consider the sample mean of Y (
observation of Y.
) to be. Next add a single actual
In this graph, indicate where you find the following distances: the
(i)
(ii)
(iii)
residual
actual minus the mean of Y
fitted value minus the mean of Y
Answer:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 95
Chapter 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals
5.1 Multiple Choice
1) Heteroskedasticity means that
A) homogeneity cannot be assumed automatically for the model.
B) the variance of the error term is not constant.
C) the observed units have different preferences.
D) agents are not all rational.
Answer: B
2) With heteroskedastic errors, the weighted least squares estimator is BLUE. You should use OLS with
heteroskedasticity-robust standard errors because
A) this method is simpler.
B) the exact form of the conditional variance is rarely known.
C) the Gauss-Markov theorem holds.
D) your spreadsheet program does not have a command for weighted least squares.
Answer: B
3) When estimating a demand function for a good where quantity demanded is a linear function of the price, you
should
A) not include an intercept because the price of the good is never zero.
B) use a one-sided alternative hypothesis to check the influence of price on quantity.
C) use a two-sided alternative hypothesis to check the influence of price on quantity.
D) reject the idea that price determines demand unless the coefficient is at least 1.96.
Answer: B
4) The t-statistic is calculated by dividing
A) the OLS estimator by its standard error.
B) the slope by the standard deviation of the explanatory variable.
C) the estimator minus its hypothesized value by the standard error of the estimator.
D) the slope by 1.96.
Answer: C
5) The confidence interval for the sample regression function slope
A) can be used to conduct a test about a hypothesized population regression function slope.
B) can be used to compare the value of the slope relative to that of the intercept.
C) adds and subtracts 1.96 from the slope.
D) allows you to make statements about the economic importance of your estimate.
Answer: A
6) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal
distribution, you can
A) reject the null hypothesis.
B) safely assume that your regression results are significant.
C) reject the assumption that the error terms are homoskedastic.
D) conclude that most of the actual values are very close to the regression line.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 96
7) Under the least squares assumptions (zero conditional mean for the error term, Xi and Yi being i.i.d., and Xi
and ui having finite fourth moments), the OLS estimator for the slope and intercept
A) has an exact normal distribution for n > 15.
B) is BLUE.
C) has a normal distribution even in small samples.
D) is unbiased.
Answer: D
8) In general, the t-statistic has the following form:
estimate-hypothesize value
A)
standard error of estimate
B)
estimator
standard error of estimator
C)
estimator-hypothesize value
standard error of estimator
D)
estimator-hypothesize value
standard error of estimator
n
Answer: C
9) Consider the following regression line: TestScore = 698.9 – 2.28 × STR. You are told that the t-statistic on the
slope coefficient is 4.38. What is the standard error of the slope coefficient?
A) 0.52
B) 1.96
C) -1.96
D) 4.38
Answer: A
10) Imagine that you were told that the t-statistic for the slope coefficient of the regression line TestScore = 698.9 –
2.28 × STR was 4.38. What are the units of measurement for the t-statistic?
A) points of the test score
B) number of students per teacher
TestScore
C)
STR
D) standard deviations
Answer: D
11) The construction of the t-statistic for a one- and a two-sided hypothesis
A) depends on the critical value from the appropriate distribution.
B) is the same.
C) is different since the critical value must be 1.645 for the one-sided hypothesis, but 1.96 for the two-sided
hypothesis (using a 5% probability for the Type I error).
D) uses ±1.96 for the two-sided test, but only +1.96 for the one-sided test.
Answer: B
12) The p-value for a one-sided left-tail test is given by
A) Pr(Z - tact ) = (tact).
B) Pr(Z < tact ) = (tact).
C) Pr(Z < tact ) < 1.645.
D) cannot be calculated, since probabilities must always be positive.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 97
13) The 95% confidence interval for 1 is the interval
A) ( 1 - 1.96SE)( 1 ), 1 + 1.96SE( 1 )).
^
^
^
^
B) ( 1 - 1.645SE)( 1 ), 1 + 1.645SE( 1 )).
^
^
^
^
C) ( 1 - 1.96SE)( 1 ), 1 + 1.96SE( 1 )).
^
^
D) ( 1 - 1.96, 1 + 1.96).
Answer: C
14) The 95% confidence interval for 0 is the interval
A) ( 0 - 1.96SE( 0 ), 0 + 1.96SE( 0 )).
^
^
^
B) ( 0 - 1.645SE( 0 ), 0 + 1.645SE( 0 )).
^
^
^
^
C) ( 0 - 1.96SE( 0 ), 0 + 1.96SE( 0 )).
^
^
D) ( 0 - 1.96, 0 + 1.96).
Answer: C
15) The 95% confidence interval for the predicted effect of a general change in X is
A) ( 1 x - 1.96SE( 1 ) × x, 1 x + 1.96SE( 1 ) × x).
^
^
^
B) ( 1 x - 1.645SE( 1 ) ×
^
^
C) ( 1 x - 1.96SE( 1 ) ×
^
^
^
x, 1 x + 1.645SE( 1 ) ×
^
^
x, 1 x + 1.96SE( 1 ) ×
x).
x).
D) ( 1 x - 1.96, 1 x + 1.96).
Answer: C
^
16) The homoskedasticity-only estimator of the variance of 1 is
2
S^
u
A)
.
n
Xi - X 2
i=1
S^
u
B)
n
.
Xi - X 2
i=1
2
S^
u
C)
.
n
2
Xi -X
i=1
D)
1
×
n
1
n-2
1
n
n
i=1
n
^2
Xi - X 2 u i
Xi - X 2
2
.
i=1
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 98
17) One of the following steps is not required as a step to test for the null hypothesis:
^
A) compute the standard error of 1 .
B) test for the errors to be normally distributed.
C) compute the t-statistic.
D) compute the p-value.
Answer: B
18) Finding a small value of the p-value (e.g. less than 5%)
A) indicates evidence in favor of the null hypothesis.
B) implies that the t-statistic is less than 1.96.
C) indicates evidence in against the null hypothesis.
D) will only happen roughly one in twenty samples.
Answer: C
19) The only difference between a one- and two-sided hypothesis test is
A) the null hypothesis.
B) dependent on the sample size n.
C) the sign of the slope coefficient.
D) how you interpret the t-statistic.
Answer: D
20) A binary variable is often called a
A) dummy variable.
B) dependent variable.
C) residual.
D) power of a test.
Answer: A
21) The error term is homoskedastic if
A) var(ui Xi = x) is constant for i = 1,…, n.
B) var(ui Xi = x) depends on x.
C) Xi is normally distributed.
D) there are no outliers.
Answer: A
22) In the presence of heteroskedasticity, and assuming that the usual least squares assumptions hold, the OLS
estimator is
A) efficient.
B) BLUE.
C) unbiased and consistent.
D) unbiased but not consistent.
Answer: C
23) The proof that OLS is BLUE requires all of the following assumptions with the exception of:
A) the errors are homoskedastic.
B) the errors are normally distributed.
C) E(ui Xi) = 0.
D) large outliers are unlikely.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 99
24) If the errors are heteroskedastic, then
A) OLS is BLUE.
B) WLS is BLUE if the conditional variance of the errors is known up to a constant factor of proportionality.
C) LAD is BLUE if the conditional variance of the errors is known up to a constant factor of proportionality.
D) OLS is efficient.
Answer: B
25) The homoskedastic normal regression assumptions are all of the following with the exception of:
A) the errors are homoskedastic.
B) the errors are normally distributed.
C) there are no outliers.
D) there are at least 10 observations.
Answer: D
26) Using the textbook example of 420 California school districts and the regression of testscores on the
student-teacher ratio, you find that the standard error on the slope coefficient is 0.51 when using the
heteroskedasticity robust formula, while it is 0.48 when employing the homoskedasticity only formula. When
calculating the t-statistic, the recommended procedure is to
A) use the homoskedasticity only formula because the t-statistic becomes larger
B) first test for homoskedasticity of the errors and then make a decision
C) use the heteroskedasticity robust formula
D) make a decision depending on how much different the estimate of the slope is under the two procedures
Answer: C
27) Consider the estimated equation from your textbook
TestScore=698.9 - 2.28 STR, R2 = 0.051, SER = 18.6
(10.4) (0.52)
The t-statistic for the slope is approximately
A) 4.38
B) 67.20
C) 0.52
D) 1.76
Answer: A
28) You have collected data for the 50 U.S. states and estimated the following relationship between the change in
the unemployment rate from the previous year ( ur) and the growth rate of the respective state real GDP (g y).
The results are as follows
ur= 2.81 — 0.23 g y, R2 = 0.36, SER = 0.78
(0.12) (0.04)
Assuming that the estimator has a normal distribution, the 95% confidence interval for the slope is
approximately the interval
A) [2.57, 3.05]
B) [-0.31,0.15]
C) [-0.31, -0.15]
D) [-0.33, -0.13]
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 100
29) Using 143 observations, assume that you had estimated a simple regression function and that your estimate for
the slope was 0.04, with a standard error of 0.01. You want to test whether or not the estimate is statistically
significant. Which of the following possible decisions is the only correct one:
A) you decide that the coefficient is small and hence most likely is zero in the population
B) the slope is statistically significant since it is four standard errors away from zero
C) the response of Y given a change in X must be economically important since it is statistically significant
D) since the slope is very small, so must be the regression R 2 .
Answer: B
30) You extract approximately 5,000 observations from the Current Population Survey (CPS) and estimate the
following regression function:
ahe= 3.32 — 0.45 Age, R2 = 0.02, SER = 8.66
(1.00) (0.04)
where ahe is average hourly earnings, and Age is the individual’s age. Given the specification, your 95%
confidence interval for the effect of changing age by 5 years is approximately
A) [$1.96, $2.54]
B) [$2.32, $4.32]
C) [$1.35, $5.30]
D) cannot be determined given the information provided
Answer: A
5.2 Essays and Longer Questions
1) (Continuation from Chapter 4) Sir Francis Galton, a cousin of James Darwin, examined the relationship
between the height of children and their parents towards the end of the 19 th century. It is from this study that
the name “regression” originated. You decide to update his findings by collecting data from 110 college
students, and estimate the following relationship:
Studenth = 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0
(7.2) (0.10)
where Studenth is the height of students in inches, and Midparh is the average of the parental heights. Values in
parentheses are heteroskedasticity robust standard errors. (Following Galton’s methodology, both variables
were adjusted so that the average female height was equal to the average male height.)
(a) Test for the statistical significance of the slope coefficient.
(b) If children, on average, were expected to be of the same height as their parents, then this would imply two
hypotheses, one for the slope and one for the intercept.
out
(i) What should the null hypothesis be for the intercept? Calculate the relevant t-statistic and carry
the hypothesis test at the 1% level.
(ii) What should the null hypothesis be for the slope? Calculate the relevant t-statistic and carry out
the
hypothesis test at the 5% level.
(c) Can you reject the null hypothesis that the regression R2 is zero?
(d) Construct a 95% confidence interval for a one inch increase in the average of parental height.
Answer: (a) H0 : 1 = 0, t=7.30, for H1 : 1 > 0, the critical value for a two-sided alternative is 1.645. Hence we
reject the null hypothesis
(b) H0 : 0 = 0, t=2.72, for H1 : 0 0, the critical value for a two-sided alternative is 2.58. Hence we
reject the null hypothesis in (i). For the slope we have H0 : 1 = 1, t=-2.70, for H1 : 1 1, the critical
value for a two-sided alternative is 1.96. Hence we reject the null hypothesis in (ii).
(c) For the simple linear regression model, H0 : 1 = 0 implies that R2 = 0. Hence it is the same test as in
(a).
(d) (0.73 – 1.96 × 0.10, 0.73 + 1.96 × 0.10) = (0.53, 0.93).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 101
2) (Requires Appendix) (Continuation from Chapter 4) At a recent county fair, you observed that at one stand
people’s weight was forecasted, and were surprised by the accuracy (within a range). Thinking about how the
person could have predicted your weight fairly accurately (despite the fact that she did not know about your
“heavy bones”), you think about how this could have been accomplished. You remember that medical charts
for children contain 5%, 25%, 50%, 75% and 95% lines for a weight/height relationship and decide to conduct
an experiment with 110 of your peers. You collect the data and calculate the following sums:
n
Yi = 17,375,
i=1
n
2
y i = 94,228.8,
i=1
n
Xi = 7,665.5,
i=1
n
i=1
2
x i = 1,248.9,
n
x iy i = 7,625.9
i=1
where the height is measured in inches and weight in pounds. (Small letters refer to deviations from means as
in zi = Zi – Z.)
(a) Calculate the homoskedasticity-only standard errors and, using the resulting t-statistic, perform a test on
the null hypothesis that there is no relationship between height and weight in the population of college
students.
(b) What is the alternative hypothesis in the above test, and what level of significance did you choose?
(c) Statistics and econometrics textbooks often ask you to calculate critical values based on some level of
significance, say 1%, 5%, or 10%. What sort of criteria do you think should play a role in determining which
level of significance to choose?
(d) What do you think the relationship is between testing for the significance of the slope and whether or not
the regression R2 is zero?
Answer: (a) The formula for the homoskedasticity-only standard errors requires knowledge of the residual
2
2
1
SSR, and SSR=TSS-ESS. Given the result in (2b), SSR=47,604.7, and hence S ^ =
variance. But S ^ =
u
u n-2
440.78. The SER is 21.00. Dividing by the square root of the variation in X then results in the
homoskedasticity-only standard error of the slope, which is 0.594. The t-statistic is 10.29, which rejects
the null hypothesis of no relationship.
(b) The alternative hypothesis should be one-sided, since there is strong prior knowledge that taller
people weigh more, on average. Given the size of the t-statistic, the null hypothesis can be rejected at
any reasonable level of significance.
(c) Clearly the levels should not be picked arbitrarily, but should depend on the cost involved with the
size and the power of the test. Consider a person who was accused of murder. In that case, the null
hypothesis is that he is innocent. The size of the test would be the probability of letting an innocent
person go to the electric chair, while (1-power of the test) gives the probability of letting a murderer go
free. There are obviously vastly different costs attached to each error, and these will determine the levels
chosen.
(d) If the slope in a regression function is zero, then there is no relationship between the two variables
involved. Hence testing for the significance of the regression slope is the same as testing whether or not
the regression R2 is zero.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 102
3) You have obtained measurements of height in inches of 29 female and 81 male students ( Studenth) at your
university. A regression of the height on a constant and a binary variable ( BFemme), which takes a value of one
for females and is zero otherwise, yields the following result:
Studenth = 71.0 – 4.84×BFemme , R2 = 0.40, SER = 2.0
(0.3) (0.57)
(a) What is the interpretation of the intercept? What is the interpretation of the slope? How tall are females, on
average?
(b) Test the hypothesis that females, on average, are shorter than males, at the 1% level.
(c) Is it likely that the error term is homoskedastic here?
Answer: (a) The intercept gives you the average height of males, which is 71 inches in this sample. The slope tells
you by how much shorter females are, on average (almost 5 inches). The average height of females is
therefore approximately 66 inches.
(b) The t-statistic for the difference in means is -8.49. For a one-sided test, the critical value is –2.33.
Hence the difference is statistically significant.
(c) It is safer to assume that the variances for males and females are different. In the underlying sample
the standard deviation for females was smaller.
4) (continuation from Chapter 4, number 3) You have obtained a sub -sample of 1744 individuals from the
Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The
regression, using heteroskedasticity-robust standard errors, yielded the following result:
Earn = 239.16 + 5.20×Age , R2 = 0.05, SER = 287.21.,
(20.24) (0.57)
where Earn and Age are measured in dollars and years respectively.
(a) Is the relationship between Age and Earn statistically significant?
(b) The variance of the error term and the variance of the dependent variable are related. Given the distribution
of earnings, do you think it is plausible that the distribution of errors is normal?
(c) Construct a 95% confidence interval for both the slope and the intercept.
Answer: (a) The t-statistic on the slope is 9.12, which is above the critical value from the standard normal
distribution for any reasonable level of significance.
(b) Since the earnings distribution is highly skewed, it is not reasonable to assume that the error
distribution is normal.
(c) The confidence interval for the slope is (4.08,6.32). The confidence interval for the intercept is
(199.49,278.83).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 103
5) (Continuation from Chapter 4, number 5) You have learned in one of your economics courses that one of the
determinants of per capita income (the “Wealth of Nations”) is the population growth rate. Furthermore you
also found out that the Penn World Tables contain income and population data for 104 countries of the world.
To test this theory, you regress the GDP per worker (relative to the United States) in 1990 ( RelPersInc) on the
difference between the average population growth rate of that country ( n) to the U.S. average population
growth rate (nus ) for the years 1980 to 1990. This results in the following regression output:
RelPersInc = 0.518 – 18.831×(n – nus) , R2 =0.522, SER = 0.197
(0.056) (3.177)
(a) Is there any reason to believe that the variance of the error terms is homoskedastic?
(b) Is the relationship statistically significant?
Answer: (a) There are vast differences in the size of these countries, both in terms of the population and GDP.
Furthermore, the countries are at different stages of economic and institutional development. Other
factors vary as well. It would therefore be odd to assume that the errors would be homoskedastic.
(b) The t-statistic is 5.93, making the relationship statistically significant, i.e., we can reject the null
hypothesis that the slope is different from zero.
6) You recall from one of your earlier lectures in macroeconomics that the per capita income depends on the
savings rate of the country: those who save more end up with a higher standard of living. To test this theory,
you collect data from the Penn World Tables on GDP per worker relative to the United States ( RelProd) in 1990
and the average investment share of GDP from 1980 -1990 (SK ), remembering that investment equals saving.
The regression results in the following output:
RelProd = –0.08 + 2.44×SK , R2 =0.46, SER = 0.21
(0.04) (0.38)
(a) Interpret the regression results carefully.
(b) Calculate the t-statistics to determine whether the two coefficients are significantly different from zero.
Justify the use of a one-sided or two-sided test.
(c) You accidentally forget to use the heteroskedasticity-robust standard errors option in your regression
package and estimate the equation using homoskedasticity -only standard errors. This changes the results as
follows:
RelProd = -0.08 + 2.44×SK , R2 =0.46, SER = 0.21
(0.04) (0.26)
You are delighted to find that the coefficients have not changed at all and that your results have become even
more significant. Why haven’t the coefficients changed? Are the results really more significant? Explain.
(d) Upon reflection you think about the advantages of OLS with and without homoskedasticity -only standard
errors. What are these advantages? Is it likely that the error terms would be heteroskedastic in this situation?
Answer: (a) An increase in the saving rate of 0.1, or from 0.15 to 0.25, results in an increase in relative GDP per
worker of 0.244, or from 0.5 to roughly 0.75. (Taiwan had a value of 0.5 for RelProd in 1990, while
Sweden was at 0.77.) There is no interpretation for the intercept. The regression explains 46 percent of
the variation in GDP per worker relative to the United States.
(b) The t- statistics are 2.00 and 6.42 for the intercept and slope respectively. You should use a two -sided
test for the intercept, since there are no prior expectations on whether it should be positive or negative.
Hence the intercept is statistically significant at the 5 percent level, but not at the 1 percent level. Since
we expect a positive sign on the slope, we should conduct a one-sided test. The critical values suggest
significance at any reasonable probability level of the size of the test.
(c) Whether you use homoskedasticity-only or heteroskedasticity-robust standard errors does not affect
the estimator, only the formula for the standard errors. If the assumption of homoskedasticity was valid,
then the results would be more significant. However, given the lengthy discussion on homoskedasticity
Stock/Watson 2e -- CVC2 8/23/06 -- Page 104
versus heteroskedasticity in the textbook, it is safer to conduct inference under the assumption of
heteroskedasticity.
(d) In the presence of homoskedasticity in addition to the least squares assumptions in the text, OLS is
BLUE (Gauss-Markov theorem). If the errors are heteroskedastic, then the GLS estimator (weighted least
squares) is BLUE if the form of heteroskedasticity is known, which rarely occurs in practice. Since
economic theory does not suggest, in general, that errors are homoskedastic, it is safer to assume that
they are not. This avoids invalid statistical inference.
7) Carefully discuss the advantages of using heteroskedasticity-robust standard errors over standard errors
calculated under the assumption of homoskedasticity. Give at least five examples where it is very plausible to
assume that the errors display heteroskedasticity.
Answer: There are virtually no examples where economic theory suggests that the errors are homoskedastic.
Hence the maintained hypothesis should be that they are heteroskedastic. Using homoskedasticity -only
standard errors when in truth heteroskedasticity-robust standard errors should be used, results in false
inference. What makes this worse is that homoskedasticity-only standard errors are typically smaller
than heteroskedasticity-robust standard errors, resulting in t-statistics that are too large, and hence
rejection of the null hypothesis too often. There is an alternative GLS estimator, weighted least squares,
which is BLUE, but requires knowledge of how the error variance depends on X, e.g. X or X 2 . Answers
will vary by student regarding the examples, but earnings functions, cross country beta -convergence
regressions, consumption functions, sports regressions involving teams from markets with varying
population size, weight-height relationships for children, etc., are all good candidates.
8) (Requires Appendix material from Chapters 4 and 5) Shortly before you are making a group presentation on
the testscore/student-teacher ratio results, you realize that one of your peers forgot to type all the relevant
information on one of your slides. Here is what you see:
TestScore = 698.9 – STR
(9.47) (0.48)
R2 = 0.051, SER = 18.6
In addition, your group member explains that he ran the regression in a standard spreadsheet program, and
that, as a result, the standard errors in parenthesis are homoskedasticity-only standard errors.
(a) Find the value for the slope coefficient.
(b) Calculate the t-statistic for the slope and the intercept. Test the hypothesis that the intercept and the slope
are different from zero.
(c) Should you be concerned that your group member only gave you the result for the homoskedasticity -only
standard error formula, instead of using the heteroskedasticity-robust standard errors?
Answer: (a) The relationship between the slope coefficient and the regression R2 is
n
n
^2
2
2
x
yi
1
i
^2
ESS
i=1
2 i=1
R2 =
.
=
1 =R × n
TSS
n
2
2
yi
xi
i=1
i=1
n
Given the information above, you need to find the TSS (=
2
y i ) and
i=1
n
2
x i . The TSS is relatively
i=1
easy to find: the SER is 18.6, and hence the SSR is 144,315.5. (Recall that SER = S ^ =
u
1
n-2
SSR
SSR
). This allows you to calculate the TSS, which is 152,109.6. (Recall that R2 = 1 n-2
TSS
SSR
1- R2
).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 105
n ^
2
ui =
i=1
TSS =
n
To find
i=1
2
x i , note that the homoskedasticity-only standard error for the slope is S ^ =
1
S^
u
n
2
xi
i=1
n
n
2
SER 2
2
xi =
. Hence,
x i = 38.72 = 1,499.6 .
S^
i=1
1
i=1
Inserting these results into the above formula, you get
^2
152,109.6
1 = 0.051 × 1,499.6 = 5.20
^
1 = -2.28 (luckily for you, your group member entered the negative
sign in front of the slope).
(b) The t-statistics are 73.82 and 4.75 respectively. Hence you can reject the two null hypothesis at any
reasonable level of significance.
(c) There is no theory that suggests the homoskedasticity in the error terms in this case. Given the serious
consequences for using homoskedasticity only standard errors in the presence of heteroskedasticity, you
should definitely use the heteroskedasticity robust standard errors for inference.
9) (Continuation of the Purchasing Power Parity question from Chapter 4) The news-magazine The Economist
regularly publishes data on the so called Big Mac index and exchange rates between countries. The data for 30
countries from the April 29, 2000 issue is listed below:
Country
Currency
Price of
Big Mac
Indonesia
Italy
South Korea
Chile
Spain
Hungary
Japan
Taiwan
Thailand
Czech Rep.
Russia
Denmark
Sweden
Mexico
France
Israel
China
South Africa
Switzerland
Poland
Germany
Malaysia
New Zealand
Singapore
Brazil
Canada
Australia
Argentina
Britain
Rupiah
Lira
Won
Peso
Peseta
Forint
Yen
Dollar
Baht
Crown
Ruble
Crown
Crown
Peso
Franc
Shekel
Yuan
Rand
Franc
Zloty
Mark
Dollar
Dollar
Dollar
Real
Dollar
Dollar
Peso
Pound
14,500
4,500
3,000
1,260
375
339
294
70
55
54.37
39.50
24.75
24.0
20.9
18.5
14.5
9.90
9.0
5.90
5.50
4.99
4.52
3.40
3.20
2.95
2.85
2.59
2.50
1.90
Actual Exchange Rate
per U.S. dollar
7,945
2,088
1,108
514
179
279
106
30.6
38.0
39.1
28.5
8.04
8.84
9.41
7.07
4.05
8.28
6.72
1.70
4.30
2.11
3.80
2.01
1.70
1.79
1.47
1.68
1.00
0.63
Stock/Watson 2e -- CVC2 8/23/06 -- Page 106
United States
Dollar
2.51
The concept of purchasing power parity or PPP (“the idea that similar foreign and domestic goods … should
have the same price in terms of the same currency,” Abel, A. and B. Bernanke, Macroeconomics, 4th edition,
Boston: Addison Wesley, 476) suggests that the ratio of the Big Mac priced in the local currency to the U.S.
dollar price should equal the exchange rate between the two countries.
After entering the data into your spread sheet program, you calculate the predicted exchange rate per U.S.
dollar by dividing the price of a Big Mac in local currency by the U.S. price of a Big Mac ($2.51). To test for PPP,
you regress the actual exchange rate on the predicted exchange rate.
The estimated regression is as follows:
ActualExRate = –27.05 + 1.35 × 1.35×Pr edExRate
(23.74) (0.02)
R2 = 0.994, n = 29, SER = 122.15
(a) Your spreadsheet program does not allow you to calculate heteroskedasticity robust standard errors.
Instead, the numbers in parenthesis are homoskedasticity only standard errors. State the two null hypothesis
under which PPP holds. Should you use a one-tailed or two-tailed alternative hypothesis?
(b) Calculate the two t-statistics.
(c) Using a 5% significance level, what is your decision regarding the null hypothesis given the two t-statistics?
What critical values did you use? Are you concerned with the fact that you are testing the two hypothesis
sequentially when they are supposed to hold simultaneously?
(d) What assumptions had to be made for you to use Student’s t-distribution?
Answer: (a) Under PPP, H0 : 0 = 0 and Ho : 1 = 1. Economic theory does not tell you whether the intercept
should be greater or less than zero if PPP does not hold. The same goes for the slope, i.e., you do not
know whether or not it is less than or greater than unity. As a result, you should use a two tailed
alternative hypothesis.
1.35- 1
-27.05 - 0
(b) The t-statistic for the intercept is t =
= -1.14. For the slope, it is t =
= 17.5.
0.02
23.74
(c) Using the Student t-distribution and 27 degrees of freedom, the critical value for a two-sided
alternative is 2.05. Hence you can reject the null hypothesis for the intercept but not the slope. Under
PPP, both hypothesis are supposed to hold simultaneously and if either or both are rejected, then PPP is
not supported by the data. As is discussed later in the textbook, testing hypothesis sequentially is not the
same as testing them simultaneously, since p-values change. (At an intuition and heroically assuming
independence here, Pr(AandB) = Pr(A) × Pr(B); and hence the rejection probability needs to be adjusted.)
(d) In addition to the standard three least squares assumptions, you had to assume that the regression
errors are homoskedastic, and that the regression errors are normally distributed. That is you had to
assume that the homoskedastic normal regression assumptions hold.
10) (Continuation from Chapter 4, number 6) The neoclassical growth model predicts that for identical savings
rates and population growth rates, countries should converge to the per capita income level. This is referred to
as the convergence hypothesis. One way to test for the presence of convergence is to compare the growth rates
over time to the initial starting level.
(a) The results of the regression for 104 countries were as follows:
g6090 = 0.019 – 0.0006 × RelProd 60 , R2 = 0.00007, SER = 0.016
(0.004) (0.0073)
where g6090 is the average annual growth rate of GDP per worker for the 1960 -1990 sample period, and
RelProd60 is GDP per worker relative to the United States in 1960. Numbers in parenthesis are
heteroskedasticity robust standard errors.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 107
Using the OLS estimator with homoskedasticity-only standard errors, the results changed as follows:
g6090 = 0.019 – 0.0006×RelProd 60 , R2 = 0.00007, SER = 0.016
(0.002) (0.0068)
Why didn’t the estimated coefficients change? Given that the standard error of the slope is now smaller, can
you reject the null hypothesis of no beta convergence? Are the results in the second equation more reliable than
the results in the first equation? Explain.
(b) You decide to restrict yourself to the 24 OECD countries in the sample. This changes your regression output
as follows (numbers in parenthesis are heteroskedasticity robust standard errors):
g6090 = 0.048 – 0.0404 RelProd 60 , R2 = 0.82 , SER = 0.0046
(0.004) (0.0063)
Test for evidence of convergence now. If your conclusion is different than in (a), speculate why this is the case.
(c) The authors of your textbook have informed you that unless you have more than 100 observations, it may
not be plausible to assume that the distribution of your OLS estimators is normal. What are the implications
here for testing the significance of your theory?
Answer: (a) Using homoskedasticity-only standard errors has no effect on the OLS estimator. The t- statistic
remains small and is certainly below the critical value. The results are less reliable since there is no
reason to believe that the error variance is homoskedastic.
(b) The t-statistic for the slope is 6.41. At face value, there is strong evidence for convergence.
Neoclassical growth theory does not predict unconditional convergence. Instead it only predicts
convergence if the savings rates and population growth rates are identical. It stands to reason that these
are much more similar between OECD countries than between the countries of the world.
(c) Since there are less than 30 observations, the distribution of the t-statistic is unknown. You should
therefore not conduct statistical inference.
11) You have collected 14,925 observations from the Current Population Survey. There are 6,285 females in the
sample, and 8,640 males. The females report a mean of average hourly earnings of $16.50 with a standard
deviation of $9.06. The males have an average of $20.09 and a standard deviation of $10.85. The overall mean
average hourly earnings is $18.58.
a.
Using the t-statistic for testing differences between two means (section 3.4 of your textbook), decide
whether or not there is sufficient evidence to reject the null hypothesis that females and males have
identical average hourly earnings.
b.
You decide to run two regressions: first, you simply regress average hourly earnings on an intercept
only. Next, you repeat this regression, but only for the 6,285 females in the sample. What will the
regression coefficients be in each of the two regressions?
c.
Finally you run a regression over the entire sample of average hourly earnings on an intercept and a
binary variable DFemme, where this variable takes on a value of 1 if the individual is a female, and is 0
otherwise. What will be the value of the intercept? What will be the value of the coefficient of the
binary variable?
d. What is the standard error on the slope coefficient? What is the t-statistic?
e.
Had you used the homoskedasticity-only standard error in (d) and calculated the t-statistic, how
would you have had to change the test-statistic in (a) to get the identical result?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 108
Answer: a. H0 : F = M; H1 : F
M
20.09-16.05
t=
. As a result, you can comfortably reject the null hypothesis at any reasonable
10.85 2 9.062
+
8640 6285
confidence level.
^
^
b. ahe = 0 = 18.58; ahe = 0 = 16.50
Hence for each of the regressions, the intercept takes on the value of the overall mean for average hourly
earnings, and the mean average hourly earnings for females.
c.
^
^
ahe = 0 + 1 × DFemme = 20.09 - 3.59× DFemme
The intercept is the mean of average hourly earnings for males, and the slope is the difference between
the mean of average hourly earnings of females and males.
d. The standard error on the slope coefficient is 0.16, which is identical to the standard error of the
t-statistic in (a) above. Hence the t-statistic is (-21.98).
e. You would have had to use the “pooled” standard error formula (3.23) in your textbook.
5.3 Mathematical and Graphical Problems
1) In order to formulate whether or not the alternative hypothesis is one -sided or two-sided, you need some
guidance from economic theory. Choose at least three examples from economics or other fields where you have
a clear idea what the null hypothesis and the alternative hypothesis for the slope coefficient should be. Write a
brief justification for your answer.
Answer: Answers will vary by student. The problem is to find examples where there is only a single explanatory
variable. A student may argue that the price coefficient in a demand function is downward sloping, but
unless you control for other variables, this may not be so. The demand for L.A. Laker tickets and their
price comes to mind. CAPM is a nice example. Perhaps the marginal propensity to consume in a
consumption function is another. Testing for speculative efficiency in exchange rate markets may also
work.
2) For the following estimated slope coefficients and their heteroskedasticity robust standard errors, find the
t-statistics for the null hypothesis H0 : 1 = 0. Assuming that your sample has more than 100 observations,
indicate whether or not you are able to reject the null hypothesis at the 10%, 5%, and 1% level of a one -sided
and two-sided hypothesis.
^
^
^
^
(a) 1 = 4.2, SE( 1 ) = 2.4
(b) 1 = 0.5, SE( 1 ) = 0.37
^
^
(c) 1 = 0.003, SE( 1 ) = 0.002
^
^
(d) 1 = 360, SE( 1 ) = 300
Answer: a) t = 1.75; reject null 10% level of two-sided test, and 5% of one-sided test.
b) t = 1.35; cannot reject null at 10% of two -sided test, reject null at 10% of one-sided test.
c) t = 1.50; cannot reject null at 10% of two -sided test, reject null at 10% of one-sided test.
d) t = 1.20; cannot reject null at 10% of both two-sided and one-sided test.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 109
3) Explain carefully the relationship between a confidence interval, a one -sided hypothesis test, and a two-sided
hypothesis test. What is the unit of measurement of the t-statistic?
Answer: In the case of a two-sided hypothesis test, the relationship between the t-statistic and the confidence
interval is straightforward. The t-statistic calculates the distance between the estimate and the
hypothesized value in standard deviations. If the distance is larger than 1.96 (size of the test: 5%), then
the distance is large enough to reject the null hypothesis. The confidence interval adds and subtracts 1.96
standard deviations in this case, and asks whether or not the hypothesized value is contained within the
confidence interval. Hence the two concepts resemble the two sides of a coin. They are simply different
ways to look at the same problem. In the case of the one -sided test, the relationship is more complex.
Since you are looking at a one-sided alternative, it does not really make sense to construct a confidence
interval. However, the confidence interval results in the same conclusion as the t-test if the critical value
from the standard normal distribution is appropriately adjusted, e.g. to 10% rather than 5%. The unit of
measurement of the t-statistic is standard deviations.
4) The effect of decreasing the student-teacher ratio by one is estimated to result in an improvement of the
districtwide score by 2.28 with a standard error of 0.52. Construct a 90% and 99% confidence interval for the
size of the slope coefficient and the corresponding predicted effect of changing the student -teacher ratio by
one. What is the intuition on why the 99% confidence interval is wider than the 90% confidence interval?
Answer: The 90% confidence interval for the slope is calculated as follows:
(2.28 - 1.645 × 0.52, 2.28 + 1.645 × 0.52) = (1.42, 3.14).
The corresponding predicted effect of a unit change in the student -teacher ratio is the same, since the
change in X is 1.
The 99% confidence interval for the slope coefficient and the unit change in the student -teacher ratio is:
(2.28 - 2.58 × 0.52, 2.28 + 2.58 × 0.52) = (0.94, 3.62).
The 99% confidence interval corresponds to a smaller size of the test. This means that you want to be
“more certain” that the population parameter is contained in the interval, and that requires a larger
interval.
5) Below you are asked to decide on whether or not to use a one-sided alternative or a two-sided alternative
hypothesis for the slope coefficient. Briefly justify your decision.
^d ^
^
(a) q i = 0 + 1 p i, where qd is the quantity demanded for a good, and p is its price.
^ actual ^
^
assess
actual
assess
, where p i
is the actual house price, and p i
is the assessed house price.
(b) p i
= 0 + 1p i
You want to test whether or not the assessment is correct, on average.
^
^
^
d
(c) Ci = 0 + 1 Y i , where C is household consumption, and Yd is personal disposable income.
Answer: (a) You would use a one-sided alternative hypothesis since economic theory suggests that the quantity
demanded and prices are negatively related.
(b) The alternative hypothesis is H1 : 1 1 since assessments could be too large or too small, on
average. You should also test for H1 : 0 0.
(c) You should use a one-sided alternative hypothesis, since economic theory strongly suggests that the
marginal propensity to consume is positive.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 110
n ^
aiYi , where a^i =
i=1
n ^
Xi – X
. For OLS to be conditionally unbiased, the following two conditions must hold:
ai = 0 and
n
2
i=1
Xi - X
i=1
^
6) (Requires Appendix material) Your textbook shows that OLS is a linear estimator 1 =
n ^
aiXi = 1. Show that this is the case.
i=1
Answer:
n
n ^
ai =
i=1
i=1
Xi - X
n
n
1
=
n
Xi - X 2
(Xi - X) = 0 since deviations from the mean sum to
Xi - X 2 i=1
i=1
i=1
n
n ^
n
aiXi =
zero.
i=1
i=1
Xi - X
n
n
Xi - X 2 =
1
Xi =
n
Xi - X 2
(Xi - X) Xi =
Xi - X 2 i=1
i=1
i=1
(Note that
n
n
(Xi - X) ×
n
i=1
n
Xi - X 2
=1
Xi - X 2
i=1
(Xi - X) =
n
(Xi - X) × Xi - X
i=1
i=1
i=1
i=1
term is zero again because of the definition of a mean.
n
(Xi - X) , where the last
i=1
7) (Requires Appedix material and Calculus) Equation (5.36) in your textbook derives the conditional variance
n
~
~
2
2
for any old conditionally unbiased estimator 1 to be var( 1 X1 , ..., Xn) = u
a i where the conditions for
i=1
n
n
aiXi = 1. As an alternative to the BLUE proof presented in
ai = 0 and
conditional unbiasedness are
i=1
i=1
your textbook, you recall from one of your calculus courses that you could minimize the variance subject to the
two constraints, thereby making the variance as small as possible while the constraints are holding. Show that
^
in doing so you get the OLS weights ai. (You may assume that X1 ,..., Xn are nonrandom (fixed over repeated
samples).)
2
u
n
2
ai - 1
n
ai - 2 (
n
aiXi - 1); i=1,... n where the i are two Lagrangian
i=1
i=1
i=1
multipliers. Minimizing the Lagrangian w.r.t. the n weights ai and the two Lagrangian multipliers,
Answer: The Lagrangian is
results in (n+2) linear equations in (n+2) unknowns. Solving these for the weights, you get ai =
Xi - X
n
^
= ai .
Xi - X 2
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 111
8) Your textbook states that under certain restrictive conditions, the t- statistic has a Student t-distribution with
n-2 degrees of freedom. The loss of two degrees of freedom is the result of OLS forcing two restrictions onto
the data. What are these two conditions, and when did you impose them onto the data set in your derivation of
the OLS estimator?
n ^
n ^
ui = 0 and
uiXi = 0. These were the result of minimizing the sum of the
i=1
i=1
squared prediction errors, i.e., taking the derivative of the prediction mistakes and setting them to zero.
Answer: The two conditions are
9) Assume that your population regression function is
Yi = iXi + ui
i.e., a regression through the origin (no intercept). Under the homoskedastic normal regression assumptions,
the t-statistic will have a Student t distribution with n-1 degrees of freedom, not n–2 degrees of freedom, as
was the case in Chapter 5 of your textbook. Explain. Do you think that the residuals will still sum to zero for
this case?
^
Answer: In deriving the OLS estimator 1 , you minimize the prediction mistake w.r.t. b1 only, not b0 and b1 . As a
n ^
uiXi = 0) not two. Hence there are n-1
result, you are only placing one restriction on the data, (
i=1
n ^
ui = 0 will no longer hold.
independent observations.
i=1
10) In many of the cases discussed in your textbook, you test for the significance of the slope at the 5% level. What
is the size of the test? What is the power of the test? Why is the probability of committing a Type II error so
large here?
Answer: The size of the test is the same as the probability of committing a Type I error. It is therefore 5%. If the
^
^
^
alternative hypothesis is vague, as is the case for H1 : 1 0 or H1 : 1 < 0 (or H1 : 1 > 0), then the
distribution of the alternative hypothesis is located virtually on top of the distribution of the null
hypothesis (it is just marginally moved to the left or the right). As a result, the probability of the Type II
error must be 1-probability of the Type I error. Hence the power of the test is only 5%, which is low.
11) Assume that the homoskedastic normal regression assumption hold. Using the Student t-distribution, find the
critical value for the following situation:
(a) n=28, 5% significance level, one-sided test.
(b) n=40, 1% significance level, two-sided test.
(c) n=10, 10% significance level, one-sided test.
(d) n= , 5% significance level, two-sided test.
Answer: (a) 1.71
(b) between 2.75 (30 degrees of freedom) and 2.66 (60 degrees of freedom)
(c) 1.40
(d) 1.96
Stock/Watson 2e -- CVC2 8/23/06 -- Page 112
12) Consider the following two models involving binary variables as explanatory variables:
Wage = 0 + 1 DFemme and Wage = 1DFemme + 2Male
where Wage is the hourly wage rate, DFemme is a binary variable that is equal to 1 if the person is a female, and
0 if the person is a male. Male = 1 – DFemme. Even though you have not learned about regression functions with
two explanatory variables (or regressions without an intercept), assume that you had estimated both models,
i.e., you obtained the estimates for the regression coefficients.
What is the predicted wage for a male in the two models? What is the predicted wage for a female in the two
models? What is the relationship between the s and the s? Why would you prefer one model over the other?
Answer: For DFemme = 1, the models read Wage =
Wage = 0 and Wage = 2 . Hence both
0 +
0 and
1 and Wage = 1; for DFemme = 0, the models read
2 give you the average wage of males. Clearly
0 =
1 . Since the wage for females is 1 = 0 + 1, and the wage for males is 0 , then 1 must be the
difference in the wage between males and females. Hence the first formulation allows you to test directly
whether or not the difference in means (here wages) is statistically significant.
^
^
^
^
13) Consider the sample regression function Yi = 0 + 1 Xi. The table below lists estimates for the slope ( 1 ) and
the variance of the slope estimator (
^ 2^
). In each case calculate the p-value for the null hypothesis of 1 = 0
1
and a two-tailed alternative hypothesis. Indicate in which case you would reject the null hypothesis at the 5%
significance level.
^
–1.76
0.0025
2.85
-0.00014
^ 2^
0.37
0.000003
117.5
0.0000013
1
1
Answer: The t-statistics are -2.89, 1.36, 0.26, and -0.123 respectively, with p-values of 0.004, 0.17, 0.79, and 0.90.
Hence you only reject the null hypothesis for the first case.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 113
14) Your textbook discussed the regression model when X is a binary variable
Yi = 0 + 1 Di + ui, i = 1..., n
Let Y represent wages, and let D be one for females, and 0 for males. Using the OLS formula for the slope
^
coefficient, prove that 1 is the difference between the average wage for males and the average wage for
females.
Answer: Using the OLS formula for the slope, we have
nf
n
wagei - nf wage
XiYi - nXY
^
1=
i=1
n
=
i=1
2
X i - nX2
i=1
2
nf
, where nf is the number of females in the sample and wage
nf n
is the average wage. Dividing both the numerator and the denominator by nf , we get
n
1 f
wagei - wage
nf
wage f - wage
i=1
n
1=
(wage f - wage), where wage f is the average wage of
=
=
n - nf
n - nf
nf
1n
n
females. But note that wage =
nf
nm
wage f +
wage m,where the m subscript indicates males. Substitution
n
n
of this expression for average wages into the previous expression results in
^
1=
nf
nm
nm
n
n
(wage f - wage) =
wage f wage f +
wage m = wage f wage m
n - nf
n
n
n - nf
n - nf
Since n - nf = nm , we have the desired result.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 114
15) Your textbook discussed the regression model when X is a binary variable
Yi = 0 + iDi + ui, i = 1,..., n
Let Y represent wages, and let D be one for females, and 0 for males. Using the OLS formula for the intercept
coefficient, prove that 0 is the average wage for males.
Answer:
0=Y-
^
1 X. It is easy but tedious to show that the formula for the slope reduces to the difference
between the average wage for females and the average wage for males.
nf
n
wagei - nf wage
XiYi - nXY
nf
i=1
i=1
and hence 0
1=
=
= wage f - wage m. But Y = wage and X =
n
n
2
2
2
nf
X i - nX
nf i=1
n
= wage - (wage f - wage m)
0 =
nf
nm
nf
. Substituting the expression wage =
wage f +
wagem then results in
n
n
n
nf
nm
wage m +
wage m, which equals the male average wage.
n
n
2
16) Let ui be distributed N(0, u ), i.e., the errors are distributed normally with a constant variance
^
(homoskedasticity). This results in 1 being distributed N( 1 ,
2
^ ), where
1
2
^
1
=
2
u
i=1
inference would be straightforward if
. Statistical
n
(Xi - X)2
2
u was known. One way to deal with this problem is to replace
2
u
2
^
with an estimator S ^ . Clearly since this introduces more uncertainty, you cannot expect 1 to be still normally
u
distributed. Indeed, the t-statistic now follows Student’s t distribution. Look at the table for the Student
t-distribution and focus on the 5% two-sided significance level. List the critical values for 10 degrees of
freedom, 30 degrees of freedom, 60 degrees of freedom, and finally degrees of freedom. Describe how the
notion of uncertainty about
2
u can be incorporated about the tails of the t-distribution as the degrees of
freedom increase.
Answer: More uncertainty implies that the tales of the distribution should be stretched further to the left and right
when compared to the normal distribution. Hence the critical values for the 5% significance level should
be greater than 1.96 in absolute levels. However, as the number of observations (degrees of freedom)
2
increase, S ^ will converge towards
u
2
u , so that the shape of the t-distribution should resemble the
normal distribution more and more. Finally, when there are infinite degrees of freedom, the sample
2
formula S ^ becomes the population variance, and the t-distribution should converge to the normal
u
distribution.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 115
17) In a Monte Carlo study, econometricians generate multiple sample regression functions from a known
population regression function. For example, the population regression function could be Yi = 0 + 1 Xi = 100 –
0.5 Xi. The Xs could be generated randomly or, for simplicity, be nonrandom (“fixed over repeated samples”).
If we had ten of these Xs, say, and generated twenty Ys, we would obviously always have all observations on a
straight line, and the least squares formulae would always return values of 100 and 0.5 numerically. However,
if we added an error term, where the errors would be drawn randomly from a normal distribution, say, then
the OLS formulae would give us estimates that differed from the population regression function values.
Assume you did just that and recorded the values for the slope and the intercept. Then you did the same
experiment again (each one of these is called a “replication”). And so forth. After 1,000 replications, you plot
the 1,000 intercepts and slopes, and list their summary statistics.
Sample: 1 1000
BETA0_HAT
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
BETA1_HAT
100.014
100.021
106.348
93.862
1.994
0.013
3.026
–0.500
–0.500
–0.468
–0.538
0.011
–0.042
2.986
0.055
0.973
0.305
0.858
Sum
100014.353
Sum Sq. Dev. 3972.403
–499.857
0.118
Observations
1000.000
Jarque-Bera
Probability
1000.000
Here are the corresponding graphs:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 116
Using the means listed next to the graphs, you see that the averages are not exactly 100 and –0.5. However,
they are “close.” Test for the difference of these averages from the population values to be statistically
significant.
Answer: You can use a simple t-statistic to calculate whether or not (-0.499857) and 100.0144 are statistically
different from (-0.5) and 100. In the denominator of that statistic you would simply put the standard
deviations (0.0109 and 1.9941) divided by the square root of 1,000. As you can see, r =
100.0144 - 100
-0.499857 - (-0.50)
= -0.41 and t =
= 0.29. Neither one of the estimators is more than 1.96
0.0109
1.9941
1000
1000
standard deviations from truth, and hence you cannot reject the null hypothesis that the estimators are
unbiased.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 117
n
^
18) In the regression through the origin model Yi = 1 Xi + ui, the OLS estimator is 1 =
i=1
n
XiYi
. Prove that the
2
Xi
i=1
estimator is a linear function of Y1 ,..., Yn and prove that it is conditionally unbiased.
Answer: Let wi =
Xi
n
^
2
Xi
, then 1 = wiYi. Hence the OLS estimator is a linear function of Y1 ..., Yn. Next, since
i=1
Yi = 1 Xi + ui, we get
n
n
n
^
wiui .
wi ( iXi + ui) = 1
wiXi +
1=
i=1
i=1
i=1
n
2
Xi
n
n
Xi
^
wiui . Taking expectations on both sides,
wiXi = i=1
wi =
,
= 1 implies 1 = 1 +
n
n
2 i=1
2
i=1
Xi
Xi
i=1
i=1
we find
^
E( 1 ) = 1 + E
n
1
n
wiui = 1 + E
i=1
1
n
n
i=1
n
i=1
1
n
Xiui
2
Xi
= 1+E
n
XiE(ui X1 ,..., Xn)
i=1
1
n
n
2
Xi
= 1
i=1
The last equality follows by using the law of iterated expectations. By least squares assumptions, ui is
distributed independently of X for all observations other than i, so E(ui X1 ,..., Xn) = E(ui X i) = 0. Hence
^
E( 1 X 1 ,...,Xn) = 1.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 118
19) The neoclassical growth model predicts that for identical savings rates and population growth rates, countries
should converge to the per capita income level. This is referred to as the convergence hypothesis. One way to
test for the presence of convergence is to compare the growth rates over time to the initial starting level, i.e., to
run the regression g6090 = 0 + 1 × RelProd 60 , where g6090 is the average annual growth rate of GDP per
worker for the 1960-1990 sample period, and RelProd 60 is GDP per worker relative to the United States in
1960. Under the null hypothesis of no convergence, 1 = 0; H1 : 1 < 0, implying (“beta”) convergence. Using a
standard regression package, you get the following output:
Dependent Variable: G6090
Method: Least Squares
Date: 07/11/06 Time: 05:46
Sample: 1 104
Included observations: 104
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable
C
YL60
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
0.018989
–0.000566
0.000068
-0.009735
0.015992
0.026086
283.5498
1.367534
Std. Error t-Statistic
0.002392
7.939864
0.005056
-0.111948
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
Prob.
0.0000
0.9111
0.018846
0.015915
-5.414418
-5.363565
0.006986
0.933550
You are delighted to see that this program has already calculated p-values for you. However, a peer of yours
points out that the correct p-value should be 0.4562. Who is right?
Answer: Statistical packages typically do not know what the alternative hypothesis is. As a result, the packages
calculate t-statistics and p-values for H1 : 1 0. You can tell your fellow student that she is right and
you will still have to calculate p-values (and t-statistics) by hand for cases other than H1 : 1
0.
20) Changing the units of measurement obviously will have an effect on the slope of your regression function. For
n
* *
xi yi
^*
a ^
i=1
example, let Y*= aY and X* = bX. Then it is easy but tedious to show that 1 =
. Given this
=
n
b 1
*2
xi
i=1
result, how do you think the standard errors and the regression R2 will change?
Answer: Statistical inference should not depend on whim, and hence changes in the units of measurement cannot
have an effect on the regression R2 . Also, the t-statistics should not change, and hence SE(
change accordingly (SE(
^*
^
a
1 ) = b × SE( 1 )).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 119
^*
1 ) must
21) Using the California School data set from your textbook, you run the following regression:
TestScr = 698.9 - 2.28 STR
n = 420, SER = 9.4
where TestScore is the average test score in the district and STR is the student-teacher ratio. The sample
standard deviation of test scores is 19.05, and the sample standard deviation of the student teacher ratio is
1.89.
a.
Find the regression R2 and the correlation coefficient between test scores and the student teacher ratio.
b.
Find the homoskedasticity-only standard error of the slope.
Answer: a. R2 = 1 -
144611.3
SSR
=1= 0.051
152490.6
TSS
The correlation coefficient is the (negative) square root of this, or (-0.23).
^
18.6
b. Using formula (5.29), you get
1 = 38.8 = 0.48
22) Using the California School data set from your textbook, you run the following regression:
TestScr = 698.9 - 2.28 STR
n = 420, R2 = 0.051, SER = 18.6
where TestScore is the average test score in the district and STR is the student-teacher ratio. Using
heteroskedasticity robust standard errors, you find
while chosing the homoskedasticity-only option, the standard error is 0.48.
a.
Calculate the t-statistic for both standard errors.
b.
Which of the two t-statistics should you base your inference on?
Answer: a. The respective t-statistics are 4.39 (heteroskedasticity-robust standard error) and 4.75
(homoskedasticity-only standard error).
b. Given the similarity of the two statistics and the fact that both are greater than 4, it will not make
much of a difference which one you will use. However, it is “cleaner” to use the
heteroskedasticity-robust formula, since, in general, it will result in the correct inference
procedure.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 120
23) Using data from the Current Population Survey, you estimate the following relationship between average
hourly earnings (ahe) and the number of years of education (educ):
ahe = -4.58 + 1.71 educ
The heteroskedasticity-robust standard error on the slope is (0.03). Calculate the 95% confidence interval for
the slope. Repeat the exercise using the 90% and then the 99% confidence interval. Can you reject the null
hypothesis that the slope coefficient is zero in the population?
Answer: The 95% confidence interval for the slope is (1.65,1.77). For the 90% confidence level, you get (1.66,1.75)
while the interval is (1.63,1.79) for the 99% level. Since neither of the confidence intervals contains zero,
you can comfortably reject the null hypothesis in all three cases.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 121
Chapter 6 Linear Regression with Multiple Regressors
6.1 Multiple Choice
1) In the multiple regression model, the adjusted R2 , R2
A) cannot be negative.
B) will never be greater than the regression R2 .
C) equals the square of the correlation coefficient r.
D) cannot decrease when an additional explanatory variable is added.
Answer: B
2) Under imperfect multicollinearity
A) the OLS estimator cannot be computed.
B) two or more of the regressors are highly correlated.
C) the OLS estimator is biased even in samples of n > 100.
D) the error terms are highly, but not perfectly, correlated.
Answer: B
3) When there are omitted variables in the regression, which are determinants of the dependent variable, then
A) you cannot measure the effect of the omitted variable, but the estimator of your included variable(s) is
(are) unaffected.
B) this has no effect on the estimator of your included variable because the other variable is not included.
C) this will always bias the OLS estimator of the included variable.
D) the OLS estimator is biased if the omitted variable is correlated with the included variable.
Answer: D
4) Imagine you regressed earnings of individuals on a constant, a binary variable (“ Male”) which takes on the
value 1 for males and is 0 otherwise, and another binary variable (“Female”) which takes on the value 1 for
females and is 0 otherwise. Because females typically earn less than males, you would expect
A) the coefficient for Male to have a positive sign, and for Female a negative sign.
B) both coefficients to be the same distance from the constant, one above and the other below.
C) none of the OLS estimators to exist because there is perfect multicollinearity.
D) this to yield a difference in means statistic.
Answer: C
5) When you have an omitted variable problem, the assumption that E(ui Xi) = 0 is violated. This implies that
A) the sum of the residuals is no longer zero.
B) there is another estimator called weighted least squares, which is BLUE.
C) the sum of the residuals times any of the explanatory variables is no longer zero.
D) the OLS estimator is no longer consistent.
Answer: D
6) If you had a two regressor regression model, then omitting one variable which is relevant
A) will have no effect on the coefficient of the included variable if the correlation between the excluded and
the included variable is negative.
B) will always bias the coefficient of the included variable upwards.
C) can result in a negative value for the coefficient of the included variable, even though the coefficient will
have a significant positive effect on Y if the omitted variable were included.
D) makes the sum of the product between the included variable and the residuals different from 0.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 122
7) (Requires Calculus) In the multiple regression model you estimate the effect on Yi of a unit change in one of
the Xi while holding all other regressors constant. This
A) makes little sense, because in the real world all other variables change.
B) corresponds to the economic principle of mutatis mutandis.
C) leaves the formula for the coefficient in the single explanatory variable case unaffected.
D) corresponds to taking a partial derivative in mathematics.
Answer: D
8) You have to worry about perfect multicollinearity in the multiple regression model because
A) many economic variables are perfectly correlated.
B) the OLS estimator is no longer BLUE.
C) the OLS estimator cannot be computed in this situation.
D) in real life, economic variables change together all the time.
Answer: C
9) In a two regressor regression model, if you exclude one of the relevant variables then
A) it is no longer reasonable to assume that the errors are homoskedastic.
B) OLS is no longer unbiased, but still consistent.
C) you are no longer controlling for the influence of the other variable.
D) the OLS estimator no longer exists.
Answer: C
10) The intercept in the multiple regression model
A) should be excluded if one explanatory variable has negative values.
B) determines the height of the regression line.
C) should be excluded because the population regression function does not go through the origin.
D) is statistically significant if it is larger than 1.96.
Answer: B
11) In the multiple regression model, the least squares estimator is derived by
A) minimizing the sum of squared prediction mistakes.
B) setting the sum of squared errors equal to zero.
C) minimizing the absolute difference of the residuals.
D) forcing the smallest distance between the actual and fitted values.
Answer: A
12) The sample regression line estimated by OLS
A) has an intercept that is equal to zero.
B) is the same as the population regression line.
C) cannot have negative and positive slopes.
D) is the line that minimizes the sum of squared prediction mistakes.
Answer: D
13) The OLS residuals in the multiple regression model
A) cannot be calculated because there is more than one explanatory variable.
B) can be calculated by subtracting the fitted values from the actual values.
C) are zero because the predicted values are another name for forecasted values.
D) are typically the same as the population regression function errors.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 123
14) Under the least squares assumptions for the multiple regression problem (zero conditional mean for the error
term, all Xi and Yi being i.i.d., all Xi and ui having finite fourth moments, no perfect multicollinearity), the OLS
estimators for the slopes and intercept
A) have an exact normal distribution for n > 25.
B) are BLUE.
C) have a normal distribution in small samples as long as the errors are homoskedastic.
D) are unbiased and consistent.
Answer: D
15) The main advantage of using multiple regression analysis over differences in means testing is that the
regression technique
A) allows you to calculate p-values for the significance of your results.
B) provides you with a measure of your goodness of fit.
C) gives you quantitative estimates of a unit change in X.
D) assumes that the error terms are generated from a normal distribution.
Answer: C
16) In a multiple regression framework, the slope coefficient on the regressor X2i
A) takes into account the scale of the error term.
B) is measured in the units of Yi divided by units of X2i.
C) is usually positive.
D) is larger than the coefficient on X1i.
Answer: B
17) One of the least squares assumptions in the multiple regression model is that you have random variables which
are “i.i.d.” This stands for
A) initially indeterminate differences.
B) irregularly integrated dichotomies.
C) identically initiated deltas (as in changes).
D) independently and identically distributed.
Answer: D
18) Omitted variable bias
A) will always be present as long as the regression R2 < 1.
B) is always there but is negligible in almost all economic examples.
C) exists if the omitted variable is correlated with the included regressor but is not a determinant of the
dependent variable.
D) exists if the omitted variable is correlated with the included regressor and is a determinant of the
dependent variable.
Answer: D
19) The following OLS assumption is most likely violated by omitted variables bias:
A) E(ui Xi) = 0
B) (Xi, Yi) i=1,..., n are i.i.d draws from their joint distribution
C) there are no outliers for Xi, ui
D) there is heteroskedasticity
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 124
20) The population multiple regression model when there are two regressors, X1i and X2i can be written as
follows, with the exception of:
A) Yi = 0 + 1 X1i + 2 X2i + ui, i = 1,..., n
B) Yi = 0 X0i + 1 X1i + 2 X2i + ui, X0i = 1, i = 1,..., n
2
C) Yi =
j Xji + ui, i = 1,..., n
j=0
D) Yi = 0 + 1 X1i + 2 X2i + ... + kXki + ui , i = 1,..., n
Answer: D
21) In the multiple regression model Yi = 0 + 1 X1i+ 2 X2i + ... + kXki + ui , i = 1,..., n, the OLS estimators are
obtained by minimizing the sum of
n
Yi - b0 - b1 X1i - ... - bkXki 2
A) squared mistakes in
i=1
n
Yi - b0 - b1 X1i - ... - bkXki - ui 2
B) squared mistakes in
i=1
n
Yi - b0 - b1 X1i - ... - bkXki
C) absolute mistakes in
i=1
n
Yi - b0 - b1 Xi 2
D) squared mistakes in
i=1
Answer: A
22) In the multiple regression model, the SER is given by
n ^
1
A)
ui
n-2
i=1
n
1
ui
B)
n - k -2
i=1
n ^
1
C)
ui
n- k-2
i=1
n ^
1
2
ui
D)
n- k-1
i=1
Answer: D
23) In multiple regression, the R2 increases whenever a regressor is
A) added unless the coefficient on the added regressor is exactly zero.
B) added.
C) added unless there is heterosckedasticity.
D) greater than 1.96 in absolute value.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 125
24) The adjusted R2 , or R2 , is given by
n-2 SSR
A) 1n - k -1 TSS
B) 1-
n-2 ESS
n - k -1 TSS
C) 1-
n-1 SSR
n - k -1 TSS
D)
ESS
TSS
Answer: C
25) Consider the following multiple regression models (a) to (d) below. DFemme = 1 if the individual is a female,
and is zero otherwise; DMale is a binary variable which takes on the value one if the individual is male, and is
zero otherwise; DMarried is a binary variable which is unity for married individuals and is zero otherwise, and
DSingle is (1-DMarried). Regressing weekly earnings (Earn) on a set of explanatory variables, you will
experience perfect multicollinearity in the following cases unless:
A) Earni = 0 + 1 DFemme + 2 Dmale + 3 X3i
B) Earni = 0 + 1 DMarried + 2 DSingle + 3 X3i
C) Earni = 0 + 1 DFemme + 3 X3i
D) Earni = 1 DFemme + 2 Dmale + 3 DMarried + 4 DSingle + 5 X3i
Answer: C
26) Consider the multiple regression model with two regressors X1 and X2 , where both variables are determinants
of the dependent variable. When omitting X2 from the regression, then there will be omitted variable bias for 1
A) if X1 and X2 are correlated
B) always
C) if X2 is measured in percentages
D) if X2 is a dummy variable
Answer: A
27) The dummy variable trap is an example of
A) imperfect multicollinearity
B) something that is of theoretical interest only
C) perfect multicollinearity
D) something that does not happen to university or college students
Answer: C
28) Imperfect multicollinearity
A) is not relevant to the field of economics and business administration
B) only occurs in the study of finance
C) means that the least squares estimator of the slope is biased
D) means that two or more of the regressors are highly correlated
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 126
29) Consider the multiple regression model with two regressors X1 and X2 , where both variables are determinants
of the dependent variable. You first regress Y on X1 only and find no relationship. However when regressing Y
on X1 and X2 , the slope coefficient 1 changes by a large amount. This suggests that your first regression
suffers from
A) heteroskedasticity
B) perfect multicollinearity
C) omitted variable bias
D) dummy variable trap
Answer: C
30) Imperfect multicollinearity
A) implies that it will be difficult to estimate precisely one or more of the partial effects using the data at
hand
B) violates one of the four Least Squares assumptions in the multiple regression model
C) means that you cannot estimate the effect of at least one of the Xs on Y
D) suggests that a standard spreadsheet program does not have enough power to estimate the multiple
regression model
Answer: A
6.2 Essays and Longer Questions
1) Females, on average, are shorter and weigh less than males. One of your friends, who is a pre -med student,
tells you that in addition, females will weigh less for a given height. To test this hypothesis, you collect height
and weight of 29 female and 81 male students at your university. A regression of the weight on a constant,
height, and a binary variable, which takes a value of one for females and is zero otherwise, yields the following
result:
Studentw = -229.21 – 6.36 × Female + 5.58 × Height , R2 =0.50, SER = 20.99
where Studentw is weight measured in pounds and Height is measured in inches.
(a) Interpret the results. Does it make sense to have a negative intercept?
(b) You decide that in order to give an interpretation to the intercept you should rescale the height variable.
One possibility is to subtract 5 ft. or 60 inches from your Height, because the minimum height in your data set is
62 inches. The resulting new intercept is now 105.58. Can you interpret this number now? Do you thing that the
regression R2 has changed? What about the standard error of the regression?
(c) You have learned that correlation does not imply causation. Although this is true mathematically, does this
always apply?
Answer: (a) For every additional inch in height, weight increases by roughly 5.5 pounds. Female students weigh
approximately 6.5 pounds less than male students, controlling for height. The regression explains 50
percent of the weight variation among students. It does not make sense to interpret the intercept, since
there are no observations close to the origin, or, put differently, there are no individuals who are zero
inches tall.
(b) There are now observations close to the origin and you can therefore interpret the intercept. A
student who is 5ft. tall will weight roughly 105.5 pounds, on average. The two slopes will be unaffected,
as will be the regression R2 . Since the explanatory power of the regression is unaffected by rescaling, and
the dependent variable and the total sums of squares have remained unchanged, the sums of squared
residuals, and hence the SER, must also remain the same.
(c) Although true in general, there are cases where Y cannot cause X, as is the case here. Gaining weight
is not a good way for becoming taller, or put differently, weighing 250 pounds will not make students
over 7 ft. tall.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 127
2) The cost of attending your college has once again gone up. Although you have been told that education is
investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not
pleased. One of the administrators at your university/college does not make the situation better by telling you
that you pay more because the reputation of your institution is better than that of others. To investigate this
hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000 -2001
U.S. News and World Report annual rankings. Next you perform the following regression
Cost = 7,311.17 + 3,985.20 × Reputation – 0.20 × Size
+ 8,406.79 × Dpriv – 416.38 × Dlibart – 2,376.51 × Dreligion
R2 =0.72, SER = 3,773.35
where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News and World
Report (based on a survey of university presidents and chief academic officers), which ranges from 1 (“marginal
”) to 5 (“distinguished”), Size is the number of undergraduate students, and Dpriv, Dlibart, and Dreligion are
binary variables indicating whether the institution is private, a liberal arts college, and has a religious
affiliation.
(a) Interpret the results. Do the coefficients have the expected sign?
(b) What is the forecasted cost for a liberal arts college, which has no religious affiliation, a size of 1,500
students and a reputation level of 4.5? (All liberal arts colleges are private.)
(c) To save money, you are willing to switch from a private university to a public university, which has a
ranking of 0.5 less and 10,000 more students. What is the effect on your cost? Is it substantial?
(d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes
Cost = 5,450.35 + 3,538.84 × Reputation + 10,935.70 × Dpriv – 2,783.31 × Dreligion;
R2 =0.72, SER = 3,792.68
Why do you think that the effect of attending a private institution has increased now?
(e) What can you say about causation in the above relationship? Is it possible that Cost affects Reputation rather
than the other way around?
Answer: (a) An increase in reputation by one category, increases the cost by roughly $3,985. The larger the size of
the college/university, the lower the cost. An increase of 10,000 students results in a $2,000 lower cost.
Private schools charge roughly $8,406 more than public schools. A school with a religious affiliation is
approximately $2,376 cheaper, presumably due to subsidies, and a liberal arts college also charges
roughly $416 less. There are no observations close to the origin, so there is no direct interpretation of the
intercept. Other than perhaps the coefficient on liberal arts colleges, all coefficients have the expected
sign.
(b) $ 32,935.
(c) Roughly $ 12,4.00. Since over the four years of education, this implies approximately $50,000, it is a
substantial amount of money for the average household.
(d) Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these
variables had negative coefficients.
(e) It is very possible that the university president and chief academic officer are influenced by the cost
variable in answering the U.S. News and World Report survey. If this were the case, then the above
equation suffers from simultaneous causality bias, a topic that will be covered in a later chapter.
However, this poses a serious threat to the internal validity of the study.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 128
3) In the multiple regression model with two explanatory variables
Yi = 0 +
1 X1i +
2 X2i + ui
the OLS estimators for the three parameters are as follows (small letters refer to deviations from means as in zi
= Zi – Z):
^
^
^
0 = Y – 1 X1 – 2 X2
n
n
n
n
2
x 2i y ix 2i
x 1ix 2i
y ix 1i
^
i=1
i=1
i=1
i=1
1=
n
n
n
2
2
x 1i
x 2i - (
x 1ix 2i)2
i=1
i=1
i=1
n
n
^
i=1
2=
y ix 2i
n
2
x 1i -
i=1
2
x 1i
i=1
n
i=1
n
i=1
n
y ix 1i
n
2
x 2i - (
i=1
i=1
x 1ix 2i
x 1ix 2i )2
You have collected data for 104 countries of the world from the Penn World Tables and want to estimate the
effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from
1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. The various sums needed to calculate the OLS
estimates are given below:
n
n
X1i = 2.025;
n
X2i = 17.313
i=1
i=1
n
n
n
2
2
2
y i = 8.3103;
x 1i = .0122;
x 2i = 0.6422
i=1
i=1
i=1
n
n
n
y i x 1i = -0.2304;
y i x 2i = 1.5676;
x 1i x 2i = -0.0520
i=1
i=1
i=1
(a) What are your expected signs for the regression coefficient? Calculate the coefficients and see if their signs
correspond to your intuition.
(b) Find the regression R2 , and interpret it. What other factors can you think of that might have an influence on
i=1
Yi = 33.33;
productivity?
^
^
Answer: (a) You expect 1 < 0 and 2 > 0 with no prior expectation on the intercept. Substituting the above
^
^
^
numbers into the equations for the regression coefficients results in 1 = -12.95, 2 = 1.39, and 0 = 0.34.
^ n
^ n
x
y
y i x 2i
+
1
2
i 1i
i=1
i=1
(b) R2 =
= 0.62. 62 percent of the variation in relative productivity is
n
2
yi
i=1
explained by the regression. There is a vast literature on the subject and students’ answers will obviously
vary. Some may focus on additional economic variables such as the initial level of productivity and the
inflation rate during the sample period. Others may emphasize institutional variables such as whether or
Stock/Watson 2e -- CVC2 8/23/06 -- Page 129
not the country was democratic over the sample period, or had political stability, etc.
4) A subsample from the Current Population Survey is taken, on weekly earnings of individuals, their age, and
their gender. You have read in the news that women make 70 cents to the $1 that men earn. To test this
hypothesis, you first regress earnings on a constant and a binary variable, which takes on a value of 1 for
females and is 0 otherwise. The results were:
Earn = 570.70 – 170.72 × Female, R2 =0.084, SER = 282.12.
(a) There are 850 females in your sample and 894 males. What are the mean earnings of males and females in
this sample? What is the percentage of average female income to male income?
(b) You decide to control for age (in years) in your regression results because older people, up to a point, earn
more on average than younger people. This regression output is as follows:
Earn = 323.70 – 169.78 × Female + 5.15 × Age, R2 =0.135, SER = 274.45.
Interpret these results carefully. How much, on average, does a 40 -year-old female make per year in your
sample? What about a 20-year-old male? Does this represent stronger evidence of discrimination against
females?
Answer: (a) Males earn $570.70, females $399.98. Percentage of average female income to male income is 70.1% in
the sample.
(b) As individuals become one year older, they earn $5.15 more, on average. Females earn significantly
less money on average and for a given age. 13.5 percent of the earnings variation is explained by the
regression. A 40-year-old female earns $359.92, while a 20-year-old male makes $426.70. There is
somewhat more evidence here, since age has been added as a regressor. However, many attributes,
which could potentially explain this difference, are still omitted.
5) You have collected data from Major League Baseball (MLB) to find the determinants of winning. You have a
general idea that both good pitching and strong hitting are needed to do well. However, you do not know how
much each of these contributes separately. To investigate this problem, you collect data for all MLB during
1999 season. Your strategy is to first regress the winning percentage on pitching quality (“Team ERA”), second
to regress the same variable on some measure of hitting (“OPS – On -base Plus Slugging percentage”), and
third to regress the winning percentage on both.
Summary of the Distribution of Winning Percentage, On Base plus Slugging Percentage,
and Team Earned Run Average for MLB in 1999
Average
Team
ERA
OPS
Standard
deviation
Percentile
10%
25%
40%
4.35
4.72
50%
60%
(median)
4.78
4.91
75%
90%
5.06
5.25
4.71
0.53
3.84
0.778
0.034
0.720 0.754 0.769 0.780
0.790 0.798 0.820
0.08
0.40
0.49
Winning
0.50
Percentage
0.43
0.46
0.48
The results are as follows:
Winpct = 0.94 – 0.100 × teamera , R2 = 0.49, SER = 0.06.
Winpct = -0.68 + 1.513 × ops , R2 =0.45, SER = 0.06.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 130
0.59
0.60
Winpct = -0.19 – 0.099 × teamera + 1.490 × ops , R2 =0.92, SER = 0.02.
(a) Interpret the multiple regression. What is the effect of a one point increase in team ERA? Given that the
Atlanta Braves had the most wins that year, wining 103 games out of 162, do you find this effect important?
Next analyze the importance and statistical significance for the OPS coefficient. (The Minnesota Twins had the
minimum OPS of 0.712, while the Texas Rangers had the maximum with 0.840.) Since the intercept is negative,
and since winning percentages must lie between zero and one, should you rerun the regression through the
origin?
(b) What are some of the omitted variables in your analysis? Are they likely to affect the coefficient on Team
ERA and OPS given the size of the R2 and their potential correlation with the included variables?
Answer: (a) A single point increase in team ERA lowers the winning percentage by approximately 10 percent. A
0.1 increase in OPS results roughly in an increase of 15 percent. Given that there are no observations
close to the origin, you should not interpret the intercept. The multiple regression explains 92 percent of
the variation in winning percentage. The Atlanta Braves only won 63.6 percent of their games. Given
that this represents the best record during that season, a 10 percentage point drop is important.
Although the intercept cannot be interpreted, it anchors the regression at a certain level and should
therefore not be omitted.
(b) The quality of the management and coaching comes to mind, although both may be reflected in the
performance statistics, as are salaries. There are other aspects of baseball performance that are missing,
such as the fielding percentage of the team.
6) In the process of collecting weight and height data from 29 female and 81 male students at your university, you
also asked the students for the number of siblings they have. Although it was not quite clear to you initially
what you would use that variable for, you construct a new theory that suggests that children who have more
siblings come from poorer families and will have to share the food on the table. Although a friend tells you that
this theory does not pass the “straight-face” test, you decide to hypothesize that peers with many siblings will
weigh less, on average, for a given height. In addition, you believe that the muscle/fat tissue composition of
male bodies suggests that females will weigh less, on average, for a given height. To test these theories, you
perform the following regression:
Studentw = -229.92 – 6.52 × Female + 0.51 × Sibs+ 5.58 × Height,
R2 =0.50, SER = 21.08
where Studentw is in pounds, Height is in inches, Female takes a value of 1 for females and is 0 otherwise, Sibs is
the number of siblings.
Interpret the regression results.
Answer: For every additional inch in height, students weigh, on average, roughly 5.5 pounds more. For a given
height and number of siblings, female students weigh approximately 6.5 pounds less. For every
additional sibling, the weight of students increases by half a pound. Since there are no observations close
to the origin, you cannot interpret the intercept. The regression explains half of the variation in student
weight.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 131
7) You have collected data for 104 countries to address the difficult questions of the determinants for differences
in the standard of living among the countries of the world. You recall from your macroeconomics lectures that
the neoclassical growth model suggests that output per worker (per capita income) levels are determined by,
among others, the saving rate and population growth rate. To test the predictions of this growth model, you
run the following regression:
RelPersInc = 0.339 – 12.894 × n + 1.397 × SK , R2 =0.621, SER = 0.177
where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate,
1980-1990, and SK is the average investment share of GDP from 1960 to1990 (remember investment equals
saving).
(a) Interpret the results. Do the signs correspond to what you expected them to be? Explain.
(b) You remember that human capital in addition to physical capital also plays a role in determining the
standard of living of a country. You therefore collect additional data on the average educational attainment in
years for 1985, and add this variable (Educ) to the above regression. This results in the modified regression
output:
RelPersInc = 0.046 – 5.869 × n + 0.738 × SK + 0.055 × Educ, R2 =0.775, SER = 0.1377
How has the inclusion of Educ affected your previous results?
(c) Upon checking the regression output, you realize that there are only 86 observations, since data for Educ is
not available for all 104 countries in your sample. Do you have to modify some of your statements in (d)?
(d) Brazil has the following values in your sample: RelPersInc = 0.30, n = 0.021, SK = 0.169, Educ = 3.5. Does your
equation overpredict or underpredict the relative GDP per worker? What would happen to this result if Brazil
managed to double the average educational attainment?
Answer: (a) The Solow growth model predicts higher productivity with higher saving rates and lower population
growth. The signs therefore correspond to prior expectations. A 10 percent point increase in the saving
rate results in a roughly 14 percent increase in per capita income relative to the United States. Lowering
the population growth rate by 1 percent results in a 13 percent higher per capita income relative to the
United States. It is best not to interpret the intercept. The regression explains approximately 62 percent of
the variation in per capita income among the 104 countries of the world.
(b) The coefficient on the population growth rate is roughly half of what it was originally, while the
coefficient on the saving rate has approximately doubled. The regression R2 has increased significantly.
(c) When comparing results, you should ensure that the sample is identical, since comparisons are not
valid otherwise.
(d) The predicted value for Brazil is 0.240. Hence the regression underpredicts Brazil’s per capita income.
Increasing Educ to 7.0 would result in a predicted per capita income of 0.43, which is a substantial
increase from both its current actual position and the previously predicted value.
8) Attendance at sports events depends on various factors. Teams typically do not change ticket prices from game
to game to attract more spectators to less attractive games. However, there are other marketing tools used, such
as fireworks, free hats, etc., for this purpose. You work as a consultant for a sports team, the Los Angeles
Dodgers, to help them forecast attendance, so that they can potentially devise strategies for price
discrimination. After collecting data over two years for every one of the 162 home games of the 2000 and 2001
season, you run the following regression:
Attend = 15,005 + 201 × Temperat + 465 × DodgNetWin + 82 × OppNetWin
+ 9647 × DFSaSu + 1328 × Drain + 1609 × D150m + 271 × DDiv – 978 × D2001;
R2 =0.416, SER = 6983
Stock/Watson 2e -- CVC2 8/23/06 -- Page 132
where Attend is announced stadium attendance, Temperat it the average temperature on game day, DodgNetWin
are the net wins of the Dodgers before the game (wins -losses), OppNetWin is the opposing team’s net wins at
the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a
value of 1 if the game was played on a weekend, it rained during that day, the opposing team was within a 150
mile radius, the opposing team plays in the same division as the Dodgers, and the game was played during
2001, respectively.
(a) Interpret the regression results. Do the coefficients have the expected signs?
(b) Excluding the last four binary variables results in the following regression result:
Attend = 14,838 + 202 × Temperat + 435 × DodgNetWin + 90 × OppNetWin
+ 10,472 × DFSaSu, R2 =0.410, SER = 6925
According to this regression, what is your forecast of the change in attendance if the temperature increases by
30 degrees? Is it likely that people attend more games if the temperature increases? Is it possible that Temperat
picks up the effect of an omitted variable?
(c) Assuming that ticket sales depend on prices, what would your policy advice be for the Dodgers to increase
attendance?
(d) Dodger stadium is large and is not often sold out. The Boston Red Sox play in a much smaller stadium,
Fenway Park, which often reaches capacity. If you did the same analysis for the Red Sox, what problems would
you foresee in your analysis?
Answer: (a) 10 degree warmer temperature increases attendance by roughly 2,000. A 10 game net increase in wins
results in approximately 4,600 more spectators. If the opponents’ net win is 10 games higher when
compared to another team, then roughly 800 more people attend. Weekend games attract almost 10,000
more people on average. Rain during the day of the game brings out close to 1,300 more fans. A team
from closer by, such as the Angels or the Diamondbacks, attract a bit more than 1,600 more people, and a
team from the same division results in close to 270 more fans in the stadium. On average, there were
approximately 1,000 fewer spectators per game in 2001 than in 2000, holding all other factors constant.
With the exception of the rain variable, the signs correspond to prior expectation. The regression
explains 41.6 percent of the variation in Dodger attendance.
(b) For an increase in 30 degrees, there will be roughly 6,000 more people in attendance. Although
people prefer 75 degrees over 45 degrees, it is unlikely that they prefer 105 degrees over 75 degrees.
Temperature rises during the baseball season in Los Angeles. There are typically fewer people in
attendance during the earlier parts of the season than during the latter parts. Binary variables for the
month of the year would pick up such an effect.
(c) The only variable that management has limited control over is the performance of the team. The
policy advice would therefore be to assure a superior team performance, which, in turn, increases
attendance. (Stating the obvious is not going to keep the consultant on the payroll much longer.)
(d) If there was a serious capacity constraint, then estimating the equation in the above way would not
yield sensible results. Imagine that Fenway Park was basically sold out and the Red Sox would now
improve their net wins. Since you would not observe an increase in the dependent variable, the
coefficient for net wins would necessarily have to be zero.
9) The administration of your university/college is thinking about implementing a policy of coed floors only in
dormitories. Currently there are only single gender floors. One reason behind such a policy might be to
generate an atmosphere of better “understanding” between the sexes. The Dean of Students (DoS) has decided
to investigate if such a behavior results in more “togetherness” by attempting to find the determinants of the
gender composition at the dinner table in your main dining hall, and in that of a neighboring university, which
only allows for coed floors in their dorms. The survey includes 176 students, 63 from your university/college,
and 113 from a neighboring institution.
(a) The Dean’s first problem is how to define gender composition. To begin with, the survey excludes single
persons’ tables, since the study is to focus on group behavior. The Dean also eliminates sports teams from the
analysis, since a large number of single-gender students will sit at the same table. Finally, the Dean decides to
only analyze tables with three or more students, since she worries about “couples” distorting the results. The
Stock/Watson 2e -- CVC2 8/23/06 -- Page 133
Dean finally settles for the following specification of the dependent variable:
GenderComp= (50%-% of Male Students at Table)
Where “ Z ” stands for absolute value of Z. The variable can take on values from zero to fifty. Briefly analyze
some of the possible values. What are the implications for gender composition as more female students join a
given number of males at the table? Why would you choose the absolute value here? Discuss some other
possible specifications for the dependent variable.
(b) After considering various explanatory variables, the Dean settles for an initial list of eight, and estimates the
following relationship:
GenderComp = 30.90 – 3.78 × Size – 8.81 × DCoed + 2.28 × DFemme + 2.06 × DRoommate
- 0.17 × DAthlete + 1.49 × DCons – 0.81 SAT + 1.74 × SibOther, R2 =0.24, SER = 15.50
where Size is the number of persons at the table minus 3, DCoed is a binary variable, which takes on the value
of 1 if you live on a coed floor, DFemme is a binary variable, which is 1 for females and zero otherwise,
DRoommate is a binary variable which equals 1 if the person at the table has a roommate and is zero otherwise,
DAthlete is a binary variable which is 1 if the person at the table is a member of an athletic varsity team, DCons
is a variable which measures the political tendency of the person at the table on a seven -point scale, ranging
from 1 being “liberal” to 7 being “conservative,” SAT is the SAT score of the person at the table measured on a
seven-point scale, ranging from 1 for the category “900-1000” to 7 for the category “1510 and above,” and
increasing by one for 100 point increases, and SibOther is the number of siblings from the opposite gender in
the family the person at the table grew up with.
Interpret the above equation carefully, justifying the inclusion of the explanatory variables along the way. Does
it make sense to interpret the constant in the above regression?
(c) Had the Dean used the number of people sitting at the table instead of Number-3, what effect would that
have had on the above specification?
(d) If you believe that going down the hallway and knocking on doors is one of the major determinants of who
goes to eat with whom, then why would it not be a good idea to survey students at lunch tables?
Answer: (a) 3 females, 0 males: 50; 0 females, 3 males: 50; 2 females, 2 males: 0; 1 female, 3 males: 30; 4 females, 3
males: 7.143. For a given number of males, say 3, the gender composition will first decrease as the
number of females increases from 0 to 3. After that, the gender composition will decrease again. You
need to choose the absolute value because having many individuals from one gender relative to the
other is equally bad for a balanced gender composition. Another possibility would be to use the squared
difference.
(b) The larger the size at the table, the more balanced the gender composition. Consider a table of 6,
where you find two more males than females (4 females, 2 males, gender composition = 16.7) versus a
table of 14, where you have two more males than females (gender composition = 7.1). Obviously, if
males and females increased in the same proportion, then gender composition would not change. This
has not happened here. Students from a coed floor are more likely to sit at a more balanced table in terms
of gender composition. This is likely to happen if students knock on neighbors’ doors to see who is
willing to join them for lunch. Females are less likely to sit at gender balanced tables, and there is no prior
on the coefficient of this variable. Having a roommate increases the likelihood of gender imbalance.
Roommates are from the same gender, and joining the roommate for a meal results in a more
imbalanced gender composition. Being a member of a varsity team decreases the gender imbalance.
Recall that sports teams sitting together are excluded from the sample. Although there is no strong prior
here, the result suggests that varsity team members have more friends, on average, from the other sex
than does the general student body. Having a more conservative view, holding other factors constant,
results in sitting at meals with more people from the same sex. More intelligent students, or at least those
with a higher SAT score, sit more frequently with students from the other sex. Having had more siblings
from the other gender at home results in a more imbalanced gender composition: the female student
who had four brothers when she grew up has had enough of this sort of experience (although, given the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 134
specification of the dependent variable, it is also possible that she continues to sit with four males). There
are no observations close to the origin, so it is best not to interpret the dependent variable. 24 percent of
the variation in gender composition is explained by the regression.
(c) The only change would be in the intercept.
(d) Many students attend lectures before lunch, and may ask some of the students attending the same
lecture to join them for lunch.
10) The Solow growth model suggests that countries with identical saving rates and population growth rates
should converge to the same per capita income level. This result has been extended to include investment in
human capital (education) as well as investment in physical capital. This hypothesis is referred to as the
“conditional convergence hypothesis,” since the convergence is dependent on countries obtaining the same
values in the driving variables. To test the hypothesis, you collect data from the Penn World Tables on the
average annual growth rate of GDP per worker (g6090) for the 1960-1990 sample period, and regress it on the
(i) initial starting level of GDP per worker relative to the United States in 1960 (RelProd 60), (ii) average
population growth rate of the country (n), (iii) average investment share of GDP from 1960 to1990 ( SK remember investment equals savings), and (iv) educational attainment in years for 1985 ( Educ). The results for
close to 100 countries is as follows:
g6090 = 0.004 – 0.172 × n + 0.133 × SK + 0.002 × Educ – 0.044 × RelProd 60,
R2 =0.537, SER = 0.011
(a) Interpret the results. Do the coefficients have the expected signs? Why does a negative coefficient on the
initial level of per capita income indicate conditional convergence (“beta-convergence”)?
(b) Equations of the above type have been labeled “determinants of growth” equations in the literature. You
recall from your intermediate macroeconomics course that growth in the Solow growth model is determined by
technological progress. Yet the above equation does not contain technological progress. Is that inconsistent?
Answer: (a) All slope coefficients have the expected sign given the economic theory behind the equation. The
negative coefficient implies that countries which were further behind grew relatively faster, or, put
differently, countries which had a higher relative per capita income in 1960 grew relatively slower.
(b) The equation only determines growth relative to a given starting point, namely per capita income in
1960. Compare this to runners placed on a track where the starting blocks are at various points of the
first 100 m. Let the race last for perhaps 10 seconds and let the runners stop at that point on the track. In
essence, you measure where the runners ended up given their starting point, or you can also measure
how far they ran given their starting point. In many ways, the above equation is therefore meant to
predict the per capita income level in 1990 rather than the growth.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 135
11) You have collected a sub-sample from the Current Population Survey for the western region of the United
States. Running a regression of average hourly earnings (ahe) on an intercept only, you get the following result:
ahe = 0 = 18.58
a.
Interpret the result.
b.
You decide to include a single explanatory variable without an intercept. The binary variable DFemme
takes on a value of “1” for females but is “0” otherwise. The regression result changes as follows:
ahe = 1 ×DFemme = 16.50×DFemme
What is the interpretation now?
c.
You generate a new binary variable DMale by subtracting DFemme from 1, and run the new regression:
ahe = 2 ×DMale = 20.09×DMale
What is the interpretation of the coefficient now?
d.
After thinking about the above results, you recognize that you could have generated the last two results
either by running a regression on both binary variables, or on an intercept and one of the binary
variables. What would the results have been?
Answer: a. The mean average hourly earnings for the sample is $18.58.
b. The mean average hourly earnings for females is $16.50 in this sample.
c. The mean average hourly earnings for males is $20.09 in this sample.
d. ahe = 1 ×DFemme +
2 ×DMale = 16.50×DFemme + 20.09×DMale=
or
ahe = 0 + 1 ×DFemme = 20.09 - 3.59×DFemme
Stock/Watson 2e -- CVC2 8/23/06 -- Page 136
6.3 Mathematical and Graphical Problems
1) Your econometrics textbook stated that there will be omitted variable bias in the OLS estimator unless the
included regressor, X, is uncorrelated with the omitted variable or the omitted variable is not a determinant of
the dependent variable, Y. Give an intuitive explanation for these two conditions.
Answer: The regression coefficient is the partial derivative of Y with respect to the corresponding X. The meaning
of the partial derivative is the effect of a change in X on Y, holding all the other variables constant. This is
identical to a controlled laboratory experiment where only one variable is changed at a time, while all
the other variables are held constant. In real life, of course, you cannot change one variable and keep all
others, including the omitted variables, constant.
Now consider the case of X changing. If it is correlated with the omitted variable and if that variable is a
determinant of Y, then Y will change further as a result of X changing. This will cause the “controlled
experiment” measure to over or understate the effect that X has on Y, depending on the relationship
between X and the omitted variable. If X is not correlated with the omitted variable, then changing X
will not have this further indirect effect on Y, so that the pure relationship between X and Y can be
measured because it is “as if” the omitted variable were held constant. This has important practical
implications if data is hard to obtain for an omitted variable while it can be argued that the variable of
interest is not much correlated with the omitted variable.
Y will change when a relevant omitted variable will change, and hence the pure effect of X on Y cannot
be observed. In the laboratory, Y would change for reasons unrelated to the change in X. However, if the
omitted variable is not a determinant of Y, then a change in it will have no effect on the pure relationship
between X and Y.
Consider the accompanying graph of the determinants of Y, where X is the included variable and Z the
omitted variable.
Then the effect of X on Y can be measured properly as long as the arrow from Z to Y does not exist, or as
long as changes in X do not cause changes in Z, which in return influence Y.
2) You have obtained data on test scores and student -teacher ratios in region A and region B of your state. Region
B, on average, has lower student-teacher ratios than region A. You decide to run the following regression
Yi = 0 +
1 X1i +
1 X2i +
3 X3i + ui
where X1 is the class size in region A, X2 is the difference in class size between region A and B, and X3 is the
class size in region B. Your regression package shows a message indicating that it cannot estimate the above
equation. What is the problem here and how can it be fixed?
Answer: There is perfect multicollinearity present since one of the three explanatory variables can always be
expressed linearly in terms of the other two. Hence there are not really three pieces of independent
information contained in the three explanatory variables. Dropping one of the three will solve the
problem.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 137
3) In the case of perfect multicollinearity, OLS is unable to calculate the coefficients for the explanatory variables,
because it is impossible to change one variable while holding all other variables constant. To see why this is the
case, consider the coefficient for the first explanatory variable in the case of a multiple regression model with
two explanatory variables:
n
n
n
n
2
x 2i x 1i x 2i
y i x 2i
^
i=1
i=1
i=1
i=1
1=
n
n
n
2
2
x 1i
x 2i – (
x 1i x 2i)2
i=1
i=1
i=1
y ix 1i
(small letters refer to deviations from means as in zi = Zi – Z) .
n
2
x 2i
Divide each of the four terms by
n
2
x 2i
to derive an expression in terms of regression coefficients
i=1
i=1
from the simple (one explanatory variable) regression model. In case of perfect multicollinearity, what would
be R2 from the regression of X1i on X2i? As a result, what would be the value of the denominator in the above
expression for 1 ?
n
n
i=1
n
i=1
^
Answer: 1 =
y ix 1i
2
x 1i
i=1
n
i=1
n
1-
i=1
n
2
x 2i
x 1ix 2i
i=1
2
x 1i
x 1ix 2i
2
x 1i
i=1
n
^
^
yx1 - yx2 x 1 x 2
^
^
. For the simple regression case R2 =
1- x 2 x 1 x 1 x 2
x 1ix 2i
i=1
^
=
n
i=1
^
i=1
n
n
y ix 2i
2
x 2i
n
1
y ixi
i=1
, so that the slope of a simple regression of Y on X is the inverse of the slope of a regression
n
2
yi
i=1
of X on Y if the regression R2 = 1. But in the case of perfect multicollinearity, the regression R2 = 1 so
that in the expression, we get
^
^
^
^
^
^
^
yx1 - yx2 x 1 x 2
yx1 - yx2 x 1 x 2
, which is not defined. The denominator would be zero in
=
1=
^
0
1
1- x 2 x 1 ^
x2x1
the case of perfect multicollinearity.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 138
4) You try to establish that there is a positive relationship between the use of a fertilizer and the growth of a
certain plant. Set up the design of an experiment to establish the relationship, paying particular attention to
relevant control variables. Discuss in this context the effect of omitted variable bias.
Answer: The answer should follow the randomized controlled experiment described in section 1.2 of the
textbook: there should be several plots where the plant is placed, each receiving identical treatment. In
this context, the same amount of water and sunshine should be available to each plant, and the soil
should have the identical quality. Then some of the plots, determined randomly, should receive varying
amounts of the fertilizer. The average yield can then be regressed on the amount of fertilizer received.
The experiment could also allow for different amounts of sunshine and water, as long as this were
recorded meticulously. In this case, failing to record the amount of sunshine received and therefore not
including this variable in the regression would result in omitted variable bias. For obvious reasons, the
effect of the fertilizer on yield would be estimated incorrectly, since plants which receive more fertilizer
but are always in the shade would produce a lower yield.
5) In the multiple regression model with two regressors, the formula for the slope of the first explanatory variable
is
n
n
^
1=
i=1
i=1
n
n
2
x 2i -
y ix 1i
n
y i x 2i
i=1
n
2
x 1i
i=1
x 1ix 2i
i=1
n
2
x 2i - (
x 1ix 2i )2
i=1
i=1
(small letters refer to deviations from means as in zi = Zi – Z).
An alternative way to derive the OLS estimator is given through the following three step procedure.
Step 1: regress Y on a constant and X2 , and calculate the residual (Res1).
Step 2: regress X1 on a constant and X2 , and calculate the residual (Res2).
Step 3: regress Res1 on a constant and Res2.
Prove that the slope of the regression in Step 3 is identical to the above formula.
n
n
y ix 2i
^
^
^
i=1
Answer: Step 1: y i = yx x 2i + v i; yx =
n
2
2
y ix 2i
^
2
x 2i
i=1
, and v i = y i n
i=1
2
x 2i
i=1
n
n
x 1ix 2i
^
^
x 2i.
^
i=1
Step 2: x 1i = x x x 2i + wi; x x =
n
1 2
1 2
x 1ix 2i
^
2
x 2i
i=1
, and wi = x 1i n
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 139
i=1
2
x 2i
x 2i
n
n
n
i=1
^
^^
Step 3: v i = wi;
^
x 1ix 2i
y ix 2i
[(y i -
i=1
n
2
x 2i
x 2i)(x 1i -
i=1
n
i=1
=
2
x 2i
x 2i)]
i=1
.
n
n
x 1ix 2i
(x 1i -
i=1
i=1
n
2
x 2i
x 2i)2
i=1
n
Multiplying out the terms in the numerator and denominator and expanding by
2
x 2i before
i=1
moving through the summation sign, results in
n
n
y ix 2i
2
x 2i n
^
=
n
x 1ix 2i
i=1
i=1
i=1
n
y ix 2i
i=1
i=1
2
x 1i
n
^
2
x 2i - (
n
= 1=
x 1ix 2i )2
i=1
i=1
n
2
x 2i -
y ix 1i
i=1
n
n
n
2
x 2i
n
2
x 1i
i=1
n
y ix 2i
i=1
i=1
i=1
n
n
2
x 2i - (
i=1
x 1ix 2i
i=1
n
.
x 1ix 2i )2
i=1
2
x 2i
i=1
6) In the multiple regression problem with k explanatory variable, it would be quite tedious to derive the
formulas for the slope coefficients without knowledge of linear algebra. The formulas certainly do not resemble
the formula for the slope coefficient in the simple linear regression model with a single explanatory variable.
However, it can be shown that the following three step procedure results in the same formula for slope
coefficient of the first explanatory variable, X1 :
Step 1: regress Y on a constant and all other explanatory variables other than X1 , and calculate the residual
(Res1).
Step 2: regress X1 on a constant and all other explanatory variables, and calculate the residual (Res2).
Step 3: regress Res1 on a constant and Res2.
Can you give an intuitive explanation to this procedure?
Answer: Step 1 eliminates the linear influence of all variables other than X1 from Y. Think of pouring a liquid
through a filter: the remaining liquid now contains the “purified” Y, or that part of Y that could not be
explained by the other X’s. The same happens in Step 2, where X1 is now purified from any correlation
with the other X’s. Step 3 establishes the purified relationship between Y and X1 .
(This procedure is of interest to students if they want to plot the two -dimensional relationship between
Y and X1 .)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 140
7) Give at least three examples from macroeconomics and three from microeconomics that involve specified
equations in a multiple regression analysis framework. Indicate in each case what the expected signs of the
coefficients would be and if theory gives you an indication about the likely size of the coefficients.
Answer: Answers will vary by student. In my experience, students most frequently will bring up demand
functions (quantity demanded, price, and other variables such as income, price of substitutes, etc.),
supply functions (quantity supplied, price, costs), production functions (output produced, capital, labor,
and other inputs), consumption functions (consumption, income, and the real interest rate or wealth),
money demand functions (real money supply, income, and interest rate), and the Phillips curve
(inflation, unemployment rate, and inflationary expectations).
8) One of your peers wants to analyze whether or not participating in varsity sports lowers or increases the GPA
of students. She decides to collect data from 110 male and female students on their GPA and the number of
hours they spend participating in varsity sports. The coefficient in the simple regression function turns out to
be significantly negative, using the t-statistic and carrying out the appropriate hypothesis test. Upon reflection,
she is concerned that she did not ask the students in her sample whether or not they were female or male. You
point out to her that you are more concerned about the effect of omitted variables in her regression, such as the
incoming SAT score of the students, and whether or not they are in a major from a high/low grading
department. Elaborate on your argument.
Answer: The presence of omitted variables will result in an inconsistent estimator for the included variable
(number of hours spent in varsity sports) if at least one of the following two conditions holds: the
omitted variable is relevant in affecting the GPA and/or the omitted variable is correlated with the
included variable. Incoming SAT scores are clearly relevant in predicting GPAs, at least in the earlier
years. Hence it is relevant. Departmental differences in the general level of grading will even more
obviously have an effect on the GPA. The relationship therefore suffers from omitted variable bias.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 141
9) (Requires Calculus) For the case of the multiple regression problem with two explanatory variables, show that
minimizing the sum of squared residuals results in three conditions:
n ^
n ^
n ^
ui = 0;
ui X1i = 0;
ui X2i = 0
i=1
i=1
i=1
Answer: To minimize the sum of squared prediction mistakes
n
i=1
(Yi - b0 - b1 X1i - b2 X2i)2
you need to take the following three derivatives with respect to b0 , b1 and b2 . This results in
n
b0
b1
b2
i=1
n
i=1
n
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
i=1
n
i=1
n
i=1
n
(Yi - b0 - b1 X1i - b2 X2i)
(Yi - b0 - b1 X1i - b2 X2i)X1i
(Yi - b0 - b1 X1i - b2 X2i)X2i
i=1
The OLS estimators are those for which the derivatives are zero. Hence we get
-2
-2
-2
n
i=1
n
i=1
n
i=1
n ^
ui
i=1
n
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) X 1i = 0 =
i=1
n
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) X2i = 0 =
i=1
^
^
(Yi - 0 - 1 X1i -
^
2 X2i) = 0 =
^
uiX1i
^
uiX2i
Stock/Watson 2e -- CVC2 8/23/06 -- Page 142
10) The probability limit of the OLS estimator in the case of omitted variables is given in your text by the following
formula:
^
1
p ^
u
1 + Xu
X
Give an intuitive explanation for two conditions under which the bias will be small.
Answer: The bias will be small if there is little correlation between the included variable and the error term. The
error term contains the omitted variable. If the omitted variable is correlated with the included variable,
then the error term is correlated with the included variable. Now consider the case where the correlation
between the included and omitted variable is low, resulting in a low correlation between the error term
and the included variable. In that case, changes in the omitted variable will not result in changes in the
included variable, which, in return, changes Y, and making it appear as if the included variable had
changed Y.
The second condition is the size of the ratio of the two standard deviations. The formula suggests that if
the included variable varies substantially more than the error term, which contains the omitted variable,
then the inconsistency will be small. In that case, the relationship between the included variable and the
dependent variable does not get disturbed much by variations in the omitted variable.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 143
11) It is not hard, but tedious, to derive the OLS formulae for the slope coefficient in the multiple regression case
with two explanatory variables. The formula for the first regression slope is
n
n
^
1=
n
2
x 2i -
y ix 1i
i=1
i=1
i=1
n
i=1
n
2
x 1i
n
y ix 2i
2
x 2i - (
i=1
x 1ix 2i
i=1
n
x 1ix 2i )2
i=1
(small letters refer to deviations from means as in zi = Zi – Z).
Show that this formula reduces to the slope coefficient for the linear regression model with one regressor if the
sample correlation between the two explanatory variables is zero. Given this result, what can you say about the
effect of omitting the second explanatory variable from the regression?
n
2
x 1i
Answer: Divide each of the four terms by
i=1
n
n
i=1
n
^
1=
2
x 1i
i=1
n
i=1
2
x 2i
2
x 1i
x 1ix 2i
i=1
n
i=1
^
=
n
x 1ix 2i
^
^
yx1 - yx2 x 1 x 2
1-
^
^
x2x1 x1x2
2
x 2i
i=1
^
Now if
i=1
2
x 1i
i=1
n
^
to get
x 1ix 2i
i=1
n
i=1
i=1
1n
2
x 2i
n
y ix 2i
y ix 1i
n
x 1 x 2 = 0, then
^
1=
yx1
1
. Omitting the second explanatory variable from the regression will
have no effect on the coefficient which indicates the effect of a change in the included variable and the
dependent variable. However, you also do not observe the effect that a change in the omitted variable
has on the dependent variable.
12) (Requires Statistics background beyond Chapters 2 and 3) One way to establish whether or not there is
independence between two or more variables is to perform a 2 – test on independence between two variables.
Explain why multiple regression analysis is a preferable tool to seek a relationship between variables.
Answer: The 2 – test can only establish whether or not a relationship between variables exists, but it cannot tell
the researcher anything about the effect of a unit change in X on Y. If the researcher is interested in the
quantitative information, then she must use a multiple regression framework. The textbook example on
student performance can be used here for an explanation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 144
13) In the multiple regression with two explanatory variables, show that the TSS can still be decomposed into the
ESS and the RSS.
Answer: The proof proceeds along the same line as in the case of a single explanatory variable. The sample
regression function is given by
^
^
^
^
Yi = 0 + 1 X1i + 2 X2i + ui
The average is therefore
Y=
^
^
^
0 + 1 X1 + 2 X2
n ^
ui = 0. Subtracting the second equation from the first and letting
i=1
small letters indicate deviations from mean, results in
since the first order condition has
yi =
^
^
^
^
^
1 x 1i + 2 x 2i + ui or y i = y i + ui.
Squaring both sides and summing gives you
n
i=1
2
yi =
n ^
n ^ ^
n ^
2
2
yi +
y iui .
ui +2
i=1
i=1
i=1
The last term is zero since it involves terms of the type
n ^
n ^
n ^
ui x i =
ui Xi - X
ui
i=1
i=1
i=1
All of which are zero given the first order conditions. We therefore arrive at
n ^
n ^
n
2
2
2
yi +
u i or TSS = ESS + SSR. This proof generalizes easily for k explanatory
yi =
i=1
i=1
i=1
variables.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 145
14) The OLS formula for the slope coefficients in the multiple regression model become increasingly more
complicated, using the “sums” expressions, as you add more regressors. For example, in the regression with a
single explanatory variable, the formula is
n
(Xi – X)(Yi - X)
i=1
n
i=1
(Xi - X)2
whereas this formula for the slope of the first explanatory variable is
n
n
y ix 1i
^
1=
2
x 2i -
i=1
i=1
n
2
x 1i
i=1
n
n
i=1
i=1
n
x 1ix 2i
y ix 2i
2
x 2i - (
n
x 1ix 2i )2
i=1
i=1
(small letters refer to deviations from means as in zi = Zi – Z)
in the case of two explanatory variables. Give an intuitive explanations as to why this is the case.
Answer: The additional terms take into account that there is a relationship between the regressors. As a matter of
fact, the more complicated formula reduces to the simpler formula if the correlation between the
included variables is zero. In a controlled laboratory experiment, only one variable is changed at a time,
holding all others constant. This is impossible to do with economic data, so the additional terms are
added to control for the change in the other variables.
15) (Requires Calculus) For the case of the multiple regression problem with two explanatory variables, derive the
OLS estimator for the intercept and the two slopes.
Answer: To minimize the sum of squared prediction mistakes
n
i=1
(Yi - b0 - b1 X1i - b2 X2i)2
you need to take the following three derivatives with respect to b0 , b1 and b2 . This results in
n
b0
b1
b2
i=1
n
i=1
n
i=1
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
n
i=1
n
i=1
n
(Yi - b0 - b1 X1i - b2 X2i)
(Yi - b0 - b1 X1i - b2 X2i)X1i
(Yi - b0 - b1 X1i - b2 X2i) X2i
i=1
The OLS estimators are those for which the derivatives are zero. Hence we get
-2
n
^
^
^
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) = 0; 0 = Y - 1 X1 - 2 X2
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 146
n
n
^ n
2
X 1i + 2
X2iX1i
i=1
i=1
i=1
i=1
n
n
n
n
^
^
^
^
^
^
2
X 2i + 1
X2iX1i
YiX2i = 0 nX2 + 2
(Yi - 0 - 1 X1i - 2 X2i) X2i = 0;
-2
i=1
i=1
i=1
i=1
-2
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) X1i = 0;
^
^
YiX1i = 0 nX1 + 1
n
^
After substituting the result for 0 into the last two equation, these have only two unknowns remaining,
^
^
namely 1 and 2 . Letting small letters indicate deviations from mean, you get
n
^ n
2
x 1i + 2
x 2ix 1i
i=1
i=1
i=1
n
n
n
^
^
2
x 1ix 2i + = 2
x 2i
y ix 2i = 1
i=1
i=1
i=1
^
y ix 1i = 1
n
^
There are various methods to solve for
substitute into the first equation.
n
n
y ix 1i =
^
n
n
i=1
i=1
^
2 . Here we isolate
^
2 in the second equation and
n
x 2ix 1i
1
i=1
i=1
2
x 1i +
1
^
y ix 2i -
1 and
n
x 2ix 1i
2
x 2i
i=1
i=1
n
n
2
x 2i -
y ix 1i
^
1=
n
n
2
x 1i
i=1
n
y ix 2i
i=1
i=1
i=1
n
x 1ix 2i
i=1
2
x 2i - (
n
.
x 1ix 2i )2
i=1
i=1
Similarly you can derive
n
n
2
x 1i -
y ix 2i
^
2=
n
i=1
2
x 1i
n
y ix 1i
i=1
i=1
i=1
n
n
i=1
2
x 2i - (
i=1
n
x 1ix 2i
.
x 1ix 2i )2
i=1
16) (Requires Calculus) For the simple linear regression model of Chapter 4, Yi = 0 + 1 Xi + ui, the OLS estimator
n
Xi Yi - nXY
^
^
^
i=1
for the intercept was 0 = Y – 1 X, and 1 =
. Intuitively, the OLS estimators for the regression
n
2
X i - nX2
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 147
n
model Yi = 0 + 1 X1i + 2 X2i + ui might be
^
^
^
0 = Y – 1 X1 – 2 X2 ,
^
1=
i=1
n
X1iYi - nX1 Y
^
and 2 =
2
2
X 1i - n X 1
i=1
n
i=1
n
X2iYi - nX2 Y
. By minimizing the prediction mistakes of the regression model with two explanatory
2
2
X 2i - n X 2
i=1
variables, show that this cannot be the case.
Answer: To minimize the sum of squared prediction mistakes
n
i=1
(Yi - b0 - b1 X1i - b2 X2i)2
you need to take the following three derivatives with respect to b0 , b1 and b2 . This results in
n
b0
b1
b2
i=1
n
i=1
n
i=1
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
(Yi - b0 - b1 X1i - b2 X2i)2 = -2
n
i=1
n
i=1
n
(Yi - b0 - b1 X1i - b2 X2i)
(Yi - b0 - b1 X1i - b2 X2i) X1i
(Yi - b0 - b1 X1i - b2 X2i) X2i
i=1
The OLS estimators are those for which the derivatives are zero. Hence we get
-2
n
i=1
n
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) = 0;
^
^
^
0 = Y - 1 X1 - 2 X2
n
^
2
X 1i + 2
X2iX1i
i=1
i=1
i=1
i=1
n
n
^
^
^
^
^ n
^ n
2
YiX2i = 0 nX2 + 2
(Yi - 0 - 1 X1i - 2 X2i) X2i = 0;
X 2i + 1
X2iX1i
-2
i=1
i=1
i=1
i=1
-2
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) X1i = 0;
n
^
^
YiX1i = 0 nX1 + 1
n
It is clear that the first of these three expressions results in
^
0=Y-
^
^
1 X1 - 2 X2 . However, the second (third) expression involves terms in X2i (X1i), hence the
n
n
X1iYi - nX1 Y
X2iYi - nX2 Y
^
^
i=1
i=1
( 2=
) unless special
formula cannot be simplified to 1 =
n
n
2
2
2
2
X 1i - n X 1
X 2i - n X 2
i=1
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 148
n
conditions hold (such as
X2iX1i = 0).
i=1
17) Your textbook extends the simple regression analysis of Chapters 4 and 5 by adding an additional explanatory
variable, the percent of English learners in school districts (PctEl). The results are as follows:
TestScore = 698.9 – 2.28 × STR
and
TestScore = 698.0 – 1.10 × STR – 0.65 × PctEL
Explain why you think the coefficient on the student-teacher ratio has changed so dramatically (been more
than halved).
Answer: This is a good example of omitted variable bias. The previously excluded variable of percent of English
learners not only seems to matter and being economically important in the determination of testscores,
but also is correlated with the student-teacher ratio (recall that schools with higher student-teacher
ratios also had a positive correlation coefficient with the percent of English learners (of almost 20%). As a
result, there will be omitted variable bias if you regress the test scores on the student-teacher ratios only.
18) (Requires some Calculus) Consider the sample regression function .
^
Yi = 0 +
^
1 X1i +
^
2 X2i. Take the total derivative. Next show that the partial derivative
Yi
X1i
is obtained by
holding X2i constant, or controlling for X2i.
Answer:
is a linear operator. Hence
Yi =
^
( 0+
^
X1i then results in
^
^
^
^
^
^
0 + 1 X1i + 2 X2i = 1 X1i + 2 X2i. Dividing through by
X2i
X2i
^
^
^
Y
, which only equals 1 if
= 1+ 2
= 0, i.e., if X2i remains constant
X1i
X1i
X1i
1 X1i +
2 X2i) =
following a change in X1i.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 149
19) (Requires Appendix material) Consider the following population regression function model with two
^
^
explanatory variables: Yi = 0 +
following formula:
2
^ = 1
1
^
1 X1i +
1
n
1-
2
x 1 ,x 2
^
^
2 X2i. It is easy but tedious to show that SE( 2 ) is given by the
2
u
2
X1
^
. Sketch how SE( 2 ) increases with the correlation between X1i
and X2i.
Answer: The answer should look something like this:
20) For this question, use the California Testscore Data Set and your regression package (a spreadsheet program if
necessary). First perform a multiple regression of testscores on a constant, the student -teacher ratio, and the
percent of English learners. Record the coefficients. Next, do the following three step procedure instead: first,
regress the testscore on a constant and the percent of English learners. Calculate the residuals and store them
under the name resYX2. Second, regress the student-teacher ratio on a constant and the percent of English
learners. Calculate the residuals from this regression and store these under the name resX1X2. Finally regress
resYX2 on resX1X2 (and a constant, if you wish). Explain intuitively why the simple regression coefficient in the
last regression is identical to the regression coefficient on the student-teacher ratio in the multiple regression.
Answer: This three step procedure actually explains how OLS controls for the influence of other variables. In the
first step, OLS removes the linear influence of the percent of English learners from the dependent
variable. The residuals from that regression represent the “left-over” of the testscores that the percent of
English learners could not explain (“purified testscores;” think of a filter removing some of the
elements). The same explanation holds for the second regression: the student -teacher ratio is purified (if
the percent of English learners actually have an influence on student-teacher ratios). In the final step,
you regress the two “purified” variables on each other.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 150
21) Assume that you have collected cross-sectional data for average hourly earnings (ahe), the number of years of
education (educ) and gender of the individuals (you have coded individuals as “1” if they are female and “0” if
they are male; the name of the resulting variable is DFemme).
Having faced recent tuition hikes at your university, you are interested in the return to education, that is, how
much more will you earn extra for an additional year of being at your institution. To investigate this question,
you run the following regression:
ahe = -4.58 + 1.71×educ
N = 14,925, R2 = 0.18, SER = 9.30
a.
Interpret the regression output.
b.
Being a female, you wonder how these results are affected if you entered a binary variable (DFemme),
which takes on the value of “1” if the individual is a female, and is “0” for males. The result is as
follows:
ahe = -3.44 - 4.09×DFemme + 1.76×educ
N = 14,925, R2 = 0.22, SER = 9.08
Does it make sense that the standard error of the regression decreased while the regression R2
increased?
c.
Do you think that the regression you estimated first suffered from omitted variable bias?
Answer: a. For every additional year of education, you receive $1.71 additional earnings. It is best not to interpret
the intercept, since there are no (or extremely few) observations at the origin.
b. The regression R2 cannot decrease if you add an explanatory variable. If the additional variable does
not contribute anything to the fit, then this measure will remain the same. However, in practice, this
does not happen. The standard error is a measure of the SSR, and these will almost always decrease
with the addition of an explanatory variable. As a result, the observed pattern in the two statistics is to be
expected.
c. There are two conditions for omitted variable bias to be present. First, DFemme must be a determinant
of ahe; and second, it must be correlated with educ. Given that you have not learned how to test for
statistical significance in the multiple regression model, the first question is hard to determine at this
point. However, you might argue that the coefficient seems large and that you have read elsewhere that
there is evidence of females earning less using this type of equation. With regard to the second question,
you could argue that the coefficient on educ has changed somewhat, although the increase does not seem
to be large ($0.05). For there to be a correlation between education and the binary female variable, you
would have to argue that males and females receive years of education. Either way, the omitted variable
bias in the first equation does not appear to be large.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 151
22) You have collected data on individuals and their attributes. Consequently you have generated several binary
variables, which take on a value of “1” if the individual has that characteristic and are “0” otherwise. One
example is the binary variable DMarr which is “1” for married individuals and “0” for non -married variables.
If you run the following regression:
ahei= 0 + 1 ×educi + 2 ×DMarri + ui
a.
What is the interpretation for 2 ?
b.
You are interested in directly observing the effect that being non -married (“single”) has on
earnings, controlling for years of education. Instead of recoding all observations such that they are
“1” for a not married individual and “0” for a married person, how can you generate such a
variable (DSingle) through a simple command in your regression program?
Answer: a. The coefficient will tell you by how much, on average, a married person’s average hourly earnings
differ from those of a non-married person, holding years of education constant.
b. gen DSingle = 1 — DMarr (STATA); genr DSingle = 1 - DMarr (EViews)
23) Consider the following earnings function:
ahei= 0 + 1 ×DFemmei + 2 ×educi+...+ ui
versus the alternative specification
ahei= 0 × DMale + 1 ×DFemmei + 2 ×educi+...+ ui
where ahe is average hourly earnings, DFemme is a binary variable which takes on the value of “1” if
the individual is a female and is “0” otherwise, educ measures the years of education, and DMale is a
binary variable which takes on the value of “1” if the individual is a male and is “0” otherwise. There
may be additional explanatory variables in the equation.
a.
How do the s and s compare? Putting it differently, having estimated the coefficients in the first
equation, can you derive the coefficients in the second equation without re-estimating the
regression?
b.
Will the goodness of fit measures, such as the regression R2 , differ between the two equations?
c.
What is the reason why economists typically prefer the second specification over the first?
Answer: a. 0 = 0 ; 1 = 0 + 1 ; 2 = 2
b. The regression R2 will be identical, as will be the standard error of the regression.
c. The second equation allows you to consider the difference between earnings of two sub-groups.
Economists are often interested in testing for such differences, rather than to find the average level of
earnings.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 152
24) You would like to find the effect of gender and marital status on earnings. As a result, you consider running the
following regression:
ahei= 0 + 1 ×DFemmei + 2 ×DMarri + 3 ×DSingle i + 4 ×educi+...+ ui
Where ahe is average hourly earnings, DFemme is a binary variable which takes on the value of “1” if
the individual is a female and is “0” otherwise, DMarr is a binary variable which takes on the value of
“1” if the individual is married and is “0” otherwise, DSingle takes on the value of “1” if the individual
is not married and is “0” otherwise. The regression program which you are using either returns a
message that the equation cannot be estimated or drops one of the coefficients. Why do you think that
is?
Answer: There is perfect multicollinearity here (“dummy variable trap”). You need to drop either Dmarr or
DSingle.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 153
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression
7.1 Multiple Choice
1) The confidence interval for a single coefficient in a multiple regression
A) makes little sense because the population parameter is unknown.
B) should not be computed because there are other coefficients present in the model.
C) contains information from a large number of hypothesis tests.
D) should only be calculated if the regression R2 is identical to the adjusted R2 .
Answer: C
2) The following linear hypothesis can be tested using the F-test with the exception of
A) 2 = 1 and 3 = 4 / 5 .
B) 2 =0.
C) 1 + 2 = 1 and 3 = -2 4 .
D) 0 = 1 and 1 = 0.
Answer: A
3) The formula for the standard error of the regression coefficient, when moving from one explanatory variable to
two explanatory variables,
A) stays the same.
B) changes, unless the second explanatory variable is a binary variable.
C) changes.
D) changes, unless you test for a null hypothesis that the addition regression coefficient is zero.
Answer: C
4) All of the following are examples of joint hypotheses on multiple regression coefficients, with the exception of
A) H0 : 1 + 2 = 1
B) H0 :
3
2
= 1 and 4 = 0
C) H0 : 2 = 0 and 3 = 0
D) H0 : 1 = - 2 and 1 + 2 = 1
Answer: A
5) When testing joint hypothesis, you should
A) use t-statistics for each hypothesis and reject the null hypothesis is all of the restrictions fail.
B) use the F-statistic and reject all the hypothesis if the statistic exceeds the critical value.
C) use t-statistics for each hypothesis and reject the null hypothesis once the statistic exceeds the critical
value for a single hypothesis.
D) use the F-statistics and reject at least one of the hypothesis if the statistic exceeds the critical value.
Answer: D
6) The overall regression F-statistic tests the null hypothesis that
A) all slope coefficients are zero.
B) all slope coefficients and the intercept are zero.
C) the intercept in the regression and at least one, but not all, of the slope coefficients is zero.
D) the slope coefficient of the variable of interest is zero, but that the other slope coefficients are not.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 154
7) For a single restriction (q = 1), the F-statistic
A) is the square root of the t-statistic.
B) has a critical value of 1.96.
C) will be negative.
D) is the square of the t-statistic.
Answer: D
8) The homoskedasticity-only F-statistic is given by the following formula
(SSRrestricted - SSRunrestricted)/q
A) F=
(SSRunrestricted /(n - kunrestricted -1)
B) F=
(SSRrestricted - SSRunrestricted)/q
SSRrestricted /(n - kunrestricted -1)
C) F=
(SSRunrestricted - SSRrestricted)/q
SSRunrestricted /(n - kunrestricted -1)
D) F=
(SSRrestricted - SSRunrestricted)/q-1)
SSRunrestricted /(n - kunrestricted)
Answer: A
9) All of the following are correct formulae for the homoskedasticity-only F-statistic, with the exception of
(SSRrestricted - SSRunrestricted)/q
A) F=
SSRunrestricted /(n - kunrestricted -1)
B) F=
(SSRunrestricted - SSRrestricted)/q
SSRrestricted /(n - krestricted -1)
C) F=
(SSRrestricted - SSRunrestricted) n - kunrestricted-1
×
q
SSRunrestricted
D) F =
SSRrestricted
(n - kunrestricted-1)
-1 ×
SSRunrestricted
q
Answer: B
10) In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is
calculated
A) by dividing the estimate by its standard error.
B) from the square root of the F-statistic.
C) by multiplying the p-value by 1.96.
D) using the adjusted R2 and the confidence interval.
Answer: A
11) To test joint linear hypotheses in the multiple regression model, you need to
A) compare the sums of squared residuals from the restricted and unrestricted model.
B) use the heteroskedasticity-robust F-statistic.
C) use several t-statistics and perform tests using the standard normal distribution.
D) compare the adjusted R2 for the model which imposes the restrictions, and the unrestricted model.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 155
12) The homoskedasticity-only F-statistic is given by the following formula
(R2 unrestricted - R2 restricted)/q
A) F=
(1-R2 unrestricted) /(n - kunrestricted -1)
B) F=
C) F=
D) F=
1 - R2 unrestricted)/q
R2 unrestricted /(n - kunrestricted -1)
(R2 unrestricted - R2 restricted)/q
(1-R2 unrestricted) /(n - krestricted -1)
(R2 unrestricted - R2 unrestricted)/q
(1-R2 unrestricted) /(n - krestricted -1)
Answer: A
13) Let R2 unrestricted and R2 restricted be 0.4366 and 0.4149 respectively. The difference between the unrestricted
and the restricted model is that you have imposed two restrictions. There are 420 observations. The F-statistic
in this case is
A) 4.61
B) 8.01
C) 10.34
D) 7.71
Answer: B
14) If you wanted to test, using a 5% significance level, whether or not a specific slope coefficient is equal to one,
then you should
A) subtract 1 from the estimated coefficient, divide the difference by the standard error, and check if the
resulting ratio is larger than 1.96.
B) add and subtract 1.96 from the slope and check if that interval includes 1.
C) see if the slope coefficient is between 0.95 and 1.05.
D) check if the adjusted R2 is close to 1.
Answer: A
15) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal
distribution you can
A) safely assume that your regression results are significant.
B) reject the null hypothesis.
C) reject the assumption that the error terms are homoskedastic.
D) conclude that most of the actual values are very close to the regression line.
Answer: B
16) If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then
A) a series of t-tests may or may not give you the same conclusion.
B) the regression is always significant.
C) all of the hypotheses are always simultaneously rejected.
D) the F-statistic must be negative.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 156
17) When your multiple regression function includes a single omitted variable regressor, then
A) use a two-sided alternative hypothesis to check the influence of all included variables.
B) the estimator for your included regressors will be biased if at least one of the included variables is
correlated with the omitted variable.
C) the estimator for your included regressors will always be biased.
D) lower the critical value to 1.645 from 1.96 in a two -sided alternative hypothesis to test the significance of
the coefficients of the included variables.
Answer: B
18) A 95% confidence set for two or more coefficients is a set that contains
A) the sample values of these coefficients in 95% of randomly drawn samples.
B) integer values only.
C) the same values as the 95% confidence intervals constructed for the coefficients.
D) the population values of these coefficients in 95% of randomly drawn samples.
Answer: D
19) When there are two coefficients, the resulting confidence sets are
A) rectangles.
B) ellipses.
C) squares.
D) trapezoids.
Answer: B
20) When testing the null hypothesis that two regression slopes are zero simultaneously, then you cannot reject the
null hypothesis at the 5% level, if the ellipse contains the point
A) (-1.96, 1.96).
B) (0, 1.96) .
C) (0,0).
D) (1.962 , 1.96 2 ).
Answer: C
21) The OLS estimators of the coefficients in multiple regression will have omitted variable bias
A) only if an omitted determinant of Yi is a continuous variable.
B) if an omitted variable is correlated with at least one of the regressors, even though it is not a determinant
of the dependent variable.
C) only if the omitted variable is not normally distributed.
D) if an omitted determinant of Yi is correlated with at least one of the regressors.
Answer: D
22) At a mathematical level, if the two conditions for omitted variable bias are satisfied, then
A) E(ui X1i, X2i,..., Xki) 0.
B) there is perfect multicollinearity.
C) large outliers are likely: X1i, X2i,..., Xki and Yi and have infinite fourth moments.
D) (X1i, X2i,..., Xki,Yi), i = 1,..., n are not i.i.d. draws from their joint distribution.
Answer: A
23) All of the following are true, with the exception of one condition:
A) a high R2 or R2 does not mean that the regressors are a true cause of the dependent variable.
B) a high R2 or R2 does not mean that there is no omitted variable bias.
C) a high R2 or R2 always means that an added variable is statistically significant.
D) a high R2 or R2 does not necessarily mean that you have the most appropriate set of regressors.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 157
24) The general answer to the question of choosing the scale of the variables is
A) dependent on you whim.
B) to make the regression results easy to read and to interpret.
C) to ensure that the regression coefficients always lie between -1 and 1.
D) irrelevant because regardless of the scale of the variable, the regression coefficient is unaffected.
Answer: B
25) If the estimates of the coefficients of interest change substantially across specifications,
A) then this can be expected from sample variation.
B) then you should change the scale of the variables to make the changes appear to be smaller.
C) then this often provides evidence that the original specification had omitted variable bias.
D) then choose the specification for which your coefficient of interest is most significant.
Answer: C
26) You have estimated the relationship between testscores and the student -teacher ratio under the assumption of
homoskedasticity of the error terms. The regression output is as follows: TestScore = 698.9 - 2.28×STR, and the
standard error on the slope is 0.48. The homoskedasticity -only “overall” regression F- statistic for the
hypothesis that the Regression R2 is zero is approximately
A) 0.96
B) 1.96
C) 22.56
D) 4.75
Answer: C
27) Consider a regression with two variables, in which X1i is the variable of interest and X2i is the control variable.
Conditional mean independence requires
A) E(ui|X1i, X2i) = E(ui|X2i)
B) E(ui|X1i, X2i) = E(ui|X1i)
C) E(ui|X1i) = E(ui|X2i)
D) E(ui) = E(ui|X2i)
Answer: A
28) The homoskedasticity-only F-statistic and the heteroskedasticity-robust F-statistic typically are
A) the same
B) different
C) related by a linear function
D) a multiple of each other (the heteroskedasticity-robust F-statistic is 1.96 times the homoskedasticity-only
F-statistic)
Answer: B
29) Consider the following regression output where the dependent variable is testscores and the two explanatory
variables are the student-teacher ratio and the percent of English learners:
TestScore = 698.9 - 1.10×STR - 0.650×PctEL. You are told that the t-statistic on the student-teacher ratio
coefficient is 2.56. The standard error therefore is approximately
A) 0.25
B) 1.96
C) 0.650
D) 0.43
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 158
30) The critical value of F4, at the 5% significance level is
A) 3.84
B) 2.37
C) 1.94
D) Cannot be calculated because in practice you will not have infinite number of observations
Answer: B
7.2 Essays and Longer Questions
1) The F-statistic with q = 2 restrictions when testing for the restrictions 1 = 0 and 2 = 0 is given by the
following formula:
F=
1
2
^
2
2
t 1 + t 2 - 2 t ,t t1 t
1 2
1-
^2
t1 ,t2
Discuss how this formula can be understood intuitively.
Answer: For the case when there is no correlation between the two explanatory variables, the formula reduces to
a simple average of the squared t-statistics, i.e., F =
2
2
1
t 1 + t 2 .The F2, distribution is the
2
distribution of a random variable with a chi-squared distribution with 2 degrees of freedom, divided by
2. Equivalently, the F2, distribution is the distribution of the average of 2 squared standard normal
random variables. Because the t-statistics are uncorrelated by assumption, they are independent
standard normal random variables under the null hypothesis. If either 1 or 2 are nonzero (or both),
2
2
then either t 1 or t 2 or both will be large. This leads to a large F-statistic, and hence a rejection of the
null hypothesis.
2) The cost of attending your college has once again gone up. Although you have been told that education is
investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not
pleased. One of the administrators at your university/college does not make the situation better by telling you
that you pay more because the reputation of your institution is better than that of others. To investigate this
hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000 -2001
U.S. News and World Report annual rankings. Next you perform the following regression
^
Cost = 7,311.17 + 3,985.20 × Reputation – 0.20 × Size
(2,058.63) (664.58)
(0.13)
+ 8,406.79 × Dpriv – 416.38 × Dlibart – 2,376.51 × Dreligion
(2,154.85)
(1,121.92)
(1,007.86)
R2 =0.72, SER = 3,773.35
where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News and World
Report (based on a survey of university presidents and chief academic officers), which ranges from 1 (“marginal
”) to 5 (“distinguished”), Size is the number of undergraduate students, and Dpriv, Dlibart, and Dreligion are
binary variables indicating whether the institution is private, a liberal arts college, and has a religious
affiliation. The numbers in parentheses are heteroskedasticity-robust standard errors.
(a) Indicate whether or not the coefficients are significantly different from zero.
(b) What is the p-value for the null hypothesis that the coefficient on Size is equal to zero? Based on this, should
you eliminate the variable from the regression? Why or why not?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 159
(c) You want to test simultaneously the hypotheses that size = 0 and Dilbert = 0. Your regression package
returns the F-statistic of 1.23. Can you reject the null hypothesis?
(d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes
^
Cost= 5,450.35 + 3,538.84 × Reputation + 10,935.70 × Dpriv – 2,783.31 × Dreligion;
(1,772.35) (590.49)
(875.51)
(1,180.57)
R2 =0.72, SER = 3,792.68
Why do you think that the effect of attending a private institution has increased now?
(e) You give a final attempt to bring the effect of Size back into the equation by forcing the assumption of
homoskedasticity onto your estimation. The results are as follows:
^
Cost= 7,311.17 + 3,985.20 × Reputation – 0.20 × Size
(1,985.17) (593.65)
(0.07)
+ 8,406.79 × Dpriv – 416.38 × Dlibart – 2,376.51 × Dreligion
(1,423.59)
(1,096.49)
(989.23)
R2 =0.72, SER = 3,682.02
Calculate the t-statistic on the Size coefficient and perform the hypothesis test that its coefficient is zero. Is this
test reliable? Explain.
Answer: (a) The coefficient on liberal arts colleges, is not significantly different from zero. All other coefficients
are statistically significant at conventional levels, with the exception of the size coefficient, which carries
a t-statistic of 1.54, and hence is not statistically significant at the 5% level (using a one -sided alternative
hypothesis).
(b) Using a one-sided alternative hypothesis, the p-value is 6.2 percent. Variables should not be
eliminated simply on grounds of a statistical test. The sign of the coefficient is as expected, and its
magnitude makes it important. It is best to leave the variable in the regression and let the reader decide
whether or not this is convincing evidence that the size of the university matters.
(c)The critical value for F2, is 3.00 (5% level) and 4.61 (1% level). Hence you cannot reject the null
hypothesis in this case.
(d) Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these
variables had negative coefficients.
(e) Although the coefficient would be statistically significant in this case, the test is unreliable and should
not be used for statistical inference. There is no theoretical suggestion here that the errors might be
homoskedastic. Since the standard errors are quite different here, you should use the more reliable ones,
i.e., the heteroskedasticity-robust.
3) In the multiple regression model with two explanatory variables
Yi = 0 + 1 X1i + 2 X2i + ui
the OLS estimators for the three parameters are as follows (small letters refer to deviations from means as in zi
= Zi - Z):
^
^
^
0 = Y- 1 X1 - 2 X2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 160
n
^
1=
y ix 1i
i=1
n
2
x 1i
i=1
n
^
2=
n
2
x 2i - (
i=1
y ix 2i
n
n
i=1
2
x 1i
n
x 1ix 2i
i=1
n
x 1ix 2i )2
i=1
n
2
x 1i -
y ix 1i
i=1
i=1
i=1
y ix 2i
i=1
i=1
n
n
2
x 2i -
n
2
x 2i - (
n
x 1ix 2i
i=1
n
x 1ix 2i )2
i=1
i=1
You have collected data for 104 countries of the world from the Penn World Tables and want to estimate the
effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from
1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. The various sums needed to calculate the OLS
estimates are given below:
n
Yi = 33.33;
n
X1i = 2.025;
n
X2i =17.313
i=1
i=1
n
n
2
2
2
y i = 8.3103;
x 1i = .0122;
x 2i = 0.6422
i=1
i=1
i=1
i=1
n
n
i=1
y ix 1i = - 0.2304;
n
i=1
y ix 2i = 1.5676;
n
x 1ix 2i = -0.0520
i=1
The heteroskedasticity-robust standard errors of the two slope coefficients are 1.99 (for population growth)
and 0.23 (for the saving rate). Calculate the 95% confidence interval for both coefficients. How many standard
deviations are the coefficients away from zero?
Answer: The 95% confidence interval for the population growth is (–16.85, -9.05), and the 95% confidence interval
for the saving rate is (0.94, 1.84). The population growth coefficient has a t-statistic of -6.51, and the
saving rate coefficient of 6.04. These represent standard deviations away from zero.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 161
4) A subsample from the Current Population Survey is taken, on weekly earnings of individuals, their age, and
their gender. You have read in the news that women make 70 cents to the $1 that men earn. To test this
hypothesis, you first regress earnings on a constant and a binary variable, which takes on a value of 1 for
females and is 0 otherwise. The results were:
Earn = 570.70 - 170.72 × Female, R2 =0.084, SER = 282.12.
(9.44) (13.52)
(a) Perform a difference in means test and indicate whether or not the difference in the mean salaries is
significantly different. Justify your choice of a one-sided or two-sided alternative test. Are these results
evidence enough to argue that there is discrimination against females? Why or why not? Is it likely that the
errors are normally distributed in this case? If not, does that present a problem to your test?
(b) Test for the significance of the age and gender coefficients. Why do you think that age plays a role in
earnings determination?
Answer: (a) The t-statistic is -12.63, while the critical value is –1.64. The difference is therefore statistically
significant. A one-sided alternative was chosen since the claim is that females make less than males. This
represents little evidence of discrimination, since attributes of males and females have not been included.
Given that earnings distributions are not normally distributed, the errors will also not be distributed
normally, and assuming that they are, results in problematic inference.
(b) The t-statistics are 9.36 for the age coefficient, and -13.00 for the gender coefficient. Both of these
values are greater than the (absolute) critical value from the standard normal distribution (1.64). Hence
you can reject the null hypothesis that these coefficients are zero. Age proxies “on the job training.” A
better proxy that has been used frequently in the past is the Mincer experience variable
(Age-Education-6). Obviously this is a better proxy for some subsample of individuals than for others.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 162
5) You have collected data from Major League Baseball (MLB) to find the determinants of winning. You have a
general idea that both good pitching and strong hitting are needed to do well. However, you do not know how
much each of these contributes separately. To investigate this problem, you collect data for all MLB during
1999 season. Your strategy is to first regress the winning percentage on pitching quality (“Team ERA”), second
to regress the same variable on some measure of hitting (“OPS – On -base Plus Slugging percentage”), and
third to regress the winning percentage on both.
Summary of the Distribution of Winning Percentage, On Base plus Slugging Percentage,
and Team Earned Run Average for MLB in 1999
Average Standard
deviation
Team
4.71
ERA
OPS
0.778
Winning 0.50
Percentage
Percentile
10%
25%
40%
75%
90%
4.72
50%
60%
(median)
4.78
4.91
0.53
3.84
4.35
5.06
5.25
0.034
0.08
0.720
0.40
0.754
0.43
0.769
0.46
0.780
0.48
0.798
0.59
0.820
0.60
0.790
0.49
The results are as follows:
Winpct = 0.94 – 0.100 × teamera , R2 = 0.49, SER = 0.06.
(0.08) (0.017)
Winpct = –0.68 + 1.513 × ops, R2 =0.45, SER = 0.06.
(0.17) (0.221)
Winpct = –0.19 – 0.099 × teamera + 1.490 × ops , R2 =0.92, SER = 0.02.
(0.08) (0.008)
(0.126)
(a) Use the t-statistic to test for the statistical significance of the coefficient.
(b) There are 30 teams in MLB. Does the small sample size worry you here when testing for significance?
Answer: (a) The t-statistics for team ERA and OPS are -12.38 and 11.83. Both of these are highly significant.
(b) The t-statistic is only normally distributed in large samples. As a result, inference is problematic here.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 163
6) In the process of collecting weight and height data from 29 female and 81 male students at your university, you
also asked the students for the number of siblings they have. Although it was not quite clear to you initially
what you would use that variable for, you construct a new theory that suggests that children who have more
siblings come from poorer families and will have to share the food on the table. Although a friend tells you that
this theory does not pass the “straight-face” test, you decide to hypothesize that peers with many siblings will
weigh less, on average, for a given height. In addition, you believe that the muscle/fat tissue composition of
male bodies suggests that females will weigh less, on average, for a given height. To test these theories, you
perform the following regression:
Studentw
= –229.92 – 6.52 × Female + 0.51 × Sibs+ 5.58 × Height,
(44.01) (5.52)
(2.25)
(0.62)
R2 =0.50, SER = 21.08
where Studentw is in pounds, Height is in inches, Female takes a value of 1 for females and is 0 otherwise, Sibs is
the number of siblings (heteroskedasticity-robust standard errors in parentheses).
(a) Carrying out hypotheses tests using the relevant t-statistics to test your two claims separately, is there
strong evidence in favor of your hypotheses? Is it appropriate to use two separate tests in this situation?
(b) You also perform an F-test on the joint hypothesis that the two coefficients for females and siblings are zero.
The calculated F-statistic is 0.84. Find the critical value from the F-table. Can you reject the null hypothesis? Is
it possible that one of the two parameters is zero in the population, but not the other?
(c) You are now a bit worried that the entire regression does not make sense and therefore also test for the
height coefficient to be zero. The resulting F-statistic is 57.25. Does that prove that there is a relationship
between weight and height?
Answer: (a) The t-statistics for gender and number of siblings are -1.18 and 0.23 respectively. Neither coefficient
is statistically significant at conventional levels. If you wanted to test the two hypothesis simultaneously,
then you should use an F-test.
(b) The critical value is 3.00 at the 5% level, and 4.61 at the 1% level. Hence you cannot reject the null
hypothesis. The hypothesis is that both coefficients are zero, and this cannot be rejected. Had you rejected
the null hypothesis, then the alternative hypothesis states that one or both of the restrictions do not hold.
(c) Although you cannot prove anything in this context with certainty, there is a very high probability
that there is a relationship between height and weight in the population, given the sample result. The
critical value from the F-table is 3.78 at the 1% level.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 164
7) You have collected data for 104 countries to address the difficult questions of the determinants for differences
in the standard of living among the countries of the world. You recall from your macroeconomics lectures that
the neoclassical growth model suggests that output per worker (per capita income) levels are determined by,
among others, the saving rate and population growth rate. To test the predictions of this growth model, you
run the following regression:
RelPersInc = 0.339 – 12.894 × n + 1.397 × SK , R2 =0.621, SER = 0.177
(0.068) (3.177)
(0.229)
where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate,
1980-1990, and SK is the average investment share of GDP from 1960 to1990 (remember investment equals
saving). Numbers in parentheses are for heteroskedasticity-robust standard errors.
(a) Calculate the t-statistics and test whether or not each of the population parameters are significantly
different from zero.
(b) The overall F-statistic for the regression is 79.11. What is the critical value at the 5% and 1% level? What is
your decision on the null hypothesis?
(c) You remember that human capital in addition to physical capital also plays a role in determining the
standard of living of a country. You therefore collect additional data on the average educational attainment in
years for 1985, and add this variable (Educ) to the above regression. This results in the modified regression
output:
RelPersInc = 0.046 – 5.869 × n + 0.738 × SK + 0.055 × Educ, R2 =0.775, SER = 0.1377
(0.079) (2.238)
(0.294)
(0.010)
How has the inclusion of Educ affected your previous results?
(d) Upon checking the regression output, you realize that there are only 86 observations, since data for Educ is
not available for all 104 countries in your sample. Do you have to modify some of your statements in (d)?
Answer: (a) The t-statistics for population growth and the saving rate are –4.06 and 6.10, making both coefficients
significantly different from zero at conventional levels of significance.
(b) The critical value is 3.00 and 4.61 respectively, allowing you to reject the null hypothesis that all slope
coefficients are zero.
(c) The coefficient on the population growth rate is roughly half of what it was originally, while the
coefficient on the saving rate has approximately doubled. The regression R2 has increased significantly.
(d) When comparing results, you should ensure that the sample is identical, since comparisons are not
valid otherwise. In addition, there are now less than 100 observations, making inference based on the
standard normal distribution problematic.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 165
8) Attendance at sports events depends on various factors. Teams typically do not change ticket prices from game
to game to attract more spectators to less attractive games. However, there are other marketing tools used, such
as fireworks, free hats, etc., for this purpose. You work as a consultant for a sports team, the Los Angeles
Dodgers, to help them forecast attendance, so that they can potentially devise strategies for price
discrimination. After collecting data over two years for every one of the 162 home games of the 2000 and 2001
season, you run the following regression:
Attend = 15,005 + 201 × Temperat + 465 × DodgNetWin + 82 × OppNetWin
(8,770) (121)
(169)
(26)
+ 9647 × DFSaSu + 1328 × Drain + 1609 × D150m + 271 × DDiv – 978 × D2001;
(1505)
(3355)
(1819)
(1,184)
(1,143)
R2 =0.416, SER = 6983
where Attend is announced stadium attendance, Temperat it the average temperature on game day, DodgNetWin
are the net wins of the Dodgers before the game (wins -losses), OppNetWin is the opposing team’s net wins at
the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a
value of 1 if the game was played on a weekend, it rained during that day, the opposing team was within a 150
mile radius, the opposing team plays in the same division as the Dodgers, and the game was played during
2001, respectively. Numbers in parentheses are heteroskedasticity- robust standard errors.
(a) Are the slope coefficients statistically significant?
(b) To test whether the effect of the last four binary variables is significant, you have your regression program
calculate the relevant F-statistic, which is 0.295. What is the critical value? What is your decision about
excluding these variables?
Answer: (a) The t-statistics for Temperat, DodgNewWin, OppNetWin, and DFSaSu are all statistically significant at
the 5% level, using a one-sided test. The constant is insignificant using a two-sided test. All the other
coefficients are not statistically significant at the 5% level.
(b) The critical value at the 5% level is 2.37. Hence you cannot reject the null hypothesis that all four
coefficients are simultaneously zero.
9) The administration of your university/college is thinking about implementing a policy of coed floors only in
dormitories. Currently there are only single gender floors. One reason behind such a policy might be to
generate an atmosphere of better “understanding” between the sexes. The Dean of Students (DoS) has decided
to investigate if such a behavior results in more “togetherness” by attempting to find the determinants of the
gender composition at the dinner table in your main dining hall, and in that of a neighboring university, which
only allows for coed floors in their dorms. The survey includes 176 students, 63 from your university/college,
and 113 from a neighboring institution.
The Dean’s first problem is how to define gender composition. To begin with, the survey excludes single
persons’ tables, since the study is to focus on group behavior. The Dean also eliminates sports teams from the
analysis, since a large number of single-gender students will sit at the same table. Finally, the Dean decides to
only analyze tables with three or more students, since she worries about “couples” distorting the results. The
Dean finally settles for the following specification of the dependent variable:
GenderComp= (50%-% of Male Students at Table)
Where “ Z ” stands for absolute value of Z. The variable can take on values from zero to fifty.
After considering various explanatory variables, the Dean settles for an initial list of eight, and estimates the
following relationship, using heteroskedasticity-robust standard errors (this Dean obviously has taken an
econometrics course earlier in her career and/or has an able research assistant):
GenderComp = 30.90 – 3.78 × Size – 8.81 × DCoed + 2.28 × DFemme +2.06 × DRoommate
Stock/Watson 2e -- CVC2 8/23/06 -- Page 166
(7.73) (0.63)
(2.66)
(2.42)
(2.39)
- 0.17 × DAthlete + 1.49 × DCons – 0.81 SAT + 1.74 × SibOther, R2 =0.24, SER = 15.50
(3.23)
(1.10)
(1.20)
(1.43)
where Size is the number of persons at the table minus 3; DCoed is a binary variable, which takes on the value
of 1 if you live on a coed floor; DFemme is a binary variable, which is 1 for females and zero otherwise;
DRoommate is a binary variable which equals 1 if the person at the table has a roommate and is zero otherwise;
DAthlete is a binary variable which is 1 if the person at the table is a member of an athletic varsity team; DCons
is a variable which measures the political tendency of the person at the table on a seven -point scale, ranging
from 1 being “liberal” to 7 being “conservative”; SAT is the SAT score of the person at the table measured on a
seven-point scale, ranging from 1 for the category “900-1000” to 7 for the category “1510 and above”; and
increasing by one for 100 point increases; and SibOther is the number of siblings from the opposite gender in
the family the person at the table grew up with.
(a) Indicate which of the coefficients are statistically significant.
(b) Based on the above results, the Dean decides to specify a more parsimonious form by eliminating the least
significant variables. Using the F-statistic for the null hypothesis that there is no relationship between the
gender composition at the table and DFemme, DRoommate, DAthlete, and SAT, the regression package returns a
value of 1.10. What are the degrees of freedom for the statistic? Look up the 1% and 5% critical values from the
F- table and make a decision about the exclusion of these variables based on the critical values.
(c) The Dean decides to estimate the following specification next:
GenderComp = 29.07 – 3.80 × Size – 9.75 × DCoed + 1.50 × DCons + 1.97 × SibOther,
(3.75) (0.62)
(1.04)
(1.04)
(1.44)
R2 =0.22 SER = 15.44
Calculate the t-statistics for the coefficients and discuss whether or not the Dean should attempt to simplify the
specification further. Based on the results, what might some of the comments be that she will write up for the
other senior administrators of your college? What are some of the potential flaws in her analysis? What other
variables do you think she should have considered as explanatory factors?
Answer: (a) Only the constant, Size, and DCoed are statistically significant at the 5% level.
(b ) The F4, is 2.37 at the 5% level, and 3.32 at the 1% level. Hence you cannot reject the null hypothesis
that all four coefficients are zero.
(c) The t-statistics for the five coefficients are as follows: 7.75, -6.13, -9.38, 1.44 and 1.37. The Dean
should leave the specification as is and allow readers to decide if they want to place much weight on the
insignificant coefficients. The variable of interest is DCoed and she will most likely focus on that,
concluding that having coed floors in dormitories will increase the gender balance at dining hall tables.
She will most likely go further in her report and suggest that communication between the sexes will
improve as a result of coed floors.
One of the major flaws in the analysis is that students from one college do not have coed floors in
dormitories while students from the other college do not have single gender floors. Ideally you would
like to survey students from the same college where some of the students lived on single gender floors
while others did not. Answers on omitted variables will obviously vary. Ideally some survey question
should be included which would indicate the student’s attitude towards the other sex.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 167
10) The Solow growth model suggests that countries with identical saving rates and population growth rates
should converge to the same per capita income level. This result has been extended to include investment in
human capital (education) as well as investment in physical capital. This hypothesis is referred to as the
“conditional convergence hypothesis,” since the convergence is dependent on countries obtaining the same
values in the driving variables. To test the hypothesis, you collect data from the Penn World Tables on the
average annual growth rate of GDP per worker (g6090) for the 1960-1990 sample period, and regress it on the
(i) initial starting level of GDP per worker relative to the United States in 1960 (RelProd 60), (ii) average
population growth rate of the country (n), (iii) average investment share of GDP from 1960 to1990 ( SK remember investment equals savings), and (iv) educational attainment in years for 1985 ( Educ). The results for
close to 100 countries is as follows (numbers in parentheses are for heteroskedasticity-robust standard errors):
g6090 = 0.004 - 0.172 × n + 0.133 × SK + 0.002 × Educ – 0.044 × RelProd60,
(0.007) (0.209)
2
R =0.537, SER = 0.011
(0.015)
(0.001)
(0.008)
(a) Is the coefficient on this variable significantly different from zero at the 5% level? At the 1% level?
(b) Test for the significance of the other slope coefficients. Should you use a one-sided alternative hypothesis or
a two-sided test? Will the decision for one or the other influence the decision about the significance of the
parameters? Should you always eliminate variables which carry insignificant coefficients?
Answer: (a) The coefficient has a t-statistic of 5.50 and is therefore statistically significant at both the 5% and the
1% level.
(b) The t-statistics are –0.82. 8.87, and 2.00. Hence the coefficient on population growth is not statistically
significant. You should use a one-sided alternative hypothesis test since economic theory gives you
information about the expected sign on these variables. In the above case, the decision will not be
influenced by the choice of a one-sided or two-sided test, since the (absolute value of the) critical value
is 1.64 or 1.96 at the 5% significance level. If there is a strong prior on the sign of the coefficient, then the
variable should not be eliminated based on the significance test. Instead it should be left in the equation,
but the low p-value should be flagged to the reader, and the reader should decide herself how
convincing the evidence is in favor of the theory.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 168
11) Using the 420 observations of the California School data set from your textbook, you estimate the following
relationship:
TestScore = 681.44 - 0.61LchPct
n=420, R2 =0.75, SER = 9.45
where TestScore is the test score and LchPct is the percent of students eligible for subsidized lunch
(average = 44.7, max = 100, min = 0).
a.
Interpret the regression result.
b.
In your interpretation of the slope coefficient in (a) above, does it matter if you start your explanation
with “for every x percent increase” rather than “for every x percentage point increase”?
c.
The “overall” regression F-statistic is 1149.57. What are the degrees of freedom for this statistic?
d.
Find the critical value of the F-statistic at the 1% significance level. Test the null hypothesis that the
regression R2 = 0.
e.
The above equation was estimated using heteroskedasticity robust standard errors. What is the
standard error for the slope coefficient?
Answer: a. For every 10 percentage point increase in students eligible for subsidized lunch, average test scores go
up by 6.1 points. If a school has no students eligible for subsidized lunch, then the average test score is
approximately 681 points. 75% of the variation in test scores is explained by our model.
b. Since your RHS variable is measured already in percent, it makes sense to increase that variable by 10
percentage points (say), rather than by 10 percent. If LchPct increases from 20 to 30, then this
represents an increase of 10 percentage points, or an increase of 50 percent.
c. There are 2 degrees of freedom in the numerator, and 418 ( ) degrees of freedom in the denominator.
d. F2, = 4.61. Hence you can comfortable reject the null hypothesis of no linear relationship between test
scores and the percent of students eligible for subsidized lunch.
e. With a single explanatory variable, the t-statistic is the square root of the F-statistic. Here it is 33.91.
From this result, and given the size of the coefficient, the standard error is 1.80.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 169
12) Consider the following regression using the California School data set from your textbook.
TestScore = 681.44 - 0.61LchPct
n=420, R2 =0.75, SER = 9.45
where TestScore is the test score and LchPct is the percent of students eligible for subsidized lunch
(average = 44.7, max = 100, min = 0).
a.
What is the effect of a 20 percentage point increase in the student eligible for subsidized lunch?
b.
Your textbook started with the following regression in Chapter 4:
TestScr = 698.9 - 2.28STR
n=420, R2 =0.051, SER = 18.58
where STR is the student teacher ratio.
Your textbook tells you that in the multiple regression framework considered, the percentage of
students eligible for subsidized lunch is a control variable, while the student teacher ratio is the
variable of interest. Given that the regression R2 is so much higher for the first equation than for the
second equation, shouldn’t the role of the two variables be reversed? That is, shouldn’t the student
teacher ratio be the control variable while the percent of students eligible for subsidized lunch be the
variable of interest?
Answer: a. The effect would be a 12.2 test score increase.
b. The choice of variable of interest versus control variable has nothing to do with which variable has a
higher explanatory power in the two models. Instead it depends on the question your are analyzing. In
Chapter 4, the question was raised whether or not the test scores of students could be improved by
hiring more teachers. Hence the variable of interest became class size or its proxy, the student teacher
ratio. However, there are other variables which may have an effect on test scores, and not controlling for
those will result in omitted variable bias on the coefficient of the variable of interest. Of course, the role
of a control variable and the variable of interest can be switched if a different policy question is
addressed. For example, a politician might be interest in figuring out the effect of improved student
performance if she can raise income levels in certain school districts, or across the board.
7.3 Mathematical and Graphical Problems
1) Explain carefully why testing joint hypotheses simultaneously, using the F-statistic, does not necessarily yield
the same conclusion as testing them sequentially (“one at a time” method), using a series of t-statistics.
Answer: Testing a joint hypothesis sequentially does not result in the desired significance level. Even if this were
not a problem, then the shape of the confidence set of the textbook suggests another reason for this
strategy to be problematic. Drawing a confidence interval for both parameters and extending the lines up
and to the right, results in a rectangle, indicating the area where the joint hypothesis would be rejected
using the t-statistic. Obviously the confidence set does not coincide with the rectangle, and there are
therefore various outcomes possible under which both strategies would come to the same conclusion or
different conclusions. Since the proper testing strategy involves using the F-statistic, the t-statistic could
result in improper inference under circumstances where the two areas do not coincide.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 170
2) Set up the null hypothesis and alternative hypothesis carefully for the following cases:
(a) k = 4, test for all coefficients other than the intercept to be zero
(b) k = 3, test for the slope coefficient of X1 to be unity, and the coefficients on the other explanatory variables to
be zero
(c) k = 10, test for the slope coefficient of X1 to be zero, and for the slope coefficients of X2 and X3 to be the
same but of opposite sign.
(d) k = 4, test for the slope coefficients to add up to unity
Answer: (a) H0 : 1 = 0, 2 = 0, 3 = 0, 4 = 0
(b) H0 : 1 = 1, 2 = 0, 3 = 0
(c) H0 : 1 = 0, 2 + 3 = 0
(d) H0 : 1 + 2 +
3 + 4= 1
3) Consider a situation where economic theory suggests that you impose certain restrictions on your estimated
multiple regression function. These may involve the equality of parameters, such as the returns to education
and on the job training in earnings functions, or the sum of coefficients, such as constant returns to scale in a
production function. To test the validity of your restrictions, you have your statistical package calculate the
corresponding F-statistic. Find the critical value from the F-distribution at the 5% and 1% level, and comment
whether or not you will reject the null hypothesis in each of the following cases.
(a) number of observations: 152; number of restrictions: 3; F-statistic: 3.21
(b) number of observations: 1,732; number of restrictions:7; F-statistic: 4.92
(c) number of observations: 63; number of restrictions: 1; F-statistic: 2.47
(d) number of observations: 4,000; number of restrictions: 5; F-statistic: 1.82
(e) Explain why you can use the Fq, distribution to compute the critical values in (a)-(d).
Answer: (a) F3, = 2.60 (5% level), F3, = 3.78 (1% level). Reject the null hypothesis at the 5% level, but not at the
1% level.
(b ) F7, = 2.01 (5% level), F7, = 2.64 (1% level). Reject the null hypothesis at the 5% level and at the 1%
level.
(c) F1, = 3.84 (5% level), F1, = 6.63 (1% level). Cannot reject the null hypothesis at the 5% level or at
the 1% level.
(d) F5, = 2.21 (5% level), F5, = 3.02 (1% level). Cannot reject the null hypothesis at the 5% level or at the
1% level.
(e) The F-statistic is distributed Fq, in large samples. Although strictly speaking this only holds for the
limiting case of n = , for practical purposes the approximation is close for n > 100. This is therefore
problematic for (c) above, where n = 63.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 171
4) Females, on average, are shorter and weigh less than males. One of your friends, who is a pre -med student,
tells you that in addition, females will weigh less for a given height. To test this hypothesis, you collect height
and weight of 29 female and 81 male students at your university. A regression of the weight on a constant,
height, and a binary variable, which takes a value of one for females and is zero otherwise, yields the following
result:
Studentw = –229.21 – 6.36 × Female + 5.58 × Height , R2 =0.50, SER = 20.99
(43.39) (5.74)
(0.62)
where Studentw is weight measured in pounds and Height is measured in inches (heteroskedasticity-robust
standard errors in parentheses).
Calculate t-statistics and carry out the hypothesis test that females weigh the same as males, on average, for a
given height, using a 10% significance level. What is the alternative hypothesis? What is the p-value? What
critical value did you use?
Answer: The t-statistics for the intercept, the gender binary variable, and the height variable are -5.28, -1.11, and
9.00, respectively. For a one-sided alternative hypothesis, Female < 0, the critical value from the
standard normal table is –1.28. Hence you cannot reject the null hypothesis at the 10% level. The
p-value is 13.4%.
5) You are presented with the following output from a regression package, which reproduces the regression
results of testscores on the student-teacher ratio from your textbook
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/30/06 Time: 17:44
Sample: 1 420
Included observations: 420
Variable
C
STR
Coefficient
Std. Error
t-Statistic
Prob.
9.47
0.48
73.82
-4.75
0.00
0.00
698.93
-2.28
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.05
0.05
18.58
144315.48
-1822.25
0.13
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
654.16
19.05
8.69
8.71
22.58
0.00
Std. Error are homoskedasticity only standard errors.
a) What is the relationship between the t-statistic on the student-teacher ratio coefficient and the F-statistic?
b) Next, two explanatory variables, the percent of English learners (EL_PCT) and expenditures per student
(EXPN_STU) are added. The output is listed as below. What is the relationship between the three t -statistics for
the slopes and the homoskedasticity-only F-statistic now?
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/30/06 Time: 17:55
Sample: 1 420
Included observations: 420
Stock/Watson 2e -- CVC2 8/23/06 -- Page 172
Variable
C
STR
EL_PCT
EXPN_STU
Coefficient
Std. Error
t-Statistic
Prob.
649.58
-2.29
-0.66
0.00
15.21
0.48
0.04
0.00
42.72
-0.60
-16.78
2.74
0.00
0.55
0.00
0.01
R-squared
0.44
Adjusted R-squared
0.43
S.E. of regression
14.35
Sum squared resid
85699.71
Log likelihood
-1712.81
Durbin-Watson stat
0.74
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
654.16
19.05
8.18
8.21
107.45
0.00
Answer: (a) The F-statistic tests the null hypothesis that all slope coefficients are zero. In the case of a single
explanatory variable, this is the same as testing for the significance of the explanatory variable
coefficient. In that case, the F-statistic is the same as the square of the t-statistic in the case of a single
restriction (q = 1).
(b) There is no simple relationship between the F-statistic and the three t-statistics now. The F-statistic
tests the null hypothesis that H0 : STR = EL_PCT = EXPN_STU = 0 simultaneously. The t-statistics
test the significance of each slope coefficient separately.
6) Consider the following multiple regression model
Yi = 0 + 1 X1i + 2 X2i + 3 X3i + ui
You want to consider certain hypotheses involving more than one parameter, and you know that the regression
error is homoskedastic. You decide to test the joint hypotheses using the homoskedasticity -only F-statistics.
For each of the cases below specify a restricted model and indicate how you would compute the F-statistic to
test for the validity of the restrictions.
(a) 1 = - 2 ; 3 = 0
(b)
(c)
1+ 2+ 3=1
1 = 2; 3 = 0
Answer: (a) The restricted model is Yi = 0 + 2 (X2i - X1i) + ui = 0 and the rule-of-thumb F-statistic would be F
(SSRrestricted - SSRunrestricted/2
.
=
SSRunrestricted/n - 3-1
(b) (Yi - X3i) =
0 + 1 (X1i - X3i) + 2 (X2i - X3i) + ui and the rule-of-thumb F-statistic would be F =
(SSRrestricted - SSRunrestricted/1
SSRunrestricted/n - 3-1
(c) Yi = 0 + ( 1 X1i + X2i) + ui and the homoskedasticity-only F-statistic would be
(SSRrestricted - SSRunrestricted/2
F=
SSRunrestricted/(n - 3-1)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 173
7) Give an intuitive explanation for F =
(SSRrestricted - SSRunrestricted/q
. Name conditions under which the
SSRunrestricted/(n - kunrestricted -1)
F-statistic is large and hence rejects the null hypothesis.
Answer: First rewrite
(SSRrestricted - SSRunrestricted/q
(SSRrestricted - SSRunrestricted (n - kunrestricted -1)
F=
=
×
SSRunrestricted/(n - kunrestricted -1)
SSRunrestricted
q
The numerator for the first expression is the difference between the sum of squared residuals between
the restricted and the unrestricted model. Anytime you place restrictions on the model, the SSR will
increase (or, strictly speaking, at least no decrease). Hence if the explanatory power ( SSR) of your
regression decreases (increase) by much as a result of the restrictions you have placed on the model, then
the numerator will be large. However, the SSR depend on units of measurement. To make the first
expression independent of the units of measurement, the difference is divided by the unrestricted
residual sums of squares. The first fraction now represents the percentage increase in the SSR that result
from the imposition of the restrictions. The second fraction has the degrees of freedom of the
denominator in its numerator, and the degrees of freedom of the numerator in its denominator. The
degrees of freedom of the numerator is the difference of the degrees of freedom of the restricted and the
unrestricted regression respectively, i.e., (n - krestricted -1) - (n - kunrestricted -1) = kunrestricted krestricted = q. As the degrees of freedom (number of observations) increase, we are closer to observing
the population rather than the sample. Since the null hypothesis is a statement about the population,
even small differences in parameters should become statistically significant eventually.
8) Prove that
(SSRrestricted - SSRunrestricted/q
F=
=
SSRunrestricted/(n - kunrestricted -1)
2
2
R unrestricted - R restricted /q
2
1- R unrestricted /(n-kunrestricted - 1)
Answer: Note that SSR = TSS - ESS. Hence we get
(TSS - ESS restricted- (TSS - ESS unrestricted))/q
F=
. Next, dividing numerator and denominator by TSS,
(TSS - ESS unrestricted)(n - kunrestricted -1)
ESS unrestricted)
gives us F =
TSS
-
TSS - ESS unrestricted)
TSS
ESS restricted)
TSS
/q
. Since R2 =
/(n - kunrestricted - 1)
we were looking for.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 174
ESS
, this gives us the expression
TSS
9) To calculate the homoskedasticity-only overall regression F-statistic, you need to compare the SSR restricted
with the SSRunrestricted. Consider the following output from a regression package, which reproduces the
regression results of testscores on the student-teacher ratio, the percent of English learners, and the
expenditures per student from your textbook:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/30/06 Time: 17:55
Sample: 1 420
Included observations: 420
Variable
Coefficient
C
STR
EL_PCT
EXPN_STU
649.58
-0.29
-0.66
0.00
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.44
0.43
14.35
85699.71
-1712.81
0.74
Std. Error
t-Statistic
Prob.
15.21
0.48
0.04
0.00
42.72
-0.60
-16.78
2.74
0.00
0.55
0.00
0.01
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
654.16
19.05
8.18
8.21
107.45
0.00
Sum of squared resid corresponds to SSRunrestricted. How are you going to find SSRrestricted?
^
Answer: You could simply run a regression of Testscr on a constant. However, for the case the Testscoret = 0 +
^
STR× STRi +
^
^
EL_PCT × EL_PCT i +
^
^
^
EXPN_STU × EXPN_STU + ui restricted residuals are Yi =
+ ui, and for the restricted sum of square residuals, you get simply the variation in test scores
n
SSRrestricted =
(Testscore i - Testscore)2 .
i=1
10) Adding the Percent of English Speakers (PctEL) to the Student Teacher Ratio (STR) in your textbook reduced
the coefficient for STR from 2.28 to 1.10 with a standard error of 0.43. Construct a 90% and 99% confidence
interval to test the hypothesis that the coefficient of STR is 2.28.
Answer: The 90% confidence interval is (1.10± 1.64 × 0.43) = (0.39, 1.81). The 99% confidence interval is (-0.01,
2.21). Hence you can reject the null hypothesis at both the 90% and 99% confidence level.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 175
^
0
11) The homoskedasticity only F-statistic is given by the formula
F=
(SSRrestricted - SSRunrestricted)/q
SSRunrestricted/(n-kunrestricted - 1)
where SSRrestricted is the sum of squared residuals from the restricted regression, SSRunrestricted is the sum of
squared residuals from the unrestricted regression, q is the number of restrictions under the null hypothesis,
and kunrestricted is the number of regressors in the unrestricted regression. Prove that this formula is the same
as the following formula based on the regression R2 of the restricted and unrestricted regression:
F=
(ESS unrestricted - ESS restricted)/q
1- ESSunrestricted/(n-kunrestricted - 1)
Answer: Note that SSR = TSS - ESS. Hence we get
(TSS - ESS restricted - (TSS - ESS unrestricted))/q
F=
, which gives the above expression once the TSS in
(TSS - ESS unrestricted)/(n-kunrestricted - 1)
the numerator are cancelled.
12) Trying to remember the formula for the homoskedasticity-only F-statistic, you forgot whether you subtract the
restricted SSR from the unrestricted SSR or the other way around. Your professor has provided you with a
table containing critical values for the F distribution. How can this be of help?
Answer: All the values in the F table are positive. Hence the correct answer must produce a positive value in the
numerator and denominator (or negative expressions in both). But
SSR? - SSR?)/q
F=
and hence the denominator is positive. Hence for the numerator
SSRunrestricted/(n - kunrestricted -1)
to be also positive, you must have SSRrestricted - SSRunrestricted.
13) Consider the following regression output for an unrestricted and a restricted model.
Unrestricted model:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:35
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
MEAL_PCT
CALW_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
658.47
-0.76
-0.19
11.69
-0.37
-0.07
0.80
0.79
8.64
30888.64
-1498.51
1.51
Std. Error
t-Statistic
Prob.
7.68
85.73
0.00
0.23
0.03
1.74
0.04
0.06
-3.27
-5.62
6.71
-9.53
-1.21
0.00
0.00
0.00
0.00
0.23
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
654.16
19.05
7.16
7.22
324.94
0.00
Stock/Watson 2e -- CVC2 8/23/06 -- Page 176
Restricted model:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:37
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
593.48
-0.39
-0.43
28.36
0.71
0.71
10.26
43792.42
-1571.82
1.30
Std. Error
6.96
0.27
0.03
1.40
t-Statistic
85.32
-1.42
-14.34
20.32
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
Prob.
0.00
0.16
0.00
0.00
654.16
19.05
7.50
7.54
342.98
0.00
Calculate the homoskedasticity only F-statistic and determine whether the null hypothesis can be rejected at
the 5% significance level.
Answer: There are two restrictions, namely H0 : meal_pct = 0, calw_pct = 0. The F-statistic is
43792.42
420 - 5 - 1
F=
-1 ×
= 86.47. The 5% critical value from the F2, distribution is 3.00. Hence we
30888.64
2
easily reject the two restrictions at the 5% level of significance.
14) Consider the regression output from the following unrestricted model:
Unrestricted model:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:35
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
MEAL_PCT
CALW_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
658.47
-0.76
-0.19
11.69
-0.37
-0.07
0.80
0.79
8.64
30888.64
-1498.51
1.51
Std. Error
t-Statistic
7.68
0.23
0.03
1.74
0.04
0.06
85.73
-3.27
-5.62
6.71
-9.53
-1.21
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
Prob.
0.00
0.00
0.00
0.00
0.00
0.23
654.16
19.05
7.16
7.22
324.94
0.00
Stock/Watson 2e -- CVC2 8/23/06 -- Page 177
To test for the null hypothesis that neither coefficient on the percent eligible for subsidized lunch nor the
coefficient on the percent on public income assistance is statistically significant, you have your statistical
package plot the confidence set. Interpret the graph below and explain what it tells you about the null
hypothesis.
Answer: The dot in the center of the ellipse is the point estimate for the two coefficients (-0.37,-0.07). Since the
(0,0) point is not inside the ellipse, you reject the null hypothesis.
15) Consider the regression model Yi = 0 + 1 X1i + 2 X2i+ 3 X3i + ui. Use “Approach #2” from Section 7.3 to
transform the regression so that you can use a t-statistic to test:
1=
2
3
Answer: This is not a linear restriction. Hence you cannot use the F-test to test for its validity.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 178
16) Consider the following Cobb-Douglas production function Yi = AK
1 2 ui
L
e (where Y is output, A is the
i
i
level of technology, K is the capital stock, and L is the labor force), which has been linearized here (by using
logarithms) to look as follows:
*
0 + 1 ki + 2 li + ui
yi =
Assuming that the errors are heteroskedastic, you want to test for constant returns to scale. Using a t-statistic
and “Approach #2,” how would you proceed.
Answer: Under constant returns to scale, 1 + 2 = 1. Hence you need to transform the unrestricted model above
by subtracting l from both sides, and by adding and subtracting 1 li. This results in (y i - li) =
*
0 + 1 (ki
- li) + ( 1 + 2 - 1) li + ui. The left hand side variable is now the (log of the) output-labor ratio, and the
first explanatory variable on the right hand side is the (log of the) capital-labor ratio. If the null
hypothesis of constant returns to scale holds, then the coefficient on l should be zero. This can be directly
tested using a t-statistic.
17) Consider the following two models to explain testscores.
Model 1:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:52
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
MEAL_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
658.55
-0.73
-0.18
11.57
-0.40
0.80
0.79
8.64
30998.01
-1499.25
1.52
Std. Error
7.68
0.23
0.03
1.74
0.02
t-Statistic
85.70
-3.18
-5.52
6.65
-13.09
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
Prob.
0.00
0.00
0.00
0.00
0.00
654.16
19.05
7.16
7.21
405.36
0.00
Model 2:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:56
Sample: 1 420
Included observations: 420
Stock/Watson 2e -- CVC2 8/23/06 -- Page 179
Variable
C
STR
EL_PCT
LOG(AVGINC)
CALW_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
620.92
-0.66
-0.39
21.87
-0.41
0.75
0.75
9.53
37659.29
-1540.13
1.41
Std. Error
t-Statistic
Prob.
7.27
0.25
0.03
1.52
0.05
85.41
-2.58
-14.05
14.41
-8.22
0.00
0.01
0.00
0.00
0.00
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
654.16
19.05
7.36
7.41
315.31
0.00
Explain why you cannot use the F-test in this situation to discriminate between Model 1 and Model 2.
Answer: Neither model is contained (“nested”) in the other, in the sense that you cannot place restrictions on
Model 1 to obtain Model 2 (and vice versa). Hence there is no unrestricted and restricted model in this case.
18) Your textbook has emphasized that testing two hypothesis sequentially is not the same as testing them
simultaneously. Consider the following confidence set below, where you are testing the hypothesis that H0 : 5
= 0, 6 = 0.
Your statistical package has also generated a dotted area, which corresponds to drawing two confidence
intervals for the respective coefficients. For each case where the ellipse does not coincide in area with the
corresponding rectangle, indicate what your decision would be if you relied on the two confidence intervals vs.
the ellipse generated by the F-statistic.
Answer: The following possible outcomes can be seen in the figure above: (i) both F-statistic and the two
confidence intervals generate the same result; (ii) you do not reject the null hypothesis using the
F-statistic, but you do so by using the confidence intervals (these are the points in the area at the “tip” of
the ellipse); (iii) you reject the null hypothesis using the confidence intervals but not the F-statistic.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 180
19) You have estimated the following regression to explain hourly wages, using a sample of 250 individuals:
AHE i = -2.44 - 1.57 × DFemme + 0.27 × DMarried + 0.59 × Educ + 0.04 × Exper - 0.60 × DNonwhite
(1.29) (0.33)
(0.36)
(0.09)
(0.01)
(0.49)
+ 0.13 × NCentral - 0.11 × South
(0.59)
(0.58)
2
R = 0.36, SER = 2.74, n = 250
Numbers in parenthesis are heteroskedasticity robust standard errors. Add “*”(5%) and “**” (1%) to indicate
statistical significance of the coefficients.
Answer: AHE i = -2.44 - 1.57 × DFemme + 0.27 × DMarried + 0.59 **× Educ
)
(1.29) (0.33)
(0.36)
(0.09)
+ 0.04** × Exper - 0.60 × DNonwhite + 0.13 × NCentral - 0.11 × South
(0.01)
(0.49)
(0.59)
(0.57)
20) You have estimated the following regression to explain hourly wages, using a sample of 250 individuals:
AHE = -2.44 - 1.57 × DFemme + 0.27 × DMarried + 0.59 × Educ + 0.04 × Exper - 0.60 × DNonwhite
(1.29) (0.33)
(0.36)
(0.09)
(0.01)
(0.49)
+0.13 × NCentral - 0.11 × South
(0.59)
(0.58)
R2 = 0.36, SER = 2.74, n = 250
Test the null hypothesis that the coefficients on DMarried, DNonwhite, and the two regional variables, NCentral
and South are zero. The F-statistic for the null hypothesis married = nonwhite = nonwhite = ncentral = south
= 0 is 0.61. Do you reject the null hypothesis?
Answer: The critical value for F4, =3.32 at the 1% significance level. Hence you cannot reject the null hypothesis.
21) Using the California School data set from your textbook, you decide to run a regression of the average reading
score (ScrRead) on the average mathematics score (ScrMaths). The result is as follows, where the numbers in
parenthesis are homoskedasticity only standard errors:
ScrRead = 8.47 + 0.9895×ScrMaths
(13.20) (0.0202)
N = 420, R2 = 0.85, SER = 7.8
You believe that the average mathematics score is an unbiased predictor of the average reading score. Consider
the above regression to be the unrestricted from which you would calculate SSRUnrestricted . How would you
find the SSRRestricted? How many restrictions would have to impose?
Answer: Since the restricted regression would read ScrRead = 0 + 1×ScrMaths, you would need to calculate
n
(ScrRead i-ScrMathsi)2 . Using the F-test to simultaneously test for a zero intercept coefficient and a
i=1
unit slope coefficient, you would have to impose two restrictions (q = 2).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 181
22) Looking at formula (7.13) in your textbook for the homoskedasticity-only F-statistic,
F=
(SSR restricted - SSR unrestricted)/q
SSRunrestricted / (n - k unrestricted-1)
give three conditions under which, ceteris paribus, you would find a large value, and hence would be
likely to reject the null hypothesis.
Answer: The F-statistic will be larger for (i) large percentage changes in the SSR between the restricted and the
unrestricted regression; (ii) smaller number of restrictions (q); (iii) larger sample size (large number of
degrees of freedom).
23) Analyzing a regression using data from a sub-sample of the Current Population Survey with about 4,000
observations, you realize that the regression R2 , and the adjusted R2 , R2 , are almost identical. Why is that the
case? In your textbook, you were told that the regression R2 will almost always increase when you add an
explanatory variable, but that the adjusted measure does not have to increase with such an addition. Can this
still be true?
Answer: The difference between the two measures is the adjustment by the degrees of freedom. Once the number
of observations become very large, it does not matter how many explanatory variables you have in your
regression, the ratio of (n-1) being roughly the same as (n-k-1). As a result, the adjusted measure will
also almost always increase with the addition of another explanatory variable.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 182
Chapter 8 Nonlinear Regression Functions
8.1 Multiple Choice
1) In nonlinear models, the expected change in the dependent variable for a change in one of the explanatory
variables is given by
A) Y = f(X1 + X1 , X2 ,... Xk).
B)
Y = f(X1 +
X1 , X2 + X2 ,..., Xk+ Xk)- f(X1 , X2 ,...Xk).
X1 , X2 ,..., Xk)- f(X1 , X2 ,...Xk).
C)
Y = f(X1 +
D)
Y = f(X1 + X1 , X2 ,..., Xk)- f(X1 , X2 ,...Xk).
Answer: C
2) The interpretation of the slope coefficient in the model Yi = 0 + 1 ln(Xi) + ui is as follows:
A) a 1% change in X is associated with a 1 % change in Y.
B) a 1% change in X is associated with a change in Y of 0.01 1 .
C) a change in X by one unit is associated with a 1 100% change in Y.
D) a change in X by one unit is associated with a 1 change in Y.
Answer: B
3) The interpretation of the slope coefficient in the model ln(Yi) = 0 + 1 Xi + ui is as follows:
A) a 1% change in X is associated with a 1 % change in Y.
B) a change in X by one unit is associated with a 100 1 % change in Y.
C) a 1% change in X is associated with a change in Y of 0.01 1 .
D) a change in X by one unit is associated with a 1 change in Y.
Answer: B
4) The interpretation of the slope coefficient in the model ln(Yi) = 0 + 1 ln(Xi)+ ui is as follows:
A) a 1% change in X is associated with a 1 % change in Y.
B) a change in X by one unit is associated with a 1 change in Y.
C) a change in X by one unit is associated with a 100 1 % change in Y.
D) a 1% change in X is associated with a change in Y of 0.01 1 .
Answer: A
5) In the case of regression with interactions, the coefficient of a binary variable should be interpreted as follows:
A) there are really problems in interpreting these, since the ln(0) is not defined.
B) for the case of interacted regressors, the binary variable coefficient represents the various intercepts for
the case when the binary variable equals one.
C) first set all explanatory variables to one, with the exception of the binary variables. Then allow for each of
the binary variables to take on the value of one sequentially. The resulting predicted value indicates the
effect of the binary variable.
D) first compute the expected values of Y for each possible case described by the set of binary variables. Next
compare these expected values. Each coefficient can then be expressed either as an expected value or as
the difference between two or more expected values.
Answer: D
6) The following interactions between binary and continuous variables are possible, with the exception of
A) Yi = 0 + 1 Xi + 2 Di + 3 (Xi × Di) + ui.
B) Yi = 0 + 1 Xi + 2 (Xi × Di) + ui.
C) Yi = ( 0 + Di) + 1 Xi + ui.
D) Yi = 0 + 1 Xi + 2 Di + ui.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 183
7) An example of the interaction term between two independent, continuous variables is
A) Yi = 0 + 1 Xi + 2 Di + 3 (Xi × Di) + ui.
B) Yi = 0 + 1 X1i + 2 X2i + ui.
C) Yi = 0 + 1 D1i + 2 D2i + 3 (D1i × D2i) + ui.
D) Yi = 0 + 1 X1i + 2 X2i + 3 (X1i × X2i) + ui.
Answer: D
8) Including an interaction term between two independent variables, X1 and X2 , allows for the following except:
A) the interaction term lets the effect on Y of a change in X1 depend on the value of X2 .
B) the interaction term coefficient is the effect of a unit increase in X1 and X2 above and beyond the sum of
the individual effects of a unit increase in the two variables alone.
C) the interaction term coefficient is the effect of a unit increase in (X1 × X2 ).
D) the interaction term lets the effect on Y of a change in X2 depend on the value of X1 .
Answer: C
9) A nonlinear function
A) makes little sense, because variables in the real world are related linearly.
B) can be adequately described by a straight line between the dependent variable and one of the explanatory
variables.
C) is a concept that only applies to the case of a single or two explanatory variables since you cannot draw a
line in four dimensions.
D) is a function with a slope that is not constant.
Answer: C
10) An example of a quadratic regression model is
A) Yi = 0 + 1 X + 2 Y2 + ui.
B) Yi = 0 + 1 ln(X) + ui.
C) Yi = 0 + 1 X + 2 X2 + ui.
2
D) Y i = 0 + 1 X + ui.
Answer: C
11) (Requires Calculus) In the equation TestScore = 607.3 + 3.85 Income – 0.0423Income2 , the following income level
results in the maximum test score
A) 607.3.
B) 91.02.
C) 45.50.
D) cannot be determined without a plot of the data.
Answer: C
12) To decide whether Yi = 0 + 1 X + ui or ln(Yi) = 0 + 1 X + ui fits the data better, you cannot consult the
regression R2 because
A) ln(Y) may be negative for 0<Y<1.
B) the TSS are not measured in the same units between the two models.
C) the slope no longer indicates the effect of a unit change of X on Y in the log-linear model.
D) the regression R2 can be greater than one in the second model.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 184
13) You have estimated the following equation:
TestScore = 607.3 + 3.85 Income – 0.0423 Income2 ,
where TestScore is the average of the reading and math scores on the Stanford 9 standardized test administered
to 5th grade students in 420 California school districts in 1998 and 1999. Income is the average annual per capita
income in the school district, measured in thousands of 1998 dollars. The equation
A) suggests a positive relationship between test scores and income for most of the sample.
B) is positive until a value of Income of 610.81.
C) does not make much sense since the square of income is entered.
D) suggests a positive relationship between test scores and income for all of the sample.
Answer: A
14) A polynomial regression model is specified as:
2
r
A) Yi = 0 + 1 Xi + 2 X + ··· + rX + ui.
i
i
2
r
B) Yi = 0 + 1 Xi +
X + ··· +
X + ui.
1 i
1 i
2
r
C) Yi = 0 + 1 Xi + 2 Y + ··· + rY + ui.
i
i
D) Yi = 0 + 1 X1i + 2 X2 +
3 (X1i × X2i) + ui.
Answer: A
15) For the polynomial regression model,
A) you need new estimation techniques since the OLS assumptions do not apply any longer.
B) the techniques for estimation and inference developed for multiple regression can be applied.
C) you can still use OLS estimation techniques, but the t-statistics do not have an asymptotic normal
distribution.
D) the critical values from the normal distribution have to be changed to 1.96 2 , 1.96 3 , etc.
Answer: B
16) To test whether or not the population regression function is linear rather than a polynomial of order r,
A) check whether the regression R2 for the polynomial regression is higher than that of the linear regression.
B) compare the TSS from both regressions.
C) look at the pattern of the coefficients: if they change from positive to negative to positive, etc., then the
polynomial regression should be used.
D) use the test of (r-1) restrictions using the F-statistic.
Answer: D
17) The best way to interpret polynomial regressions is to
A) take a derivative of Y with respect to the relevant X.
B) plot the estimated regression function and to calculate the estimated effect on Y associated with a change
in X for one or more values of X.
C) look at the t-statistics for the relevant coefficients.
D) analyze the standard error of estimated effect.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 185
18) The exponential function
A) is the inverse of the natural logarithm function.
B) does not play an important role in modeling nonlinear regression functions in econometrics.
C) can be written as exp(ex ).
D) is ex , where e is 3.1415….
Answer: A
19) The following are properties of the logarithm function with the exception of
A) ln(1/ x) = -ln(x).
B) ln(a + x) = ln(a) + ln(x).
C) ln(ax) = ln(a) + ln(x).
D) ln(x a) a ln(x).
Answer: B
20) The binary variable interaction regression
A) can only be applied when there are two binary variables, but not three or more.
B) is the same as testing for differences in means.
C) cannot be used with logarithmic regression functions because ln(0) is not defined.
D) allows the effect of changing one of the binary independent variables to depend on the value of the other
binary variable.
Answer: D
21) In the regression model Yi = 0 + 1 Xi + 2 Di + 3 (Xi × Di) + ui , where X is a continuous variable and D is a
binary variable, 3
A) indicates the slope of the regression when D=1.
B) has a standard error that is not normally distributed even in large samples since D is not a normally
distributed variable.
C) indicates the difference in the slopes of the two regressions.
D) has no meaning since (Xi × Di) = 0 when Di = 0.
Answer: C
22) In the regression model Yi = 0 + 1 Xi + 2 Di + 3 (Xi × Di) + ui , where X is a continuous variable and D is a
binary variable, 2
A) is the difference in means in Y between the two categories.
B) indicates the difference in the intercepts of the two regressions.
C) is usually positive.
D) indicates the difference in the slopes of the two regressions.
Answer: B
23) In the regression model Yi = 0 + 1 Xi + 2 Di + 3 (Xi × Di) + ui , where X is a continuous variable and D is a
binary variable, to test that the two regressions are identical, you must use the
A) t-statistic separately for 2 = 0, 2 = 0.
B) F-statistic for the joint hypothesis that
0 = 0,
1 = 0.
3 = 0.
D) F-statistic for the joint hypothesis that
2 = 0,
3 = 0.
C) t-statistic separately for
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 186
Y
is
X1
24) In the model Yi = 0 + 1 X1 + 2 X2 + 3 (X1 × X2 ) + ui, the expected effect
A) 1 + 3 X2 .
B) 1 .
C) 1 + 3 .
D) 1 + 3 X1 .
Answer: A
25) In the log-log model, the slope coefficient indicates
A) the effect that a unit change in X has on Y.
B) the elasticity of Y with respect to X.
C) Y / X.
Y Y
D)
× .
X X
Answer: B
26) In the model ln(Yi) = 0 + 1 Xi + ui, the elasticity of E(Y|X) with respect to X is
A) 1 X
B) 1
C)
1X
+
0 1X
D) cannot be calculated because the function is non-linear
Answer: A
27) Assume that you had estimated the following quadratic regression model
TestScore = 607.3 + 3.85 Income - 0.0423 Income2 . If income increased from 10 to 11 ($10,000 to $11,000), then the
predicted effect on testscores would be
A) 3.85
B) 3.85-0.0423
C) Cannot be calculated because the function is non-linear
D) 2.96
Answer: D
2
r
28) Consider the polynomial regression model of degree Yi = 0 + 1 Xi + 2 X i + ...+ r X i + ui. According to the
null hypothesis that the regression is linear and the alternative that is a polynomial of degree r corresponds to
A) H0 : r = 0 vs. r 0
B) H0 : r = 0 vs. 1 0
C) H0 : 3 = 0, ..., r = 0, vs. H1 : all j 0, j = 3, ..., r
D) H0 : 2 = 0, 3 = 0 ..., r = 0, vs. H1 : at least one j
0, j = 2, ..., r
Answer: D
29) Consider the following least squares specification between testscores and the student -teacher ratio:
TestScore = 557.8 + 36.42 ln (Income). According to this equation, a 1% increase income is associated with an
increase in test scores of
A) 0.36 points
B) 36.42 points
C) 557.8 points
D) cannot be determined from the information given here
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 187
30) Consider the population regression of log earnings [Yi, where Yi = ln(Earnings i)] against two binary variables:
whether a worker is married (D1i, where D1i=1 if the ith person is married) and the worker’s gender ( D2i,
where D2i=1 if the ith person is female), and the product of the two binary variables Yi = 0 + 1 D1i + 2 D2i +
3 (D1i×D2i) + ui. The interaction term
A) allows the population effect on log earnings of being married to depend on gender
B) does not make sense since it could be zero for married males
C) indicates the effect of being married on log earnings
D) cannot be estimated without the presence of a continuous variable
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 188
8.2 Essays and Longer Questions
1) Females, it is said, make 70 cents to the dollar in the United States. To investigate this phenomenon, you collect
data on weekly earnings from 1,744 individuals, 850 females and 894 males. Next, you calculate their average
weekly earnings and find that the females in your sample earned $346.98, while the males made $517.70.
(a) Calculate the female earnings in percent of the male earnings. How would you test whether or not this
difference is statistically significant? Give two approaches.
(b) A peer suggests that this is consistent with the idea that there is discrimination against females in the labor
market. What is your response?
(c) You recall from your textbook that additional years of experience are supposed to result in higher earnings.
You reason that this is because experience is related to “on the job training.” One frequently used measure for
(potential) experience is “Age-Education-6.” Explain the underlying rationale. Assuming, heroically, that
education is constant across the 1,744 individuals, you consider regressing earnings on age and a binary
variable for gender. You estimate two specifications initially:
Earn = 323.70 + 5.15 × Age – 169.78 × Female, R2 =0.13, SER=274.75
(21.18) (0.55)
(13.06)
Ln(Earn) = 5.44 + 0.015 × Age – 0.421 × Female, R2 =0.17, SER=0.75
(0.08) (0.002)
(0.036)
where Earn are weekly earnings in dollars, Age is measured in years, and Female is a binary variable, which
takes on the value of one if the individual is a female and is zero otherwise. Interpret each regression carefully.
For a given age, how much less do females earn on average? Should you choose the second specification on
grounds of the higher regression R2 ?
(d) Your peer points out to you that age-earning profiles typically take on an inverted U-shape. To test this
idea, you add the square of age to your log-linear regression.
Ln(Earn) = 3.04 + 0.147 × Age – 0.421 × Female – 0.0016 Age2 ,
(0.18) (0.009)
(0.033)
(0.0001)
R2 =0.28, SER=0.68
Interpret the results again. Are there strong reasons to assume that this specification is superior to the previous
one? Why is the increase of the Age coefficient so large relative to its value in (c)?
(e) What other factors may play a role in earnings determination?
Answer: (a) Female earnings are at 67 percent of male earnings. The difference in means test described in section
3.4 of the text. The t-statistic for comparison of two means is given in equation (3.20), which is one way
to test for statistical significance. The alternative is to run a regression of earnings on a constant and a
binary variable, which takes on the value of one for females and is zero otherwise. Using a t-test on the
slope of the binary variable amounts to the same test as the difference in means (section 4.7 in the text).
(b) Differences in attributes of the individuals, such as education, ability, and tenure with an employer,
have not been taken into account. Hence, in itself, this is weak evidence, at best, for discrimination.
(c) The potential experience variable is a reasonable proxy for “on the job training” if the individual
started to work after completing her or his education, and stayed employed thereafter. Hence this is a
better proxy for some than for others.
The linear specification suggests that for every additional year the individual receives $5.15 of additional
weekly earnings on average. Females make $167.78 less than males at a given age. There is no data close
to the origin, so the intercept should not be interpreted. The regression explains 13 percent of the
variation in earnings.
The log-linear specification says that earnings increase by 1.5 percent for every additional year in an
Stock/Watson 2e -- CVC2 8/23/06 -- Page 189
individual’s life. Females earn approximately 42.1 percent less than males at a given age. Again, the
intercept should not be interpreted. The regression explains 17 percent of the variation in the log of
earnings. You should not prefer this specification over the linear one on grounds of the higher regression
R2 since these cannot be compared as a result of the difference in the units of measurement of the
dependent variable.
(d) The coefficient on the added variable is statistically significant and has resulted in a substantial
increase in the regression R2 . The increase in the Age coefficient is due to the fact that earnings increase
more initially than later in life or, mathematically speaking, it compensates for the negative coefficient on
Age2 , which lowers earnings as individuals become older.
(e) Students’ answers will differ, but education, ability, regional differences, race, and professional
choice are often mentioned.
2) An extension of the Solow growth model that includes human capital in addition to physical capital, suggests
that investment in human capital (education) will increase the wealth of a nation (per capita income). To test
this hypothesis, you collect data for 104 countries and perform the following regression:
RelPersInc = 0.046 – 5.869 × gpop + 0.738 × SK + 0.055 × Educ, R2 =0.775, SER = 0.1377
(0.079) (2.238)
(0.294)
(0.010)
where RelPersInc is GDP per worker relative to the United States, gpop is the average population growth rate,
1980 to1990, sK is the average investment share of GDP from 1960 to1990, and Educ is the average educational
attainment in years for 1985. Numbers in parentheses are for heteroskedasticity -robust standard errors.
(a) Interpret the results and indicate whether or not the coefficients are significantly different from zero. Do the
coefficients have the expected sign?
(b) To test for equality of the coefficients between the OECD and other countries, you introduce a binary
variable (DOECD), which takes on the value of one for the OECD countries and is zero otherwise. To conduct
the test for equality of the coefficients, you estimate the following regression:
RelPersInc = -0.068 – 0.063 × gpop + 0.719 × SK + 0.044 × Educ,
(0.072) (2.271)
(0.365)
(0.012)
0.381 × DOECD – 8.038 × (DOECD × gpop)- 0.430 × (DOECD × SK)
(0.184)
(5.366)
(0.768)
+0.003 × (DOECD × Educ), R2 =0.845, SER = 0.116
(0.018)
Write down the two regression functions, one for the OECD countries, the other for the non -OECD countries.
The F- statistic that all coefficients involving DOECD are zero, is 6.76. Find the corresponding critical value
from the F table and decide whether or not the coefficients are equal across the two sets of countries.
(c) Given your answer in the previous question, you want to investigate further. You first force the same slopes
across all countries, but allow the intercept to differ. That is, you reestimate the above regression but set
DOECD×gpop = DOECD×SK = DOECD×Educ = 0. The t-statistic for DOECD is 4.39. Is the coefficient, which
was 0.241, statistically significant?
(d) Your final regression allows the slopes to differ in addition to the intercept. The F-statistic for
DOECD×gpop = DOECD×SK = DOECD×Educ = 0 is 1.05. What is your decision? Each one of the t-statistics
is also smaller than the critical value from the standard normal table. Which test should you use?
(e) Looking at the tests in the two previous questions, what is your conclusion?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 190
Answer: (a) A one percentage point decrease in the population growth rate increases GDP per worker relative to
the United States by roughly 0.06. An increase in the investment share of 0.1 results in an increase of
GDP per worker relative to the United States by approximately 0.07. For every additional year of
average educational attainment, the increase is 0.055. The intercept should not be interpreted. The
regression explains 77.5 percent of the variation in relative productivity. All coefficients are significantly
different from zero at conventional levels. All coefficients carry the expected sign.
(b) The regression for the non-OECD countries is
RelPerInc = -0.068 – 0.063 × gpop + 0.719 × SK + 0.044 × Educ.
For the OECD countries we get
RelPerInc = 0.313 – 8.101 × gpop + 0.289 × SK + 0.047 × Educ.
The critical value is 3.32 at the 1% level and hence you can reject the null hypothesis that the coefficients
are equal.
(c) Answer: Given the critical value, the coefficient is statistically significant, that is, you can reject
DOECD = 0.
(d) Given the critical value of 3.78 at the 1% level, you cannot reject the null hypothesis that the
additional coefficients are all zero. The F-test is the proper procedure to use when testing for
simultaneous restrictions.
(e) There is evidence that the slopes can be set equal. However, there seems to be a level difference
between the two groups of countries.
3) You have been asked by your younger sister to help her with a science fair project. During the previous years
she already studied why objects float and there also was the inevitable volcano project. Having learned
regression techniques recently, you suggest that she investigate the weight -height relationship of 4 th to 6th
graders. Her presentation topic will be to explain how people at carnivals predict weight. You collect data for
roughly 100 boys and girls between the ages of nine and twelve and estimate for her the following relationship:
Weight = 45.59 + 4.32 × Height4 , R2 = 0.55, SER = 15.69
(3.81) (0.46)
where Weight is in pounds, and Height4 is inches above 4 feet.
(a) Interpret the results.
(b) You remember from the medical literature that females in the adult population are, on average, shorter than
males and weigh less. You also seem to have heard that females, controlling for height, are supposed to weigh
less than males. To see if this relationship holds for children, you add a binary variable ( DFY) that takes on the
value one for girls and is zero otherwise. You estimate the following regression function:
Weight = 36.27 + 17.33 × DFY + 5.32 × Height4 – 1.83 × (DFY × Height4),
(5.99) (7.36)
(0.80)
(0.90)
R2 = 0.58, SER = 15.41
Are the signs on the new coefficients as expected? Are the new coefficients individually statistically significant?
Write down and sketch the regression function for boys and girls separately.
(c) The medical literature provides you with the following information for median height and weight of nine to twelve-year-olds:
Median Height and Weight for Children, Age 9 -12
Stock/Watson 2e -- CVC2 8/23/06 -- Page 191
9-year-old
10-year-old
11-year-old
12-year-old
Boys Weight
60
70
77
87
Boys Height
52
54
56
58.5
Girls Weight
60
70
80
92
Girls Height
49
52
57
60
Insert two height/weight measures each for boys and girls and see how accurate your predictions are.
(d) The F-statistic for testing that the intercept and slope for boys and girls are identical is 2.92. Find the critical
values at the 5% and 1% level, and make a decision. Allowing for a different intercept with an identical slope
results in a t-statistic for DFY of (–0.35). Having identical intercepts but different slopes gives a t -statistic on
(DFYHeight4) of (–0.35) also. Does this affect your previous conclusion?
(e) Assume that you also wanted to test if the relationship changes by age. Briefly outline how you would
specify the regression including the gender binary variable and an age binary variable ( Older) that takes on a
value of one for eleven to twelve year olds and is zero otherwise. Indicate in a table of two rows and two
columns how the estimated relationship would vary between younger girls, older girls, younger boys, and
older boys.
Answer: (a) For every inch above 4 feet, children of that age group gain roughly 4 pounds. A student who is 4 feet
tall, weighs approximately 45.5 pounds. The regression explains 55 percent of the weight variation in
children of that age group.
(b) Shorter girls weight more than boys, and taller boys weigh more than girls on average. Given your
prior expectations, this is somewhat unexpected. The coefficients involving the binary variable are
statistically significant at conventional levels. The regressions for boys is
Weight = 36.27 + 5.32 × Height4.
For girls it is
Weight = 53.60 + 3.49 × Height4.
(c) The “XX” points mark a female, and the “XY” a male. The regression line predicts a 9 -year-old boy
to weigh 57.2 pounds, an 11-year-old boy to weight 78.8 pounds, a 10 -year- old girl to weigh 67.6 and a
12-year-old girl to weigh 95.5 pounds. Hence the weights are quite close.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 192
(d) The critical value is 3.00 at the 5% level, and 4.61 at the 1% level. Hence you cannot reject equality of
the two coefficients. The previous conclusion is unaffected since the test was for both hypotheses to hold
simultaneously. The t-statistics indicate that imposing the equality and testing for either the slope or the
intercept to be significantly different between boys and girls, does not result in a different coefficient
either.
(e) Weight = 0 + 1 DFY + 2 Height4 + 3 (DFY × Height4)
+ 4 Older + 5 (Older × Height4) + u
^
Boys
Girls
^
^
Younger
^
0 + 2 Height4
^
^
^
^
^
^
^
Older
^
^
^
^
( 0 + 4 ) + ( 2 + 5 ) Height4
^
( 0 + 1 ) + ( 2 + 3 ) Height4 ( 0 + 1 + 4 ) + ( 2 + 3 + 5 ) Height4
4) You have learned that earnings functions are one of the most investigated relationships in economics. These
typically relate the logarithm of earnings to a series of explanatory variables such as education, work
experience, gender, race, etc.
(a) Why do you think that researchers have preferred a log-linear specification over a linear specification? In
addition to the interpretation of the slope coefficients, also think about the distribution of the error term.
(b) To establish age-earnings profiles, you regress ln(Earn) on Age, where Earn is weekly earnings in dollars,
and Age is in years. Plotting the residuals of the regression against age for 1,744 individuals looks as shown in
the figure:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 193
Do you sense a problem?
(c) You decide, given your knowledge of age-earning profiles, to allow the regression line to differ for the
below and above 40 years age category. Accordingly you create a binary variable, Dage, that takes the value
one for age 39 and below, and is zero otherwise. Estimating the earnings equation results in the following
output (using heteroskedasticity-robust standard errors):
LnEarn = 6.92 – 3.13 × Dage – 0.019 × Age + 0.085 × (Dage × Age), R2 =0.20, SER =0.721.
(38.33) (0.22)
(0.004)
(0.005)
Sketch both regression lines: one for the age category 39 years and under, and one for 40 and above. Does it
make sense to have a negative sign on the Age coefficient? Predict the ln( earnings) for a 30 year old and a 50
year old. What is the percentage difference between these two?
(d) The F-statistic for the hypothesis that both slopes and intercepts are the same is 124.43. Can you reject the
null hypothesis?
(e) What other functional forms should you consider?
Answer: (a) The error variance and the variance of the dependent variable are related. Given that the dependent
variable (earnings) is not normally distributed, it is difficult to postulate that the error variance is
normally distributed. Using logarithms results in a distribution that is closer to a normal. In addition,
there seems to be a better fit for the log-linear specification, and the coefficients can be interpreted as
percentage changes.
(b) There seems to be a pattern in the residuals when sorted by age. This suggests a misspecified
functional form.
(c) According to the specification, earnings increase with age until the individual is 39 years old. It is
only from age 40 onwards that the regression predicts a negative relationship between earnings and age.
According to the estimates, a 30-year-old would have ln(earnings) of 5.77, while the predicted value for
a 50-year-old would be 5.97. The difference between the two is approximately 20 percent.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 194
(d) The critical value from the F-table is 4.61 at the 1% level. Hence the null hypothesis is rejected.
(e) Instead of the inverted V-shape for the above regression, an inverted U -shape would most likely
produce a better fit. This can be generated through the use of a polynomial regression model of degree 2.
5) Sports economics typically looks at winning percentages of sports teams as one of various outputs, and
estimates production functions by analyzing the relationship between the winning percentage and inputs. In
Major League Baseball (MLB), the determinants of winning are quality pitching and batting. All 30 MLB teams
for the 1999 season. Pitching quality is approximated by “Team Earned Run Average” (ERA), and hitting
quality by “On Base Plus Slugging Percentage” (OPS).
Summary of the Distribution of Winning Percentage, On Base Plus Slugging Percentage,
and Team Earned Run Average for MLB in 1999
Average Standard
deviation
Team ERA 4.71
OPS
0.778
Winning 0.50
Percentage
0.53
0.034
0.08
Percentile
10%
25%
40%
3.84
0.720
0.40
4.35
0.754
0.43
4.72
0.769
0.46
50%
(median)
4.78
0.780
0.48
60%
75%
90%
4.91
0.790
0.49
5.06
0.798
0.59
5.25
0.820
0.60
Your regression output is:
Winpct = –0.19 – 0.099 × teamera + 1.490 × ops , R2 =0.92, SER = 0.02.
(0.08) (0.008)
(0.126)
(a) Interpret the regression. Are the results statistically significant and important?
(b) There are two leagues in MLB, the American League (AL) and the National League (NL). One major
difference is that the pitcher in the AL does not have to bat. Instead there is a “designated hitter” in the hitting
line-up. You are concerned that, as a result, there is a different effect of pitching and hitting in the AL from the
NL. To test this hypothesis, you allow the AL regression to have a different intercept and different slopes from
the NL regression. You therefore create a binary variable for the American League ( DAL) and estimate the
following specification:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 195
Winpct = – 0.29 + 0.10 × DAL – 0.100 × teamera + 0.008 × (DAL× teamera)
(0.12) (0.24)
(0.008)
(0.018)
+ 1.622*ops – 0.187 *(DAL× ops) , R2 =0.92, SER = 0.02.
(0.163)
(0.160)
What is the regression for winning percentage in the AL and NL? Next, calculate the t -statistics and say
something about the statistical significance of the AL variables. Since you have allowed all slopes and the
intercept to vary between the two leagues, what would the results imply if all coefficients involving DAL were
statistically significant?
(c) You remember that sequentially testing the significance of slope coefficients is not the same as testing for
their significance simultaneously. Hence you ask your regression package to calculate the F -statistic that all
three coefficients involving the binary variable for the AL are zero. Your regression package gives a value of
0.35. Looking at the critical value from you F -table, can you reject the null hypothesis at the 1% level? Should
you worry about the small sample size?
Answer: (a) Lowering the team ERA by one results in a winning percentage increase of roughly ten percent.
Increasing the OPS by 0.1 generates a higher winning percentage of approximately 15 percent. The
regression explains 92 percent of the variation in winning percentages. Both slope coefficients are
statistically significant, and given the small differences in winning percentage, they are also important.
(b) NL: Winpct = – 0.29 – 0.100 × teamera + 1.622 × ops.
AL : Winpct = – 0.19 – 0.092 × teamera + 1.435 × ops.
The t-statistics for all variables involving DAL are, in order of appearance in the above regression, 0.42,
0.44, and –1.17. None of the coefficients is statistically significant individually. If these were statistically
significant, then this would indicate that the coefficients vary between the two leagues. Hence it would
suggest that the introduction of the designated hitter might have changed the relationship.
(c) The critical value of the F-statistic is 3.78 at the 1% level, and hence you cannot reject the null
hypothesis, that all three coefficients are zero. However, the F-statistic is not really distributed as F3, ,
and, as a result, inference is problematic here.
6) There has been much debate about the impact of minimum wages on employment and unemployment. While
most of the focus has been on the employment-to-population ratio of teenagers, you decide to check if
aggregate state unemployment rates have been affected. Your idea is to see if state unemployment rates for the
48 contiguous U.S. states in 1985 can predict the unemployment rate for the same states in 1995, and if this
prediction can be improved upon by entering a binary variable for “high impact” minimum wage states. One
labor economist labeled states as high impact if a large fraction of teenagers was affected by the 1990 and 1991
federal minimum wage increases. Your first regression results in the following output:
85
95
Ur i = 3.19 + 0.27 × Ur i , R2 = 0.21, SER=1.031
(0.56) (0.07)
(a) Sketch the regression line and add a 45 0 line to the graph. Interpret the regression results. What would the
interpretation be if the fitted line coincided with the 45 0 line?
(b) Adding the binary variable DhiImpact by allowing the slope and intercept to differ, results in the following
fitted line:
95
85
85
Ur i = 4.02 + 0.16 × Ur i – 3.25 × DhiImpact + 0.38 × (DhiImpact× Ur i ),
(0.66) (0.09)
(0.89)
(0.11)
R2 = 0.31, SER=0.987
Stock/Watson 2e -- CVC2 8/23/06 -- Page 196
The F-statistic for the null hypothesis that both parameters involving the high impact minimum wage variable
are zero, is 42.16. Can you reject the null hypothesis that both coefficients are zero? Sketch the two regression
lines together with the 450 line and interpret the results again.
(c) To check the robustness of these results, you repeat the exercise using a new binary variable for the
so-called mining state (Dmining), i.e., the eleven states that have at least three percent of their total state
earnings derived from oil, gas extraction, and coal mining, in the 1980s. This results in the following output:
95
85
85
Ur i = 4.04 + 0.15× Ur i – 2.92 × Dmining + 0.37 × (Dmining × Ur i ),
(0.65) (0.09)
(0.90)
(0.10)
R2 = 0.31, SER=0.997
How confident are you that the previously found effect is due to minimum wages?
Answer: (a) An increase in the 1985 unemployment rate results in an increase in the unemployment rate in 1995 of
0.27 percent. Put differently, if one state had a one percent higher unemployment rate in 1985 than
another state, then this difference would shrink, on average, to 0.27 percent in 1995. 21 percent of the
variation in 1995 state unemployment rates is explained by the regression. If the fitted line coincided
with the 450 line, then the unemployment rates in 1995 would remain unchanged when compared to
1985. The estimated regression implies, unrealistically, mean reversion in the unemployment rates.
(b) The critical value for the F-statistic is 4.61 at the 1% level and hence the null hypothesis that both
coefficients are zero in the population is rejected. (The sample size is small, however, so the distribution
of the test statistic is not really known.) The intercept for the high -impact states is smaller and the slope
is steeper. This suggests that for high-impact states there is less of a mean reversion effect present: if
high-impact states had high 1985 unemployment rates, then they are expected to have higher
unemployment rates in 1995 when compared to a low-impact state. High and low unemployment rates
are thereby more persistent for high-impact states.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 197
(c) The results here are similar to those in (b) in that the regression for the mining states is steeper than
the one for the other states. Perhaps omitted variables play a role here, such as relative (oil) price shocks
that affect some states more than others. Oil prices fell considerably over the time period and it is
possible that the high-impact binary variable coefficient picks up the effect of omitted variables.
Including more explanatory variables would be desirable.
7) Labor economists have extensively researched the determinants of earnings. Investment in human capital,
measured in years of education, and on the job training are some of the most important explanatory variables
in this research. You decide to apply earnings functions to the field of sports economics by finding the
determinants for baseball pitcher salaries. You collect data on 455 pitchers for the 1998 baseball season and
estimate the following equation using OLS and heteroskedasticity -robust standard errors:
Ln(Earni) = 12.45 + 0.052 × Years + 0.00089 × Innings + 0.0032 × Saves
(0.08) (0.026)
(0.00020)
(0.0018)
– 0.0085 × ERA, R2 =0.45, SER=0.874
(0.0168)
where Earn is annual salary in dollars, Years is number of years in the major leagues, Innings is number of
innings pitched during the career before the 1998 season, Saves is number of saves during the career before the
1998 season, and ERA is the earned run average before the 1998 season.
(a) What happens to earnings when the pitcher stays in the league for one additional year? Compare the
salaries of two relievers, one with 10 more saves than the other. What effect does pitching 100 more innings
have on the salary of the pitcher? What effect does reducing his ERA by 1.5? Do the signs correspond to your
expectations? Explain.
(b) Are the individual coefficients statistically significant? Indicate the level of significance you used and the
type of alternative hypothesis you considered.
(c) Although you are quite impressed with the fit of the regression, someone suggests that you should include
the square of years and innings as additional explanatory variables. Your results change as follows:
Ln(Earni) = 12.15 + 0.160 × Years + 0.00268 × Innings + 0.0063 × Saves
(0.05) (0.039)
(0.00030)
(0.0010)
- 0.0584 × ERA – 0.0165 × Years2 - 0.00000045 × Innings2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 198
(0.0165)
(0.0026)
(0.00000012)
R2 =0.69, SER=0.666
What is her reasoning? Are the coefficients of the quadratic terms statistically significant? Are they meaningful?
(d) Calculate the effect of moving from two to three years, as opposed to from 12 to 13 years.
(e) You also decide to test the specification for stability across leagues (National League and American League)
by including a dummy variable for the National League and allowing the intercept and all slopes to differ. The
resulting F-statistic for restricting all coefficients that involve the National League dummy variable to zero, is
0.40. Compare this to the relevant critical value from the table and decide whether or not these additional
variables should be included.
Answer: (a) For staying an additional year in the league, the pitcher receives a 5.2 percent increase in earnings. On
average, the reliever with 10 more saves ends up with 3.2 percent higher earnings. Pitching100
additional innings results in 8.9 percent higher earnings, and lowering the ERA by 1.5 increases earnings
by 1.3 percent. ERA, innings pitched, and number of saves are all quality of input indicators and should
therefore have the signs as in the regression above. Years in the major leagues stands as a proxy for on
the job training and should therefore carry a positive sign.
(b) Given that there is prior expectation on the sign of the coefficients, you should conduct a one-sided
hypothesis test. All variables with the exception of ERA carry statistically significant coefficients at the
5% level.
(c) Allowing for the quadratic terms to enter results in an inverted U-shape for the relationship between
the log of earnings, and both years in the league and innings pitched. Both coefficients are highly
significant and have resulted also in a significant ERA coefficient.
(d) Having played for two years and staying for one more year in the league results in an earnings
increase of 7.8 percent, while staying for an additional year after 12 years in the majors results in a
predicted decrease of 25.3 percent.
(e) F7, = 2.01 at the 5% level. Hence you cannot reject the null hypothesis of equality of coefficients
across leagues.
8) After analyzing the age-earnings profile for 1,744 workers as shown in the figure, it becomes clear to you that
the relationship cannot be approximately linear.
You estimate the following polynomial regression model, controlling for the effect of gender by using a binary
variable that takes on the value of one for females and is zero otherwise:
Earn = –795.90 + 82.93 × Age – 1.69 × Age2 + 0.015 × Age3 – 0.0005 × Age4
(283.11) (29.29)
(1.06)
(0.016)
(0.0009)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 199
– 163.19 Female, R2 =0.225, SER=259.78
(12.45)
(a) Test for the significance of the Age4 coefficient. Describe the general strategy to determine the appropriate
degree of the polynomial.
(b) You run two further regressions. Present an argument as to which one you should use for further analysis.
Earn = – 683.21 + 65.83 × Age – 1.05 × Age2 + 0.005 × Age3
(120.13) (9.27)
(0.22)
(0.002)
– 163.23 Female, R2 =0.225, SER=259.73
(12.45)
Earn = – 344.88 + 41.48 × Age – 0.45× Age2
(51.58) (2.64)
(0.03)
– 163.81 Female, R2 =0.222, SER=260.22
(12.47)
(c) Sketch the graph of fitted earnings of males against age of your preferred regression. Does this make sense?
Are you concerned about the negative coefficient on the regression intercept? What is the implication for
female earners in this sample?
(d) Explain how you would calculate the effect of changing age by one year on earnings, holding constant the
gender variable. Finally, briefly describe how you would calculate the standard errors of the estimated effect.
Answer: (a) The coefficient has a t-statistic of 0.56 and hence is not statistically significant at conventional levels.
The strategy is described in section 6.2 of the textbook. Considering first a polynomial of degree r, the
coefficient associated with the largest value of r is tested for significance. From there, a sequential
hypothesis testing procedure should be followed.
(b) The coefficient of Age3 is statistically significant at the 1% level using a one-sided hypothesis. The
polynomial of degree three seems therefore the appropriate regression.
(c)
There is little difference between the two fits for values between the age of 25 and 60. The inverted
U-shape is well known to exist for age-earnings profiles, and hence the plot makes sense. There is no
interpretation for the intercept, since there is no data close to the origin. Females earn significantly less at
every age level.
(d) Since this is a nonlinear relationship, the effect will depend on the age level. This is described in
section 6.1 of the textbook. In essence, the predicted earnings value for one age level has to be computed
Stock/Watson 2e -- CVC2 8/23/06 -- Page 200
first. Next, the same has to be done for the age level plus one. Finally the two values are differenced to
find the change in earnings associated with the age level.
For the polynomial of degree 3, the first task is to consider the estimated change in earnings associated
^ ^
^
with a change in age by one year, say from 30 to 31. This is given by Y = 1 × (31- 30) + 2 (312 - 302 ) +
^
3 (313 - 303 ) or
^
SE( Y =
^
^
^
^
Y = 1 + 61 2 + 2791 3 . The standard error of the estimated effect is then given from
^
^
^
^
^
^
^
Y
, where F = [( 1 + 61 2 + 2791 3 ) / SE( 1 + 61 2 + 2791 3 ]2 . A 95% confidence interval
F
^
^
^
^
^
^
for the change in the expected value of earnings is ( 1 + 61 2 + 2791 3 ) ± 1.96 × SE( 1 + 61 2 + 2791 3 ).
Obviously these expressions get quite complicated once you go beyond a quadratic.
9) Earnings functions attempt to find the determinants of earnings, using both continuous and binary variables.
One of the central questions analyzed in this relationship is the returns to education.
(a) Collecting data from 253 individuals, you estimate the following relationship
ln(Earni) = 0.54 + 0.083 × Educ, R2 = 0.20, SER = 0.445
(0.14) (0.011)
where Earn is average hourly earnings and Educ is years of education.
What is the effect of an additional year of schooling? If you had a strong belief that years of high school
education were different from college education, how would you modify the equation? What if your theory
suggested that there was a “diploma effect”?
(b) You read in the literature that there should also be returns to on -the-job training. To approximate
on-the-job training, researchers often use the so called Mincer or potential experience variable, which is
defined as Exper = Age – Educ – 6. Explain the reasoning behind this approximation. Is it likely to resemble
years of employment for various sub-groups of the labor force?
(c) You incorporate the experience variable into your original regression
ln(Earni) = -0.01 + 0.101 × Educ + 0.033 × Exper – 0.0005 × Exper2 ,
(0.16) (0.012)
(0.006)
(0.0001)
R2 = 0.34, SER = 0.405
What is the effect of an additional year of experience for a person who is 40 years old and had 12 years of
education? What about for a person who is 60 years old with the same education background?
(d) Test for the significance of each of the coefficients of the added variables. Why has the coefficient on
education changed so little? Sketch the age-(log)earnings profile for workers with 8 years of education and 16
years of education.
(e) You want to find the effect of introducing two variables, gender and marital status. Accordingly you specify
a binary variable that takes on the value of one for females and is zero otherwise ( Female), and another binary
variable that is one if the worker is married but is zero otherwise (Married). Adding these variables to the
regressors results in:
ln(Earni) = 0.21 + 0.093 × Educ + 0.032 × Exper – 0.0005 ×Exper2
(0.16) (0.012)
(0.006)
(0.0001)
- 0.289 × Female + 0.062 Married,
(0.049)
(0.056)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 201
R2 = 0.43, SER = 0.378
Are the coefficients of the two added binary variables individually statistically significant? Are they
economically important? In percentage terms, how much less do females earn per hour, controlling for
education and experience? How much more do married people make? What is the percentage difference in
earnings between a single male and a married female? What is the marriage differential between males and
females?
(f) In your final specification, you allow for the binary variables to interact. The results are as follows:
ln(Earni) = 0.14 + 0.093 × Educ + 0.032 × Exper – 0.0005 × Exper2
(0.16) (0.011)
(0.006)
(0.001)
- 0.158 × Female + 0.173 × Married – 0.218 × (Female × Married),
(0.075)
(0.080)
(0.097)
R2 = 0.44, SER = 0.375
Repeat the exercise in (e) of calculating the various percentage differences between gender and marital status.
Answer: (a) One additional year of education carries an 8.3 percent increase, or a return, on earnings. You would
need additional data to see if this coefficient was different for high school versus college education.
Including both variables in the regression would then allow you to test for equality of the coefficients. A
“diploma effect” could be studied by creating a binary variable for a high school diploma, a junior
college diploma, a B.A. or B.Sc. diploma, and so forth.
(b) The idea is that everybody works except in the first six years of life and during the time spent in
school/university for education. This approximation will work better for people with a strong
attachment to the labor force. It will not work well for females and those who are frequently
unemployed or out of the workforce.
(c) For the first person, the Exper variable increases from 22 to 23, and results in a 1.1 percent earnings
increase. For the 60 year old, there is an expected decrease of 1 percent.
(d) Both coefficients are highly significant using conventional levels of significance. The fact that the
coefficient on the education variable hardly changed suggests that education and experience are not
highly correlated.
(e) The coefficient for the female binary variable is statistically significant even at the 1% level. The
coefficient for the married binary variable only has a t-statistic of 1.11 and is not statistically significant
at the 10% level. Both coefficients indicate economic importance, since females make approximately 29
percent less than males and married people earn roughly 6 percent more. A married female earns
Stock/Watson 2e -- CVC2 8/23/06 -- Page 202
roughly 23 percent less than a single male. Married females earn 29 percent less than married males, the
same percentage that single females earn less than single males.
(f) The default is the single male. Single females earn 15.8 percent less. Married males earn 17.3 percent
more. Married females earn 20.3 percent less. Comparing married females with married males now
results in a percentage differential of 37.6 percent in favor of the males.
10) One of the most frequently estimated equations in the macroeconomics growth literature are so -called
convergence regressions. In essence the average per capita income growth rate is regressed on the
beginning-of-period per capita income level to see if countries that were further behind initially, grew faster.
Some macroeconomic models make this prediction, once other variables are controlled for. To investigate this
matter, you collect data from 104 countries for the sample period 1960 -1990 and estimate the following
relationship (numbers in parentheses are for heteroskedasticity-robust standard errors):
g6090 = 0.020 – 0.360 × gpop + 0.00 4 × Educ – 0.053×RelProd 60, R2 =0.332, SER = 0.013
(0.009) (0.241)
(0.001)
(0.009)
where g6090 is the growth rate of GDP per worker for the 1960-1990 sample period, RelProd 60 is the initial
starting level of GDP per worker relative to the United States in 1960, gpop is the average population growth
rate of the country, and Educ is educational attainment in years for 1985.
(a) What is the effect of an increase of 5 years in educational attainment? What would happen if a country
could implement policies to cut population growth by one percent? Are all coefficients significant at the 5%
level? If one of the coefficients is not significant, should you automatically eliminate its variable from the list of
explanatory variables?
(b) The coefficient on the initial condition has to be significantly negative to suggest conditional convergence.
Furthermore, the larger this coefficient, in absolute terms, the faster the convergence will take place. It has been
suggested to you to interact education with the initial condition to test for additional effects of education on
growth. To test for this possibility, you estimate the following regression:
g6090 = 0.015 -0.323 × gpop + 0.005 × Educ –0.051×RelProd60
(0.009) (0.238)
(0.001)
(0.013)
–0.0028 × (EducRelProd 60), R2 =0.346, SER = 0.013
(0.0015)
Write down the effect of an additional year of education on growth. West Germany has a value for RelProd 60 of
0.57, while Brazil’s value is 0.23. What is the predicted growth rate effect of adding one year of education in
both countries? Does this predicted growth rate make sense?
(c) What is the implication for the speed of convergence? Is the interaction effect statistically significant?
(d) Convergence regressions are basically of the type
ln Yt = 0 – 1 ln Y0
where might be the change over a longer time period, 30 years, say, and the average growth rate is used on
the left-hand side. You note that the equation can be rewritten as
ln Yt = 0 – (1 – 1 ) ln Y0
Over a century ago, Sir Francis Galton first coined the term “regression” by analyzing the relationship between
the height of children and the height of their parents. Estimating a function of the type above, he found a
positive intercept and a slope between zero and one. He therefore concluded that heights would revert to the
mean. Since ultimately this would imply the height of the population being the same, his result has become
known as “Galton’s Fallacy.” Your estimate of 1 above is approximately 0.05. Do you see a parallel to Galton
Stock/Watson 2e -- CVC2 8/23/06 -- Page 203
’s Fallacy?
Answer: (a) Increasing educational attainment by 5 years results in an increase of productivity growth of 2
percent. Decreasing the population growth rate by one percent increases productivity growth by 0.4
percent. All coefficients are statistically significant at the 5% level with the exception of population
growth. You should not eliminate a variable simply because it is not statistically significant. It is better to
report the statistics and let the reader decide.
(b)
g6090
= 0.005 - 0.0028 RelProd60. For West Germany, the effect is 0.3 percent, while for Brazil it is
Educ
0.4 percent. These are small gains, but they accumulate over time.
g6090
(c)
= -0.051 - 0.0028Educ, which therefore depends on educational attainment. Countries
RelProd60
with higher educational attainment will converge faster. The coefficient has a t-statistic of 1.87 and is
therefore statistically significant at the 5% level using a one-sided hypothesis test.
(d) The above regressions generate a mean reversion outcome. Interpreted literally, the implication is
that all countries end up with the same productivity or per capita income, just as all persons would be of
the same height. It can be shown that Galton’s Fallacy is the result of errors-in-variables which biases
the slope coefficient downward. This topic is covered in Chapter 7. The solution is to use instrumental
variable techniques, also discussed in Chapter 10. The literature in this area has done so, and the
convergence result persists.
11) Pages 283-284 in your textbook contain an analysis of the “Return to Education and the Gender Gap.” Column
(4) in Table 8.1 displays regression results using the 2009 Current Population Survey. The equation below
shows the regression result for the same specification, but using the 2005 Current Population Survey. Interpret
the major results.
ln earnings = 1.215 + 0.0899×educ - 0.521×DFemme+ 0.0180×(DFemme×educ)
(0.018) (0.0011)
(0.022)
(0.0016)
+ 0.0232×exper - 0.000368×exper2 - 0.058×Midwest - 0.0098×South - 0.030×West
(0.0008)
(0.000018)
(0.006)
(0.0078)
(0.0030)
Answer: The return to education for males is approximately 9% and its coefficient has a t-statistic of 11.25. For
females, the return is slightly higher, approximately 11%. Since the binary variable for females is
interacted with the number of years of education, the gender gap depends on the number of years of
education. For the typical high school graduate (12 years of education), the gender gap is approximately
27%, while for the typical college graduate (16 years of education) the gender gap narrows to 19%. The
potential experience variable enters in an inverted U-shape, which is to be expected given the shape of
age-earnings profiles and the fact that potential experience depends on the age of the individual. There
is a declining marginal value for each year of potential experience until it eventually becomes negative.
Northeast is the omitted region, and all other regions have lower (log) earnings, ranging from 0.8% in the
South to 5.8% in the Midwest. All coefficients are statistically significant.
8.3 Mathematical and Graphical Problems
1) Give at least three examples from economics where you expect some nonlinearity in the relationship between
variables. Interpret the slope in each case.
Answer: Answers will vary by student. Typical answers involve the Cobb-Douglas production function, the
Phillips curve, earnings functions, and (given the textbook discussion) student performance and income.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 204
2) Suggest a transformation in the variables that will linearize the deterministic part of the population regression
functions below. Write the resulting regression function in a form that can be estimated by using OLS.
1
2
(a) Yi = 0 X 1i X 2i
(b) Yi =
(c) Yi =
Xi
0 + 1 Xi
e 0 + 1 X1
1+ e 0 + 1 X1
1
(d) Yi = 0 X 1i e 2 2 X2i
Answer: (a) ln(Yi) = ln( 0 ) +
1
1
(b)
= 0
+ 1
Yi
Xi
(c) ln
Yi
1-Yi
=
0+
1 ln(X1i) +
2 ln(X2i)
1 Xi
(d) ln(Yi) = ln( 0 ) +
1 ln(X1i) +
2 X2i
3) Indicate whether or not you can linearize the regression functions below so that OLS estimation methods can
be applied:
(a) Yi = e 0 + 1 Xi+ui
(b) Yi =
1
2
1 X 1i X 2i + ui
Answer: (a) The function can be linearized by taking logs on both sides.
(b) The function cannot be linearized due to the additive error term.
4) Choose at least three different nonlinear functional forms of a single independent variable and sketch the
relationship between the dependent and independent variable.
Answer: Answers will vary by student. Most commonly used forms are the quadratic regression, the inverse (in
X) regression, and the log-log model.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 205
5) In the case of perfect multicollinearity, OLS is unable to estimate the slope coefficients of the variables involved.
2
Assume that you have included both X1 and X2 as explanatory variables, and that X2 = X , so that there is an
1
exact relationship between two explanatory variables. Does this pose a problem for estimation?
Answer: There is no problem for estimation, since the second explanatory variable is not linearly related to the
first. This is an example of a polynomial regression model of degree 2, which is frequently estimated in
econometrics
Stock/Watson 2e -- CVC2 8/23/06 -- Page 206
6) The figure shows is a plot and a fitted linear regression line of the age -earnings profile of 1,744 individuals,
taken from the Current Population Survey.
(a) Describe the problems in predicting earnings using the fitted line. What would the pattern of the residuals
look like for the age category under 40?
(b) What alternative functional form might fit the data better?
(c) What other variables might you want to consider in specifying the determinants of earnings?
Answer: (a) There would be many overpredictions for this age category under 40, and hence more negative
residuals.
(b) It would be better to fit a quadratic here, i.e., a polynomial regression model, which would produce
an inverted U-shape.
(c) Answers will vary by students, but education, gender, race, tenure with an employer, professional
choice, and ability are typically present in answers.
7) (Requires Calculus) Show that for the log-log model the slope coefficient is the elasticity.
Answer: Consider the deterministic part Y = AX 1 . Then ln(Y) = 0 + 1 ln(X), where 0 = ln(A). Now
=
1
Y
Y
1
X
Y
=
ln(Y)
= 1
ln(X)
Y X
Y
. Alternatively you can derive the same result by taking the derivative
from Y = A
X Y
X
X 1.
8) Assume that you had data for a cross-section of 100 households with data on consumption and personal
disposable income. If you fit a linear regression function regressing consumption on disposable income, what
prior expectations do you have about the slope and the intercept? The slope of this regression function is called
the “marginal propensity to consume.” If, instead, you fit a log-log model, then what is the interpretation of
the slope? Do you have any prior expectation about its size?
Answer: For the log-log specification, the slope is the elasticity. Since there are many theories that predict a
constant average propensity to consume, the elasticity should equal one.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 207
9) The textbook shows that ln(x +
ln(1 + y)
x
. Show that this is equivalent to the following approximation
x
x) – ln(x)
y if y is small. You use this idea to estimate a demand for money function, which is of the form m =
0 × GDP 1 ×, (1+ R) 1 × eu where m is the quantity of (real) money, GDP is the value of (real) Gross Domestic
Product, and R is the nominal interest rate. You collect the quarterly data from the Federal Reserve Bank of St.
Louis data bank (“FRED”), which lists the money supply and GDP in billions of dollars, prices as an index, and
nominal interest rates in percentage points per year
You generate the variables in your regression program as follows: m = (money supply)/price index; GDP =
(Gross Domestic Product/Price Index), and R = nominal interest rate in percentage points per annum. Next you
perform the log-transformations on the real money supply, real GDP, and on (1+R). Can you for see a problem
in using this transformation?
Answer: ln(x +
x) - ln(x) = ln
x+ x
x
x
. Let y = 0.05, then ln(1 + y) = 0.049
= ln 1+
= ln(1+ y), where y =
x
x
x
0.05. Note that this approximation does not hold well for larger fractions, such as 0.60. The interest
rate is listed in percentage points. Entering R as 5, rather than 0.05, makes 2 not equal a semi-elasticity.
10) You have estimated an earnings function, where you regressed the log of earnings on a set of continuous
explanatory variables (in levels) and two binary variables, one for gender and the other for marital status. One
of the explanatory variables is education.
(a) Interpret the education coefficient.
(b) Next, specify the binary variables and an equation, where the default is a single male, without allowing for
interaction between marital status and gender. Indicate the coefficients that measure the effect of a single male,
single female, married male, and married female.
(c) Finally allow for an interaction between the gender and marital status binary variables. Repeat the exercise
of writing down the various effects based on the female/male and single/married status. Why is the latter
approach more general than the former?
Answer: (a) The coefficient on education gives you the return to education, i.e., if education increased by one
year, then by how many percent do earnings increase?
(b) Let DGender equal one if the individual is a female, and be zero otherwise. DMarried takes on a value
of one if the individual is married and is zero otherwise. The regression is
^
ln Earn = 0 +
^
1 DGender +
^
^
2 DMarried + ...
^
^
^
^
^
^
^
^
^
^
Single male: 0 ; single female: 0 + 1 ; married male: 0 + 2 ; married female: 0 + 1 + 2 .
^
(c) ln Earn = 0 +
^
^
1 DGender +
^
^
^
2 DMarried + 3 (DGender × DMarried + ...
^
^
^
^
Single male: 0 ; single female: 0 + 1 ; married male: 0 + 2 ; married female: 0 + 1 + 2 + 3 . This
approach is more general because it allows the effect of being married and female to be different from
being married and male. In (b), both females and males were faced with identical effects from being
^
^
married, 2 . In (c), this effect differs due to the additional coefficient 3 .
Stock/Watson 2e -- CVC2 8/23/06 -- Page 208
11) You have been told that the money demand function in the United States has been unstable since the late 1970.
To investigate this problem, you collect data on the real money supply (m=M/P; where M is M1 and P is the
GDP deflator), (real) gross domestic product (GDP) and the nominal interest rate (R). Next you consider
estimating the demand for money using the following alternative functional forms:
(i) m = 0 + 1 × GDP + 2 x R+ u
(ii) m =
(iii) m =
0 × GDP 1 x R 2 × eu
0 × GDP 1 x 1+ R 2 × eu
Give an interpretation for 1 and 2 in each case. How would you calculate the income elasticity in case (i)?
Answer: In (i), both coefficients show the effect of a unit increase of the respective variables on the demand for
^
^
money. In (ii), the two coefficients are elasticities. In (iii), 1 is an elasticity, whereas 2 is often referred
^
^
^
to as a “semi-elasticity.” The specification becomes ln(m) = ln( 0 ) + 1 ln(GDP) + 2 R + u , since ln(1+ R)
m
m
^
, that is, it indicates by how many percent the (real) money demand
R for small R. Hence 2 =
R
will increase for a percentage change in the interest rate.
12) You have collected data for a cross-section of countries in two time periods, 1960 and 1997, say. Your task is to
find the determinants for the Wealth of a Nation (per capita income) and you believe that there are three major
determinants: investment in physical capital in both time periods (X1,T and X1,0 ), investment in human capital
or education (X2,T and X2,0 ), and per capita income in the initial period (Y0 ). You run the following
regression:
ln(YT) = 0 + 1 X1,T + 2 X1,0 + 3 X2,T + 4 X1,0 + ln(Y0 ) + uT
One of your peers suggests that instead, you should run the growth rate in per capita income over the two
periods on the change in physical and human capital. For those results to be a parsimonious presentation of
your initial regression, what three restrictions would have to hold? How would you test for these? The same
person also points out to you that the intercept vanishes in equations where the data is differenced. Is that true?
Answer: The regression using growth rates is as follows:
[ln(YT) - ln(Y0 )] =
0 + 1 (X1,T - X1,0 )+ 3 (X2,T - X1,0)+ ( 5 - 1) ln(Y0 ) + uT
For this to be a parsimonious presentation of the initial regression, the following two restrictions must
hold: 1 = - 2 , and 3 = - 4 .The use of an F-test is required here to test the restrictions simultaneously.
The intercept is still present in the equation, and the assertion therefore cannot be true.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 209
13) Earnings functions attempt to predict the log of earnings from a set of explanatory variables, both binary and
continuous. You have allowed for an interaction between two continuous variables: education and tenure with
the current employer. Your estimated equation is of the following type:
^
^
^
^
^
ln(Earn) = 0 + 1 × Femme + 2 × Educ + 3 × Tenure + 4 x (Educ × Tenure) + ···
where Femme is a binary variable taking on the value of one for females and is zero otherwise, Educ is the
number of years of education, and tenure is continuous years of work with the current employer. What is the
effect of an additional year of education on earnings (“returns to education”) for men? For women? If you
allowed for the returns to education to differ for males and females, how would you respecify the above
equation? What is the effect of an additional year of tenure with a current employer on earnings?
Answer: For both males and females, the effect of an additional year of education is
^
^
ln(Earn)
= 2 + 4 x Tenure, and hence depends on continuous years of work with the current
Educ
employer. To allow the effect to be different for males and females, an interaction variable between
Femme and Educ would have to be introduced. The return to tenure with a current employer is
^
^
ln(Earn)
= 3 + 4 x Educ.
Tenure
14) Many countries that experience hyperinflation do not have market-determined interest rates. As a result, some
authors have substituted future inflation rates into money demand equations of the following type as a proxy:
m = 0 × (1+
ln P) 1 × eu
(m is real money, and P is the consumer price index).
Income is typically omitted since movements in it are dwarfed by money growth and the inflation rate.
Authors have then interpreted 1 as the “semi-elasticity” of the inflation rate. Do you see any problems with
this interpretation?
Answer: Linearizing the above equation results in ln(m) = ln( 0 ) + 1 ln(1+ ln P) + u. Now this simplifies to
ln(m) = ln( 0 ) + 1 ln P + u if ln P, the inflation rate, is small. In that case, 1 represents the effect of a
percent increase in the inflation rate on the demand for money. However, if the inflation rate is not
small, as is the case in hyperinflations, then the approximation does not hold any longer.
15) To investigate whether or not there is discrimination against a sub-group of individuals, you regress the log of
earnings on determining variables, such as education, work experience, etc., and a binary variable which takes
on the value of one for individuals in that sub-group and is zero otherwise. You consider two possible
specifications. First you run two separate regressions, one for the observations that include the sub -group and
one for the others. Second, you run a single regression, but allow for a binary variable to appear in the
regression. Your professor suggests that the second equation is better for the task at hand, as long as you allow
for a shift in both the intercept and the slopes. Explain her reasoning.
Answer: By running the regression over the entire sample period, you can test for equality of coefficients, or
alternatively, for the significance of binary variables coefficients. Also, the combined sample has more
observations, and hence smaller standard errors.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 210
16) Being a competitive female swimmer, you wonder if women will ever be able to beat the time of the male gold
medal winner. To investigate this question, you collect data for the Olympic Games since 1910. At first you
consider including various distances, a binary variable for Mark Spitz, and another binary variable for the
arrival and presence of East German female swimmers, but in the end decide on a simple linear regression.
Your dependent variable is the ratio of the fastest women’s time to the fastest men’s time in the 100 m
backstroke, and the explanatory variable is the year of the Olympics. The regression result is as follows,
TFoverM = 4.42 – 0.0017 × Olympics,
where TFoverM is the relative time of the gold medal winner, and Olympics is the year of the Olympic Games.
What is your prediction when females will catch up to men in this discipline? Does this sound plausible? What
other functional form might you want to consider?
Answer: According to the above regression, women will catch up in the year 2011.76 or 2012. (This happens to be
an Olympics year.) This is not plausible for swimming, and a better functional form would be TFoverM =
1
0 + 1 Olympics .
17) Sketch for the log-log model what the relationship between Y and X looks like for various parameter values of
the slope, i.e., 1 > 1; 0 < 1 < 1; 1 = (-1).
Answer:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 211
18) Show that for the following regression model
Yt = e 0 + 1 × t + u
where t is a time trend, which takes on the values 1, 2, …,T, 1 represents the instantaneous (“continuous
compounding”) growth rate. Show how this rate is related to the proportionate rate of growth, which is
calculated from the relationship
Yt = Y0 × (1 + g)t
when time is measured in discrete intervals.
Answer: ln(Yt) = 0 + 1 × t + u and hence 1 =
ln(Yt)
=
t
1
Y
Y t
t
. From Yt = Y0 × (1 + g)t, we get ln(Yt) = ln(Y0 ) +
ln(1 + g)t = 0 + 1 t, where 0 = ln(Y0 ) and 1 = (1 + g) g for small g. Hence if g is small, then
regressing the log of a variable on time generates a slope coefficient which is approximately the
proportionate rate of growth for small growth rates.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 212
19) Your task is to estimate the ice cream sales for a certain chain in New England. The company makes available
to you quarterly ice cream sales (Y) and informs you that the price per gallon has approximately remained
constant over the sample period. You gather information on average daily temperatures ( X) during these
quarters and regress Y on X, adding seasonal binary variables for spring, summer, and fall. These variables are
constructed as follows: DSpring takes on a value of 1 during the spring and is zero otherwise, DSummer takes
on a value of 1 during the summer, etc. Specify three regression functions where the following conditions hold:
the relationship between Y and X is (i) forced to be the same for each quarter; (ii) allowed to have different
intercepts each season; (iii) allowed to have varying slopes and intercepts each season. Sketch the difference
between (i) and (ii). How would you test which model fits the data the best?
Answer: (i) Yi = 0 + 1 Xi + ui ;
(ii) Yi = 0 + 1 Xi + 2 DSpring + 3 DSummer + 4 DFall + ui ;
(iii) Yi = 0 + 1 Xi + 2 DSpring + 3 DSummer + 4 DFall
+ 5 (DSpring × Xi) + 6 (DSummer × Xi ) + 7 (DFall × Xi) + ui ;
(iii) is the most general of the models, the others are nested. Hence you can use the F-test to see if certain
restrictions hold. For example, (i) is a parsimonious representation of (iii) if all coefficients involving the
seasonal binary variables are simultaneously equal to zero.
20) In estimating the original relationship between money wage growth and the unemployment rate, Phillips used
United Kingdom data from 1861 to 1913 to fit a curve of the following functional form
·
W
(
+ 0 ) = 1 × ur 2 × eu,
W
·
W
is the percentage change in money wages and ur is the unemployment rate. Sketch the function.
where
W
What role does 0 play? Can you find a linear transformation that allows you to estimate the above function
using OLS? If, after taking logarithms on both sides of the equation, you tried to estimate 1 and 2 using OLS
by choosing different values for 0 by “trial and error procedure” (Phillips’s words), what sort of problem
might you run into with the left-hand side variable for some of the observations?
Answer: Given the shape of the Phillips curve, 2 will be negative and 1 will be positive. Hence for large values
·
W
. Taking logarithms on
of 1 × ur 2 will be approximately zero, - 0 and is the lower asymptote of
W
·
W
both sides results in ln(
+ 0 ) = ln( 1 )+ 2 ln(ur) + u, which cannot be estimated by OLS due to the
W
·
W
form of the dependent variable. Choosing different values for 0 can result in situations where (
+
W
0 ) is negative and hence is not defined.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 213
^
21) Using a spreadsheet program such as Excel, plot the following logistic regression function with a single X, Yi =
^
^
1
, where 0 = - 4.13 and 1 = 5.37. Enter values of X in the first column starting from 0 and then
^ ^
1+e-( 0 + 1 Xi)
incrementing these by 0.1 until you reach 2.0. Then enter the logistic function formula in the next column.
Finally produce a scatter plot, connecting the predicted values with a line.
Answer:
22) Table 8.1 on page 284 of your textbook displays the following estimated earnings function in column (4):
ln earnings = 1.503 + 0.1032×educ - 0.451×DFemme+ 0.0143×(DFemme×educ)
(0.023) (0.0012)
(0.024)
(0.0017)
+ 0.0232×exper - 0.000368×exper2 - 0.058×Midwest - 0.0098×South - 0.030×West
(0.0012)
(0.000023)
(0.006)
(0.006)
(0.007)
n = 52.790, R2 = 0.267
Given that the potential experience variable (exper) is defined as (Age-Education-6) find the age at
which individuals with a high school degree (12 years of education) and with a college degree (16
years of education) have maximum earnings, holding all other factors constant.
Answer: The answer can be found either by using calculus or graphical/spreadsheet techniques. Maximum
earnings occurs at potential experience of 31.5. Hence with 12 years of education, the maximum earnings
happen at age 49.5, while for a person with 16 years of education these occur at 53.5 years. (Since taking
logarithms results in a monotonistic transformation of the original data, the same results hold for the log
of earnings as for earnings).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 214
23) Consider a typical beta convergence regression function from macroeconomics, where the growth of a country s
per capita income is regressed on the initial level of per capita income and various other economic and
socio-economic variables. Assume that two of these variables are the average number of years of education in
the specific country and a binary variable which indicates whether or not the country experienced a significant
number of years of civil war/unrest. Explain why it would make sense to have these two variables enter
separately and also why you should use an interaction term. What signs would you expect on the three
coefficients?
Answer: Simple extensions of the standard neoclassical growth model suggest that the number of years of
education have a positive effect on conditional growth in the wealth of a nation (per capita income). A
civil war would have a negative effect on the investment/output ratio (savings rate) and you would
therefore expect a negative sign on the coefficient. However, it is important to interact the variables
because no matter how much education the average person has, there will be virtually no investment in
a country during a civil war. Hence you would expect a negative sign, which would indicate the effect
that a civil war has on the education effect.
24) Consider the following regression of testscores on an intercept, a binary variable that equals 1 if the
student-teacher ratio is 20 or more (HiSTR) and another binary variable that equals 1 if the percentage of
English learners is 10% or more (HiEL).
TestScore = 664/1 - 1.9×HiSTR - 18.2×HiEL - 3.5×(HiSTR×HiEL)
Using the two by two table below, fill in the expected testscores of a student with various combinations of the
high/low student teacher ratio and the high/low percent of English lerners.
STR < 20
STR
20
EL < 10%
EL
10%
STR < 20
Answer:
EL < 10%
EL
10%
664.1
645.9
STR
662.2
640.5
Stock/Watson 2e -- CVC2 8/23/06 -- Page 215
20
Chapter 9 Assessing Studies Based on Multiple Regression
9.1 Multiple Choice
1) The analysis is externally valid if
A) the statistical inferences about causal effects are valid for the population being studied.
B) the study has passed a double blind refereeing process for a journal.
C) its inferences and conclusions can be generalized from the population and setting studied to other
populations and settings.
D) some committee outside the author’s department has validated the findings.
Answer: C
2) By including another variable in the regression, you will
A) decrease the regression R2 if that variable is important.
B) eliminate the possibility of omitted variable bias from excluding that variable.
C) look at the t-statistic of the coefficient of that variable and include the variable only if the coefficient is
statistically significant at the 1% level.
D) decrease the variance of the estimator of the coefficients of interest.
Answer: B
3) Errors-in-variables bias
^
A) is present when the probability limit of the OLS estimator is given by 1
p
1+
2
x
2
x +
2
w
.
B) arises when an independent variable is measured imprecisely.
C) arises when the dependent variable is measured imprecisely.
D) always occurs in economics since economic data is never precisely measured.
Answer: B
4) Sample selection bias
A) occurs when a selection process influences the availability of data and that process is related to the
dependent variable.
B) is only important for finite sample results.
C) results in the OLS estimator being biased, although it is still consistent.
D) is more important for nonlinear least squares estimation than for OLS.
Answer: A
5) Simultaneous causality bias
A) is also called sample selection bias.
B) happens in complicated systems of equations called block recursive systems.
C) results in biased estimators if there is heteroskedasticity in the error term.
D) arises in a regression of Y on X when, in addition to the causal link of interest from X to Y, there is a
causal link from Y to X.
Answer: D
6) The reliability of a study using multiple regression analysis depends on all of the following with the exception
of
A) omitted variable bias.
B) errors-in-variables.
C) presence of homoskedasticity in the error term.
D) external validity.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 216
7) A statistical analysis is internally valid if
A) its inferences and conclusions can be generalized from the population and setting studied to other
populations and settings.
B) statistical inference is conducted inside the sample period.
C) the hypothesized parameter value is inside the confidence interval.
D) the statistical inferences about causal effects are valid for the population being studied.
Answer: D
8) The components of internal validity are
A) a large sample, and BLUE property of the estimator.
B) a regression R2 above 0.75 and serially uncorrelated errors.
C) unbiasedness and consistency of the estimator, and desired significance level of hypothesis testing.
D) nonstochastic explanatory variables, and prediction intervals close to the sample mean.
Answer: C
9) A study based on OLS regressions is internally valid if
A) the errors are homoskedastic, and there are no more than two binary variables present among the
regressors.
B) you use a two-sided alternative hypothesis, and standard errors are calculated using the
heteroskedasticity-robust formula.
C) weighted least squares produces similar results, and the t-statistic is normally distributed in large
samples.
D) the OLS estimator is unbiased and consistent, and the standard errors are computed in a way that makes
confidence intervals have the desired confidence level.
Answer: D
10) Panel data estimation can sometimes be used
A) to avoid the problems associated with misspecified functional forms.
B) in case the sum of residuals is not zero.
C) in the case of omitted variable bias when data on the omitted variable is not available.
D) to counter sample selection bias.
Answer: C
11) Misspecification of functional form of the regression function
A) is overcome by adding the squares of all explanatory variables.
B) is more serious in the case of homoskedasticity-only standard error.
C) results in a type of omitted variable bias.
D) requires alternative estimation methods such as maximum likelihood.
Answer: C
12) Errors-in-variables bias
A) is only a problem in small samples.
B) arises from error in the measurement of the independent variable.
C) becomes larger as the variance in the explanatory variable increases relative to the error variance.
D) is particularly severe when the source is an error in the measurement of the dependent variable.
Answer: B
13) A survey of earnings contains an unusually high fraction of individuals who state their weekly earnings in
100s, such as 300, 400, 500, etc. This is an example of
A) errors-in-variables bias.
B) sample selection bias.
C) simultaneous causality bias.
D) companies that typically bargain with workers in 100s of dollars.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 217
14) In the case of a simple regression, where the independent variable is measured with i.i.d. error,
A)
B)
C)
D)
^
^
^
^
1
1
1
1
2
X
p
2
X+
2
X
p
2
X+
2
w
.
2
w
p
2
X+
p
1
2
w
1+
2
w
1.
2
X
2
X+
2
w
.
Answer: A
15) In the case of errors-in-variables bias,
A) maximum likelihood estimation must be used.
B) the OLS estimator is consistent if the variance in the unobservable variable is relatively large compared to
variance in the measurement error.
C) the OLS estimator is consistent, but no longer unbiased in small samples.
D) binary variables should not be used as independent variables.
Answer: B
16) Sample selection bias occurs when
A) the choice between two samples is made by the researcher.
B) data are collected from a population by simple random sampling.
C) samples are chosen to be small rather than large.
D) the availability of the data is influenced by a selection process that is related to the value of the dependent
variable.
Answer: D
17) Simultaneous causality
A) means you must run a second regression of X on Y.
B) leads to correlation between the regressor and the error term.
C) means that a third variable affects both Y and X.
D) cannot be established since regression analysis only detects correlation between variables.
Answer: B
18) Correlation of the regression error across observations
A) results in incorrect OLS standard errors.
B) makes the OLS estimator inconsistent, but not unbiased.
C) results in correct OLS standard errors if heteroskedasticity-robust standard errors are used.
D) is not a problem in cross-sections since the data can always be “reshuffled.”
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 218
19) Applying the analysis from the California test scores to another U.S. state is an example of looking for
A) simultaneous causality bias.
B) external validity.
C) sample selection bias.
D) internal validity.
Answer: B
20) Comparing the California test scores to test scores in Massachusetts is appropriate for external validity if
A) Massachusetts also allowed beach walking to be an appropriate P.E. activity.
B) the two income distributions were very similar.
C) the student-to-teacher ratio did not differ by more than five on average.
D) the institutional settings in California and Massachusetts, such as organization in classroom instruction
and curriculum, were similar in the two states.
Answer: D
21) The guidelines for whether or not to include an additional variable include all of the following, with the
exception of
A) providing “full disclosure” representative tabulations of the results.
B) testing whether additional questionable variables have nonzero coefficients.
C) determining whether it can be measured in the population of interest.
D) being specific about the coefficient or coefficients of interest.
Answer: C
22) Possible solutions to omitted variable bias, when the omitted variable is not observed, include the following
with the exception of
A) panel data estimation.
B) nonlinear least squares estimation.
C) use of instrumental variables regressions.
D) use of randomized controlled experiments.
Answer: B
23) A possible solution to errors-in-variables bias is to
A) use log-log specifications.
B) choose different functional forms.
C) use the square root of that variable since the error becomes smaller.
D) mitigate the problem through instrumental variables regression.
Answer: D
24) You try to explain the number of IBM shares traded in the stock market per day in 2005. As an independent
variable you choose the closing price of the share. This is an example of
A) simultaneous causality.
B) invalid inference due to a small sample size.
C) sample selection bias since you should analyze more than one stock.
D) a situation where homoskedasticity-only standard errors should be used since you only analyze one
company.
Answer: A
25) In the case of errors-in-variables bias, the precise size and direction of the bias depend
on
A) the sample size in general.
B) the correlation between the measured variable and the measurement error.
C) the size of the regression R2 .
D) whether the good in question is price elastic.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 219
26) The question of reliability/unreliability of a multiple regression depends on
A) internal but not external validity
B) the quality of your statistical software package
C) internal and external validity
D) external but not internal validity
Answer: C
27) A statistical analysis is internally valid if
A) all t-statistics are greater than |1.96|
B) the regression R2 > 0.05
C) the population is small, say less than 2,000, and can be observed
D) the statistical inferences about causal effects are valid for the population studied
Answer: D
28) A definition of internal validity is
A) the estimator of the causal effect being unbiased and consistent
B) the estimator of the causal effect being efficient
C) inferences and conclusions being generalized from the population to toher populations
D) OLS estimation being available in your statistical package
Answer: A
29) Threats to in internal validity lead to
A) perfect multicollinearity
B) the inability to transfer data sets into your statistical package
C) failures of one or more of the least squares assumptions
D) a false generalization to the population of interest
Answer: C
30) The true causal effect might not be the same in the population studied and the population of interest because
A) of differences in characteristics of the population
B) of geographical differences
C) the study is out of date
D) all of the above
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 220
9.2 Essays and Longer Questions
1) Until about 10 years ago, most studies in labor economics found a small but significant negative relationship
between minimum wages and employment for teenagers. Two labor economists challenged this perceived
wisdom with a publication in 1992 by comparing employment changes of fast -food restaurants in Texas, before
and after a federal minimum wage increase.
(a) Explain how you would obtain external validity in this field of study.
(b) List the various threats to external validity and suggest how to address them in this case.
Answer: (a) Obtaining external validity involves generalizing the results from the population and setting them
under study, in this case Texas. Students familiar with the Card and Krueger literature on minimum
wages will point to the New Jersey/Pennsylvania study, or the high/low impact minimum wage paper
by Card. In general, studies of the effect of minimum wages on employment using data from other states
and/or countries will generate external validity.
(b) The main threats to external validity are the differences between the population and setting studied
versus the population and setting of interest. In particular, there may be geographic and/or time
differences, in that the study may be out of date. Being out of date is not a major concern here, since the
study was done relatively recently. Using data from Texas only could be of concern if you believed that
the Texas fast-food restaurants are different from those elsewhere, say in terms of monopsony power,
the type of teenager they attract, etc. (Students familiar with the literature may point out that no data
was obtained from McDonald’s, but again, that does not pose a particular threat.) Generalizing from
fast-food restaurants to other sectors such as the garment industry, is an entirely different matter, as is
generalizing from teenagers to older workers, especially females. Some authors have established that
increases in minimum wages lead to lower school enrollment rates by whites, who then replace black
fast-food restaurant workers. These types of substitutions are not likely to occur with older workers.
Comparisons with other countries, where cultural differences may be larger than within the United
States, are potentially more problematic.
2) Your textbook used the California Standardized Testing and Reporting (STAR) data set on test student
performance in Chapters 4-7. One justification for putting second to twelfth graders through such an exercise
once a year is to make schools more accountable. The hope is that schools with low scores will improve the
following year and in the future. To test for the presence of such an effect, you collect data from 1,000 L.A.
County schools for grade 4 scores in 1998 and 1999, both for reading ( Read) and mathematics (Maths). Both are
on a scale from zero to one hundred. The regression results are as follows (homoskedasticity -only standard
errors in parentheses):
Maths99 = 6.967 + 0.919 Maths98, R2 = 0.825, SER = 7.818
(0.542) (0.013)
Re ad99 = 4.131 + 0.943 , Re ad98 = R2 = 0.887, SER = 6.416
(0.409) (0.011)
(a) Interpret the results and indicate whether or not the coefficients are significantly different from zero. Do the
coefficients have the expected sign and magnitude?
(b) Discuss various threats to internal and external validity, and try to assess whether or not these are likely to
be present in your study.
(c) Changing the estimation method to allow for heteroskedasticity -robust standard errors produces four new
standard errors: (0.539), (0.015), (0.452), and (0.015) in the order of appearance in the two equations above.
Given these numbers, do any of your statements in (b) change? Do you think that the coefficients themselves
changed?
(d) If reading and maths scores were the same in 1999 as in 1998, on average, what coefficients would you
expect for the intercept and the slope? How would you test for the restrictions?
(e) The appropriate F-statistic in (d) is 138.27 for the maths scores, and 104.85 for the reading scores.
Comparing these values to the critical values in the F table, can you reject the null hypothesis in each case?
(f) Your professor tells you that the analysis reminds her of “Galton’s Fallacy.” Sir Francis Galton regressed the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 221
height of children on the average height of their parents. He found a positive intercept and a slope between
zero and one. Being concerned about the height of the English aristocracy, he interpreted the results as
“regression to mediocrity” (hence the name regression). Do you see the parallel?
Answer: (a) High (low) reading and maths scores in 1998 will result in high (low) reading and maths scores in
1999. The slope coefficients suggest a high degree of persistence. However, both regression lines cross
the 45 degree line, thereby implying implausibly mean reversion. All coefficients are statistically
significant, and approximately 80 to 90 percent of the variation in the 1999 scores are explained by the
1998 scores.
(b) The biggest threat to internal validity stems from the errors-in-variables problem. Assume that the
tests scores in maths in a given year are determined by a given set of factors, such as class size,
socioeconomic variables of the school district, quality of teachers, etc. Let the maths score in the second
year also be determined by the same factors, which are unlikely to change by much between the two
years. Then subtracting the earlier year from the more current year results in a population regression
function with a slope of one and an intercept of zero, and an error term which is correlated with the
previous year’s score. Hence the OLS estimator will be biased downward from one and the intercept will
be biased upward from zero, giving the above result.
There are few threats to internal or external validity present through the other factors, although the L.A.
school district may not be typical when compared to a less urban setting.
(c) The coefficients are unaffected by the choice of standard error calculation. However, hypothesis tests
have no longer the desired significance levels, unless the errors are homoskedastic. There is no
suggestion from the institutional setting of the district that this should be the case here. (Indeed,
homoskedasticity is rejected for the above sample.)
(d) In that case the intercept would be zero, and the slope one. This is a simultaneous hypothesis, and
hence the F-test is appropriate here.
(e) The critical value is 4.61 at the 1% level, thereby comfortably rejecting the null hypothesis in each
case.
(f) The situation is similar here. Instead of regressing the outcome in one period on determining factors,
it is regressed on the outcome in a previous period. In each case the outcome in the previous period is an
imperfect measure, or contains a measure error, of the underlying determinants. This results in problems
with internal validation.
3) Keynes postulated that the marginal propensity to consume (MPC =
hypothesized that the average propensity to consume (APC =
C
) is between zero and one. He also
Ypd
C
) would fall as personal disposable income
Ypd
Ypd increased.
(a) Specify a linear consumption function. Show that the assumption of a falling APC implies the presence of a
positive intercept.
(b) Using annual per capita data, estimation of the consumption function for the United States results in the
following output for the years 1929-1938:
^
Ct = 981.35 + 0.735 Ypd ,t, R2 = 0.98, SER= 50.65
(158.65) (0.038)
Can you reject the null hypothesis that the slope is less than one? Greater than zero? Test the hypothesis that
the intercept is zero. Should you be concerned about the sample size when conducting these tests? What other
threats to internal validity may be present here?
(c) Given the GDP identity for a closed economy,
Yt Ct + It + Gt ,
show why economists saw important policy implications in finding an APC that would decrease over time.
(d) Simon Kuznets, who won the Nobel Prize in economics, collected data on consumption expenditures and
Stock/Watson 2e -- CVC2 8/23/06 -- Page 222
national income from 1869 to 1938 and found, using overlapping period averages, that the APC was relatively
constant over this period. To reconcile this finding with the regression results, Milton Friedman, who also won
the Nobel Prize, formulated the “permanent income” hypothesis. In essence, Friedman hypothesized that both
actual consumption and income are measured with error,
~
~
Ct = Ct + v t and Yt = Yt + wt ,
where Ct and Yt were called “permanent” consumption and income, respectively, and v t and wt, the two
measurement errors, were labeled transitory consumption and income. Friedman hypothesized that the
transitory components were purely random error terms, uncorrelated with the permanent parts.
Let permanent consumption and income be related as follows:
Ct = k × Ypd ,t + ut
so that the APC and MPC are the same and constant over time. Furthermore, let both transitory and permanent
income be independent of the error term. Show that by regressing actual consumption on actual income, the
MPC will be downward biased, and the intercept will be greater than zero, even in large samples (to simplify
the analysis, assume that permanent income and all of the errors are i.i.d. and mutually independent).
^
^
^
Answer: (a) Ci = 0 +
^
1 Ypd ,i. Dividing both sides by personal disposable income results in
Ci
Ypd ,i
^
= APC = 0
^
1
+ 1 . Hence the APC will fall with increases in personal disposable income.
Ypd ,i
(b) Assuming that all assumptions required for proper inference are satisfied here, the t-statistic for an
MPC of one is –6.97, thereby rejecting the null hypothesis. You can also reject the null hypothesis that the
slope is zero (t-statistic = 26.32). The sample is very small here and certainly less than the number of
observations required to permit the use of the standard normal distribution. There may also be omitted
variables here, such as wealth, the real interest rate, the inflation rate, etc. The functional form may be
misspecified, and there may be errors in variables (permanent income). Perhaps most seriously, there is
simultaneous causality present, given the GDP identity.
Ct It
Gt
(c) Dividing both sides of the identity by GDP results in 1
. With the APC falling over
+
+
Yt Yt Yt
time as income increased, either the investment output ratio or the government output ratio would have
to make up for this fall. The likely candidate was the government-expenditure share.
(d) This is the standard errors-in-variables problem discussed in the textbook. Following the derivation
in footnote 2 in the textbook, it is straightforward to show
^
1
2
X
p
^
2
X+
2
w
1 , where X is permanent
income, and w is the measurement error in income. Hence the marginal propensity to consume will be
^
downward biased, or 1 < k. For the intercept we get
~
^
~
~
^
~
0 = Y - 1 X = 0 + 1 X + v - 1 X , and collecting terms results in
^
~
^
0 = 0 - ( 1 - 1 ) X + v. Therefore 0
p
p
0+ X 1
2
w
2
X+
2
w
, since
^
1
p
1- 1
~ . Hence the intercept in the consumption function will be upward biased.
X
Stock/Watson 2e -- CVC2 8/23/06 -- Page 223
2
w
2
X+
~
2
w
and X
4) The Phillips curve is a relationship in macroeconomics between the inflation rate (inf) and the unemployment
rate (ur). Estimating the Phillips curve using quarterly data for the United States from 1962:I to 1995:IV, you
find
Inf t = 4.08 + 0.118 urt, R2 = 0.003, SER = 3.148
(1.11) (0.176)
(a) Explain why, at first glance, this is a surprising result.
(b) Do you think that there is omitted variable bias in the regression?
(c) What other threats to internal validity may be present?
(d) If you could find a proper specification for the Phillips curve using United States data, what external
validity criteria would you suggest?
Answer: (a) There is supposed to be a negative relationship between inflation and unemployment.
(b) The omitted variable is inflationary expectations and the natural rate of unemployment.
(c) There is simultaneous causality in that inflation also causes employment and thereby unemployment
in many models. The functional form is most likely incorrect, since the Phillips curve is typically not
shown as a straight line. There may also be omitted variables in the form of supply side shocks.
(d) The most obvious choice would be to estimate the Phillips curve for other countries. It is also possible
to estimate the Phillips curve for a cross-section of countries. Using state data is more problematic since
state unemployment rates vary, but inflation rates are very similar and only exist for certain cities (using
the CPI).
5) You have decided to analyze the year-to-year variation in temperature data. Specifically you want to use this
year’s temperature to predict next year’s temperature for certain cities. As a result, you collect the daily high
temperature (Temp) for 100 randomly selected days in a given year for three United States cities: Boston,
Chicago, and Los Angeles. You then repeat the exercise for the following year. The regression results are as
follows (heteroskedasticity-robust standard errors in parentheses):
BOS
BOS
Temp t
= 18.19 + 0.75 × Temp t-1 ; R2 = 0.62, SER = 12.33
(6.46) (0.10)
CHI
CHI
Temp t
= 2.47 + 0.95 × Temp t-1 ; R2 = 0.93, SER = 5.85
(3.98) (0.05)
LA
LA
Temp t = 37.54 + 0.44 × Temp t-1 ; R2 = 0.18, SER = 7.17
(15.33) (0.22)
(a) What is the prediction of the above regression for Los Angeles if the temperature in the previous year was
75 degrees? What would be the prediction for Boston?
(b) Assume that the previous year’s temperature gives accurate predictions, on average, for this year’s
temperature. What values would you expect in this case for the intercept and slope? Sketch how each of the
above regressions behaves compared to this line.
(c) After reflecting on the results a bit, you consider the following explanation for the above results. Daily high
temperatures on any given date are measured with error in the following sense: for any given day in any of the
three cities, say January 28, there is a true underlying seasonal temperature ( X), but each year there are
^
different temporary weather patterns (v, w) which result in a temperature X different from X. For the two years
in your data set, the situation can be described as follows:
~
~
Xt1 = X + v t and Xt2 = X + wt
Stock/Watson 2e -- CVC2 8/23/06 -- Page 224
~
~
~
~
Subtracting Xt1 from Xt2, you get Xt2 = Xt1 + wt – v t. Hence the population parameter for the intercept and
slope are zero and one, as expected. Show that the OLS estimator for the slope is inconsistent, where
^
1
p
2
v
1-
2
X+
2
v
(d) Use the formula above to explain the differences in the results for the three cities. Is your mathematical
explanation intuitively plausible?
Answer: (a) The prediction for Los Angeles is 70.5 degrees, and for Boston 74.4 degrees.
(b) In that case, the intercept would be zero, and the slope one.
(c) The derivation follows footnote 2 in the textbook with one modification: 1 = 1.
^
(d) Rewriting 1
p
1-
2
v
2
X+
2
v
as
^
1
p
1
1-
2
X
1+
suggests that the slope in the
2
v
temperature regression will be closer to one, the more variation there is in the underlying “true”
temperature. Temperatures in Los Angeles vary the least throughout the year, and you would therefore
expect the largest bias. The slope for Chicago suggests that temperatures there have the most variation.
The standard deviation for the Boston temperature is 19.5 and for Chicago 21.0. However, these are
actual temperature standard deviations. To calculate the variance of X in the above example, you could
Stock/Watson 2e -- CVC2 8/23/06 -- Page 225
collect data over a 100-year period on the same dates and form daily averages. It is the standard
deviation of these temperatures that would most resemble the standard deviation in X.
6) A study of United States and Canadian labor markets shows that aggregate unemployment rates between the
two countries behaved very similarly from 1920 to 1982, when a two percentage point gap opened between the
two countries, which has persisted over the last 20 years. To study the causes of this phenomenon, you specify
a regression of Canadian unemployment rates on demographic variables, aggregate demand variables, and
labor market characteristics.
(a) Assume that your analysis is internally valid. What would make it externally valid?
(b) If one of the determinants of Canadian unemployment is aggregate United States economic activity (or
perhaps shocks to it), what variable would you suggest as its replacement if you did a similar study for the
United States?
(c) Certain Canadian geographical areas, such as the prairies and British Columbia, seem particularly sensitive
to commodity price shocks (Edmonton’s NHL team is called the Edmonton Oilers). Having collected provincial
data, you establish a relationship between provincial unemployment rates and commodity price changes
(shocks). How would you address external validity now?
Answer: (a) Threats to external validation come from the difference between the population and settings studied
versus the population and settings of interest. Finding, for example, that the variables which characterize
the unemployment insurance system exert an influence on Canadian unemployment, does not
automatically imply that this holds universally. To obtain external validity, the exercise should be
repeated to other geographic units, such as countries or states. If the coefficients are similar, or
differences in coefficients can be explained, then the study is externally valid.
(b) Shocks to world aggregate demand, or the major trading partners for the United States, would be a
possibility.
(c) The task is to find geographical units that are also sensitive to commodity price changes. Texas,
Louisiana, and Oklahoma would be candidates for obtaining external validity.
7) Several authors have tried to measure the “persistence” in U.S state unemployment rates by running the
following regression:
uri,t = 0 + 1 × uri,t-k + zi,t
where ur is the state unemployment rate, i is the index for the i-th state, t indicates a time period, and typically
k 10.
(a) Explain why finding a slope estimate of one and an intercept of zero is typically interpreted as evidence of
“persistence.”
(b) You collect data on the 48 contiguous U.S. states’ unemployment rates and find the following estimates:
^
uri,1995 = 2.25 + 0.60 × uri,1970; R2 = 0.40, SER = 0.90
(0.61) (0.13)
Interpret the regression results.
(c) Analyzing the accompanying figure, and interpret the observation for Maryland and for Washington. Do
you find evidence of persistence? How would you test for it?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 226
(d) One of your peers points out that this result makes little sense, since it implies that eventually all states
would have identical unemployment rates. Explain the argument.
(e) Imagine that state unemployment rates were determined by their natural rates and some transitory shock.
The natural rates themselves may be functions of the unemployment insurance benefits of the state,
unionization rates of its labor force, demographics, sectoral composition, etc. The transitory components may
include state-specific shocks to its terms of trade such as raw material movements and demand shocks from
the other states. You specify the i-th state unemployment rate accordingly as follows for the two periods when
you observe it,
~
~
Xi,t = Xi + v i,t and Xi,t-k = Xi + wi,t-k ,
so that actual unemployment rates are measured with error. You have also assumed that the natural rate is the
same for both periods. Subtracting the second period from the first then results in the following population
regression function:
~
~
Xi,t = 0 + 1 × Xi,t-k + (v i,t – wi,t-k)
It is not too hard to show that estimation of the observed unemployment rate in period t on the unemployment
rate in period (t-k) by OLS results in an estimator for the slope coefficient that is biased towards zero. The
formula is
^
1
p
2
v
1–
2
X+
2
v
.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 227
Using this insight, explain over which periods you would expect the slope to be closer to one, and over which
period it should be closer to zero.
(f) Estimating the same regression for a different time period results in
^
uri,1995 = 3.19 + 0.27 × uri,1985 ; R2 = 0.21, SER = 1.03
(0.56) (0.07)
If your above analysis is correct, what are the implications for this time period?
Answer: (a) This result would imply that states with high (low) unemployment rates in the ( t-k) period would
have high (low) unemployment rates in period t. Hence high (low) unemployment rates would persist.
(b) A state which had an unemployment rate of 3 percent in 1970 is predicted to have an unemployment
rate of approximately 4 percent in 1995. If the state had a 7 percent unemployment rate in 1970, then the
prediction becomes approximately 6.5 percent. There is no interpretation for the constant. The regression
explains 40 percent of the variation in state unemployment rates in 1995.
(c) Washington had the highest unemployment rate in 1970, namely above 9 percent. There are several
states in 1995 that have higher unemployment rates. Washington seems to have reverted towards the
mean unemployment rate of all states. Maryland had a relatively low unemployment rate in 1970 (about
3.5 percent), but has a relatively higher unemployment rate in 1995. It also has reverted towards the
mean.
(d) The positive intercept and the slope between zero and one imply that high (low) unemployment rate
states will have high (low) unemployment rates in the future, but that they will not be as high (low) as in
the base period. Hence there is mean reversion. The prediction would be that ultimately all states would
end up with identical unemployment rates. However, unemployment rate differences should persist if
there are differences in the natural rates of the state unemployment rates. These may be due to different
sectoral compositions, unemployment insurance benefits, tax rates, etc. Unless states were identical with
regard to these variables, then unemployment rates should differ.
(e) Noting that
^
1
p
1-
2
v
2
X+
^
2
v
can be rewritten as 1
p
1-
^
1
2
X
2
v
, you would expect 1 to lie
+1
closer to one over time periods when natural rate variations dominate the transitory deviation of state
unemployment rates from their natural rates. Therefore if you attempted to predict the unemployment
rates in the mid 1980s from those in the mid 1970s, then the slope coefficient should be further away
from one. (There are several studies that have found virtually no persistence in state unemployment
rates over this period.)
(f) Following the previous argument, the result suggests that there were more transitory deviations from
the natural rate over this period. The large drop in oil prices, particularly in 1986, comes to mind.
8) Sir Francis Galton (1822-1911), an anthropologist and cousin of Charles Darwin, created the term regression. In
his article “Regression towards Mediocrity in Hereditary Stature,” Galton compared the height of children to
that of their parents, using a sample of 930 adult children and 205 couples. In essence he found that tall (short)
parents will have tall (short) offspring, but that the children will not be quite as tall (short) as their parents, on
average. Hence there is regression towards the mean, or as Galton referred to it, mediocrity. This result is
obviously a fallacy if you attempted to infer behavior over time since, if true, the variance of height in humans
would shrink over generations. This is not the case.
(a) To research this result, you collect data from 110 college students and estimate the following relationship:
studenth = 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0
(7.2) (0.10)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 228
where Studenth is the height of students in inches and Midparh is the average of the parental heights. Values in
parentheses are heteroskedasticity-robust standard errors. Sketching this regression line together with the 45
degree line, explain why the above results suggest “regression to the mean” or “mean reversion.”
(b) Researching the medical literature, you find that height depends, to a large extent, on one gene (“phog”)
and on environmental influences. Let us assume that parents and offspring have the same invariant (over time)
gene and that actual height is therefore measured with error in the following sense,
~
~
Xi,0 = Xi + v i,o and Xi,p = Xi + wi,p,
~
where X is measured height, X is the height given through the gene, v and w are environmental influences, and
the subscripts o and p stand for offspring and parents, respectively. Let the environmental influences be
independent from each other and from the gene.
Subtracting the measured height of offspring from the height of parents, what sort of population regression
function do you expect?
(c) How would you test for the two restrictions implicit in the population regression function in (b)? Can you
tell from the results in (a) whether or not the restrictions hold?
(d) Proceeding in a similar way to the proof in your textbook, you can show that
^
1
p
2
v
1-
2
X+
2
v
for the situation in (b). Discuss under what conditions you will find a slope closer to one for the height
comparison. Under what conditions will you find a slope closer to zero?
(e) Can you think of other examples where Galton’s Fallacy might apply?
Answer: (a) As can be seen in the accompanying graph, the regression line crosses the 45 degree line. Tall (short)
parents will have tall (short) children, but on average, they will not be as tall (short) as their parents.
Hence they will regress to the mean, or mean revert.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 229
~
~
(b) Xi,t = 0 + 1 × Xi,t-k + (v it - wi,t-k)
(c) You would have to test simultaneously whether the intercept is zero and the slope is one. This
requires an F-test. Analyzing the t-statistics above suggests rejection of both hypotheses. However,
testing the hypotheses sequentially is not the same as testing them simultaneously.
^ p
^
1
1. 1 will equal unity if there is no
(d) The above expression can be rewritten as 1
2
X
+1
2
v
measurement error, or if the variance in the gene is relatively large compared to the measurement error.
(e) Answer will vary by student. There are many examples of Galton’s Fallacy, some of which have been
used in the test bank (state unemployment rates in year t when compared to year t-k; temperatures in a
given city this year compared to the previous year; grade received in the final examination relative to the
midterm grade; mutual fund performance this year versus last year; convergence regressions, sports
performance this year compared to the previous year, etc.).
9) Macroeconomists who study the determinants of per capita income (the “wealth of nations”) have been
particularly interested in finding evidence on conditional convergence in the countries of the world. Finding
such a result would imply that all countries would end up with the same per capita income once other
variables such as saving and population growth rates, education, government policies, etc., took on the same
value. Unconditional convergence, on the other hand, does not control for these additional variables.
(a) The results of the regression for 104 countries was as follows,
g6090 = 0.019 – 0.0006 × RelProd 60 , R2 = 0.00007, SER = 0.016
(0.004) (0.0073),
where g6090 is the average annual growth rate of GDP per worker for the 1960 -1990 sample period, and
RelProd60 is GDP per worker relative to the United States in 1960.
For the 24 OECD countries in the sample, the output is
g6090 = 0.048 – 0.0404 RelProd 60 , R2 = 0.82 , SER = 0.0046
(0.004) (0.0063)
Interpret the results and point out the difference with regard to unconditional convergence.
(b) The “beta-convergence” regressions in (a) are of the following type,
t ln Yi,t
= 0 + 0 ln Yi,0 + ui,t ,
T
where
t ln Yi,t = ln Yi,0 – ln Yi,0 , and t and o refer to two time periods, i is the i-th country.
Explain why a significantly negative slope implies convergence (hence the name).
(c) The equation in (b) can be rewritten without any change in information as (ignoring the division by T)
ln Yt = 0 + 1 ln Y0 + ut
In this form, how would you test for unconditional convergence? What would be the implication for
convergence if the slope coefficient were one?
(d) Let’s write the equation in (c) as follows:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 230
~
Yt = 0 + 1~
Y0 + ut
and assume that the “~” variables contain measurement errors of the following type,
~
~
Yi,t = Y * + v i,t and Yi,0 = Y * + wi,0 ,
t
0
where the “*” variables represent true, or permanent, per capita income components, while v and w are
temporary or transitory components. Subtraction of the initial period from the current period then results in
~
~
Yi,t = ( Y * – Y * ) + Yi,0 + (v i,t – wi,0 )
t
0
Ignoring, without loss of generality, the constant in the above equation, and making standard assumptions
about the error term, one can show that by regressing current per capita income on a constant and the initial
period per capita income, the slope behaves as follows:
^
1
p
2
v
1–
2
Y* +
2
v
Discuss the implications for the convergence results above.
Answer: (a) There is evidence for unconditional convergence among the OECD countries, but not for the
countries of the world as a whole. Only for the OECD countries is the slope coefficient significantly
different from zero.
(b) A significantly negative slope coefficient implies that countries which were further behind initially,
grow faster subsequently. Hence these countries will eventually converge.
(c) Ignoring T above, 1 = 1 - 1. Hence for convergence to occur, 1 has to be significantly different
from unity. If it were unity, then there would be no convergence or mean reversion.
(d) If Y is measured with error, perhaps due to a temporary difference resulting from a shock during the
initial year of measurement, then beta will be biased downward, i.e., the regression will indicate
convergence when there is none in truth.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 231
10) One of the most frequently used summary statistics for the performance of a baseball hitter is the so -called
batting average. In essence, it calculates the percentage of hits in the number of opportunities to hit
(appearances “at the plate”). The management of a professional team has hired you to predict next season’s
performance of a certain hitter who is up for a contract renegotiation after a particularly great year. To analyze
the situation, you search the literature and find a study which analyzed players who had at least 50 at bats in
1998 and 1997. There were 379 such players.
(a) The reported regression line in the study is
1998
1997 2
Batavg i
; R = 0.17
= 0.138 + 0.467 × Batavg i
and the intercept and slope are both statistically significant. What does the regression imply about the
relationship between past performance and present performance? What values would the slope and intercept
have to take on for the future performance to be as good as the past performance, on average?
(b) Being somewhat puzzled about the results, you call your econometrics professor and describe the results to
her. She says that she is not surprised at all, since this is an example of “Galton’s Fallacy.” She explains that Sir
Francis Galton regressed the height of offspring on the mid-height of their parents and found a positive
intercept and a slope between zero and one. He referred to this result as “regression towards mediocrity.” Why
do you think econometricians refer to this result as a fallacy?
(c) Your professor continues by mentioning that this is an example of errors-in-variables bias. What does she
mean by that in general? In this case, why would batting averages be measured with error? Are baseball
statisticians sloppy?
(d) The top three performers in terms of highest batting averages in 1997 were Tony Gwynn (.372), Larry
Walker (.366), and Mike Piazza (.362). Given your answers for the previous questions, what would be your
predictions for the 1998 season?
Answer: (a) The regression implies mean reversion: those players who had a high (low) average in 1997 will have
a high (low) average in 1998, but it will not be as high (low) as before. If the performance was as good or
bad as in the past, then the intercept would have to be zero and the slope one.
(b) If the result were true, then eventually everyone would be of the same height.
(c) Errors-in-variables bias refers to a situation where variables are not measured precisely, but contain
a measurement error. In this situation, the player may have had an extraordinarily good or bad year,
resulting, perhaps, from an injury, adjustments to a new league, a new city, etc. This results in a
measurement error of his underlying ability. It has nothing to do with not measuring the batting average
correctly.
(d) The forecast would be for Tony Gwynn to bat (.312), Larry Walker (.309), and Mike Piazza (.307).
11) Your textbook compares the results of a regression of test scores on the student -teacher ratio using a sample of
school districts from California and from Massachusetts. Before standardizing the test scores for California,
you get the following regression result:
TestScr = 698.9 - 2.28×STR
n = 420, R2 = 0.051, SER = 18.6
In addition, you are given the following information: the sample mean of the student -teacher ratio is 19.64
with a standard deviation of 1.89, and the standard deviation of the test scores is 19.05.
a.
After standardizing the test scores variable and running the regression again, what is the value of the
slope? What is the meaning of this new slope here (interpret the result)?
b.
What will be the new intercept? Now that test scores have been standardized, should you interpret the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 232
intercept?
c.
Does the regression R2 change between the two regressions? What about the t-statistic for the slope
estimator?
Answer: a. Standardization of a variable is a simple linear transformation,
1
* Yi- Y
-Y
Y = a + b Yi
Y i=
=
+
sY i
sY
sY
(say). Hence the new regression slope will be
The numerical value of the new slope is (-0.11). The interpretation is as follows: if you
decrease the student-teacher ratio by one, then test scores improve by 0.11 of a standard
deviation of test scores or 0.11×19.05 = 2.10 (there are some rounding errors here).
b. The intercept will be
n
^*
1=
n
*
y i xi
i=1
n
i=1
=
xi
n
*
Y i xi
i=1
n
=
i=1
xi
i=1
n
(a + b Yi) x i
n
=b
xi
i=1
Yi x i
i=1
n
^
=b× 1
xi
i=1
Or, in this case, 2.35. Mathematically speaking, the intercept continues to represent the
(standardized) test score when the student-teacher ratio is zero. This does not make sense
and it is best not to interpret the intercept.
c. Performing a linear transformation on the regressand (or the regressor for that matter) does not
change the regression R2 . It is easy but tedious to show that it is unaffected. Intuitively this makes sense
since otherwise you could affect the goodness of fit by whim (changing the scale of the data). Similarly,
logic dictates that the t-statistic is unaffected.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 233
12) Suppose that you have just read a review of the literature of the effect of beauty on earnings. You were initially
surprised to find a mild effect of beauty even on teaching evaluations at colleges. Intrigued by this effect, you
consider explanations as to why more attractive individuals receive higher salaries. One of the possibilities you
consider is that beauty may be a marker of performance/productivity. As a result, you set out to test whether or
not more attractive individuals receive higher grades (cumulative GPA) at college. You happen to have access
to individuals at two highly selective liberal arts colleges nearby. One of these specializes in Economics and
Government and incoming students have an average SAT of 2,100; the other is known for its engineering
program and has an incoming SAT average of 2,200. Conducting a survey, where you offer students a small
incentive to answer a few questions regarding their academic performance, and taking a picture of these
individuals, you establish that there is no relationship between grades and beauty. Write a short essay using
some of the concepts of internal and external validity to determine if these results are likely to apply to
universities in general.
Answer: Students will consider various points that pose a threat to internal and external validity. Obviously there
is a difference in populations (external validity) between highly selective liberal arts colleges and
universities in general. SAT scores at these colleges are much higher than for the average university. In
addition, the gender composition may be quite different, especially for engineering school, where males
dominate in terms of student numbers. Even in economics, the ratio of female to male students is
typically 1:2. This is an example of sample selection bias (internal validity). Other potential problems
with this study may include errors-in-variables from students not reporting the correct GPA. However,
this may not be a severe problem since GPA is the dependent variable. There could be a problem if there
are systematic problems in inflating the GPA for lower GPAs. It is also not clear from the setup how
beauty was judged. If judges were chosen who are friends of the individuals, then their judgments may
be biased, which is more severe since beauty is an explanatory variable. The setup also does not indicate
what the control variables are. In the absence of controls, there will be omitted variable bias (internal
validity) since intelligence will clearly be a determining factor of cumulative GPAs.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 234
9.3 Mathematical and Graphical Problems
1) Your textbook gives the following example of simultaneous causality bias of a two equation system:
Yi = 0 + 1 Xi + ui
Xi = 0 + 1 Yi + v i
D
In microeconomics, you studied the demand and supply of goods in a single market. Let the demand ( Q i )
S
and supply ( Q i ) for the i-th good be determined as follows,
D
Q i = 0 – 1 Pi + ui ,
S
Q i = 0 – 1 Pi + v i ,
where P is the price of the good. In addition, you typically assume that the market clears.
Explain how the simultaneous causality bias applies in this situation. The textbook explained a positive
correlation between Xi and ui for 1 > 0 through an argument that started from “imagine that ui is negative.”
Repeat this exercise here.
Answer: Although quantities appear on the left-hand side of both equations, this is a system of two equations in
two unknowns, where quantity and price are determined simultaneously by demand and supply.
A negative ui, call it a “demand shock,” decreases the quantity demanded. Since demand equals supply,
this results in a lower quantity traded, and hence a lower price. (At the old price level, there would now
be excess supply, and hence the price would fall.) The negative ui has therefore resulted in a lower price,
and hence the error term in the demand equation is positively correlated with the price in the same
equation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 235
2) The errors-in-variables model analyzed in the text results in
^
1
2
X
p
2
X+
2
w
1
so that the OLS estimator is inconsistent. Give a condition involving the variances of X and w, under which the
bias towards zero becomes small.
^
Answer: 1
p
2
X
2
X+
2
w
1
1=
2
w
1+
1 . Hence if the variance of X is large relative to w, so that variations in
2
X
the variable measured with error is dominated by the unobserved component, then the bias disappears.
Also, if there is no measurement error, then
2
w = 0, and the bias disappears.
3) You have been hired as a consultant by building contractor, who have been sued by the owners’
representatives of a large condominium project for shoddy construction work. In order to assess the damages
for the various units, the owners’ association sent out a letter to owners and asked if people were willing to
make their units available for destructive testing. Destructive testing was conducted in some of these units as a
result of the responses. Based on the tests, the owners’ association inferred the damage over the entire condo
complex. Do you think that the inference is valid in this case? Discuss how proper sampling should proceed in
this situation.
Answer: This is clearly a case of sample selection bias which leads to bias in the OLS estimator in general. It
should be clear that inference cannot be conducted properly, since owners who suspect that their unit is
faulty are much more likely to agree to destructive testing of their unit than those who have not
experienced any problems. The proportion of units assumed to be faulty in the population is bound to be
too large when derived through sampling of this type.
The proper sampling method would be to decide on the units to be tested through random sampling. A
random number generator should be used to determine the sampled units. The owners’ association must
guarantee that the randomly selected units are available for destructive testing.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 236
4) Assume that a simple economy could be described by the following system of equations,
Ct = 0 + 1 Yt + ui
It = I ,
where C is consumption, Y is income, and I is investment. (This may be a primitive island society which does
not trade with other islands. There is no government, and the only good consumed and invested (saved) is
sunflower seeds.)
Assume the presence of the GDP identity, Y = C + I. If you estimated the consumption function, what sort of
problem involving internal validity may be present?
Answer: There is simultaneous causality present in the system. Income causes consumption, which in return
causes income (GDP). A negative consumption “shock,” ut, causes consumption, and hence aggregate
demand, to fall. With lower aggregate demand, not all goods supplied are being sold in the market, and
hence income (Yt) falls. There is therefore a positive correlation between ut and Yt, i.e., the error term
and the regressor are correlated.
5) Your professor wants to measure the class’s knowledge of econometrics twice during the semester, once in a
midterm and once in a final. Assume that your performance, and that of your peers, on the day of your
midterm exam only measure knowledge imperfectly and with an error,
~
1
1
1
Xi = Xi + wi ,
~
where X is your exam grade, X is underlying econometrics knowledge, and w is a random error with mean zero
and variance
2
w . w may depend on whether you have a headache that day, whether or not the questions you
had prepared for appeared on the exam, your mood, etc. A similar situation holds for the final, which is exam
two:
~
2
2
2
X i = X i + w i . What would happen if you ran a regression of grades received by students in the final on
midterm grades?
Answer: This is a typical errors-in-variables problem, which results in a downward biased estimator of the slope.
~
~
2
2
1
1
2
1
Subtracting the first equation from the second results in X i = (X i - X i ) + X i + ( w i - w i ). If
underlying econometrics knowledge at each exam did not change, then the regression should have a
slope of one and a zero intercept. (Alternatively, you can allow for an intercept.) The main point here is
that the performance during the first exam is only an imperfect measure of econometric ability, meaning
that there is measurement error. This results in a correlation between the error term and the regressor,
^
and the OLS estimator will be inconsistent. 1
p
2
X
2
X+
2
w
1=
2
X
2
X+
2
w
< 1, and so the regression
will display mean reversion: students with high (low) midterm scores will most likely have high (low)
scores in the final, but they will not be quite as high (low) as in the midterm.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 237
6) Consider the one-variable regression model, Yi = 0 + 1 Xi + ui, where the usual assumptions from Chapter 4
~
~
are satisfied. However, suppose that both Y and X are measured with error, Yi = Yi + zi and Xi = Xi + wi. Let
both measurement errors be i.i.d. and independent of both Y and X respectively. If you estimated the
~
~
regression model Yi = 0 + 1 Xi + v i using OLS, then show that the slope estimator is not consistent.
Answer: The difference from the example used in section 7.2 of the text is that both the regressor and the
dependent variable are measured with error here. Proceeding along the lines in section 7.2, you can write
the population regression equation Yi = 0 + 1 Xi + ui in terms of the imprecisely measured variables
~
~
~
~
~
Yi = 0 + 1 Xi + [ 1 (Xi - Xi) + ( Yi - Yi) + ui] = 0 + 1 Xi + v i
where v i = zi - 1 wi + ui. Hence the dependent variable being measured with error does not cause
additional problems to the case discussed in the textbook, but the error term continues to be correlated
with the regressor. As a matter of fact, it is easiest to combine the this measurement error with the
*
population regression error term, i.e., u i = zi + ui, in which case the derivation shown in Chapter 7
~
~
*
footnote 2 of the textbook holds after making this small adjustment. Note that cov(Xi, u i ) = cov(Xi, zi) +
~
~
cov(Xi ui) = 0, and hence cov(Xi, v i) = - 1
^
2
X
p
2
w as before, and 1
2
X+
2
w
1.
7) In the simple, one-explanatory variable, errors-in-variables model, the OLS estimator for the slope is
inconsistent. The textbook derived the following result
^
1
2
X
p
2
X+
2
w
1.
Show that the OLS estimator for the intercept behaves as follows in large samples:
^
~ p ~
X.
~
1
p
2
w
~
0+ X
2
X+
2
w
1,
where X
^
~
~
^
~
^
~
Answer: 0 = Y - 1 X = 0 + 1 X + v - 1 X, and, collecting terms, this results in 0 = 0 - ( 1 - 1 ) X + v.
^
Therefore 0
p
0+ X 1
2
w
2
X+
^
2
w
, since 1
p
1- 1
2
w
2
X+
Stock/Watson 2e -- CVC2 8/23/06 -- Page 238
2
w
.
8) Assume that you had found correlation of the residuals across observations. This may happen because the
regressor is ordered by size. Your regression model could therefore be specified as follows:
Yi = 0 + 1 Xi + ui
ui = u i-1 + v i;
< 1.
Furthermore, assume that you had obtained consistent estimates for 0 , 1 , . If asked to make a prediction for
^
Y, given a value of X(= Xj) and uj-1 , how would you proceed? Would you use the information on the lagged
residual at all? Why or why not?
Answer: Given that the error term for j is related to the error term in j-1, it seems intuitive to use that information
^
^
in prediction, i.e., if Yj-1 is larger than 0 + 1 Xj-1 , thenYj will also be larger than but not by as much
(given > 0). Substitution of the second equation into the first equation results in Yi = 0 + 1 Xi + u i-1
+ v i. Hence the predicted value should be calculated as
^
^
^
^^
Yj = 0 + 1 Xj + uj-1 .
~
9) Your textbook only analyzed the case of an error-in-variables bias of the type Xi= Xi + wi. What if the error
were generated in the simple regression model by entering data that always contained the same typographical
~
~
error, say Xi= Xi + a or Xi= bXi, where a and b are constants. What effect would this have on your regression
model?
Answer: This would have an effect similar to changing the units of measurement. The measurement error is not
random here, and the bias can be determined exactly.
~
For the case Xi= Xi + a, the slope will be unaffected and the usual properties for the OLS slope estimator
~
^
^
~
^
will hold. However, since X = X + a and 0 = Y - 1 X) - 1 a, the intercept will be underestimated by the
constant measurement error times the slope.
~
For the case Xi = bXi, the intercept is unaffected, but the ratio of the estimated slope with measurement
error to the slope without measurement error is b.
10) Explain why the OLS estimator for the slope in the simple regression model is still unbiased, even if there is
correlation of the error term across observations.
^
Answer: The proof for unbiasedness is presented in Appendix 4.3 of the textbook. There 1 = 1 +
n
n
1
1
(Xi - X)ui
(Xi - X)ui
n
n
^
i=1
i=1
, and E( 1 ) = 1 + E
.
n
n
1
1
(Xi - X)2
(Xi - X)2
n
n
i=1
i=1
^
Given the law of iterated expectations, this becomes E( 1 ) = 1 + E
1
n
n
(Xi - X) E(ui X 1 ,..., Xn)
i=1
1
n
, and
n
i=1
(Xi - X)2
the second term vanishes due to the least squares assumptions of independence between the error term
and the regressor. The assumption of correlation of the error term across observations has not entered
into the proof. However, it will play a role in the derivation of standard errors.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 239
11) To analyze the situation of simultaneous causality bias, consider the following system of equations:
Yi = 0 + 1 Xi + ui
Xi = 0 + 1 Yi + v i
Demonstrate the negative correlation between Xi and 1 for 1 < 0 , either through mathematics or by
presenting an argument which starts as follows: “Imagine that ui is negative.”
Answer: The mathematical derivation of the correlation is given in footnote 3 of Chapter 7 in the textbook. Setting
1 <0 results in a negative correlation between Xi and ui. A negative shock to the first equation yields a
lower Y. This in turn increases X in the second equation. Hence there is a negative correlation between Xi
and ui.
12) Think of three different economic examples where cross-sectional data could be collected. Indicate in each of
these cases how you would check if the analysis is externally valid.
Answer: Answers will differ by student. Using U.S. state data to analyze determinants of unemployment or the
effect of minimum wages on employment-population ratios, and using a sample of Canadian provinces,
or other subnational geographical units, may be mentioned. Similarly cross -country comparisons to test
convergence in per capita income could be compared to results within countries. Given the textbook
example, test scores in elementary schools within one state may be validated by using data from another
state.
^
13) The textbook derived the following result: 1
2
X
p
2
X+
2
w
1 . Show that this is the same as
2
w
2
w+
2
X
1.
2
X
Answer:
2
X+
2
w
1=
2
X±
2
w
2
X+
2
w
1 = 1-
2
w
2
X+
2
w
1= 1-
2
w
2
X+
Stock/Watson 2e -- CVC2 8/23/06 -- Page 240
2
w
1.
^
1
p
1
14) Your textbook has analyzed simultaneous equation systems in the case of two equations,
Yi = 0 + 1 Xi + ui
Xi = 0 + 1 Yi + v i ,
where the first equation might be the labor demand equation (with capital stock and technology being held
constant), and the second the labor supply equation (X being the real wage, and the labor market clears). What
if you had a a production function as the third equation
Zi = 0 + 1 Yi + wi
where Z is output. If the error terms, u, v, and w, were pairwise uncorrelated, explain why there would be no
simultaneous causality bias when estimating the production function using OLS.
Answer: Although the above system represents three equations in three unknowns, it is “block -recursive,”
meaning that X and Y (the real wage and employment) are completely determined by the first two
equations and independently of the production function (Z). Given the solution for employment (Y), the
third equation solely determines output (Z).
Put differently, if there was a positive shock to the production function, which would result in higher
output, then this would have no effect on employment (Y), and there would therefore be no feedback
into the production function. Hence the error term in the third equation is not correlated with the
regressor.
15) A professor in your microeconomics lectures derived a labor demand curve in the lecture. Given some
reasonable assumptions, she showed that the demand for labor depends negatively on the real wage. You want
to put this hypothesis to the test (“show me”) and collect data on employment and real wages for a certain
industry. You try to estimate the labor demand curve but find no relationship between the two variables. Is
economic theory wrong? Explain.
Answer: This is a case of simultaneous causality. Since there is a supply of labor as well, the real wage depends
on employment, which, in a market-clearing model, is determined by the intersection of supply and
demand. In a Keynesian world with wait unemployment, you would expect a negative relationship
between real wages and employment, given the capital stock and productivity.
16) Your textbook uses the following example of simultaneous causality bias of a two equation system:
Yi = 0 +
1 Xi + ui
Xi = 0 + 1 Yi + v i
To be more specific, think of the first equation as a demand equation for a certain good, where Y is the quantity
demanded and X is the price. The second equation then represents the supply equation, with a third equation
establishing that demand equals supply. Sketch the market outcome over a few periods and explain why it is
impossible to identify the demand and supply curves in such a situation. Next assume that an additional
variable enters the demand equation: income. In a new graph, draw the initial position of the demand and
supply curves and label them D0 and S0 . Now allow for income to take on four different values and sketch
what happens to the two curves. Is there a pattern that you see which suggests that you might be able to
identify one of the two equations with real-life data?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 241
Answer: See the accompanying graph.
You only observe market outcomes (the intersection of the demand and supply curve). Fitting a
regression line through these points does not gives you neither the supply curve nor the demand curve,
and hence neither is identified.
The market outcome now generates give observations at the intersection of the two curves. Fitting a line
through the five points will give an estimate of the supply curve. Hence by shifting the demand curve in
this fashion, you can identify the supply curve.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 242
17) Give at least three examples where you could envision errors-in-variables problems. For the case where the
^ p
measurement error occurs only for the explanatory variable in the simple regression case, derive 1
2
X
2
X+
2
w
1.
Answer: Answers will vary by student. Consumption functions are frequently mentioned, where permanent
consumption is proportional to permanent income, both of which differ from actual measures of
consumption and income through transitory components. There are several examples in this chapter of
the test bank where the underlying measure of the regressor is proxied by previous outcomes
(unemployment rates, weather, height, etc.). Students may feel that responses to surveys result in
measurement error, e.g., when people respond to questions regarding their income, their SAT score, and
so forth.
The formula is derived in Chapter 7, footnote 2 of the textbook.
18) Your textbook states that correlation of the error term across observations “will not happen if the data are
obtained by sampling at random from the population.” However, in one famous study of the electric utility
industry, the observations were listed by the size of the output level, from smallest to largest. The pattern of the
residuals was as shown in the figure.
What does this pattern suggest to you?
Answer: The pattern suggests that there is correlation in the error term across observations, and therefore
possibly the presence of an omitted variable, or, most likely here, a misspecification of the functional
form of the regression function.
1
19) Consider a situation where Y is related to X in the following manner: Yi = 0 × X i × eui. Draw the
deterministic part of the above function. Next add, in the same graph, a hypothetical Y, X scatterplot of the
actual observations. Assume that you have misspecified the functional form of the regression function and
estimated the relationship between Y and X using a linear regression function. Add this linear regression
function to your graph. Separately, show what the plot of the residuals against the X variable in your
regression would look like.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 243
Answer: See the accompanying graphs.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 244
20) In macroeconomics, you studied the equilibrium in the goods and money market under the assumption of
prices being fixed in the very short run. The goods market equilibrium was described by the so -called IS
equation
Ri = 0 – 1 Yi + ui
where R represented the nominal interest rate and Y was real GDP. 0 contained variables determined outside
the system, such as government expenditures, taxes, and inflationary expectations.
The money market equilibrium was given by the so-called LM equation
Ri = 0 + 1 Yi + v i
and 0 contained the real money supply and the intercept from the money demand equation.
Show that there is simultaneous causality bias in this situation.
Answer: Consider the case of a positive shock to the LM curve. This will increase the interest rate, which, in
return, will result in lower output through the IS curve. Hence there is negative correlation between the
error in the LM curve and the regressor, resulting in simultaneous causality bias.
21) Assume the following model of the labor market:
W
Nd = 0 + 1
+u
P
W
Ns = 0 + 1
+v
P
Nd = Ns = N
where N is employment, (W/P) is the real wage in the labor market, and u and v are determinants other
than the real wage which affect labor demand and labor supply (respectively). Let
E(u) = E(v) = 0; var(u) =
2
u ; var(v) =
2
v ; cov(u,v) = 0
Assume that you had collected data on employment and the real wage from a random sample of
observations and estimated a regression of employment on the real wage (employment being the
regressand and the real wage being the regressor). It is easy but tedious to show that
^
( 1- 1)
p
( 1 - 1)
2
u
2
u+
2
v
>0
since the slope of the labor supply function is positive and the slope of the labor demand function is
negative. Hence, in general, you will not find the correct answer even in large samples.
a.
What is this bias referred to?
b.
What would the relationship between the variance of the labor supply/demand shift variable have to
be for the bias to disappear?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 245
c.
Give an intuitive answer why the bias would disappear in that situation. Draw a graph to illustrate
your argument.
Answer: a. Simultaneous equations bias
b. The variance of v, the shift variable of the labor supply curve, would have to be substantially larger
compared to the variance of the labor demand shift variable.
c. Take the extreme case where the labor demand curve hardly shifts at all, but there are large changes in
the labor supply curve caused by the shift variable v. In that case, the labor supply curve would “trace
out “the labor demand curve. Since in real life you only observe the intersection of the demand and
supply relationship, it becomes clear now why the simultaneous equation bias has been removed.
22) To compare the slope coefficient from the California School data set with that of the Massachusetts School data
set, you run the following two regressions:
TestScrCA = 2.35 - 0.123×STRCA
(0.54) (0.027)
n = 420, R2 = 0.051, SER = 0.98
TestScrMA = 1.97 - 0.114×STRMA
(0.57) (0.033)
n = 220,
R2
= 0.067, SER = 0.97
Numbers in parenthesis are heteroskedasticity-robust standard errors, and the LHS variable has been
standardized.
Calculate a t-statistic to test whether or not the two coefficients are the same. State the alternative
hypothesis. Which level of significance did you choose?
Answer: H0 : 1,CA = 1,MA; H1 : 1,CA
1,MA;t =
0.123-0.114
= 0.21. Hence you cannot reject the null
0.027 2 + 0.114 2
hypothesis at any reasonable level of significance. The underlying assumption here is that the two
samples are independent, which seems reasonable.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 246
23) You have read the analysis in chapter 9 and want to explore the relationship between poverty and test scores.
You decide to start your analysis by running a regression of test scores on the percent of students who are
eligible to receive a free/reduced price lunch both in California and in Massachusetts. The results are as
follows:
TestScrCA = 681.44 - 0.610×PctLchCA
(0.99) (0.018)
n = 420,
R2
= 0.75, SER = 9.45
TestScrMA = 731.89 - 0.788×PctLchMA
(0.95)
(0.045)
n = 220, R2 = 0.61, SER = 9.41
Numbers in parenthesis are heteroskedasticity-robust standard errors.
a.
Calculate a t-statistic to test whether or not the two slope coefficients are the same.
b.
Your textbook compares the slope coefficients for the student-teacher ratio instead of the percent
eligible for a free lunch. The authors remark: “Because the two standardized tests are different, the
coefficients themselves cannot be compared directly: One point on the Massachusetts test is not the
same as one point on the California test.” What solution do they suggest?
Answer: a. H0 : 1,CA = 1,MA; H1 : 1,CA
1,MA;t =
0.788-0.610
= 3.67. Hence you reject the null
0.018 2 + 0.045 2
hypothesis.
b. The authors suggest standardizing the test score variable in both states by subtracting the mean and
by dividing by the standard deviation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 247
Chapter 10 Regression with Panel Data
10.1 Multiple Choice
1) The notation for panel data is (Xit, Yit), i = 1, ..., n and t = 1, ..., T because
A) we take into account that the entities included in the panel change over time and are replaced by others.
B) the X’s represent the observed effects and the Y the omitted fixed effects.
C) there are n entities and T time periods.
D) n has to be larger than T for the OLS estimator to exist.
Answer: C
2) The difference between an unbalanced and a balanced panel is that
A) you cannot have both fixed time effects and fixed entity effects regressions.
B) an unbalanced panel contains missing observations for at least one time period or one entity.
C) the impact of different regressors are roughly the same for balanced but not for unbalanced panels.
D) in the former you may not include drivers who have been drinking in the fatality rate/beer tax study.
Answer: B
3) Consider the special panel case where T = 2. If some of the omitted variables, which you hope to capture in the
changes analysis, in fact change over time, then the estimator on the included change regressor
A) will be unbiased only when allowing for heteroskedastic-robust standard errors.
B) may still be unbiased.
C) will only be unbiased in large samples.
D) will always be unbiased.
Answer: B
4) The Fixed Effects regression model
A) has n different intercepts.
B) the slope coefficients are allowed to differ across entities, but the intercept is “fixed” (remains
unchanged).
C) has “fixed” (repaired) the effect of heteroskedasticity.
D) in a log-log model may include logs of the binary variables, which control for the fixed effects.
Answer: A
5) In the Fixed Effects regression model, you should exclude one of the binary variables for the entities when an
intercept is present in the equation
A) because one of the entities is always excluded.
B) because there are already too many coefficients to estimate.
C) to allow for some changes between entities to take place.
D) to avoid perfect multicollinearity.
Answer: D
6) In the Fixed Effects regression model, using (n – 1) binary variables for the entities, the coefficient of the binary
variable indicates
A) the level of the fixed effect of the ith entity.
B) will be either 0 or 1.
C) the difference in fixed effects between the ith and the first entity.
D) the response in the dependent variable to a percentage change in the binary variable.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 248
7) cov (uit, uis Xit, Xis = 0 for t s means that
A) there is no perfect multicollinearity in the errors.
B) division of errors by regressors in different time periods is always zero.
C) there is no correlation over time in the residuals.
D) conditional on the regressors, the errors are uncorrelated over time.
Answer: D
8) With Panel Data, regression software typically uses an “entity -demeaned” algorithm because
A) the OLS formula for the slope in the linear regression model contains deviations from means already.
B) there are typically too many time periods for the regression package too handle.
C) the number of estimates to calculate can become extremely large when there are a large number of
entities.
D) deviations from means sum up to zero.
Answer: C
9) The “before and after” specification, binary variable specification, and “entity -demeaned” specification
produce identical OLS estimates
A) as long as there are observations for more than two time periods.
B) if you use the heteroskedasticity-robust option in your regression program.
C) for the case of more than 100 observations.
D) as long as T = 2 and the intercept is excluded from the “before and after” specification.
Answer: D
10) In the Fixed Time Effects regression model, you should exclude one of the binary variables for the time periods
when an intercept is present in the equation
A) because the first time period must always excluded from your data set.
B) because there are already too many coefficients to estimate.
C) to avoid perfect multicollinearity.
D) to allow for some changes between time periods to take place.
Answer: C
11) If you included both time and entity fixed effects in the regression model which includes a constant, then
A) one of the explanatory variables needs to be excluded to avoid perfect multicollinearity.
B) you can use the “before and after” specification even for T > 2.
C) you must exclude one of the entity binary variables and one of the time binary variables for the OLS
estimator to exist.
D) the OLS estimator no longer exists.
Answer: C
12) Consider estimating the effect of the beer tax on the fatality rate, using time and state fixed effect for the
Northeast Region of the United States (Maine, Vermont, New Hampshire, Massachusetts, Connecticut and
Rhode Island) for the period 1991-2001. If Beer Tax was the only explanatory variable, how many coefficients
would you need to estimate, excluding the constant?
A) 18
B) 17
C) 7
D) 11
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 249
13) Consider the regression example from your textbook, which estimates the effect of beer taxes on fatality rates
across the 48 contiguous U.S. states. If beer taxes were set nationally by the federal government rather than by
the states, then
A) it would not make sense to use state fixed effect.
B) you can test state fixed effects using homoskedastic-only standard errors.
C) the OLS estimator will be biased.
D) you should not use time fixed effects since beer taxes are the same at a point in time across states.
Answer: D
14) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982 -1988 for the 48
contiguous U.S. states. To test for the significance of time fixed effects, you should calculate the F-statistic and
compare it to the critical value from your Fq, distribution, where q equals
A) 6.
B) 7.
C) 48.
D) 53.
Answer: A
15) When you add state fixed effects to a simple regression model for U.S. states over a certain time period, and the
regression R2 increases significantly, then it is safe to assume that
A) the included explanatory variables, other than the state fixed effects, are unimportant.
B) state fixed effects account for a large amount of the variation in the data.
C) the coefficients on the other included explanatory variables will not change.
D) time fixed effects are unimportant.
Answer: B
16) Time Fixed Effects regression are useful in dealing with omitted variables
A) even if you only have a cross-section of data available.
B) if these omitted variables are constant across entities but vary over time.
C) when there are more than 100 observations.
D) if these omitted variables are constant across entities but not over time.
Answer: B
17) Indicate for which of the following examples you cannot use Entity and Time Fixed Effects: a regression of
A) OECD unemployment rates on unemployment insurance generosity for the period 1980 -2006 (annual
data).
B) the (log of) earnings on the number of years of education, using the Current Population Survey of 60,000
households for March 2006.
C) the per capita income level in Canadian Provinces on provincial population growth rates, using decade
averages for 1960, 1970, and 1980.
D) the risk premium of 75 stocks on the market premium for the years 1998-2006.
Answer: B
18) Panel data is also called
A) longitudinal data.
B) cross-sectional data.
C) time series data.
D) experimental data.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 250
19) (Requires Appendix material) When the fifth assumption in the Fixed Effects regression (cov (uit, uis Xit, Xis)
= 0 for t s ) is violated, then
A) using heteroskedastic-robust standard errors is not sufficient for correct statistical inference when using
OLS.
B) the OLS estimator does not exist.
C) you can use the simple homoskedasticity-only standard errors calculated in your regression package.
D) you cannot use fixed time effects in your estimation.
Answer: A
20) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982 -1988 for the 48
contiguous U.S. states. To test for the significance of entity fixed effects, you should calculate the F-statistic and
compare it to the critical value from your Fq, distribution, where q equals
A) 48.
B) 54.
C) 7.
D) 47.
Answer: D
21) The main advantage of using panel data over cross sectional data is that it
A) gives you more observations.
B) allows you to analyze behavior across time but not across entities.
C) allows you to control for some types of omitted variables without actually observing them.
D) allows you to look up critical values in the standard normal distribution.
Answer: C
22) One of the following is a regression example for which Entity and Time Fixed Effects could be used: a study of
the effect of
A) minimum wages on teenage employment using annual data from the 48 contiguous states in 2006 .
B) various performance statistics on the (log of) salaries of baseball pitchers in the American League and the
National League in 2005 and 2006.
C) inflation and inflationary expectations on unemployment rates in the United States, using quarterly data
from 1960-2006.
D) drinking alcohol on the GPA of 150 students at your university, controlling for incoming SAT scores.
Answer: B
23) Consider a panel regression of unemployment rates for the G7 countries (United States, Canada, France,
Germany, Italy, United Kingdom, Japan) on a set of explanatory variables for the time period 1980 -2000
(annual data). If you included entity and time fixed effects, you would need to specify the following number of
binary variables:
A) 21.
B) 6.
C) 28.
D) 26.
Answer: D
24) A pattern in the coefficients of the time fixed effects binary variables may reveal the following in a study of the
determinants of state unemployment rates using panel data:
A) macroeconomic effects, which affect all states equally in a given year.
B) attitude differences towards unemployment between states.
C) there is no economic information that can be retrieved from these coefficients.
D) regional effects, which affect all states equally, as long as they are a member of that region.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 251
25) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982 -1988 for the 48
contiguous U.S. states. To test for the significance of time fixed effects, you should calculate the F-statistic and
compare it to the critical value from your Fq, distribution, which equals (at the 5% level)
A) 2.01.
B) 2.10.
C) 2.80.
D) 2.64.
Answer: B
26) Assume that for the T = 2 time periods case, you have estimated a simple regression in changes model and
found a statistically significant positive intercept. This implies
A) a negative mean change in the LHS variable in the absence of a change in the RHS variable since you
subtract the earlier period from the later period
B) that the panel estimation approach is flawed since differencing the data eliminates the constant (intercept)
in a regression
C) a positive mean change in the LHS variable in the absence of a change in the RHS variable
D) that the RHS variable changed between the two subperiods
Answer: C
27) HAC standard errors and clustered standard errors are related as follows:
A) they are the same
B) clustered standard errors are one type of HAC standard error
C) they are the same if the data is differenced
D) clustered standard errors are the square root of HAC standard errors
Answer: B
28) In panel data, the regression error
A) is likely to be correlated over time within an entity
B) should be calculated taking into account heteroskedasticity but not autocorrelation
C) only exists for the case of T > 2
D) fits all of the three descriptions above
Answer: A
29) It is advisable to use clustered standard errors in panel regressions because
A) without clustered standard errors, the OLS estimator is biased
B) hypothesis testing can proceed in a standard way even if there are few entities ( n is small)
C) they are easier to calculate than homoskedasticity-only standard errors
D) the fixed effects estimator is asymptotically normally distributed when n is large
Answer: D
30) If Xit is correlated with Xis for different values of s and t, then
A) Xit is said to be autocorrelated
B) the OLS estimator cannot be computed
C) statistical inference cannot proceed in a standard way even if clustered standard errors are used
D) this is not of practical importance since these correlations are typically weak in applications
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 252
10.2 Essays and Longer Questions
1) A study, published in 1993, used U.S. state panel data to investigate the relationship between minimum wages
and employment of teenagers. The sample period was 1977 to 1989 for all 50 states. The author estimated a
model of the following type:
ln(Eit )= 0 + 1 ln(Mit /Wit ) + 2 D2 i + ... + nD50i + 2 B2 t + ... + TB13t + uit,
where E is the employment to population ratio of teenagers, M is the nominal minimum wage, and W is
average hourly earnings in manufacturing. In addition, other explanatory variables, such as the adult
unemployment rate, the teenage population share, and the teenage enrollment rate in school, were included.
(a) Name some of the factors that might be picked up by time and state fixed effects.
(b) The author decided to use eight regional dummy variables instead of the 49 state dummy variables. What is
the implicit assumption made by the author? Could you test for its validity? How?
(c) The results, using time and region fixed effects only, were as follows:
ln Eit = -0.182 × ln(Mit /Wit ) + ...; R2 = 0.727
(0.036)
Interpret the result briefly.
(d) State minimum wages do not exceed federal minimum wages often. As a result, the author decided to
choose the federal minimum wage in his specification above. How does this change your interpretation? How
is the original equation
ln(Eit )= 0 + 1 ln(Mit /Wit ) + 2 D2 i + ... + nD8 i +
2 B2 t + ... + TB13t + uit,
affected by this?
Answer: (a) Time effects will pick up the effect of omitted variables that are common to all 50 states at a given
point in time. Federal fiscal and monetary variables, exchange rate and U.S. terms of trade movements,
aggregate business cycle developments, etc., are candidates here. State fixed effects will include variables
that are slowly changing over time within a specific state such as attitudes toward employment or labor
force participation, state specific labor market policies, industrial and labor force composition, etc.
(b) The implicit assumption by the author is that the coefficients on the state fixed effects are identical
within a region but differ between regions. Since these coefficients imply linear restrictions, they can be
tested using the F-test.
(c) Consider a ten percent increase in minimum wages, say from $5 to $5.50 with constant average
hourly earnings. This corresponds to a ten percent increase in relative minimum wages. The resulting
decrease in the teenage to population ratio is 1.8 or almost 2 percent. The regression explains roughly 73
percent of the employment to population ratio of teenagers during the period of 1977 to 1989 for the 50
U.S. states.
(d) This choice in effect drops the i subscript from the minimum wage, since there is no variation by
state. The original equation then reads
ln(Eit )= 0 + 1 ln(Mit /Wit ) + 2 D2 i + ... + nD8 i +
2 B2 t + ... +
TB13t + uit.
Furthermore, since the federal minimum wage is constant across the nine regions at a point in time, it is
absorbed by the time effects. The coefficient on the relative minimum wage therefore reflects regional
variations in average hourly earning in manufacturing. The minimum wage only enters indirectly as
changes in the federal minimum wage since there are different relative levels to average hourly earnings
in each region.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 253
2) You want to find the determinants of suicide rates in the United States. To investigate the issue, you collect
state level data for ten years. Your first idea, suggested to you by one of your peers from Southern California, is
that the annual amount of sunshine must be important. Stacking the data and using no fixed effects, you find
no significant relationship between suicide rates and this variable. (This is good news for the people of Seattle.)
However, sorting the suicide rate data from highest to lowest, you notice that those states with the lowest
population density are dominating in the highest suicide rate category. You run another regression, without
fixed effect, and find a highly significant relationship between the two variables. Even adding some economic
variables, such as state per capita income or the state unemployment rate, does not lower the t-statistic for the
population density by much. Adding fixed entity and time effects, however, results in an insignificant
coefficient for population density.
(a) What do you think is the cause for this change in significance? Which fixed effect is primarily responsible?
Does this result imply that population density does not matter?
(b) Speculate as to what happens to the coefficients of the economic variables when the fixed effects are
included. Use this example to make clear what factors entity and time fixed effects pick up.
(c) What other factors might play a role?
Answer: (a) Population density only changes slowly over time, hence state effects will pick up the influence of
this variable. This does not imply that population is of no relevance. However, there are other omitted
variables in this regression, such as religious and cultural attitudes towards suicide, that are also
captured by the state effects, and these may also be correlated with population density.
(b) Since there is sufficient variation of state unemployment rates and state per capita income both over
time and across states, the coefficients on these variables are likely to remain statistically significant.
However, there may be multicollinearity between the two variables, and the standard errors may
therefore be large.
(c) Answers will vary by student. Cultural and institutional factors, such as attitudes towards suicide
and religion, and social services, are frequently mentioned.
3) Two authors published a study in 1992 of the effect of minimum wages on teenage employment using a U.S.
state panel. The paper used annual observations for the years 1977 -1989 and included all 50 states plus the
District of Columbia. The estimated equation is of the following type
(Eit )= 0 + 1 (Mit /Wit ) + 2 D2 i + ... + nD51i + 2 B2 t + ... + TB13t + uit,
where E is the employment to population ratio of teenagers, M is the nominal minimum wage, and W is
average wage in the state. In addition, other explanatory variables, such as the prime -age male unemployment
rate, and the teenage population share were included.
(a) Briefly discuss the advantage of using panel data in this situation rather than pure cross sections or time
series.
(b) Estimating the model by OLS but including only time fixed effects results in the following output
^
^
Eit = 0 - 0.33 × (Mit /Wit ) + 0.35(SHYit) – 1.53 × uramit; R2 = 0.20
(0.08)
(0.28)
(0.13)
where SHY is the proportion of teenagers in the population, and uram is the prime-age male unemployment
rate. Coefficients for the time fixed effects are not reported. Numbers in parenthesis are homoskedasticity -only
standard errors.
Comment on the above results. Are the coefficients statistically significant? Since these are level regressions,
how would you calculate elasticities?
(c) Adding state fixed effects changed the above equation as follows:
^
^
Eit = 0 + 0.07 × (Mit /Wit ) – 0.19 × (SHYit) – 0.54 × uramit; R2 = 0.69
(0.10)
(0.22)
(0.11)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 254
Compare the two results. Why would the inclusion of state fixed effects change the coefficients in this way?
(d) The significance of each coefficient decreased, yet R2 increased. How is that possible? What does this result
tell you about testing the hypothesis that all of the state fixed effects can be restricted to have the same
coefficient? How would you test for such a hypothesis?
Answer: (a) There are likely to be omitted variables in the above regression. One way to deal with some of these is
to introduce state and time effects. State effects will capture the influence of omitted variables that are
state specific and do not vary over time, while time effects capture those of country wide variables that
are common to all states at a point in time. Furthermore, there are more observations when using panel
data, resulting in more variation.
(b) There is negative relationship between minimum wages and the employment to population ratio.
Increases in the share of teenagers in the population result in a higher employment to population ratio,
and increases in the prime-age male unemployment rate lower the employment to population ratio. 20
percent of employment to population of teenagers variation is explained by the above regression. The
relative minimum wage and the prime-age male unemployment rate are significant using a 1%
significance level, while the proportion of teenagers in the population is not. Elasticities vary with levels
here. One possibility is to report elasticities at sample means.
(c) The parameter of interest here is the coefficient on the relative minimum wage. While it was highly
significant in the previous regression, it now has changed signs and is statistically insignificant. The
explanatory power of the equation has increased substantially. The size of the other two coefficients has
also decreased. The results suggest that omitted variables, which are now captured by state fixed effects,
were correlated with the regressors and caused omitted variable bias.
(d) The influence of the state effects is large. These are bound to be statistically significant and the
hypothesis to restrict these coefficients to zero is bound to fail. Since these are linear hypothesis that are
supposed to hold simultaneously, an F-test is appropriate here.
4) You learned in intermediate macroeconomics that certain macroeconomic growth models predict conditional
convergence or a catch up effect in per capita GDP between the countries of the world. That is, countries which
are further behind initially in per-capita GDP will grow faster than the leader. You gather data from the Penn
World Tables to test this theory.
(a) By limiting your sample to 24 OECD countries, you hope to have a more homogeneous set of countries in
your sample, i.e., countries that are not too different with respect to their institutions. To simplify matters, you
decide to only test for unconditional convergence. In that case, the laggards catch up even without taking into
account differences in some of the driving variables. Your scatter plot and regression for the time period
1975-1989 are as follows:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 255
g8975 = 0.024 – 0.005 PCGDP75_US; R2 = 0.025, SER = 0.006
(0.06) (0.008)
where g8975 is the average annual growth rate of per capita GDP from 1975-1989, and PCGDP75_US is per
capita GDP relative to the United States in 1975. Numbers in parenthesis are heteroskedasticity -robust
standard errors.
Interpret the results. Is there indication of unconditional convergence? What critical value did you use?
(b) Although you are quite discouraged by the result, you think that it might be due to the specific time period
used. During this period, there were two OPEC oil price shocks with varying degrees of exposure for the
OECD countries. You therefore repeat the exercise for the period 1960 -1974, with the following results:
g7460 = 0.061 – 0.043 PCGDP60_US; R2 = 0.613, SER = 0.008
(0.004) (0.007)
where g7460 is the average annual growth rate of per capita GDP from 1960-1974, and PCGDP60_US is per
capita GDP relative to the United States in 1960.
Compare this regression to the previous one.
(c) You decide to run one more regression in differences. The dependent variable is now the change in the
growth rate of per capita GDP from 1960-1974 to 1975-1989 (diffg) and the regressor the difference in the initial
conditions (diffinit). This produces the following graph and regression:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 256
diffg = -0.006 – 0.096 × diffinit; R2 = 0.468; SER = 0.009
(0.03) (0.021)
Interpret these results. Explain what has happened to unobservable omitted variables that are constant over
time. Suggest what some of these variables might be.
(d) Given that there are only two time periods, what other methods could you have employed to generate the
identical results? Why do you think that the slope coefficient in this regression is significant given the results
over the sub-periods?
Answer: (a) Although the slope coefficient is negative, thereby indicating unconditional convergence, the
t-statistic does not exceed the critical value. However, using the standard normal distribution here is not
really justified here since there are only 24 observations.
(b) The explanatory power of the regression is much higher and there is a larger t-statistic for the slope
coefficient. If a standard normal distribution could be used here, then the absolute value of the t-statistic
would easily exceed the critical value of 1.64. This suggests unconditional convergence over the sample
period. However, the same comment regarding the sample size as in (a) applies here.
(c) The slope coefficient suggests that countries which are further behind initially with respect to the
United States, will grow relatively faster. Almost 50 percent of the relative growth difference variation is
explained by the regression. Decreasing the initial per capita income ratio to the United States by 10
percentage points will decrease the relative growth performance by 1 percentage point. Omitted
variables that remain constant over time are picked up by focusing on changes in the variables. Some of
these may be cultural and institutional variables such as the level of educational attainment, saving rates,
population growth rates, independence of the central bank, etc.
(d) Using either a fixed effects regression or entity-demeaned OLS would have resulted in identical
estimates. In general, it is possible for included coefficients to be statistically insignificant as a result of
omitted variable bias. This result depends, among other factors, on the relationship between the omitted
variables and the included variables. Using the differencing method has eliminated at least some of the
omitted variables.
5) A researcher investigating the determinants of crime in the United Kingdom has data for 42 police regions over
22 years. She estimates by OLS the following regression
ln(cmrt)it = i + t + 1 unrtmit + 2 proythit + 3 ln(pp)it + uit; i = 1,..., t = 1,..., 22
where cmrt is the crime rate per head of population, unrtm is the unemployment rate of males, proyth is the
proportion of youths, pp is the probability of punishment measured as (number of convictions)/(number of
Stock/Watson 2e -- CVC2 8/23/06 -- Page 257
crimes reported). and are area and year fixed effects, where i equals one for area i and is zero otherwise
for all i, and t is one in year t and zero for all other years for t = 2, …, 22. 1 is not included.
(a) What is the purpose of excluding 1 ? What are the terms and likely to pick up? Discuss the advantages
of using panel data for this type of investigation.
(b) Estimation by OLS using heteroskedasticity and autocorrelation -consistent standard errors results in the
following output, where the coefficients of the fixed effects are not reported:
ln(cmrt)it = 0.063 × unrtmit + 3.739 × proythit – 0.588 × ln(pp)it ; R2 = 0.904
(0.109)
(0.179)
(0.024)
Comment on the results. In particular, what is the effect of a ten percent increase in the probability of
punishment?
(c) To test for the relevance of the area fixed effects, your restrict the regression by dropping all entity fixed
effects and add single constant is added. The relevant F-statistic is 135.28. What are the degrees of freedom?
What is the critical value from your F table?
(d) Although the test rejects the hypothesis of eliminating the fixed effects from the regression, you want to
analyze what happens to the coefficients and their standard errors when the equation is re -estimated without
^
fixed effects. In the resulting regression, 2 and
roughly double. However,
^
^
3 do not change by much, although their standard errors
1 is now 1.340 with a standard error of 0.234. Why do you think that is?
Answer: (a) Since there is no constant in addition to the entity and time fixed effects, setting
t to one in year t
and zero for all other years for t = 1, …, 22 would result in perfect multicollinearity. picks up omitted
variables that are specific to police regions and do not vary over time. picks up effects that are
common to all police regions in a given year. Attitudes toward crime may vary between rural regions
and metropolitan areas. These would be hard to capture through measurable variables. Common
macroeconomic shocks that affect all regions equally will be captured by the time fixed effects. Although
some of these variables could be explicitly introduced, the list of possible variables is long. By
introducing time fixed effects, the effect is captured all in one variable.
(b) A higher male unemployment rate and a higher proportion of youths increase the crime rate, while a
higher probability of punishment decreases the crime rate. The coefficients on the probability of
punishment and the proportion of youths is statistically significant, while the male unemployment rate
is not. The regression explains roughly 90 percent of the variation in crime rates in the sample. A ten
percent increase in the number of convictions over the number of crimes reported decreases the crime
rate by roughly six percent.
(c) The coefficients of the three regressors other than the entity coefficients would have been unaffected,
had there been a constant in the regression and (n-1) police region specific entity variables. In this case,
the entity coefficients on the police regions would have indicated deviations from the constant for the
first police region. Hence there are 41 restrictions imposed by eliminating the entity fixed effects and
adding a constant. Since there are over 100 observations (900 degrees of freedom), the critical value for
F41,
F30, = 1.70 at the 1% level. Hence the restrictions are rejected.
(d) This result would make the male unemployment rate coefficient significant. It suggests that male
unemployment rates change slowly over the years in a given police district and that this effect is picked
up by the entity fixed effects. Of course, there are other slowly changing variables, such as attitudes
towards crime, that are captured by these fixed effects.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 258
6) You want to investigate the relationship between cumulative GPA scores at graduation and incoming SAT
scores of students. For this purpose, you have collected data from a balanced panel of 120 undergraduate
colleges and universities in the United States over a ten year period. Discuss some of the entity fixed effects
which you potentially capture by allowing for a binary variable for each of the colleges.
Answer: Students will come up with various possible entity fixed effects. These should include differences
between educational institutions that have
•
•
made it a policy not to fight grade inflation
a different degree of selectivity (if you admit only students with an SAT score of 2400,
then there will be no relationship)
gender or religion specific requirements
an admission process that is need-blind
large varsity sports programs
•
•
•
and so forth.
7) You want to study the relationship between weight and height of young children (4 th grade to 7th grade). You
collect data for more than 400 students and track the progress of these students over the following four years,
where you end up with a balanced panel of 400 students (you discard the observations for the students who
moved away). Discuss some of the entity fixed effects which you potentially capture by allowing for a binary
variable for each of the students. Do you expect significant time fixed effects if you allowed for them?
Answer: Students will come up with various possible entity fixed effects. These will reflect differences between
students potentially depending on
•
•
•
•
•
•
gender
ethnicity
degree of participation in exercises/athletic programs
growth spurts during these years
nutrition
genes
and so forth. It is hard to think of time fixed effects. Potentially there could be an effect if all students
went to a different school in 7th grade (e.g. middle school) and this school had a less/more healthy
lunch diet.
8) You first encountered growth regression in your intermediate macroeconomics course (“beta -convergence
regressions”), that is, conditionally on some initial condition in per capita income, different authors tried to find
the determinants of growth. Since growth is a long-run phenomenon, various studies collected data for a panel
of numerous countries using 10-year averages, over a time period stretching from 1960 to 2005. For example, a
balanced panel might consist of 50 or so odd countries for the time periods 1960 -1970, 1971-1980, … ,
2000-2005. Instead of using two-way fixed effects (entity fixed effects and time fixed) authors often only
employed time fixed effects. Why do you think that is? What sort of information would be lost if these authors
employed entity fixed effects as well?
Answer: Time fixed effects will eliminate common growth phenomenon experienced by all countries during the
same decade (say). These could include productivity slow -downs due to the oil crisis of the ‘70s, effects
of the Great Moderation of the ‘90s, etc. However, most of these studies were interested in determining
the effect of institutional differences between countries. These effects, such as the degree of democracy,
law and order, openness of the economy, size of government, civil wars, geography, religion, etc., are
typically slowly changing, and by including entity fixed effects, you would lose the effects you are
interested in studying.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 259
10.3 Mathematical and Graphical Problems
1) Your textbook suggests an “entity-demeaned” procedure to avoid having to specify a potentially large number
of binary variables. While it is somewhat tedious to specify a binary variable for each entity, this can still be
handled relatively easily in the case of the 48 contiguous states. Give a few examples where it might be close to
impossible to implement specifying such large number of entity binary variables. The idea of the
“entity-demeaned” procedure was introduced as a computationally convenient and simplifying procedure.
Since there are also time fixed effects, why is there no discussion of using a “time-demeaned” procedure?
Using the following equation
Yit = 0 + 1 Xit + 3 St + uit,
Show how 1 can be estimated by the OLS regression using “time -demeaned” variables.
Answer: Answers will vary by student with regard to the examples. A panel containing tens of thousands of
individuals would make it impractical to specify entity fixed effects. The same would hold for a large
number of firms. Regression software typically does not estimate panel regressions using
“time-demeaned” variables, since there are not that many observations across time. In the textbook
example, there were seven years of data for the 48 contiguous U.S. states. There maybe observations for
10,000 individuals over a few years. Still, in principle, you could use a “time -demeaning” procedure.
Taking averages on both sides of the above equation results in
Yt = 1 Xt +
where Yt =
1
n
n
i=1
Yit, Xt = 1
n
n
i=1
0 + 3 St + ut
1
Xit, and ut =
n
n
uit .
i=1
Subtracting the averaged equation from the original one yields
~
~
~
Yit - Yt = 1 (Xit - Xt) + (uit - ut) or Yit = 1 Xit + uit ,
~
~
~
where Yit = Yit -Yi, and Xit and uit are defined similarly. The “time-demeaned” regression can then be
estimated by OLS.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 260
2) Consider the case of time fixed effects only, i.e.,
Yit = 0 + 1 Xit + 3 St + uit,
First replace 0 + 3 St with t. Next show the relationship between the t and t in the following equation
Yit = 0 + 1 Xit + 2 B2 t + ... + TBTt + uit,
where each of the binary variables B2, …, BT indicates a different time period. Explain in words why the two
equations are the same. Finally show why there is perfect multicollinearity if you add another binary variable
B1. What is the intuition behind the fact that the OLS estimator does not exist in this case? Would that also be
the case if you dropped the intercept?
Answer: Yit = 1 Xit + t + uit. The relationship is 1 = 0 , and t = 0 + t for t 2. Consider time period t, then
the population regression line for that period is t + 1 Xit, with 1 being the same for all time periods,
but the intercept varying from time period to time period. The variation of the intercept comes from
factors which are common to all entities in a given time period, i.e., the St. The same role is played by 2 ,
… T, since B2 t …BTt are only different from zero during one period. There is perfect multicollinearity if
one of the regressors can be expressed as a linear combination of the other regressors. Define B0 t as a
variable that equals one for all period. In that case, the previous regression can be rewritten as
Yit = 0 B0 t + 1 Xit + 2 B2 t + ... + TBTt + uit.
Adding B1 with a coefficient here results in
Yit = 0 B0 t + 1 Xit + 1 B1 t + 2 B2 t ... + TBTt + uit..
But B0 t = B1 t + B2 t + ... + BTt, and hence there is perfect multicollinearity. Intuitively, whenever any one
of the binary variable equals one in a given period, so does the constant. Hence the coefficient of that
variable cannot pick up a separate effect from the data. Dropping the intercept from the regression
eliminates the problem.
3) Consider the following panel data regression with a single explanatory variable
Yit = 0 + 1 Xit + uit.
In each of the examples below, you will be adding entity and time fixed effects. Indicate the total number of
coefficients that need to be estimated.
(a) The effect of beer taxes on the fatality rate, annual data, 1982 -1988, nine U.S. regions (New England, Pacific,
Mid-Atlantic, East North Central, etc.).
(b) The effect of the minimum wage on teenage employment, annual data, 1963 -2000, five Canadian Regions
(Atlantic Provinces, Quebec, Ontario, Prairies, British Columbia).
(c) The effect of savings rates on per capita income, data for three decades (1960 -1969, 1970-1979, 1980-1989;
one observation per decade), 104 countries of the world.
(d) The effect of pitching quality in baseball (as measured by the Team ERA) on the winning percentage,
annual data, 1998-1999 season, 1999-2000 season, 30 teams.
Answer: (a) 16 coefficients (6 time fixed effects, 8 entity fixed effects, intercept, slope).
(b) 43 coefficients (37 time fixed effects, 5 entity fixed effects, intercept, slope).
(c) 107 coefficients (3 time fixed effects, 103 entity fixed effects, intercept, slope).
(d) 32 coefficients (1 time fixed effect, 29 entity fixed effects, intercept, slope).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 261
4) Your textbook modifies the four assumptions for the multiple regression model by adding a new assumption.
This represents an extension of the cross-sectional data case, where errors are uncorrelated across entities. The
new assumption requires the errors to be uncorrelated across time, conditional on the regressors as well
(cov(uit, uis Xit, Xis) = 0 for t s.).
(a) Discuss why there might be correlation over time in the errors when you use U.S. state panel data. Does this
mean that you should not use OLS as an estimator?
(b) Now consider pairs of adjacent states such as Indiana and Michigan, Texas and Arkansas, New York and
Connecticut, etc. Is it likely that the fifth assumption will hold here, even though the “contemporaneous” errors
are correlated? If not, can you still use OLS for estimation?
Answer: (a) The error term may contain omitted variables. If these change slowly from one period to the next,
then the error term will be correlated over time. In that case (cov(uit, uis Xit, Xis) = 0 for t s will be
violated. The OLS estimator is still unbiased, but valid statistical inference cannot be conducted, even
when using heteroskedasticity-robust standard errors. However, heteroskedasticity- and
autocorrelation- consistent standard errors can be used in this situation.
(b) The fifth assumption deals with observations that do not occur during the same time period. It does
not address the problems of errors of one entity being affected by errors in another entity during the
same period. While potentially there are more efficient estimators available in such a situation, OLS can
still be used for estimation.
5) In Sports Economics, production functions are often estimated by relating the winning percentage of teams ( Y)
to inputs indicating performance in certain aspects of the game. However, this omits the quality of
management. Assume that you could measure the quality of pitching and hitting by a single index L, and that
managerial ability is represented by M, which is assumed to be constant over time. The production function
would then be specified as follows:
Yit = 0 + 1 Lit + 2 Mi + uit
where i is an index for the baseball team, and t indexes time and all variables are in logs.
(a) Assume that managerial ability is unobservable but is positively related, in a linear way, to L. Explain why
^
the OLS estimator 1 is inconsistent in the case of a single cross-section, i.e., if you attempt to estimate the
above regression for a single year. Do you expect this coefficient to over- or under-estimate 1 ?
(b) If you had data for two years, indicate the transformation, which allows you to obtain a consistent estimator
for 1 .
Answer: (a) Regressing Y on L alone will result in omitted variable bias. An increase in the pitching and hitting
^
index will increase managerial ability, which in return increases the winning percentage. Hence 1 will
be expected to overestimate the effect of pitching and hitting on the winning percentage. Said differently,
OLS will attribute more to pitching and hitting quality and it deserves.
(b) Since managerial ability is assumed to be constant over time, then differencing the data over the two
time-periods will eliminate this effect for all teams. This can be shown as follows:
Yi2 = 0 + 1 Li2 + 2 Mi + ui2
Yi1 = 0 + 1 Li1 + 2 Mi + ui1
Subtracting the second equation from the first results in
Yi2 - Yi1 = 1 (Li2 - Li1 ) + ui2 - ui1 .
Alternatively, the binary variable specification or the “entity-demeaned” specification could have been
used with identical estimation results.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 262
6) A study attempts to investigate the role of the various determinants of regional Canadian unemployment rates
in order to get a better picture of Canadian aggregate unemployment rate behavior. The annual data
(1967-1991) is for five regions (Atlantic region, Quebec, Ontario, Prairies, and British Columbia), and four
age-gender groups (female and male, adult and young). Focusing on young females, the authors find
significant effects for the following variables: the regional relative minimum wage rate (minimum wages
divided by average hourly earnings), the regional share of youth in the labor force, the regional share of adult
females in the labor force, United States activity shocks (deviations of United States GDP from trend), an
indicator of the degree of monetary tightness in Canada, regional union density, and a regional index of
unemployment insurance generosity. Explain why the authors only used region fixed effects. How would their
specification have to change if they also employed time fixed effects?
Answer: Since the study used Canada-wide effects (United States activity shocks, and monetary tightness), these
are identical for all regions at a point in time. Using time fixed effects in addition to these two variables
would have generated perfect multicollinearity among the regressors, and hence the OLS estimator
would not exist. An alternative specification would include time fixed effects, but eliminate the two
variables which are constant across all regions at a given point in time.
7) (Requires Matrix Algebra) Consider the time and entity fixed effect model with a single explanatory variable
Yit = 0 + 1 Xit + 2 D2 i + ... + nDni + 2 B2 t + ... + TBTt + uit,
For the case of n = 4 and T = 3, write this model in the form Y = X + U, where, in general,
X1
Y1
Y=
Y2
Yn
, U=
u1
u2
1 X11 ... Xk1
X2
X
X
, X = 1 12 ... k1 =
, and
un
0
= 1
1 X1n ... Xkn
k
Xn
How would the X matrix change if you added two binary variables, D1 and B1? Demonstrate that in this case
the columns of the X matrix are not independent. Finally show that elimination of one of the two variables is
^
not sufficient to get rid of the multicollinearity problem. In terms of the OLS estimator, = (X X)-1 X Y, why
does perfect multicollinearity create a problem?
Answer: For the case of n = 4 and T = 3, the general model would look as follows:
Y11
Y12
Y13
Y21
Y22
Y23
1 X11 0 0 0 0 0
1 X12 0 0 0 0 0
u11
u12
1 X13 0 0 0 1 0
1 X21 1 0 0 0 0
u13
u21
1 X22 1 0 0 1 0
1 X23 1 0 0 0 1
Y31 = 1 X31 0 1 0
Y32
1 X32 0 1 0
Y33
1 X33 0 1 0
Y41
1 X41 0 0 1
Y42
Y43
0
1
2
u22
u23
1 0
3 + u31
u32
4
0 1
2
0 0
3
0 0
1 X42 0 0 1 1 0
1 X43 0 0 1 0 1
u33
u41
u42
u43
Adding the two binary variable would change the X matrix in this way:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 263
1 X11 0 0 0 0 0 1 1
1 X12 0 0 0 1 0 1 0
1 X13 0 0 0 0 1 1 0
1 X21 1 0 0 0 0 0 1
1 X22 1 0 0 1 0 0 0
1 X23 1 0 0 0 1 0 0
X=
1 X31 0 1 0 0 0 0 1
1 X32 0 1 0 1 0 0 0
1 X33 0 1 0 0 1 0 0
1 X41 0 0 1 0 0 0 1
1 X42 0 0 1 1 0 0 0
1 X43 0 0 1 0 1 0 0
Adding columns 6, 7, and 9 results in column 1. Also adding columns 3, 4 , 5, and 8 results in column 1.
Hence the columns are not linearly independent and there is perfect multicollinearity among the
columns of the matrix. Eliminating column 9, say, is not sufficient to get rid of this problem, since adding
columns 3, 4, 5, and 8 still equals column 1. In case of perfect multicollinearity, the X matrix will not have
full rank, and hence (X X)-1 will also not have full rank (it is singular). In this case, (X X)-1 cannot be
inverted, and hence the OLS estimator does not exist.
8) Consider the time and entity fixed effect model with a single explanatory variable
Yit = 0 + 1 Xit + 2 D2 i + ... + nDni + 2 B2 t + ... + TBTt + uit,
Assume that you had estimated the above equation by OLS. Typically the coefficients for the entity and time
binary variables are not reported. Can you think of situations where the pattern of these coefficients might be
of interest? What could you do, for example, if you had a strong theoretical justification for believing that a few
macroeconomic variables had an effect on Yit ?
Answer: The coefficients pick up the effects of omitted variables that are common to all entities at a point in time
(time fixed effects), or that are constant across time for entities (entity fixed effect). If data is available on
slowly changing variables across time, say population density or average educational attainment by U.S.
state, or on macroeconomic variables, then you could perform a regression of the binary variable
coefficients on these variables to determine the degree of correlation. Obviously, the correlation will be
less than perfect, and unless these variables bear coefficients of interest, then there is little to be gained
from these auxiliary regressions.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 264
9) Empirical studies of economic growth are flawed because many of the truly important underlying
determinants, such as culture and institutions, are very hard to measure. Discuss this statement paying
particular attention to simple cross-section data and panel data models. Use equations whenever possible to
underscore your argument.
Answer: Although some cultural and institutional variables, such as corruption, black market activity, central
bank independence, trust, etc., are hard to measure, authors have developed such series for the countries
of the world. Still, either these variables are measure with error or not all cultural and institutional
aspects are bound to be captured. Hence you would expect omitted variable bias to be present in
cross-sectional studies. However, if you could argue that these effects are constant across time or at least
slowly changing, then introducing country fixed effects in panel studies goes some way to alleviate the
omitted variable problem. Similarly by using time fixed effects, common world business cycle effects can
be largely eliminated. For an empirical study of economic growth using U.S. states, time fixed effects
would eliminate common effects of monetary policy and inflation.
The above argument can be made using equations along the theoretical arguments presented in sections
8.3 and 8.4 of the textbook.
10) Give at least three examples from macroeconomics and five from microeconomics that involve specified
equations in a panel data analysis framework. Indicate in each case what the role of the entity and time fixed
effects in terms of omitted variables might be.
Answer: Answers will vary by student. Given the textbook example, you can expect a study of fatality rates and
beer taxes to appear. Other examples mentioned may be minimum wage studies using data from U.S.
states or Canadian provinces, panel data in earnings studies, empirical studies of economic growth
across the countries of the world or regions within a country, determinants of unemployment rates using
data from geographical units (countries, regions, states), degree of democratization of the countries of
the world, etc. Students should point out in the various examples how entity and time fixed effects pick
up variables that are constant across entities at a point in time, or constant over time for specific entities.
For geographical units, these typically involve cultural and institutional factors, and common
macroeconomic effects.
11) Your textbook specifies a simple regression problem for two time periods for the years 1982 and 1988 as
follows:
FatalityRatei,1982= 0 + 1 BeerTaxi,1982 + ui,1982
FatalityRatei,1988= 0 + 1 BeerTaxi,1988 + ui,1988
After subtracting the first equation from the second equation, the authors estimate the model and find
a negative intercept.
a.
Show how you would have to modify the two equations to allow for the presence of an intercept in
the differenced model.
b.
What would the relative magnitude of the modified model have to be for you to find a negative
intercept?
Answer: a. FatalityRatei,1982= 0 + 1 BeerTaxi,1982 + ui,1982
FatalityRatei,1988= 0 + 1 BeerTaxi,1988 + ui,1988
b.
0
<
0
Stock/Watson 2e -- CVC2 8/23/06 -- Page 265
12) Your textbook reports the following result from an two-way fixed effects (entity and time fixed effects)
regression model:
FatalityRate = -0.66 BeerTax + StateFixedEffects + TimeFixedEffects
(0.36)
Where the number in parenthesis is the heteroskedasticity- and autocorrelation-consistent (HAC) standard
error.
a.
Calculate the t-statistic. Can you reject the null hypothesis that the slope coefficient is zero in the
population, using a two-sided test and a 5% significance level?
b.
Given that economic theory suggests that the population slope is negative under the alternative
hypothesis, is it possible to use a one-sided test here? In that case, does your conclusion change?
c.
Using only heteroskedasticity-robust standard errors, but not HAC standard errors, the value in
parenthesis becomes 0.25. Repeat the calculations in (a) and report your decision based on a two -sided
test.
d. Since the coefficient becomes more statistically significant in (d), should this influence your choice of
standard errors? Why or why not?
Answer: a. t =
-0.64
= -1.78 < -1.96. Hence you cannot reject the null hypothesis that the coefficient is zero in the
0.36
population.
b. The beer tax represents part of the cost (price) of alcohol consumption and an increase in price should
reduce the demand for alcohol. Hence economic theory suggests a negative price coefficient. It therefore
seems reasonable to use a one-sided test. Since the critical value is -1.64 in that case, you can reject the
null hypothesis at the 5% significance level.
c. The t-statistic is now -2.56 and you can reject the null hypothesis at the 5% level, and almost at the 1%
level.
d. It is better to use the clustered standard errors, since these are valid whether or not there is
heteroskedasticity, autocorrelation, or both. Using heteroskedasticity -robust standard errors only will
result in invalid statistical inference, since they were derived under the assumption of no serial
correlation in the error term.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 266
Chapter 11 Regression with a Binary Dependent Variable
11.1 Multiple Choice
1) The binary dependent variable model is an example of a
A) regression model, which has as a regressor, among others, a binary variable.
B) model that cannot be estimated by OLS.
C) limited dependent variable model.
D) model where the left-hand variable is measured in base 2.
Answer: C
2) (Requires Appendix material) The following are examples of limited dependent variables, with the exception of
A) binary dependent variable.
B) log-log specification.
C) truncated regression model.
D) discrete choice model.
Answer: B
3) In the binary dependent variable model, a predicted value of 0.6 means that
A) the most likely value the dependent variable will take on is 60 percent.
B) given the values for the explanatory variables, there is a 60 percent probability that the dependent
variable will equal one.
C) the model makes little sense, since the dependent variable can only be 0 or 1.
D) given the values for the explanatory variables, there is a 40 percent probability that the dependent
variable will equal one.
Answer: B
4) E(Y X1 ,..., Xk) = Pr(Y = 1 X1 ,..., Xk) means that
A) for a binary variable model, the predicted value from the population regression is the probability that
Y=1, given X.
B) dividing Y by the X’s is the same as the probability of Y being the inverse of the sum of the X’s.
C) the exponential of Y is the same as the probability of Y happening.
D) you are pretty certain that Y takes on a value of 1 given the X’s.
Answer: A
5) The linear probability model is
A) the application of the multiple regression model with a continuous left-hand side variable and a binary
variable as at least one of the regressors.
B) an example of probit estimation.
C) another word for logit estimation.
D) the application of the linear multiple regression model to a binary dependent variable.
Answer: D
6) In the linear probability model, the interpretation of the slope coefficient is
A) the change in odds associated with a unit change in X, holding other regressors constant.
B) not all that meaningful since the dependent variable is either 0 or 1.
C) the change in probability that Y=1 associated with a unit change in X, holding others regressors constant.
D) the response in the dependent variable to a percentage change in the regressor.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 267
7) The following tools from multiple regression analysis carry over in a meaningful manner to the linear
probability model, with the exception of the
A) F-statistic.
B) significance test using the t-statistic.
C) 95% confidence interval using ± 1.96 times the standard error.
D) regression R2 .
Answer: D
8) (Requires material from Section 11.3 – possibly skipped) For the measure of fit in your regression model with a
binary dependent variable, you can meaningfully use the
A) regression R2 .
B) size of the regression coefficients.
C) pseudo R2 .
D) standard error of the regression.
Answer: C
9) The major flaw of the linear probability model is that
A) the actuals can only be 0 and 1, but the predicted are almost always different from that.
B) the regression R2 cannot be used as a measure of fit.
C) people do not always make clear-cut decisions.
D) the predicted values can lie above 1 and below 0.
Answer: D
10) The probit model
A) is the same as the logit model.
B) always gives the same fit for the predicted values as the linear probability model for values between 0.1
and 0.9.
C) forces the predicted values to lie between 0 and 1.
D) should not be used since it is too complicated.
Answer: C
11) The logit model derives its name from
A) the logarithmic model.
B) the probit model.
C) the logistic function.
D) the tobit model.
Answer: C
12) In the probit model Pr(Y = 1 =
( 0 + 1 X),
A) is not defined for (0).
B) is the standard normal cumulative distribution function.
C) is set to 1.96.
D) can be computed from the standard normal density function.
Answer: B
13) In the expression Pr(Y = 1 =
( 0 + 1 X),
A) ( 0 + 1 X) plays the role of z in the cumulative standard normal distribution function.
B) 1 cannot be negative since probabilities have to lie between 0 and 1.
C) 0 cannot be negative since probabilities have to lie between 0 and 1.
D) min ( 0 + 1 X) > 0 since probabilities have to lie between 0 and 1.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 268
14) In the probit model Pr(Y = 1 X1 , X2 ,..., Xk) = ( 0 + 1 X1 + x X2 + ... + kXk),
A) the ’s do not have a simple interpretation.
B) the slopes tell you the effect of a unit increase in X on the probability of Y.
C) 0 cannot be negative since probabilities have to lie between 0 and 1.
D) 0 is the probability of observing Y when all X’s are 0
Answer: A
15) In the expression Pr(deny = 1 P/I Ratio, black) =
P/I ratio from 0.3 to 0.4 for a white person
A) is 0.274 percentage points.
B) is 6.1 percentage points.
(–2.26 + 2.74P/I ratio + 0.71black), the effect of increasing the
C) should not be interpreted without knowledge of the regression R2 .
D) is 2.74 percentage points.
Answer: B
16) The maximum likelihood estimation method produces, in general, all of the following desirable properties with
the exception of
A) efficiency.
B) consistency.
C) normally distributed estimators in large samples.
D) unbiasedness in small samples.
Answer: D
17) The logit model can be estimated and yields consistent estimates if you are using
A) OLS estimation.
B) maximum likelihood estimation.
C) differences in means between those individuals with a dependent variable equal to one and those with a
dependent variable equal to zero.
D) the linear probability model.
Answer: B
18) When having a choice of which estimator to use with a binary dependent variable, use
A) probit or logit depending on which method is easiest to use in the software package at hand.
B) probit for extreme values of X and the linear probability model for values in between.
C) OLS (linear probability model) since it is easier to interpret.
D) the estimation method which results in estimates closest to your prior expectations.
Answer: A
19) Nonlinear least squares
A) solves the minimization of the sum of squared predictive mistakes through sophisticated mathematical
routines, essentially by trial and error methods.
B) should always be used when you have nonlinear equations.
C) gives you the same results as maximum likelihood estimation.
D) is another name for sophisticated least squares.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 269
20) (Requires Advanced material) Only one of the following models can be estimated by OLS :
A) Y = AK L + u.
B) Pr(Y = 1 X) = ( 0 + 1 X)
C) Pr(Y = 1 X) = F( 0 + 1 X) =
1
.
-( 0 + 1 X)
1+ e
D) Y = AK L u.
Answer: D
21) (Requires Advanced material) Nonlinear least squares estimators in general are not
A) consistent.
B) normally distributed in large samples.
C) efficient.
D) used in econometrics.
Answer: C
22) (Requires Advanced material) Maximum likelihood estimation yields the values of the coefficients that
A) minimize the sum of squared prediction errors.
B) maximize the likelihood function.
C) come from a probability distribution and hence have to be positive.
D) are typically larger than those from OLS estimation.
Answer: B
23) To measure the fit of the probit model, you should:
A) use the regression R2 .
B) plot the predicted values and see how closely they match the actuals.
C) use the log of the likelihood function and compare it to the value of the likelihood function.
D) use the fraction correctly predicted or the pseudo R2 .
Answer: D
24) When estimating probit and logit models,
A) the t-statistic should still be used for testing a single restriction.
B) you cannot have binary variables as explanatory variables as well.
C) F-statistics should not be used, since the models are nonlinear.
D) it is no longer true that the R2 < R2 .
Answer: A
25) The following problems could be analyzed using probit and logit estimation with the exception of whether or
not
A) a college student decides to study abroad for one semester.
B) being a female has an effect on earnings.
C) a college student will attend a certain college after being accepted.
D) applicants will default on a loan.
Answer: B
26) In the probit regression, the coefficient 1 indicates
A) the change in the probability of Y = 1 given a unit change in X
B) the change in the probability of Y = 1 given a percent change in X
C) the change in the z- value associated with a unit change in X
D) none of the above
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 270
27) Your textbook plots the estimated regression function produced by the probit regression of deny on P/I ratio.
The estimated probit regression function has a stretched “S” shape given that the coefficient on the P/I ratio is
positive. Consider a probit regression function with a negative coefficient. The shape would
A) resemble an inverted “S” shape (for low values of X, the predicted probability of Y would approach 1)
B) not exist since probabilities cannot be negative
C) remain the “S” shape as with a positive slope coefficient
D) would have to be estimated with a logit function
Answer: A
28) Probit coefficients are typically estimated using
A) the OLS method
B) the method of maximum likelihood
C) non-linear least squares (NLLS)
D) by transforming the estimates from the linear probability model
Answer: B
29) F-statistics computed using maximum likelihood estimators
A) cannot be used to test joint hypothesis
B) are not meaningful since the entire regression R2 concept is hard to apply in this situation
C) do not follow the standard F distribution
D) can be used to test joint hypothesis
Answer: D
30) When testing joint hypothesis, you can use
A) the F- statistic
B) the chi-squared statistic
C) either the F-statistic or the chi-square statistic
D) none of the above
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 271
11.2 Essays and Longer Questions
1) Your task is to model students’ choice for taking an additional economics course after the first principles
course. Describe how to formulate a model based on data for a large sample of students. Outline several
estimation methods and their relative advantage over other methods in tackling this problem. How would you
go about interpreting the resulting output? What summary statistics should be included?
Answer: Answers will vary by student. This is an example of a binary dependent variable problem with multiple
regressors. The variable of interest here is the grade received in the first principles course. Students may
talk about grade-inflating departments luring students away by giving higher grades, or by watering
down the value of the signal contained in a grade by compressing the grade distribution towards the
upper end. Other control variables mentioned by students may be the Math SAT score, a binary variable
for business majors, etc.
Students should mention the linear probability model, the probit model, and the logit model, and
discuss their relative advantages and disadvantages. Several points mentioned in the textbook just
before section 11.3 should be brought up, such as the ease with which the linear probability model can
be estimated and interpreted, although its functional form cannot capture the nature of the problem.
However, in the case where there are few extreme values, the model may be used as an adequate
approximation. There is little to choose between logit and probit, and the fit between both is extremely
close with the exception to the tails.
The answer to the interpretation question should focus on the idea that all three models try to predict a
probability given the attributes of the subject, i.e., E(Y X) = Pr(Y = 1 X). Students are expected to
mention that the regression R2 is of no use here, given the nature of the dependent variable, and that a
pseudo R2 and the fraction correctly predicted is available as an alternative.
2) The Report of the Presidential Commission on the Space Shuttle Challenger Accident in 1986 shows a plot of the
calculated joint temperature in Fahrenheit and the number of O-rings that had some thermal distress. You
collect the data for the seven flights for which thermal distress was identified before the fatal flight and
produce the accompanying plot.
(a) Do you see any relationship between the temperature and the number of O-ring failures? If you fitted a
linear regression line through these seven observations, do you think the slope would be positive or negative?
Significantly different from zero? Do you see any problems other than the sample size in your procedure?
(b) You decide to look at all successful launches before Challenger, even those for which there were no
Stock/Watson 2e -- CVC2 8/23/06 -- Page 272
incidents. Furthermore you simplify the problem by specifying a binary variable, which takes on the value one
if there was some O-ring failure and is zero otherwise. You then fit a linear probability model with the
following result,
OFail = 2.858 – 0.037 × Temperature; R2 = 0.325, SER = 0.390,
(0.496) (0.007)
where Ofail is the binary variable which is one for launches where O-rings showed some thermal distress, and
Temperature is measured in degrees of Fahrenheit. The numbers in parentheses are heteroskedasticity -robust
standard errors.
Interpret the equation. Why do you think that heteroskedasticity-robust standard errors were used? What is
your prediction for some O-ring thermal distress when the temperature is 31°, the temperature on January 28,
1986? Above which temperature do you predict values of less than zero? Below which temperature do you
predict values of greater than one?
(c) To fix the problem encountered in (b), you re-estimate the relationship using a logit regression:
Pr(OFail = 1 Temperature) = F (15.297 – 0.236 × Temperature); pseudo- R2 =0.297
(7.329) (0.107)
What is the meaning of the slope coefficient? Calculate the effect of a decrease in temperature from 80° to 70°,
and from 60° to 50°. Why is the change in probability not constant? How does this compare to the linear
probability model?
(d) You want to see how sensitive the results are to using the logit, rather than the probit estimation method.
The probit regression is as follows:
Pr(OFail = 1 Temperature) =
(8.900 – 0.137 × Temperature); pseudo- R2 =0.296
(3.983) (0.058)
Why is the slope coefficient in the probit so different from the logit coefficient? Calculate the effect of a decrease
in temperature from 80° to 70°, and from 60° to 50°
and compare the resulting changes in probability to your results in (c). What is the meaning of the pseudo - R2
? What other measures of fit might you want to consider?
(e) Calculate the predicted probability for 80° and 40°, using your probit and logit estimates. Based on the
relationship between the probabilities, sketch what the general relationship between the logit and probit
regressions is. Does there seem to be much of a difference for values other than these extreme values?
(f) You decide to run one more regression, where the dependent variable is the
actual number of incidences (NoOFail). You allow for a different functional form by choosing the inverse of the
temperature, and estimate the regression by OLS.
NoOFail = -3.8853 + 295.545 × (1/Temperature); R2 = 0.386, SER = 0.622
(1.516) (106.541)
What is your prediction for O-ring failures for the 31° temperature which was forecasted for the launch on
January 28, 1986? Sketch the fitted line of the regression above.
Answer: (a) There does not appear to be a linear relationship underlying the few observations where O -ring
failure occurred. If estimated by OLS, you would expect a slightly negative relationship (the slope turns
out to be –0.025). It certainly would not be statistically significant using the t-statistic (although a
standard normal distribution cannot be used given the small sample size). Using a linear function is also
a problem since, even in the presence of a significant slope, the dependent variable cannot be less than
zero.
(b) There is a negative relationship between the temperature and the occurrence of an O -ring failure. At
high temperatures, say above 75 degrees, there is less than a 10 percent chance of O -ring failure.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 273
As was mentioned in the textbook, the errors of the linear probability model are always heteroskedastic.
It is therefore necessary to use heteroskedasticity-robust standard errors for inference. The linear
probability model predicts O-ring failure with certainty for temperatures below 50 degrees. The
prediction for 31 degrees is therefore above one (1.7). The model predicts negative values for
temperatures above 77 degrees Fahrenheit.
(c) The slope coefficient is negative. Hence increases in temperature result in a lowering of the
probability of O-ring failures. Beyond that, neither the slope nor the intercept is easy to interpret. The
decrease in temperature from 80° to 70° results in an increase in the probability of 20.0 percent, and from
60° to 50° in an increase in the probability of 21.3 percent. The change in probability is not constant since
this is a nonlinear model. In the linear probability model the change in probability would remain
constant, being 30.7 percent in the above example.
(d) The slope coefficients should not be directly compared, since the functions are different. This does
not imply that the calculated probabilities are not similar between using the logit and probit model. For
example, the decrease in temperature from 80° to 70° results in an increase in the probability of 22.5
percent, and from 60° to 50° in an increase in the probability of 22.8 percent. The pseudo - R2 calculates
the increase in the likelihood function by using temperature compared to the case where no explanatory
variables is used. An alternative measure of fit is the fraction correctly predicted.
(e) There is little difference between the logit and probit predictions, other than in the extremes. For 80°,
the logit and probit predicted values are 2.7 and 2.0 percent respectively, and at 40°, they are 99.7
percent and 99.9 percent. Hence the logit is slightly higher at high temperatures and slightly lower at
low temperatures. However, the difference is very small.
(f) The predicted number of failures from this regression is 5.7.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 274
3) A study tried to find the determinants of the increase in the number of households headed by a female. Using
1940 and 1960 historical census data, a logit model was estimated to predict whether a woman is the head of a
household (living on her own) or whether she is living within another’s household. The limited dependent
variable takes on a value of one if the female lives on her own and is zero if she shares housing. The results for
1960 using 6,051 observations on prime-age whites and 1,294 on nonwhites were as shown in the table:
Regression
Regression model
Constant
Age
age squared
education
farm status
South
expected family
earnings
family composition
Pseudo-R2
Percent Correctly
Predicted
(1) White
Logit
1.459
(0.685)
-0.275
(0.037)
0.00463
(0.00044)
-0.171
(0.026)
-0.687
(0.173)
0.376
(0.098)
0.0018
(0.00019)
4.123
(0.294)
0.266
(2) Nonwhite
Logit
-2.874
(1.423)
0.084
(0.068)
0.00021
(0.00081)
-0.127
(0.038)
-0.498
(0.346)
-0.520
(0.180)
0.0011
(0.00024)
2.751
(0.345)
0.189
82.0
83.4
where age is measured in years, education is years of schooling of the family head, farm status is a binary variable
taking the value of one if the family head lived on a farm, south is a binary variable for living in a certain region
of the country, expected family earnings was generated from a separate OLS regression to predict earnings from a
Stock/Watson 2e -- CVC2 8/23/06 -- Page 275
set of regressors, and family composition refers to the number of family members under the age of 18 divided by
the total number in the family.
The mean values for the variables were as shown in the table.
Variable
age
age squared
education
farm status
south
expected family
earnings
family composition
(1) White mean
46.1
2,263.5
12.6
0.03
0.3
2,336.4
(2) Nonwhite mean
42.9
1,965.6
10.4
0.02
0.5
1,507.3
0.2
0.3
(a) Interpret the results. Do the coefficients have the expected signs? Why do you think age was entered both in
levels and in squares?
(b) Calculate the difference in the predicted probability between whites and nonwhites at the sample mean
values of the explanatory variables. Why do you think the study did not combine the observations and allowed
for a nonwhite binary variable to enter?
(c) What would be the effect on the probability of a nonwhite woman living on her own, if education and family
composition were changed from their current mean to the mean of whites, while all other variables were left
unchanged at the nonwhite mean values?
Answer: (a) Since these are logit estimates, the value of the coefficients cannot be interpreted easily. However,
statements can be made about the direction of the relationship between the dependent variable and the
regressors. There is a decrease in the probability of females of living on their own with an increase in
years of education. Not living on a farm also lowers the probability. These results hold both for whites
and nonwhites. In addition, for whites the probability of living on her own increases up to a point with
age, but then decreases. This is the result of age entering as a level and the square of age. This
relationship with regard to age is not statistically significant for nonwhites. In the south, white females
are more likely to live on their own, but nonwhites are not. An increase in expected family earnings and
family composition increase the probability of females living on their own.
(b) For whites, the probability is 0.90, while for nonwhites, it is 0.88. In the above approach, all
coefficients are allowed to vary, whereas in a combined sample, the coefficients on the variables other
than the binary race variable would have to be identical.
(c) The probability would increase to 0.81.
4) A study investigated the impact of house price appreciation on household mobility. The underlying idea was
that if a house were viewed as one part of the household’s portfolio, then changes in the value of the house,
relative to other portfolio items, should result in investment decisions altering the current portfolio. Using 5,162
observations, the logit equation was estimated as shown in the table, where the limited dependent variable is
one if the household moved in 1978 and is zero if the household did not move:
Regression
model
constant
Male
Black
Married78
marriage
Logit
-3.323
(0.180)
-0.567
(0.421)
-0.954
(0.515)
0.054
(0.412)
0.764
Stock/Watson 2e -- CVC2 8/23/06 -- Page 276
change
A7983
PURN
Pseudo-R2
(0.416)
-0257
(0.921)
-4.545
(3.354)
0.016
where male, black, married78, and marriage change are binary variables. They indicate, respectively, if the entity
was a male-headed household, a black household, was married, and whether a change in marital status
occurred between 1977 and 1978. A7983 is the appreciation rate for each house from 1979 to 1983 minus the
SMSA-wide rate of appreciation for the same time period, and PNRN is a predicted appreciation rate for the
unit minus the national average rate.
(a) Interpret the results. Comment on the statistical significance of the coefficients. Do the slope coefficients
lend themselves to easy interpretation?
(b) The mean values for the regressors are as shown in the accompanying table.
Variable
male
black
married78
marriage change
A7983
PNRN
Mean
0.82
0.09
0.78
0.03
0.003
0.007
Taking the coefficients at face value and using the sample means, calculate the probability of a household
moving.
(c) Given this probability, what would be the effect of a decrease in the predicted appreciation rate of 20
percent, that is A7983 = –0.20?
Answer: (a) Since the logit model is nonlinear, the slope coefficients cannot be easily interpreted. However, the
signs of the coefficients indicate the direction of the relationship between the regressors and the binary
dependent variable. Accordingly, being married or having experienced a marriage change increases the
probability of moving. A male-headed household or a black household is less likely to move. If the
predicted appreciation rate relative to the national average increased, then the household is less likely to
move. The same holds for the actual appreciation rate from 1979 to 1983. None of the slope coefficients
are statistically significant with the exception of the black household and marriage change coefficients.
The two t-statistics are –1.85 and 1.84 respectively. These would be statistically significant at the 5%
level of a one-sided hypothesis test.
(b) The probability is 0.021.
(c) The resulting probability would be 0.051, i.e., more than twice the value in the previous result.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 277
5) A study analyzed the probability of Major League Baseball (MLB) players to “survive” for another season, or,
in other words, to play one more season. The researchers had a sample of 4,728 hitters and 3,803 pitchers for
the years 1901-1999. All explanatory variables are standardized. The probit estimation yielded the results as shown
in the table:
Regression
Regression model
constant
number of seasons
played
performance
average performance
(1) Hitters
probit
2.010
(0.030)
-0.058
(0.004)
0.794
(0.025)
0.022
(0.033)
(2) Pitchers
probit
1.625
(0.031)
-0.031
(0.005)
0.677
(0.026)
0.100
(0.036)
where the limited dependent variable takes on a value of one if the player had one more season (a minimum of
50 at bats or 25 innings pitched), number of seasons played is measured in years, performance is the batting average
for hitters and the earned run average for pitchers, and average performance refers to performance over the
career.
(a) Interpret the two probit equations and calculate survival probabilities for hitters and pitchers at the sample
mean. Why are these so high?
(b) Calculate the change in the survival probability for a player who has a very bad year by performing two
standard deviations below the average (assume also that this player has been in the majors for many years so
that his average performance is hardly affected). How does this change the survival probability when
compared to the answer in (a)?
(c) Since the results seem similar, the researcher could consider combining the two samples. Explain in some
detail how this could be done and how you could test the hypothesis that the coefficients are the same.
Answer: (a) Note that all variables are standardized, so that the mean is zero. This results in a survival probability
of 0.997 for hitters and 0.991 for pitchers. These results are so high because there is a high probability, in
general, for a player to return the following season.
(b) Since the variables are standardized, this implies a change of two for the performance variable. The
result for hitters is a lowering of the survival probability to 0.65, and for pitchers to 0.633
(c) After combining the sample for hitters and pitchers, you would allow for a different intercept and
slopes by introducing a binary variable for pitchers if hitters are the default. This binary variable would
be introduced by itself and in combination with each of the above variables, thereby allowing all
coefficients to differ. You could then conduct an F-test for the joint hypothesis that all coefficients
involving the binary variables are zero. If the hypothesis cannot be rejected, then there is no difference
between the coefficients for hitters and pitchers.
6) The logit regression (11.10) on page 393 of your textbook reads:
Pr(deny=1|P/Iratio,black) = F(-4.13 + 5.37 P/Iratio + 1.27 black)
a)
Using a spreadsheet program such as Excel, plot the following logistic regression function with a single X,
^
^
^
1
where
Yi =
0 = -4.13, 1 = 5.37, 2 = 1.27. Enter values for X1 in the first column
^ ^
^
1+e-( 0 + 1 X1i+ 2 X2i)
^
starting from 0 and then increment these by 0.1 until you reach 2.0. Let X2 be 0 at first. Then enter the logistic
function formula in the next column. Next allow X2 to be 1 and calculate the new values for the logistic
function in the third column. Finally produce the predicted probabilities for both blacks and whites, connecting
the predicted values with a line.
(b) Using the same spreadsheet calculations, list how the probability increases for blacks and for whites
Stock/Watson 2e -- CVC2 8/23/06 -- Page 278
as the P/I ratio increases from 0.5 to 0.6.
(c) What is the difference in the rejection probability between blacks and whites for a P/I ratio of 0.5 and
for 0.9? Why is the difference smaller for the higher value here?
(d) Table 11.2 on page 401 of your textbook lists logit regressions (column 2) with further explanatory
variables. Given that you can only produce simple plots in two dimensions, how would you proceed in
(a) above if there were more than a single explanatory variable?
Answer: a.
b. The increase in the deny probability increases by 9.7 percentage points for whites, and by 13.3
percentage points for blacks.
c. At a P/I value of 0.5, the difference is approximately 30%, while it is 20% for the higher value. As the
ratio increases, the probability that everyone gets rejected increases and approaches 1, regardless of race.
d. In that case you would have to hold the other explanatory variables constant. A simple solution
would be to set all of these to zero. A more reasonable approach would be to set them to their sample
average if they are continuous variables, and to set them either to 0 or 1 for binary variables.
7) Equation (11.3) in your textbook presents the regression results for the linear probability model.
a.
Using a spreadsheet program such as Excel, plot the fitted values for whites and blacks in the same
graph, for P/I ratios ranging from 0 to 1 (use 0.05 increments).
b.
Explain some of the strengths and shortcomings of the linear probability model using this graph.
Answer: a.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 279
Answer:
b. The strength is that the regression line is easy to interpret once you realize that the fitted values are
probabilities of being denied a loan: increases in the P/I ratio of 10 percentage points increase the
probability of being denied by roughly 6 percentage points. The role of the binary variable for blacks also
becomes clear: blacks have a roughly 18 percentage point higher probability of being rejected for a loan
when compared to whites, at any given level of a P/I ratio. As for shortcomings, it becomes clear that this
model cannot be used to calculate the probability of rejection for whites with a P/I ratio less than
approximately 20 percent. In that case, the predicted probability would be negative. Similarly, you
would expect the probability increase for a given change in the P/I ratio to change as the P/I ratio
becomes larger; this is not the case for the linear probability model. Furthermore, you will find values
larger than 1 for the P/I ratio in the data set used for Chapter 11. As a result, the predicted probability of
being rejected for a loan would be above 1 for some individuals, which does not make sense.
8) Equation (11.3) in your textbook presents the regression results for the linear probability model, and equation
(11.10) the results for the logit model.
a.
Using a spreadsheet program such as Excel, plot the predicted probabilities for being denied a loan for
both the linear probability model and the logit model if you are black. (Use a range from 0 to 1 for the
P/I Ratio and allow for it to increase by increments of 0.05.)
b.
Given the shortcomings of the linear probability model, do you think that it is a reasonable
approximation to the logit model?
c.
Repeat the exercise using predicted probabilities for whites.
Answer: a.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 280
Answer:
b. The predicted probabilities are actually quite close for P/I Ratio values between 0 and 0.5. Beyond that,
the linear probability model predicts substantially lower rejection probabilities.
c.
Here the shortcomings of the linear probability model become obvious for P/I Ratio values of
less than approximately 0.2: the predicted probabilities become negative. However, for values
of between 0.2 and 0.7, the predicted probabilities of both models are approximately the same,
so that the linear probability model would work well as an approximation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 281
11.3 Mathematical and Graphical Problems
1) Sketch the regression line for the linear probability model with a single regressor. Indicate for which values of
the slope and intercept the predictions will be above one and below zero. Can you rule out homoskedasticity in
the error terms with certainty here?
Answer: The errors in the linear probability model are always heteroskedastic.
2) Consider the following logit regression:
Pr(Y = 1 X) = F (15.3 – 0.24 × X)
Calculate the change in probability for X increasing by 10 for X = 40 and X = 60. Why is there such a large
difference in the change in probabilities?
Answer: Pr(Y=1 X=40) = 0.997; Pr(Y=1 X=50) = 0.964; Pr(Y=1 X=60) = 0.711;
Pr(Y=1 X=70) = 0.182. The change is large due to the nonlinear nature of the model and the values for
which the change was calculated.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 282
3) You have a limited dependent variable (Y) and a single explanatory variable (X). You estimate the relationship
using the linear probability model, a probit regression, and a logit regression. The results are as follows:
^
Y = 2.858 – 0.037 × X
(0.007)
Pr(Y = 1 X) = F (15.297 – 0.236 × X)
Pr(Y = 1 X) =
(8.900 – 0.137 × X)
(0.058)
(a) Although you cannot compare the coefficients directly, you are told that “it can be shown” that certain
^
relationships between the coefficients of these models hold approximately. These are for the slope: probit
^
^
^
0.625 × Logit , linear 0.25 × Logit . Take the logit result above as a base and calculate the slope coefficients for
the linear probability model and the probit regression. Are these values close?
(b) For the intercept, the same conversion holds for the logit-to-probit transformation. However, for the linear
probability model, there is a different conversion:
^
0,linear
^
0.25 × 0,Logit + 0.5
Using the logit regression as the base, calculate a few changes in X (temperature in degrees of Fahrenheit) to
see how good the approximations are.
Answer: (a)
^
probit
0.625 × 0.236 = 0.148, which is quite close to the estimated slope, judging by its standard
^
deviation. linear 0.25 × 0.236 = 0.059 is close numerically, but not as close when you take into account
the small standard deviation.
(b) The approximation gives a probit intercept of 9.561 and a linear approximation of 4.324.
Temperature X
30
40
50
60
70
80
Linear probability model
Actual
approximation
1.7
2.6
1.4
2.0
1.0
1.4
0.6
0.8
0.3
0.4
-0.1
-0.2
Probit model
actual
approximation
1
1
1
1
0.98
0.98
0.75
0.75
0.25
0.21
0.02
0.01
In terms of calculated probabilities, the approximation is closer for the probit model than for the linear
probability model.
4) The population logit model of the binary dependent variable Y with a single regressor is
Pr(Y=1 X1 )=
1
-( + 1 X1 )
1+e 0
.
Logistic functions also play a role in econometrics when the dependent variable is not a binary variable. For
example, the demand for televisions sets per household may be a function of income, but there is a saturation
or satiation level per household, so that a linear specification may not be appropriate. Given the regression
model
Stock/Watson 2e -- CVC2 8/23/06 -- Page 283
Yi =
0
+ ui,
- X
1 + 1e 2 i
sketch the regression line. How would you go about estimating the coefficients?
Answer: The equation cannot be estimated using linear methods or transformations that allow linearization.
However, nonlinear least squares estimation is possible as described in section 11.3 of the textbook.
^
Some students may point out that 0 will give an estimate of the satiation level (perhaps 10 TVs per
household), and that the point of inflection is at
1
ln 1 .
X=
2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 284
5) (Requires Appendix material) Briefly describe the difference between the following models: censored and
truncated regression model, count data, ordered responses, and discrete choice data. Try to be specific in terms
of describing the data involved.
Answer: The answer should follow the discussion in Appendix 11.3. Briefly: censored regression models have a
dependent variable that has been “censored” above or below a certain cutoff, such as in the case where
some individuals actually spend different amounts of money on an item, but others do not spend any
amount. An example is the tobit regression model. The difference to the truncated regression model is that
data is available for both types of individuals, buyers and non-buyers in the case of the censored model,
but only for buyers in the case of the truncated regression model. An example for these types of models
are expenditures by individuals. There are other examples in economics where sample selection bias
occurs, such as in the case of earnings functions (labor economics), industrial organization, and finance.
Count data involves a discrete dependent variable, such as the number of times an activity is performed.
Just as OLS does not perform well in the discrete dependent variable case, the same holds here, and
special methods (Poisson and negative binomial regression models) have been developed to deal with
the special format. Ordered response data resembles the count data situation, in that there is a natural
ordering. The difference is that there are no natural numerical values attached, such as is the case when
activity by individuals happens a discrete number of times during a certain period. The Federal Reserve
may decide to lower the federal funds rate or not, and conditionally on lowering it, it may decide on a
mild cut or a more severe cut. Ordered Probit Models have been developed for such situations. Finally,
discrete choice data also allows for multiple responses, but these are not ordered, such as when the
individual can decide on different modes of transportation. In addition to its use in transportation
economics, multinomial probit and logit regression models have been developed and applied in labor
economics and health economics.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 285
6) (Requires Appendix material and Calculus) The logarithm of the likelihood function (L) for estimating the
population mean and variance for an i.i.d. normal sample is as follows (note that taking the logarithm of the
likelihood function simplifies maximization. It is a monotonic transformation of the likelihood function,
meaning that this transformation does not affect the choice of maximum):
n
L = - log(2
2
n
2) – 1
2 2
(Yi - Y)2
i=1
Derive the maximum likelihood estimator for the mean and the variance. How do they differ, if at all, from the
OLS estimator? Given that the OLS estimators are unbiased, what can you say about the maximum likelihood
estimators here? Is the estimator for the variance consistent?
Answer: Taking the derivative with respect to the two parameters Y and 2 results in
L
Y
=-
n
1
2 2
i=1
1
2(Yi - Y)(-1) =
2 2
n
1
L
=+
2
2
2
2 4
n
i=1
n
i=1
(Yi - Y)
(Yi - Y)2 .
The maximum likelihood estimator is then the value for Y and 2 that maximizes the (log) likelihood
function. Setting both equations to zero, and assuming that this results in a maximum rather than a
minimum (second order conditions will not be discussed here), yields
^
Y,MLE =
1
n
n
i=1
Yi = Y and
^2
1
MLE = n
n
i=1
1
^
(Yi - Y,MLE )2 = n
n
(Yi - Y)2 ) .
i=1
The maximum likelihood estimator of the population mean is therefore the sample mean. Since the OLS
estimator is identical, and it is unbiased, the MLE will also be unbiased. However, the MLE for the
population differs from the OLS estimator, and since the OLS estimator is unbiased, the MLE must be
biased. But, the difference between the two estimators vanishes as n increases, and hence the MLE is
consistent.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 286
7) Besides maximum likelihood estimation of the logit and probit model, your textbook mentions that the model
can also be estimated by nonlinear least squares. Construct the sum of squared prediction mistakes and suggest
how computer algorithms go about finding the coefficient values that minimize the function. You may want to
use an analogy where you place yourself into a mountain range at night with a flashlight shining at your feet.
Your task is to find the lowest point in the valley. You have two choices to make: the direction you are walking
in and the step length. Describe how you will proceed to find the bottom of the valley. Once you find the
lowest point, is there any guarantee that this is the lowest point of all valleys? What should you do to assure
this?
n
(Yi - f(b0 + b1 X1i + ... + bkXki)]2 is the sum of squared prediction mistakes, whether or not
i=1
the function f() is linear or nonlinear. Nonlinear least squares then uses a sophisticated algorithm of trial
and error to find the minimum of the squared prediction mistakes by changing the values of the
parameters. Some of the routines are called Newton-Raphson, Gauss-Newton, Method of Steepest
Ascent, etc. What they have in common is the general principle that they evaluate the squared prediction
after changing the parameters in a certain direction and by a certain size. In the analogy, the student is
lowered into a mountain range at night and her task is to find the lowest point of the valley. The rule
may be that she will walk in one direction as long as at the end of the step she is at a lower point than at
the beginning of the step. If not, then she should walk in a different direction. She is also allowed to
choose the step length. There is, of course, no guarantee that another point in another valley is not lower
than the one she found in the valley she is in, nor is she guaranteed to find the lowest point if she makes
very large steps. To assure that this is the lowest point, she should ask to be dropped off in a different
location (“starting point”) and see if she finds the same spot again. Finally, she should be warned that it
is possible to walk along ridges for a long time without much progress visible.
Answer: In general,
8) Consider the following probit regression
Pr(Y = 1 X) =
(8.9 – 0.14 × X)
Calculate the change in probability for X increasing by 10 for X = 40 and X = 60. Why is there such a large
difference in the change in probabilities?
Answer: Pr(Y=1 X=40) = 0.999; Pr(Y=1 X=50) = 0.971; Pr(Y=1 X=60) = 0.691; Pr(Y=1 X=70) = 0.184. The large
differences happen as a result of the non-linearity of the function, and the points at which they are
calculated.
9) Earnings equations establish a relationship between an individual’s earnings and its determinants such as years
of education, tenure with an employer, IQ of the individual, professional choice, region within the country the
individual is living in, etc. In addition, binary variables are often added to test for “discrimination” against
certain sub-groups of the labor force such as blacks, females, etc. Compare this approach to the study in the
textbook, which also investigates evidence on discrimination. Explain the fundamental differences in both
approaches using equations and mathematical specifications whenever possible.
Answer: In the former case, the binary variable appears as a regressor. That is, the regression may be ln( Earni) =
0 + 1 × Educi + 2 × Exper + 3 × Binary + ... + ui,
where earnings of an individual are explained by a set of attributes. Binary is a shift variable, which is
one for females (or blacks, religion, union members, etc.). The coefficient on the shift variable then
indicates whether or not the individual is treated differently, controlling for all other influences.
However, the dependent variable is continuous.
In the case of a limited dependent variable, it is the left-hand variable that is binary. Here behavior of a
qualitative type is being explained, i.e.,
Binaryi = 0 + 1 × X1i + 2 × X2i + ... + k × Xki + ui,
although some of the regressors may also be binary variables.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 287
10) (Requires Appendix material and Calculus) The log of the likelihood function (L) for the simple regression
model with i.i.d. normal errors is as follows (note that taking the logarithm of the likelihood function simplifies
maximization. It is a monotonic transformation of the likelihood function, meaning that this transformation
does not affect the choice of maximum):
n
n
1
log(2 ) – log 2 –
2
2
2 2
L=–
n
i=1
(Yi - 0 - 1 Xi)2
Derive the maximum likelihood estimator for the slope and intercept. What general properties do these
estimators have? Explain intuitively why the OLS estimator is identical to the maximum likelihood estimator
here.
Answer: Maximizing the likelihood function with respect to the regression coefficients is the same as making the
third term as small as possible. However, this term will become the sum of squared residuals once the
function is maximized. Hence maximizing the likelihood function is identical to minimizing the sum of
squared residuals, and the two methods of choosing an estimator are therefore identical for the
regression coefficients.
Taking the derivative of the log of the likelihood with respect to the three parameters
0 , 1 and 2
results in
L
0
L
1
=-
=-
n
1
2 2
2(Yi - 0 - 1 Xi)(-1)
i=1
n
1
2 2
2(Yi - 0 - 1 Xi)(-Xi)
i=1
L
n
1
=+
2
2
2
2 4
n
(Yi - 0 - 1 Xi)2
i=1
Setting the equations to zero and solving for the three parameters then results in the maximum
likelihood estimator (MLE).
n
i=1
n
^
^
^
^
(Yi - 0,MLE - 1,MLE Xi) = 0, or
^
^
0,MLE = Y - 1,MLE X.
(Yi - 0,MLE - 1,MLE Xi)(Xi) = 0, or, after multiplying through by Xi and substituting
i=1
n
^
YiXi - nXY
i=1
1,MLE = n
.
2
X i - nX2
i=1
n
2
^2
^2
MLE
1
MLE = n
n
1
+
2
^4
n
i=1
i=1
MLE
^
^
(Yi - 0,MLE - 1,MLE Xi)2 = 0, or
1
(Yi - 0,MLE - 1,MLE Xi)2 = n
^
^
n ^
2
ui .
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 288
^
0,MLE ,
The estimator for the regression slope and intercept is therefore identical to the OLS estimator. However,
the estimator for the error variance is different and biased. In general, MLEs are consistent. They are also
normally distributed in large samples.
11) The estimated logit regression in your textbook is
Pr(deny=1|P/Iratio,black) = F(-4.13 + 5.37 P/Iratio + 1.27 black)
Using a spreadsheet program, such as Excel, generate a table with predicted probabilities for both whites and
blacks using P/I Ratio values between 0 and 1 and increments of 0.05.
Answer: P/I Ratio
whites
0.02
0.02
0.03
0.03
0.04
0.06
0.07
0.10
0.12
0.15
0.19
0.24
0.29
0.35
0.41
0.47
0.54
0.61
0.67
0.73
0.78
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
blacks
0.05
0.07
0.09
0.11
0.14
0.18
0.22
0.27
0.33
0.39
0.46
0.52
0.59
0.65
0.71
0.76
0.81
0.85
0.88
0.90
0.92
12) The estimated logit regression in your textbook is
Pr(deny=1|P/Iratio,black) = F(-4.13 + 5.37 P/Iratio + 1.27 black)
Is there a meaningful interpretation to the slope for the P/I Ratio? Calculate the increase of a rejection
probability for both blacks and whites as the P/I Ratio increases from 0.1 to 0.2. Repeat the exercise for
an increase from 0.65 to 0.75. Why is the increase in the probability higher for blacks at the smaller
value of the P/I Ratio but higher for whites at the larger P/I Ratio?
Answer: There is no meaningful interpretation of the regression slope: it certainly does not indicate by how much
the rejection probability increases for a given change in the P/I Ratio. For whites, the change in the
rejection probabilities are 0.02 and 0.13. For blacks, the respective values are 0.05 and 0.11. The
differences are due to the non-linearity of the logit function: it is steeper for blacks at low values of the
P/I Ratio and flattens out for higher values.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 289
Chapter 12 Instrumental Variables Regression
12.1 Multiple Choice
1) Estimation of the IV regression model
A) requires exact identification.
B) allows only one endogenous regressor, which is typically correlated with the error term.
C) requires exact identification or overidentification.
D) is only possible if the number of instruments is the same as the number of regressors.
Answer: C
2) Two Stage Least Squares is calculated as follows; in the first stage:
A) Y is regressed on the exogenous variables only. The predicted value of Y is then regressed on the
instrumental variables.
B) the unknown coefficients in the reduced form equation are estimated by OLS, and the predicted values
are calculated. In the second stage, Y is regressed on these predicted values and the other exogenous
variables.
C) the exogenous variables are regressed on the instruments. The predicted value of the exogenous variables
is then used in the second stage, together with the instruments, to predict the dependent variable.
D) the unknown coefficients in the reduced form equation are estimated by weighted least squares, and the
predicted values are calculated. In the second stage, Y is regressed on these predicted values and the
other exogenous variables.
Answer: B
3) The conditions for a valid instruments do not include the following:
A) each instrument must be uncorrelated with the error term.
B) each one of the instrumental variables must be normally distributed.
C) at least one of the instruments must enter the population regression of X on the Zs and the Ws.
D) perfect multicollinearity between the predicted endogenous variables and the exogenous variables must
be ruled out.
Answer: B
4) The IV regression assumptions include all of the following with the exception of
A) the error terms must be normally distributed.
B) E(ui W1i,…, Wri) = 0.
C) Large outliers are unlikely: the X’s, W’s, Z’s, and Y’s all have nonzero, finite fourth moments.
D) (X1i,…, Xki, W1i,…,Wri, Z1i, … Zmi, Yi) are i.i.d. draws from their joint distribution.
Answer: A
5) The rule-of-thumb for checking for weak instruments is as follows: for the case of a single endogenous
regressor,
A) a first stage F must be statistically significant to indicate a strong instrument.
B) a first stage F > 1.96 indicates that the instruments are weak.
C) the t-statistic on each of the instruments must exceed at least 1.64.
D) a first stage F < 10 indicates that the instruments are weak.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 290
6) The J-statistic
A) tells you if the instruments are exogenous.
B) provides you with a test of the hypothesis that the instruments are exogenous for the case of exact
identification.
C) is distributed
2
m-k where m-k is the degree of overidentification.
D) is distributed
2
m-k where m-k is the number of instruments minus the number of regressors.
Answer: C
7) In the case of the simple regression model Yi = 0 + 1 Xi + ui, i = 1,…, n, when X and u are correlated, then
A) the OLS estimator is biased in small samples only.
B) OLS and TSLS produce the same estimate.
C) X is exogenous.
D) the OLS estimator is inconsistent.
Answer: D
8) The following will not cause correlation between X and u in the simple regression model:
A) simultaneous causality.
B) omitted variables.
C) irrelevance of the regressor.
D) errors in variables.
Answer: C
9) The distinction between endogenous and exogenous variables is
A) that exogenous variables are determined inside the model and endogenous variables are determined
outside the model.
B) dependent on the sample size: for n > 100, endogenous variables become exogenous.
C) depends on the distribution of the variables: when they are normally distributed, they are exogenous,
otherwise they are endogenous.
D) whether or not the variables are correlated with the error term.
Answer: D
10) The two conditions for a valid instrument are
A) corr(Zi, Xi) = 0 and corr(Zi, ui) 0.
B) corr(Zi, Xi) = 0 and corr(Zi, ui) = 0.
C) corr(Zi, Xi)
D) corr(Zi, Xi)
0 and corr(Zi, ui) = 0.
0 and corr(Zi, ui)
0.
Answer: C
11) Instrument relevance
A) means that the instrument is one of the determinants of the dependent variable.
B) is the same as instrument exogeneity.
C) means that some of the variance in the regressor is related to variation in the instrument.
D) is not possible since X and u are correlated and Z and u are not correlated.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 291
12) Consider a competitive market where the demand and the supply depend on the current price of the good.
Then fitting a line through the quantity-price outcomes will
A) give you an estimate of the demand curve.
B) estimate neither a demand curve nor a supply curve.
C) enable you to calculate the price elasticity of supply.
D) give you the exogenous part of the demand in the first stage of TSLS.
Answer: B
13) When there is a single instrument and single regressor, the TSLS estimator for the slope can be calculated as
follows:
^ TSLS SZY
A) 1
.
=
SZX
B)
C)
D)
^ TSLS
1
^ TSLS
1
^ TSLS
1
=
=
=
SXY
2
SX
.
SZX
.
SZY
SZY
2
SZ
.
Answer: A
14) The TSLS estimator is
A) consistent and has a normal distribution in large samples.
B) unbiased.
C) efficient in small samples.
D) F-distributed.
Answer: A
15) The reduced form equation for X
A) regresses the endogenous variable X on the smallest possible subset of regressors.
B) relates the endogenous variable X to all the available exogenous variables, both those included in the
regression of interest and the instruments.
C) uses the predicted values of X from the first stage as a regressor in the original equation.
D) uses smaller standard errors, such as homoskedasticity-only standard errors, for inference.
Answer: B
16) When calculating the TSLS standard errors
A) you do not have to worry about heteroskedasticity, since it was eliminated in the first stage
B) you can use the standard errors reported by OLS estimation of the second stage regression.
C) the critical values from the standard normal table should be adjusted for the proper degrees of freedom.
D) you should use heteroskedasticity-robust standard errors.
Answer: D
17) Having more relevant instruments
A) is a problem because instead of being just identified, the regression now becomes overidentified.
B) is like having a larger sample size in that the more information is available for use in the IV regressions.
C) typically results in larger standard errors for the TSLS estimator.
D) is not as important for inference as having the same number of endogenous variables as instruments.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 292
18) Weak instruments are a problem because
A) the TSLS estimator may not be normally distributed, even in large samples.
B) they result in the instruments not being exogenous.
C) the TSLS estimator cannot be computed.
D) you cannot predict the endogenous variables any longer in the first stage.
Answer: A
19) (Requires Appendix material) The relationship between the TSLS slope and the corresponding population
parameter is:
n
1
(Zi - Z)ui
n
^ TSLS
i=1
A) ( 1
.
- 1) =
n
1
(Zi - Z)(Xi - X)
n
i=1
^ TSLS
B) ( 1
1
n
- 1) =
^ TSLS
C) ( 1
- 1) =
^ TSLS
D) ( 1
- 1) =
1
n
1
n
1
n
n
(Zi - Z)
i=1
.
(Zi - Z)(Xi - X)
i=1
n
(Zi - Z)ui
i=1
n
.
(Zi - Z)2
i=1
1
n
1
n
n
n
(Xi - X)ui
i=1
n
.
(Zi - Z)(Xi - X)
i=1
Answer: A
20) If the instruments are not exogenous,
A) you cannot perform the first stage of TSLS.
B) then, in order to conduct proper inference, it is essential that you use heteroskedasticity -robust standard
errors.
C) your model becomes overidentified.
D) then TSLS is inconsistent.
Answer: D
21) In the case of exact identification
A) you can use the J-statistic in a test of overidentifying restrictions.
B) you cannot use TSLS for estimation purposes.
C) you must rely on your personal knowledge of the empirical problem at hand to assess whether the
instruments are exogenous.
D) OLS and TSLS yield the same estimate.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 293
22) To calculate the J-statistic you regress the
A) squared values of the TSLS residuals on all exogenous variables and the instruments. The statistic is then
the number of observations times the regression R2 .
B) TSLS residuals on all exogenous variables and the instruments. You then multiply the
homoskedasticity-only F-statistic from that regression by the number of instruments.
C) OLS residuals from the reduced form on the instruments. The F-statistic from this regression is the
J-statistic.
D) TSLS residuals on all exogenous variables and the instruments. You then multiply the
heteroskedasticity-robust F-statistic from that regression by the number of instruments.
Answer: B
23) (Requires Chapter 8) When using panel data and in the presence of endogenous regressors
A) the TSLS does not exist.
B) you do not have to worry about the validity of instruments, since there are so many fixed effects.
C) the OLS estimator is consistent.
D) application of the TSLS estimator is straightforward if you use two time periods and difference the data.
Answer: D
24) In practice, the most difficult aspect of IV estimation is
A) finding instruments that are both relevant and exogenous.
B) that you have to use two stages in the estimation process.
C) calculating the J-statistic.
D) finding instruments that are exogenous. Relevant instruments are easy to find.
Answer: A
25) Consider a model with one endogenous regressor and two instruments. Then the J-statistic will be large
A) if the number of observations are very large.
B) if the coefficients are very different when estimating the coefficients using one instrument at a time.
C) if the TSLS estimates are very different from the OLS estimates.
D) when you use homoskedasticity-only standard errors.
Answer: B
26) Let W be the included exogenous variables in a regression function that also has endogenous regressors ( X).
The W variables can
A) be control variables
B) have the property E(ui|Wi) = 0
C) make an instrument uncorrelated with u
D) all of the above
Answer: D
27) The logic of control variables in IV regressions
A) parallels the logic of control variables in OLS
B) only applies in the case of homoskedastic errors in the first stage of two stage least squares estimation
C) is different in a substantial way from the logic of control variables in OLS since there are two stages in
estimation
D) implies that the TSLS is efficient
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 294
28) For W to be an effective control variable in IV estimation, the following condition must hold
A) E(ui ) = 0
B) E(u i|Zi,W i) = E(ui|Wi )
C) E(uiuj) 0
D) there must be an intercept in the regression
Answer: B
29) The IV estimator can be used to potentially eliminate bias resulting from
A) multicollinearity.
B) serial correlation.
C) errors in variables.
D) heteroskedasticity.
Answer: C
30) Instrumental Variables regression uses instruments to
A) establish the Mozart Effect.
B) increase the regression R2 .
C) eliminate serial correlation.
D) isolate movements in X that are uncorrelated with u.
Answer: D
31) Endogenous variables
A) are correlated with the error term.
B) always appear on the LHS of regression functions.
C) cannot be regressors.
D) are uncorrelated with the error term.
Answer: A
32) Consider the following two equations to describe labor markets in various sectors of the economy
W
Nd = 0 + 1
+u
P
W
Ns = 0 + 1
+v
P
Nd = Ns = N
A) W/P is exogenous, n is endogenous
B) Both n and W/P are endogenous
C) n is exogenous, W/P is endogenous
D) the parameters cannot be estimated because it would require two equations to be estimated at the same
time (simultaneously)
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 295
12.2 Essays and Longer Questions
1) Write a short essay about the Overidentifying Restrictions Test. What is meant exactly by “overidentification?”
State the null hypothesis. Describe how to calculate the J-statistic and what its distribution is. Use an example
of two instruments and one endogenous variable to explain under what situation the test will be likely to reject
the null hypothesis. What does this example tell you about the exactly identified case? If your variables pass
the test, is this sufficient for these variables to be good instruments?
Answer: The regression coefficients in the regression model with endogenous regressors can be either
underidentified, exactly identified, or overidentified. If the number of instruments (m) equals the
number of endogenous regressors (k), then the coefficients are exactly identified. If there are more
instruments than number of endogenous regressors, then the regression coefficients are overidentified.
For the instrumental variable estimator to exist, there must be at least as many instruments as
endogenous regressors (m k). In the case of overidentification, the exogeneity of the instruments can be
tested. Under the null hypothesis, all instruments are exogenous. Under the alternative hypothesis, at
least one of the instruments is endogenous. Technically, the overidentifying restrictions test uses the
TSLS residuals to see if these are correlated with the instruments. The residuals are regressed on the
instruments and the included exogenous regressors. Under the null hypothesis, all coefficients other
than the constant are zero. Since this is a case of joint hypothesis testing, the F-statistic is computed, and
from it the J-statistic, where J = mF. In large samples the distribution of this statistic is
2
m-k .
Calculating the J-statistic amounts to comparing different IV estimates. In the case of two instruments
and one endogenous regressor, where the degree of overidentification is one, two such estimates exist.
Due to sample variation, these estimates will differ, although they should be similar, or “close” to each
other. If one or both of the instruments is not exogenous, then the estimates will not be similar, or the
difference between the two will be sufficiently large so as not to be the result of pure sampling variation.
In this situation the null hypothesis will be rejected. This procedure can only be executed when the
coefficients are overidentified, since there is no comparison possible for the case of exactly identified
coefficients. Passing the test is not sufficient for the instruments to be valid since, in addition to being
exogenous, they must also be relevant, i.e., they must be correlated with the endogenous regressor.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 296
2) Using some of the examples from your textbook, describe econometric studies which required instrumental
variable techniques. In each case emphasize why the need for instrumental variables arises and how authors
have approached the problem. Make sure to include a discussion of overidentification, the validity of
instruments, and testing procedures in your essay.
Answer: The textbook mentions several studies which used instrumental variable estimation techniques, starting
with Whright’s problem to estimate demand and supply elasticities on animal and vegetable oils and
fats. This is a case of simultaneous causality bias since the price and quantity in the market are
determined by both the supply and demand for the commodity. Wright used the weather, which shifted
the supply curve only and thereby traced out the demand curve. Since there was only a single
instrument, the coefficients are exactly identified, and the validity of the instrument cannot not be tested.
Another example mentioned is the effect of class size on test scores. The reason for a correlation between
class size and the error term potentially stems from omitted variable bias here, such as the quality of the
teaching staff and outside opportunities for some of the students. In the hypothetical examples of an
earthquake, some schools may receive more students than usual dependent on the closeness to the
epicenter, if the school was unaffected structurally. The increase in class size is related to the closeness to
the epicenter, but this distance should be uncorrelated with the ability of the teaching staff and the
outside opportunities. As in the previous study, there is only a single instrument and hence no
possibility to use the overidentification test.
The primary example of instrumental variable estimation in the chapter involves estimation of the
demand elasticity for cigarettes. Due to simultaneity bias for the demand equation, sales taxes are used
as an instrument first in a cross section of states in a single year and later in a panel. Prices and quantities
are determined simultaneously by supply and demand, and as a result, prices will be correlated with the
error term in the demand equation. Sales taxes are fairly highly correlated with prices, explaining almost
half of the variation in these. It is argued that due to differences in choices about public finance due to
political considerations across states, these are exogenous. Only one instrument is used in the cross
section and hence there is no degree of overidentification. Later another instrument is introduced,
cigarette-specific taxes. With two instruments and one endogenous regressor, the J-statistic can be
computed for the overidentifying restrictions test.
Further examples discussed in the textbook include the effect of an increase in the prison population on
crime rates, further discussion of class size and test scores, and aggressive treatment of heart attacks and
the potential for saving lives.
3) Describe the consequences of estimating an equation by OLS in the presence of an endogenous regressor. How
can you overcome these obstacles? Present an alternative estimator and state its properties.
Answer: In the case of an endogenous regressor, there is correlation between the variable and the error term. In
this case, the OLS estimator is inconsistent. To get a consistent estimator in this situation, instrumental
variable techniques, such as TSLS, should be used. If one or more valid instruments can be found,
meaning that the instrument must be relevant and exogenous, then a consistent estimator can be
derived. The relevance of instruments can be tested using the rule of thumb (a first -stage F-statistic of
more than 10 in the TSLS estimator). The exogeneity of the instruments can be tested using the J-statistic.
The test requires that there is at least one more instrument than endogenous regressors, i.e., that the
equation is overidentified. In large samples the sampling distribution of the TSLS estimator is
approximately normal, so that statistical inference can proceed as usual using the t-statistic, confidence
intervals, or joint hypothesis tests involving the F-statistic. However, inference based on these statistics
will be misleading in the case where instruments are not valid.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 297
4) Write an essay about where valid instruments come from. Part of your explorations must deal with checking
the validity of instruments and what the consequences of weak instruments are.
Answer: In order for instruments to be valid, they have to be relevant and exogenous. To find valid instruments,
two approaches are typically used. First economic theory can serve as a guide. In the case of
simultaneous causality in a market, for example, theory predicts shifts in one curve but not the other as a
result of changes in an instrumental variable. The second approach focuses on shifts in the endogenous
regressor that is caused by an “exogenous source of variation” in the variable resulting from a random
phenomenon. The textbook uses the example of an earthquake which changes student teacher ratios as
students in affected areas have to be redistributed.
To check the validity of instruments, there is the rule of thumb to determine whether or not an
instrument is weak. It states that the F-statistic in the first stage of the TSLS procedure should exceed 10.
Instrument exogeneity can be tested only in the case of overidentification. If there are more instruments
than endogenous regressors, then the J-statistic can be calculated. The null hypothesis of exogeneity will
be rejected, in essence, if the TSLS residuals are correlated with the instruments.
If instruments are weak, then the TSLS estimator is biased and statistical inference does not yield reliable
confidence intervals even in large samples.
5) You have estimated a government reaction function, i.e., a multiple regression equation, where a government
instrument, say the federal funds rate, depends on past government target variables, such as inflation and
unemployment rates. In addition, you added the previous period’s popularity deficit of the government, e.g.
the (approval rating of the president – 50%), as one of the regressors. Your idea is that the Federal Reserve,
although formally independent, will try to expand the economy if the president is unpopular. One of your
peers, a political science student, points out that approval ratings depend on the state of the economy and
thereby indirectly on government instruments. It is therefore endogenous and should be estimated along with
the reaction function. Initially you want to reply by using a phrase that includes the words “money neutrality”
but are worried about a lengthy debate. Instead you state that as an economist, you are not concerned about
government approval ratings, and that government approval ratings are determined outside your (the
economic) model. Does your whim make the regressor exogenous? Why or why not?
Answer: In general, the question of whether or not a variable is endogenous or exogenous depends on its
correlation with the error term, not on the size of the underlying model. The point to make is that just
because a variable is endogenous does not imply that its determinants have to be modeled. If the
purpose of the exercise is to eventually simulate the model for policy purposes, then the feedback
envisioned by the political science student is potentially important. However, if the aim is simply to
forecast the behavior of the government reaction function, then the issue of endogeneity or exogeneity is
only relevant for questions regarding the type of estimator to be used. Of course, if a regressor is
endogenous, then instrumental variable techniques must be used to ensure desirable properties of the
estimator.
6) You have been hired as a consultant to estimate the demand for various brands of coffee in the market. You are
provided with annual price data for two years by U.S. state and the quantities sold. You want to estimate a
demand function for coffee using this data. What problems do you think you will encounter if you estimated
the demand equation by OLS?
Answer: Answers will differ by student. However, the following points should be mentioned: (i) there will be
simultaneous equation bias because quantity and price are determined simultaneously in the market. (ii)
If this is the case, then the OLS estimator will not be consistent. (iii) In that case, IV estimation should be
used to get a consistent estimator of the demand elasticity or response to a price increase. (iv) This brings
up the question of a valid instrument. It is not clear that students will come up with an easy answer, but
their deliberations should be insightful. One possible instrument is the price (change) from a previous
year, which most likely will be highly correlated with this year’s price (change) but not with the error
term in the equation. (v) There should be some discussion on the other factors determining coffee
demand, although some of these can be ignored if there is data for two periods and the data is
differenced (fixed effects).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 298
7) Studies of the effect of minimum wages on teenage employment typically regress the teenage employment to
population ratio on the real minimum wage or the minimum wage relative to average hourly earnings using
OLS. Assume that you have a cross section of United States for two years. Do you think that there are problems
with simultaneous equation bias?
Answer: For OLS not to be consistent, there would have to be omitted variable bias or simultaneous equation
bias. The former can be dealt with by differencing the data, if you assume that most other factors are
being held constant. If the minimum wage does not change between the two periods, i.e. it is constant,
then this will bring further problems with the interpretation, since the variation in the RHS variable only
comes from the denominator. In many ways, the question should come down to the correlation between
minimum wages and the error term in the equation. Students may argue that minimum wages are set by
the legislature or, more recently, by ballot, and are therefore exogenous. A more nuanced discussion may
point out that neither the legislature nor the electorate will raise minimum wages in time periods of low
employment (a recession — although the 2008 and 2009 raises will contradict this statement to some
extent; however, these were decided in 2006/2007 when the economy was booming). There may be
further problems because of the denominator of the minimum wage variable, either the CPI or AHE,
both of which are potentially correlated with teenage employment. The point here is for the student to
think about the problem at hand and to point out various obstacles to getting a good estimate of the
elasticity/response of employment from a minimum wage increase.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 299
12.3 Mathematical and Graphical Problems
1) To analyze the year-to-year variation in temperature data for a given city, you regress the daily high
temperature (Temp) for 100 randomly selected days in two consecutive years (1997 and 1998) for Phoenix. The
results are (heteroskedastic-robust standard errors in parenthesis):
PHX
PHX
Temp 1998 = 15.63 + 0.80 × Temp 1997 ; R2 = 0.65, SER = 9.63
(0.10)
(a) Calculate the predicted temperature for the current year if the temperature in the previous year was 40°F,
78°F, and 100°F. How does this compare with you prior expectation? Sketch the regression line and compare it
to the 45 degree line. What are the implications?
(b) You recall having studied errors-in-variables before. Although the web site you received your data from
seems quite reliable in measuring data accurately, what if the temperature contained measurement error in the
following sense: for any given day, say January 28, there is a true underlying seasonal temperature ( X), but
^
each year there are different temporary weather patterns (v, w) which result in a temperature X different from
X. For the two years in your data set, the situation can be described as follows:
~
~
X1997 = X + v 1997 and X1998 = X + w1998
~
~
~
~
Subtracting X1997 from X1998 , you get X1998 = X1997 + w1998 – v 1997 . Hence the population parameter for
the intercept and slope are zero and one, as expected. It is not difficult to show that the OLS estimator for the
slope is inconsistent, where
^
1
p
2
v
1–
2
x +
2
v
As a result you consider estimating the slope and intercept by TSLS. You think about an instrument and
consider the temperature one month ahead of the observation in the previous year. Discuss instrument validity
for this case.
(c) The TSLS estimation result is as follows:
PHX
PHX
Temp 1998 = -6.24 + 1.07× Temp 1997 ;
(0.06)
Perform a t-test on whether or not the slope is now significantly different from one.
Answer: (a) The three predicted temperatures will be 47.6, 78.0, and 95.6 respectively. The initial expectation
should be that the temperature in 1998 is the same in 1997 for a given date. The regression line and the
45 degree line are sketched in the accompanying figure. The implication is mean reversion: if the
temperature was low (40 degrees), then it will also be low the following year, but not as low.
Alternatively, if the temperature was high (100 degrees), then it will be high again, but not as high. If this
prediction extrapolated into the future, then eventually all temperatures should be the same for all days.
This obviously does not make sense.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 300
(b) For an instrument to be valid, two conditions have to hold. First, the instrument has to be relevant,
and second, the instrument has to be exogenous. If temperatures in one month ahead can predict the
current temperature, as it certainly does in Phoenix, then the instrument is relevant or correlated with
the current month’s temperature. If in addition, whatever caused the temperature in the current month
to deviate from its long-term value is only a temporary phenomenon, such as a weather system created
by a storm in the Pacific, then next month’s temperature should not be correlated with this event. Hence
the instrument would be exogenous.
(c) The t-statistic is 1.17, and hence you cannot reject the null hypothesis that the slope equals one.
2) Consider the following population regression model relating the dependent variable Yi and regressor Xi,
Yi = 0 + 1 Xi + ui, i = 1,…, n.
Xi
Yi + Zi
where Z is a valid instrument for X.
(a) Explain why you should not use OLS to estimate
1.
(b) To generate a consistent estimator for 1 , what should you do?
(c) The two equations above make up a system of equations in two unknowns. Specify the two reduced form
equations in terms of the original coefficients. (Hint: substitute the identity into the first equation and solve for
Y. Similarly, substitute Y into the identity and solve for X.)
(d) Do the two reduced form equations satisfy the OLS assumptions? If so, can you find consistent estimators of
the two slopes? What is the ratio of the two estimated slopes? This estimator is called “Indirect Least Squares.”
How does it compare to the TSLS in this example?
Answer: (a) Substitution of the first equation into the identity shows that X is correlated with the error term.
Hence estimation with OLS results in an inconsistent estimator.
SZY
^ 2SLS
(b) The instrumental variable estimator is consistent and in this case is 1
. Adventurous
=
SZX
students will derive this estimator along the lines shown in Appendix 10.2.
(c)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 301
Yi = 0 + 1 (Yi + Zi) + ui
Xi = ( 0 + 1 Xi + ui) + Zi
or
(1- 1 )Yi = 0 + 1 Zi + ui
(1- 1 )Xi = 0 + Zi + ui
Hence
Yi = 0 + 2 Zi + v 1i
Xi = 3 + 4 Zi + v 2i
1
1
1
0
,
,
, and v 1i = v 2i =
u.
where 0 = 3 =
=
=
1- 1 i
1- 1 2 1- 1 4 1- 1
(d) Since Z is a valid instrument by assumption, it must be uncorrelated with the error term and hence
SYZ
^
using OLS results in a consistent estimator. ^
2
4
=
SZZ
SXZ
SZZ
=
SYZ
which is identical to the TSLS estimator.
SZZ
3) Here are some examples of the instrumental variables regression model. In each case you are given the number
of instruments and the J-statistic. Find the relevant value from the
2
m-k distribution, using a 1% and 5%
significance level, and make a decision whether or not to reject the null hypothesis.
(a) Yi = 0 + 1 X1i + ui, i = 1,..., n; Z1i, Z2i are valid instruments, J = 2.58.
(b) Yi = 0 + 1 X1i + 2 X2i + 3 W1i + ui, i = 1,..., n; Z1i, Z2i, Z3i, Z4i are valid instruments,
J = 9.63.
(c) Yi = 0 + 1 X1i + 2 W1i + 3 W2i + 4 W3i + ui, i = 1,..., n; Z1i, Z2i, Z3i, Z4i are valid instruments, J = 11.86.
Answer: (a) The test statistic is distributed
2
1 and the critical values are 6.63 and 3.84 at the 1% and 5%
significance level. Hence you cannot reject the null hypothesis that all the instruments are exogenous.
(b) The test statistic is distributed
2
2 and the critical values are 9.21 and 5.99 at the 1% and 5%
significance level. Hence you can reject the null hypothesis that all the instruments are exogenous.
(c) The test statistic is distributed
2
3 and the critical values are 11.34 and 7.81 at the 1% and 5%
significance level. Hence you can reject the null hypothesis that all the instruments are exogenous.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 302
4) To study the determinants of growth between the countries of the world, researchers have used panels of
countries and observations spanning over long periods of time (e.g. 1965-1975, 1975-1985, 1985-1990). Some of
these studies have focused on the effect that inflation has on growth and found that although the effect is small
for a given time period, it accumulates over time and therefore has an important negative effect.
(a) Explain why the OLS estimator may be biased in this case.
(b) Explain how methods using panel data could potentially alleviate the problem.
(c) Some authors have suggested using an index of central bank independence as an instrumental. Discuss
whether or not such an index would be a valid instrument.
Answer: (a) The presence of simultaneous causality is highly likely since inflation may respond to growth.
Depending on the list of regressors, omitted variables can also bias the estimator for the effect of the
inflation rate.
(b) Country fixed effects or differencing the data can solve the problem if inflation stays relatively
constant over time from one country to the other. Unfortunately if the effect of inflation on growth is the
focus of the study, then much of the cross-sectional information is lost using this approach.
(c) For this index to be valid, central bank independence has to be relevant and exogenous. If inflation
rates are correlated with the index, then central bank independence is a relevant instrument. Although
there is a high correlation for developed countries, there is little to no correlation when data for all
countries is considered. Whether or not the index is exogenous cannot be tested unless the coefficients of
the equation are overidentified. Otherwise personal judgment is the only guide. An argument that
central bank independence is exogenous would have to rely on it being based on institutional
arrangements which are independent of inflation. Although the independence of central banks in many
countries was initially determined by concerns independent of inflation, there have been many
situations where the institutional arrangements were altered as a result of high inflation.
5) (Requires Matrix Algebra) The population multiple regression model can be written in matrix form as
Y=X +U
where
Y1
u1
u2
1 X11 N Xk1
Y
X
X
Y= 2 ,U=
, X = 1 12 N k2
O
O
OO R O
Yn
un
1 X1n N Xkn
W11 N Wr1
W12 N Wr2
, and
O
R O
W1n N Wrn
0
=
1
O
k
Note that the X matrix contains both k endogenous regressors and (r +1) included exogenous regressors (the
constant is obviously exogenous).
The instrumental variable estimator for the overidentified case is
^ IV
= [X Z(Z Z)-1 Z X]-1 X Z(Z Z)-1 Z Y,
where Z is a matrix, which contains two types of variables: first the r included exogenous regressors plus the
constant, and second, m instrumental variables.
1 Z11 N Zm1 W11 N
Z
Z
W
Z = 1 12 N m2 12 N
OO R O
O
R
1 Z1n N Zmn W1n N
Wr1
Wr2
O
Wrn
It is of order n × (m+r+1).
For this estimator to exist, both (Z Z) and [X Z(Z Z)-1 Z X] must be invertible. State the conditions under
which this will be the case and relate them to the degree of overidentification.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 303
Answer: In order for a matrix to be invertible, it must have full rank. Since Z Z is of order (m + r + 1) × (m + r + 1),
then in order to invert Z Z, it must have rank (m + r + 1). In the case of a product such as Z Z, the rank is
at most less than or equal to the rank of Z or Z, whichever is smaller. Z is of order n × (m + r + 1), and
assuming that there is no perfect multicollinearity, will have either rank n or rank (m + r + 1), whichever
is the smaller of the two. Hence if there are fewer observations than the number of instrumental
variables plus exogenous variables, then the rank of Z will be n(< m + r + 1), and the rank of Z Z is also
n(< m + r + 1). Hence Z Z does not have full rank, and therefore cannot be inverted. The IV estimator
does not exist as a result. In the past, this was considered a strong possibility with large econometric
models, where many predetermined variables entered.
If there are more observations than instruments, then the rank of Z Z is ( m + r + 1). X Z will be of order
(k + r + 1) × (m + r + 1), which will have rank (k + r + 1) if m > k, i.e., if there is overidentification.
Furthermore [X Z(Z Z)-1 Z X] is of order (k + r + 1) × (k + r + 1) and will have full rank since the rank of
a product of the three matrices involved is at most the rank of the minimum of the three matrices X Z,
Z Z, and Z X.
6) Consider the following model of demand and supply of coffee:
Coffee
Coffee
Tea
Demand: Q i
= 1P i
+ 2P i
+ ui
Coffee
Coffee
Tea
Supply: Q i
= 3P i
+ 4P i
+ 5 Weather + v i
(variables are measure in deviations from means, so that the constant is omitted).
What are the expected signs of the various coefficients this model? Assume that the price of tea and Weather are
exogenous variables. Are the coefficients in the supply equation identified? Are the coefficients in the demand
equation identified? Are they overidentified? Is this result surprising given that there are more exogenous
regressors in the second equation?
Answer: Changes in Weather will shift the supply equation and thereby trace out the demand equation. Hence the
coefficients of the demand equation are exactly identified since the number of instruments equals the
number of endogenous regressors. However the coefficients of the supply equation are underidentified
since there is no instrumental variable available for estimation. The result is not surprising, since it is not
the number of exogenous regressors in the equation that matters when determining whether or not the
coefficients are identified. Instead what matters is the number of instruments available relative to the
number of endogenous regressors. It is possible that the regression coefficients can be (over)identified
even if there are no exogenous regressors present in the equation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 304
7) You started your econometrics course by studying the OLS estimator extensively, first for the simple regression
case and then for extensions of it. You have now learned about the instrumental variable estimator. Under what
situation would you prefer one to the other? Be specific in explaining under which situations one estimation
method generates superior results.
Answer: Under the OLS assumptions, the OLS estimator is unbiased and consistent. The sampling distribution of
the estimator is approximately normal in large samples. Hence statistical inference can proceed as usual
using the t-statistic, confidence intervals, or joint hypothesis tests involving the F-statistic.
One major concern throughout the text has been the development of new estimation techniques in the
case where one of the OLS assumptions is violated, specifically that there is correlation between the error
term and at least one of the regressors. This may be the result of omitted variables, error -in-variables, or
simultaneous causality bias. These make up three of the threats to internal validity. In each of these
cases, OLS becomes biased and an alternative estimator should be used.
Even if the OLS assumptions are violated and the OLS estimator is biased because of omitted variable
bias, simultaneous causality, or errors-in-variables, using TSLS will not improve the situation if the
instruments are not valid. In that case, TSLS will yield inconsistent estimators if the instruments are not
exogenous. It will be biased and statistical inference will not be valid if the instruments are weak.
Furthermore, the estimator will not even normally distributed in large samples.
If the instruments are valid and the other IV regression assumptions hold, then the TSLS estimator is
consistent and therefore preferable over the OLS estimator. Although its distribution is complicated in
small samples, the sampling distribution of the estimator is approximately normal in large samples.
Hence statistical inference can proceed as usual using the t-statistic, confidence intervals, or joint
hypothesis tests involving the F-statistic.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 305
8) Your textbook gave an example of attempting to estimate the demand for a good in a market, but being unable
to do so because the demand function was not identified. Is this the case for every market? Consider, for
example, the demand for sports events. One of your peers estimated the following demand function after
collecting data over two years for every one of the 162 home games of the 2000 and 2001 season for the Los
Angeles Dodgers.
Attend = 15,005 + 201 × Temperat + 465 × DodgNetWin + 82 × OppNetWin
(8,770) (121)
(169)
(26)
+ 9647 × DFSaSu + 1328 × Drain + 1609 × D150m + 271 × DDiv – 978 × D2001;
(1505)
(3355)
(1819)
(1,184)
(1,143)
R2 = 0.416, SER = 6983
Where Attend is announced stadium attendance, Temperat it the average temperature on game day,
DodgNetWin are the net wins of the Dodgers before the game (wins-losses), OppNetWin is the opposing team’s
net wins at the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables,
taking a value of 1 if the game was played on a weekend, it rained during that day, the opposing team was
within a 150 mile radius, plays in the same division as the Dodgers, and during 2001, respectively. Numbers in
parenthesis are heteroskedasticity- robust standard errors.
Even if there is no identification problem, is it likely that all regressors are uncorrelated with the error term? If
not, what are the consequences?
Answer: In the case of sports events, often price and quantity are not simultaneously determined by supply and
demand. For baseball games, the supply of seats is fixed at the capacity level of the stadium. In addition,
prices for games are also fixed in advance and do not vary with the attractiveness of the opponent.
Therefore the supply curve is infinitely elastic up to the point of where the game is sold out. This
situation is complicated by ticket scalping and the fact that teams stage special events (fireworks, etc.).
Taking these considerations into account may result in simultaneous causality bias, or a threat to internal
validity because of the identification problem.
However, assuming that there is no identification problem, there may still be omitted variable bias or
errors-in-variables bias. For example, attendance typically increases the tighter the race for a play -off
spot towards the end of the season. Furthermore, it is not the opposing team’s net wins at the end of the
previous season that accounts for the attractiveness of the opponent, but the performance during the
current season. If the opposing team’s current performance is related to its performance in the previous
season, then the OLS estimator is biased.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 306
9) Earnings functions, whereby the log of earnings is regressed on years of education, years of on the job training,
and individual characteristics, have been studied for a variety of reasons. Some studies have focused on the
returns to education, others on discrimination, union non-union differentials, etc. For all these studies, a major
concern has been the fact that ability should enter as a determinant of earnings, but that it is close to impossible
to measure and therefore represents an omitted variable.
Assume that the coefficient on years of education is the parameter of interest. Given that education is positively
correlated to ability, since, for example, more able students attract scholarships and hence receive more years of
education, the OLS estimator for the returns to education could be upward biased. To overcome this problem,
various authors have used instrumental variable estimation techniques. For each of the instruments potential
instruments listed below briefly discuss instrument validity.
(a) The individual’s postal zip code.
(b) The individual’s IQ or testscore on a work related exam.
(c) Years of education for the individual’s mother or father.
(d) Number of siblings the individual has.
Answer: (a) Instrumental validity has two components, instrument relevance (corr(Zi, Xi)
0, and instrument
exogeneity (corr(Zi, ui) = 0). The individual’s postal zip code will certainly be uncorrelated with the
omitted variable, ability, even though some zip codes may attract more able individuals. However, this
is an example of a weak instrument, since it is also uncorrelated with years of education.
(b) There is instrument relevance in this case, since, on average, individuals who do well in intelligence
scores or other work related test scores, will have more years of education. Unfortunately there is bound
to be a high correlation with the omitted variable ability, since this is what these tests are supposed to
measure.
(c) A non-zero correlation between the mother’s or father’s years of education and the individual’s years
of education can be expected. Hence this is a relevant instrument. However, it is not clear that the parent
’s years of education are uncorrelated with parent’s ability, which in turn, can be a major determinant of
the individual’s ability. If this is the case, then years of education of the mother or father is not a valid
instrument.
(d) There is some evidence that the larger the number of siblings of an individual, the less the number of
year of education the individual receives. Hence number of siblings is a relevant instrument. It has been
argued that number of siblings is uncorrelated with an individual’s ability. In that case it also represents
an exogenous instrument. However, there is the possibility that ability depends on the attention an
individual receives from parents, and this attention is shared with other siblings.
10) The two conditions for instrument validity are corr(Zi, Xi)
0 and corr(Zi, ui) = 0. The reason for the
inconsistency of OLS is that corr(Xi, ui) 0. But if X and Z are correlated, and X and u are also correlated, then
how can Z and u not be correlated? Explain.
Answer: The introduction to Chapter 10 on instrumental variables regression and section 10.1 went into a lengthy
explanation of this problem. The major idea is that corr(Xi, ui) has two parts: one for which the
correlation is zero and a second for which it is non-zero. The trick is to isolate the uncorrelated part of X.
For the instrument to be valid, corr(Zi, ui) = 0 and corr(Zi, Xi) 0 must hold. TSLS then generates
predicted values of X in the first stage by using a linear combination of the instruments. As long as
corr(Zi, Xi) 0 and corr(Zi, ui) = 0, then the part of X which is uncorrelated with the error term is
extracted through the prediction. In the second stage, this captured exogenous variation in X is then used
to estimate the effect of X on Y, which is exogenous.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 307
11) Consider the a model of the U.S. labor market where the demand for labor depends on the real wage, while the
supply of labor is vertical and does not depend on the real wage. You could argue that the supply of labor by
households (think of hours supplied by two adults and two children) has not changed much over the last 60
years or so in the U.S. while real wages more than doubled over the same time span. At first that seems strange
given the higher participation rate of females over that period, but that increase has been countered by a lower
male participation rate (resulting from earlier retirement), an increase in legal holidays, and an increase in
vacation days.
a.
Write down two equations representing the labor supply and labor demand function, allowing for an
error term in each of the demand and supply equation. In addition, assume that the labor market
clears.
b.
How would you estimate the labor supply equation?
c.
Assuming that the error terms are mutually independent i.i.d. random variables, both with mean zero,
show that the real wage and the error term of the labor demand equation are correlated.
d. If you find a non-zero correlation, should you estimate the labor demand equation using OLS? If so,
what are the consequences?
e.
Estimating the labor demand equation by IV estimation, which instrument suggests itself
immediately?
Answer: a. Student may use different symbols, but will end up with something like the following specification:
W
Nd = 0 + 1
+u
P
Ns = 0 + v
Nd = Ns = N
^
b. The labor supply equation can be estimated by OLS. 0 = N
c. Using the above symbols, it can be shown that E(W/P, u) = -
2
u
1
d. OLS will not be consistent.
e. Hours worked per household is correlated with the real wage but not correlated with the error term
(here u) in the labor demand equation. Hence it is a valid instrument.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 308
Chapter 13 Experiments and Quasi-Experiments
13.1 Multiple Choice
1) The following are reasons for studying randomized controlled experiment in an econometrics course, with the
exception of
A) at a conceptual level, the notion of an ideal randomized controlled experiment provides a benchmark
against which to judge estimates of causal effects in practice.
B) when experiments are actually conducted, their results can be very influential, so it is important to
understand the limitations and threats to validity of actual experiments as well as their strength.
C) randomized controlled experiments in economics are common.
D) external circumstances sometimes produce what appears to be randomization.
Answer: C
2) Program evaluation
A) is conducted for most departments in your university/college about every seven years.
B) is the field of study that concerns estimating the effect of a program, policy, or some other intervention or
“treatment.”
C) tries to establish whether EViews, SAS or Stata work best for your econometrics course.
D) establishes rating systems for television programs in a controlled experiment framework.
Answer: B
3) In the context of a controlled experiment, consider the simple linear regression formulation Yi = 0 + 1 Xi + ui.
Let the Yi be the outcome, Xi the treatment level, and ui contain all the additional determinants of the outcome.
Then
A) the OLS estimator of the slope will be inconsistent in the case of a randomly assigned Xi since there are
omitted variables present.
B) Xi and ui will be independently distributed if the Xi be are randomly assigned.
C) 0 represents the causal effect of X on Y when X is zero.
D) E(Y X = 0)is the expected value for the treatment group.
Answer: B
4) In the context of a controlled experiment, consider the simple linear regression formulation Yi = 0 + 1 Xi + ui.
Let the Yi be the outcome, Xi the treatment level when the treatment is binary, and ui contain all the additional
^
determinants of the outcome. Then calling 1 a differences estimator
A) makes sense since it is the difference between the sample average outcome of the treatment group and the
sample average outcome of the control group.
^
B) and 0 the level estimator is standard terminology in randomized controlled experiments.
C) does not make sense, since neither Y nor X are in differences.
D) is not quite accurate since it is actually the derivative of Y on X.
Answer: A
5) The following does not represent a threat to internal validity of randomized controlled experiments:
A) attrition.
B) failure to follow the treatment protocol.
C) experimental effects.
D) a large sample size.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 309
6) The Hawthorne effect refers to
A) subjects dropping out of the study after being randomly assigned to the treatment or control group.
B) the failure of individuals to follow completely the randomized treatment protocol.
C) the phenomenon that subjects in an experiment can change their behavior merely by being included in
the experiment.
D) assigning individuals, in part, as a result of their characteristics or preferences.
Answer: C
7) The following is not a threat to external validity:
A) the experimental sample is not representative of the population of interest.
B) the treatment being studied is not representative of the treatment that would be implemented more
broadly.
C) experimental participants are volunteers.
D) partial compliance with the treatment protocol.
Answer: D
8) Assume that data are available on other characteristics of the subjects that are relevant to determining the
experimental outcome. Then including these determinants explicitly results in
A) the limited dependent variable model.
B) the differences in means test.
C) the multiple regression model.
D) large scale equilibrium effects.
Answer: C
9) All of the following are reasons for using the differences estimator with additional regressors, with the
exception of
A) efficiency.
B) providing a check for randomization.
C) providing an adjustment for “conditional” randomization.
D) making the difference estimator easier to calculate than in the case of the differences estimator without
the additional regressors.
Answer: D
10) Experimental data are often
A) observational data.
B) binary data, in that the subject either does or does not respond to the treatment.
C) panel data.
D) time series data.
Answer: C
11) With panel data, the causal effect
A) cannot be estimated since correlation does not imply causation.
B) is typically estimated using the probit regression model.
C) can be estimated using the “differences-in-differences” estimator.
D) can be estimated by looking at the difference between the treatment and the control group after the
treatment has taken place.
Answer: C
12) Causal effects that depend on the value of an observable variable, say Wi,
A) cannot be estimated.
B) can be estimate by interacting the treatment variable with Wi.
C) result in the OLS estimator being inefficient.
D) requires use of homoskedasticity-only standard errors.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 310
13) To test for randomization when Xi is binary,
A) you regress Xi, on all W’s and compute the F-statistic for testing that all the coefficients on the W’s are
zero. (The W’s measure characteristics of individuals, and these are not affected by the treatment.)
B) is not possible, since binary variables can only be regressors.
C) requires reordering the observations randomly and re-estimating the model. If the coefficients remain the
same, then this is evidence of randomization.
D) requires seeking external validity for your study.
Answer: A
14) The following estimation methods should not be used to test for randomization when Xi, is binary:
A) linear probability model (OLS) with homoskedasticity-only standard errors.
B) probit.
C) logit.
D) linear probability model (OLS) with heteroskedasticity-robust standard errors.
Answer: A
15) In a quasi-experiment
A) quasi differences are used, i.e., instead of Y you need to use (Yafter - × Ybefore), where 0 < < 1.
B) randomness is introduced by variations in individual circumstances that make it appear as if the
treatment is randomly assigned.
C) the causal effect has to be estimated through quasi maximum likelihood estimation.
D) the t-statistic is no longer normally distributed in large samples.
Answer: B
16) Your textbooks gives several examples of quasi experiments that were conducted. The following is not an
example of a quasi experiment:
A) labor market effects of immigration.
B) effects on civilian earnings of military service.
C) the effect of cardiac catheterization.
D) the effect of unemployment on the inflation rate.
Answer: D
17) A repeated cross-sectional data set
A) is also referred to as panel data.
B) is a collection of cross-sectional data sets, where each cross-sectional data set corresponds to a different
time period.
C) samples identical entities at least twice.
D) is typically used for estimating the following regression model
Yit = 0 + 1 Xit + 2 W1,it + ... + 1+ rWr,it + uit
Answer: B
18) For quasi-experiments,
A) there is a particularly important potential threat to internal validity, namely whether the “as if”
randomization in fact can be treated reliably as true randomization.
B) there are the same threats to internal validity as for true randomized controlled experiments, without
modifications.
C) there is little threat to external validity, since the populations are typically already different.
D) OLS estimation should not be used.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 311
19) Experimental effects, such as the Hawthorne effect,
A) generally are not germane in quasi-experiments.
B) typically require instrumental variable estimation in quasi-experiments.
C) can be dealt with using binary variables in quasi-experiments.
D) are the most important threat to internal validity in quasi-experiments.
Answer: A
20) Heterogeneous population
A) implies that heteroskedasticity-robust standard errors must be used.
B) suggest that multiple characteristics must be used to describe the population.
C) effects can be captured through interaction terms.
D) refers to circumstances in which there is unobserved variation in the causal effect with the population.
Answer: D
21) If the causal effect is different for different people, then the population regression equation for a binary
treatment variable Xi, can be written as
A) Yi = 0 +
B) Yi = 0 +
1 Xi + ui.
1iXi + ui.
C) Yi = 0i + 1iXi + ui.
D) Yi = 0 + 1 Gi + 2 Dt + ui.
Answer: C
22) In the case of heterogeneous causal effects, the following is not true:
A) in the circumstances in which OLS would normally be consistent (when E(ui Xi) = 0), the OLS estimator
continues to be consistent.
B) OLS estimation using heteroskedasticity-robust standard errors is identical to TSLS.
C) the OLS estimator is properly interpreted as a consistent estimator of the average causal effect in the
population being studied.
D) the TSLS estimator in general is not a consistent estimator of the average causal effect if an individual’s
decision to receive treatment depends on the effectiveness of the treatment for that individual.
Answer: B
23) One of the major lessons learned in the chapter on experiments and quasi -experiments
A) is that there are almost no true experiments in economics and that quasi-experiments are a poor
substitute.
B) you should always use TSLS when estimating causal effects in quasi -experiments.
C) populations are always homogeneous.
D) is that the insights of experimental methods can be applied to quasi -experiments, in which special
circumstances make it seem “as if” randomization has occurred.
Answer: D
24) Quasi-experiments
A) provide a bridge between the econometric analysis of observational data sets and the statistical ideal of a
true randomized controlled experiment.
B) are not the same as experiments, and lessons learned from the use of the latter can therefore not be
applied to them.
C) most often use difference-in-difference estimators, which are quite different from OLS and instrumental
variables methods studied in earlier chapters of the book.
D) use the same methods as studied in earlier chapters of the book, and hence the interpretation of these
methods is the same.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 312
25) The major distinction between the experiments and quasi-experiments chapter and earlier chapters is the
A) frequent use of binary variables.
B) type of data analyzed and the special opportunities and challenges posed when analyzing experiments
and quasi-experiments.
C) superiority of TSLS over OLS.
D) use of heteroskedasticity-robust standard errors.
Answer: B
26) A potential outcome
A) is the outcome for an individual under a potential treatment.
B) cannot be observed because most individuals do not achieve their potential.
C) is the same as a causal effect.
D) is none of the above.
Answer: A
27) A causal effect for a single individual
A) can be deduced from the average treatment effect.
B) cannot be measured.
C) depends on observable variables only.
D) is observable since it is used as part of calculating the mean of individual causal effects.
Answer: B
28) Randomization based on covariates is
A) not of practical importance since individuals are hardly ever assigned in this fashion.
B) dependent on the covariances of the error term (serial correlation).
C) a randomization in which the probability of assignment to the treatment group depends on one of more
observable variables W.
D) eliminates the omitted variable bias when using the difference estimator based on Yi = 0 + 1 Xi + ui ,
where Y is the outcome variable and X is the treatment indicator.
Answer: C
29) Testing for the random receipt of treatment
A) is not possible, in general.
B) entails testing the hypothesis that the coefficients on W1i, …, Wri are non-zero in a regression of Xi on W1i,
…, Wr .
C) is not meaningful since the LHS variable is binary.
D) entails testing the hypothesis that the coefficients on W1i, …, Wri are zero in a regression of Xi on W1i, …,
Wr .
Answer: D
30) Failure to follow the treatment protocol means that
A) the OLS estimator cannot be computed.
B) instrumental variables estimation of the treatment effect should be used where the initial random
assignment is the instrument for the treatment actually received.
C) you should use the TSLS estimator and regress the outcome variable Y on the initial random assignment
in the first stage to get predicted values of the outcome variable.
D) the Hawthorne effect plays a crucial role.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 313
31) Small sample sizes in an experiment
A) biases the estimators of the causal effect.
B) may pose a problem because the assumption that errors are normally distributed is dubious for
experimental data.
C) do not raise threats to the validity of confidence intervals as long as heteroskedasticity -robust standard
errors are used.
D) may affect confidence intervals but not hypothesis tests.
Answer: B
32) A repeated cross-sectional data set is
A) a collection of cross-sectional data sets, where each cross-sectional data set corresponds to a different
time period.
B) the same as a balanced panel data set.
C) what Card and Krueger used in their study of the effect of minimum wages on teenage employment.
D) time series.
Answer: A
33) In a sharp regression discontinuity design,
A) crossing the threshold influences receipt of the treatment but is not the sole determinant.
B) the population regression line must be linear above and below the threshold.
C) Xi will in general be correlated with ui.
D) receipt of treatment is entirely determined by whether W exceeds the threshold.
Answer: D
34) Threats to internal validity of quasi-experiments include
A) failure of randomization.
B) failure to follow the treatment protocol.
C) attrition.
D) all of the above with some modifications from true randomized controlled experiments.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 314
13.2 Essays and Longer Questions
1) You want to study whether or not the use of computers in the classroom for elementary students has an effect
on performance. Explain in some detail how you would ideally set up such an experiment and what threats to
internal and external validity there might be.
Answer: Answer will differ by students. Students have the choice to suggest a quasi-experiment or a controlled
experiment. Although it is possible to focus on states where elementary schools have introduced the use
of computers in the classroom, and compare the change in test scores with those in states which have not
done so, it is more likely that students will concentrate on a controlled experiment design here. The
answer should emphasize the initial random selection of pupils, classrooms, or schools, from the
population of a state, or the nation, and the random assignment to a treatment group. Furthermore,
teachers must also be randomly assigned. In essence, the random selection and assignment is to ensure
that E(ui Xi) = 0 holds. X could be a binary variable, indicating whether computers were introduced in
the classroom, or it could indicate the intensity with which computers were used.
Answers should mention each of the threats to internal and external validity. Failure to randomize might
occur because the treatment group could be assigned according to the performance level of students or
computers already being used in classrooms, or the previous experience students had with computers.
Teachers may be chosen depending on their knowledge of computers and software. Failure to follow
treatment protocol is less of a risk here. Attrition is a problem if parents move to another school district or
private schools as a result of the assignment to the treatment and control groups. Experimental effects are
hard to avoid in this situation since it does not make sense to have a double blind experiment. Small
samples should not be a problem in this set-up. Threats to external validity include nonrepresentative
samples, which are unlikely to occur here unless there are a large number of volunteers. Similarly,
nonrepresentative programs or policy should not pose a problem. There may be general equilibrium effects if
more technically oriented teachers have to be hired or others have to be reeducated. Finally, there may
be treatment vs. eligibility effects if there is a choice to opt in or out of the treatment and control group.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 315
2) Canada and the United States had approximately the same aggregate unemployment rates from the 1920s to
1981. In 1982, a two percentage point gap appears, which has roughly persisted until today, with the Canadian
unemployment rate in the third quarter of 2002 being 7.6 percent while the American rate stood at 5.9 percent
in the same period. Several authors have investigated this phenomenon. One study, published in 1990,
contained the following statement: “It is a clich that, as compared to analysis in the physical sciences,
economic analysis is hampered by the lack of controlled experiments. In this regard, study of the Canadian
economy can be much facilitated by comparison with the behaviour of the US …” Discuss what the authors
may have had in mind. List some potential threats to internal and external validity when comparing aggregate
unemployment rate behavior between countries.
Answer: It should be clear that the authors were not really talking about a controlled experiment, but instead had
in mind a quasi-experiment or natural experiment. In a randomized controlled experiment to study the
effect of unemployment insurance benefits on unemployment, for example, unemployed workers would
be “treated” with various degrees of unemployment insurance generosity, such as the amount by which
their former wages are replaced by unemployment insurance benefits (“replacement rate”), the duration
of benefits, the scrutiny of the agency monitoring the job search effort, etc. Instead the authors must have
thought that the two economies were similar in many aspects, and that because of an external event,
either in Canada or in the U.S., one was subjected to a treatment, while the other was not, which resulted
in the aggregate unemployment rate difference. It is the difference in location (living in the U.S. vs. in
Canada) that gives the resemblance to a randomly assigned treatment. The above study is of the first
type of quasi-experiments discussed in the textbook whereby the treatment received is viewed as if
randomly determined.
One threat to external validity is to generalize the results from a U.S. -Canada comparison to other
cultural and less developed economies. Also, consider unemployment insurance generosity as a
treatment variable. (Canada liberalized unemployment benefits considerably in the early ‘70s). In that
case E(ui Xi) = 0 is unlikely to hold, and additional regressors and instrumental variable techniques
should be used.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 316
3) Earnings functions provide a measure, among other things, of the returns to education. It has been argued
these regressions contain a serious omitted variable bias due to differences in abilities. Furthermore, ability is
hard to measure and bound to be highly correlated with years of schooling. Hence the standard estimate of
about a 10 percent return to every year of schooling is upward biased. Suggest some ways to address this
problem. One famous study looked at earnings of identical twins. Explain how this can be viewed as a
quasi-experiment, and mention some of the threats to internal and external validity that such a study might
encounter.
Answer: Answers will vary by student. The omitted variable bias should play a central part in the discussion. E(ui
Xi, W1i,..., Wri) = 0 will not hold if one of the W’s is years of education and u contains unobserved
ability. If ability causes individuals to have higher earnings and longer years of education, perhaps
through obtaining university scholarships easier, then the returns to education are biased upward. One
way to circumvent this problem is, as some studies have done in the past, to approximate ability by IQ
scores. If IQ scores measure ability with error, then instrumental variable techniques can be employed.
These were discussed in Chapter 10 of the textbook. Another possibility is to model ability as an omitted
variable that remains constant over time. In that case, panel estimation methods with fixed effects,
presented in Chapter 8 of the textbook, can be used. Data can be differenced to eliminate the entity fixed
effects or binary variables can be added to capture them. At any rate, this approach requires data being
available for more than a single point in time. The use of data from identical twins is fascinating since
these have identical genes and, typically, identical family backgrounds. The suggestion is therefore to
assume that they have identical ability as well. If some twins have different years of schooling while
others do not, then this can be treated as a quasi-experiment since the researcher can view this choice as
if it had been randomly assigned. Obviously it cannot count as a randomized controlled experiment,
since the difference in schooling was not determined by the flip of a coin, say. But it may also run into
problems in providing an as if randomization. The text flagged some of the potential problems in section
11.1: “Initially, one might think that an ideal experiment would take two otherwise identical individuals,
treat one of them, and compare the difference in their outcomes while holding constant all other
influences. This is not, however, a practical experimental design, for it is impossible to find two identical
individuals: even identical twins have different life experiences, so they are not identical in every way.”
Finally, if identical twins are “different” from the general population, then there is also a threat to
external validity by generalizing the results for the population of all individuals.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 317
4) Describe the major differences between a randomized controlled experiment and a quasi -experiment.
Answer: Answers will vary by student. Some of the following points should appear.
A randomized controlled experiment relies on the random selection of entities from a population of interest,
and the random assignment of these individuals into either a treatment or control group. To study the
causal effects, a simple regression model with a single regressor can be specified. This regressor can
either be a binary variable or a variable indicating treatment levels. Since E(ui Xi) = 0 is guaranteed if
the assignment and selection was random, then the causal or treatment effect can be measured through
E(Yi X = x) - E(Yi X = 0). The random selection and assignment assures that there is no omitted
variable bias, and therefore the OLS estimator is unbiased. Adding additional regressors can result in
increased efficiency. Alternatively a differences-in-differences estimator with or without additional
regressors is also available if the entities have been observed for two periods, one before and one after
the treatment. In the case of more than two observations per entity, panel methods can be employed.
There are various threats to internal and external validity. These include failure to randomize, failure to
follow treatment protocol, attrition, experiment effects, and small samples (threats to internal validity),
and nonrepresentative sample, nonrepresentative program or policy, general equilibrium effects, and
treatment vs. eligibility effects (threats to external validity).
A quasi-experiment is also called a “natural experiment” since the treatment of some entities resulted
from an external event. The treatment is administered “as if” it was random. The reason for observing
quasi-experiments more often in economics is that they are less expensive and raise less of an ethical
concern. The “as if” randomly assigned treatment is the result of, as the textbook puts it, “vagaries in
legal institutions, location, timing of policy or program implementation, natural randomness such as
birth dates, rainfall, or other factors that are unrelated to the causal effect under study.” There are two
types of quasi-experiments, one whereby treatment is viewed as if randomly determined, the other
whereby the “as if” randomization provides an instrumental variable. Threats to internal and external
validity are the same as for randomized controlled experiments once they are modified. For example,
experimental effects are typically absent since individuals are not aware that they are part of an
experiment. Small samples is replaced by instrument validity in quasi -experiments.
5) Roughly ten percent of elementary schools in California have a system whereby 4 th to 6th graders share a
common classroom and a single teacher (multi-age, multi-grade classroom). Suggest an experimental design
that would allow you to assess the effect of learning in this environment. Mention some of the threats to
internal and external validity and how you would attempt to circumvent these.
Answer: Students should be selected randomly within a school and should be randomly assigned to a treatment
group (multi-age, multi-grade classroom) and a control group (traditional grade assignment; 4 th, 5th,
and 6th grade only per room). Alternatively, and depending on the size of the experiment, a subset of
schools could be chosen and some pupils would randomly be assigned to traditional grade assignments
while others would be moved into multi-age, multi-grade classrooms. Another alternative would be to
simply choose some schools randomly which would have multi-age, multi-grade classrooms only. The
causal effect could then be estimated in a simple regression model with a binary regressor. Random
selection and random assignment would assure E(ui Xi) = 0 and thereby eliminate one threat to internal
validity through omitted variable bias.
Another threat to internal validity would be if the worst or best performing schools were chosen instead
of using a random selection, or if parents in the district were allowed to vote whether or not to have the
school selected for the experiment. This would imply a failure to randomize. If students were allowed to
refuse to participate by transferring to a neighboring school, then this would represent failure to follow
treatment protocol. Double blind experiments are obviously not feasible since both instructors and
students know into which setting they are being placed (“experimental effects”). There are few threats to
external validity except for the situation whereby students would be allowed to opt in or out of the
experimental group (“treatment vs. eligibility effect”).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 318
6) Assume for the moment that the student-teacher ratio effect on test scores was large enough that you would
advocate reducing class sizes in elementary schools. In 1996, the State of California reduced class sizes from
K-3 to no more than 20 students across all public elementary schools (Class Size Reduction Act) at a cost of
approximately $2 billion. In a short essay, discuss why the general equilibrium effects might differ from the
results obtained using experiments.
Answer: The General Equilibrium effects are the result of the additional demand for teachers. Each elementary
school needed additional teachers in order to reduce the class size to 20 or less — think of a school that
had perhaps 3 Kindergarten classes of 25 students each. In that case, one additional classroom had to be
created — typically some temporary structure. The question arises where the additional teacher came
from. If your school district was a desirable district to teach in, perhaps because of having a reputation of
well behaved children or classrooms that were well equipped, then teachers from other districts, perhaps
less desirable ones, would apply to the better school district. Presumably the desirable school district
would pick the best teacher(s) available, leaving the less desirable school district with a lower level of
teacher quality. The same phenomenon would repeat itself at the lower level school district, and so forth,
until you would get to the least desirable school district, which would have to hire new teachers from a
cohort that could not find a job elsewhere. Given the size of the State of California, the General
Equilibrium effect could be substantial, perhaps even drawing quality teachers from other states.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 319
13.3 Mathematical and Graphical Problems
1) Your textbook mentions use of a quasi-experiment to study the effects of minimum wages on employment
using data from fast food restaurants. In 1992, there was an increase in the (state) minimum wage in one U.S.
state (New Jersey) but not in neighboring location (Eastern Pennsylvania). To calculate the
^ diffs-in-diffs
1
you
need the change in the treatment group and the change in the control group. To do this, the study provides you
with the following information
FTE Employment
before
FTE Employment
after
PA
23.33
NJ
20.44
21.17
21.03
Where FTE is “full time equivalent” and the numbers are average employment per restaurant.
(a) Calculate the change in the treatment group, the change in the control group, and finally
Since minimum wages represent a price floor, did you expect
(b) If you look at
^ diffs-in-diffs
1
^ diffs-in-diffs
1
^ diffs-in-diffs
1
.
to be positive or negative?
, is this number primarily due to a change in the treatment group or the control
group? Is this what you expected?
(c) The standard error for
^ diffs-in-diffs
1
is 1.36. Test whether or not the coefficient is statistically significant,
given that there are 410 observations. If you believed that the benefit from small minimum wage increases
outweighed the cost in terms of employment loss, would finding that this coefficient was not statistically
significant discourage you?
Answer: (a) change in treatment group: + 0.59, change in control group: - 2.16,
^ diffs-in-diffs
1
= 2.75. Standard economic theory suggests a negative, not positive, change.
(b) The overall change of 2.76 is primarily due to the change in Eastern Pennsylvania (2.16), i.e., the
control group. Following standard economic theory, if employment fell in Eastern Pennsylvania, then
you would expect employment in New Jersey to fall by even more than in Eastern Pennsylvania. Not
only did employment in New Jersey not fall by less, it actually increased.
(c) The t-statistic is 2.03, thereby making the coefficient statistically significant at the 5% level (two-sided
test). Even if the coefficient was not statistically significant, it is not negative. Hence finding an
insignificant coefficient should be discouraging since it suggests that there is no negative employment
effect of a small increase in minimum wages.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 320
diffs-in-diffs
in terms of observable differences in the treatment and control group, before and after
1
2) Define the
the treatment. Explain why this presentation is the equivalent of calculating the coefficient in a regression
framework.
Answer:
^ diffs-in-diffs
1
=
= (Y treatment,after - Y treatment,before) - (Y control,after- Y control,before)
Y treatment -
Y control. Consider the following regression
Yi = 0 + 1 Xi + ui
th
where Y is the value for the i individual after the experiment is completed, minus the value of Y
before it starts, and X is a randomly assigned binary treatment variable, which takes on the value of one
if treatment was received and is zero otherwise. Then for an individual who did not receive treatment,
^
Y control,after- Y control,before = 0 . If the individual received treatment, then
^
^
Y treatment,after - Y treatment,before = 0 + 1 . Hence
^
treatment,after - Y treatment,before) - (Y control,after- Y control,before).
1 = (Y
3) Your textbook gives a graphical example of
^ diffs-in-diffs
1
, where outcome is plotted on the vertical axis, and
time period appears on the horizontal axis. There are two time periods entered: “t = 1” and “t = 2.” The former
corresponds to the “before” time period, while the latter represents the “after” period. The assumption is that
the policy occurred sometime between the time periods (call this “t = p”). Keeping in mind the graphical
example of
^ diffs-in-diffs
1
, carefully read what a reviewer of the Card and Krueger (CK) study of the minimum
wage effect on employment in the New Jersey-Pennsylvania study had to say:
“Two assumptions are implicit throughout the evaluation of the ‘natural experiment:’ (1) [
would be zero if the treatment had not occurred, so a nonzero [
^ diffs-in-diffs
1
^ diffs-in-diffs
1
]
] indicates the effect of the
treatment (that is, nothing else could have caused the difference in the outcomes to change), and (2) … the
intervention occurs after we measure the initial outcomes in the two groups. … Three conditions are
particularly relevant in interpreting CK’s work: (1) [t = 1] must be sufficiently before [t = p] that [the treatment
group] did not adjust to the treatment before [t=1] – otherwise [Ytreatent,before – Ycontrol,before] will reflect the
effect of the treatment; (2) [t = 2] must be sufficiently after [t = p] to allow the treatment’s effect to be fully felt;
and (3) we must be sure that the same difference [Ytreatent,before – Ycontrol,before] would have been observed at
[t = 2] if the treatment had not been imposed, that is, [the control group must be good enough] that there is no
need to adjust the differences for factors other than the treatment that might have caused them to change.”
Use a figure similar to the textbook to explain what this reviewer meant.
Answer: See accompanying figures.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 321
(1)
^ diffs-in-diffs
would be zero if treatment had occurred.
(2) The intervention occurs after we not measure the initial outcomes in the two groups.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 322
Rule out (1)
and (2)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 323
and (3) in the case of no treatment.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 324
4) Consider the simple population regression model where the treatment is the same for the members of the
treatment group, and hence X is a binary variable. Explain why the coefficient on X represents the difference
between two means. How is the test for the statistical significance of the coefficient on X related to the test for
differences in means between two populations, when their variances are different? Write down the null and
alternative hypothesis in each case.
Answer: The answer should proceed along the lines of “Regression When X Is a Binary Variable” (Section 4.7) of
the textbook, where the binary variable now indicates whether or not an individual has received
treatment. In terms of the regression model with a single regressor this is formulated as
Yi = 0 + 1 Xi + ui,
where Xi is 1 or 0 depending on whether or not the individual received treatment. Then in the case of no
treatment received, Yi = 0 + ui and E(Yi Xi = 0) = 0 . Alternatively, when treatment was received, Yi =
0 + 1 + ui and E(Yi Xi = 1) = 0 + 1 . Hence 1 is the difference between the two means. To test
whether or not there is a difference, the hypotheses are
H0 : 1 = 0 vs. H1 : 1
0.
The null hypothesis can be tested using the usual t-statistic and allowing for heteroskedasticity-robust
standard errors. This test corresponds to the test encountered in section 3.4 of the textbook, where
H0 : treatment - control = 0 vs. H1 : treatment - control
0,
and the standard error of the differences in means is calculated under the assumption that the two
population variances are unequal.
5) Present alternative estimators for causal effects using experimental data when data is available for a single
period or for two periods. Discuss their advantages and disadvantages.
Answer: There are essentially four estimators discussed in the textbook: two each for a single period randomized
controlled experiment, and two for panel data. For each of these situations, a binary or treatment level
regressor X is used, and additional characteristics can be added, thereby distinguishing the two possible
estimators within the single/panel two periods framework.
The single period estimator of the causal or treatment effect is the OLS estimator in the regression model
with a single regressor
^
^
^
Yi = 0 + 1 Xi + ui.
Random selection and assignment assures that E(ui Xi) = 0. Thus even with omitted variables present,
E(Yi Xi) 0 + 1 Xi, since X is independently distributed from the omitted variables. The OLS estimator
^
1 , also called the differences estimator, is unbiased and consistent.
A different estimator, called differences estimator with additional regressors, is obtained by adding
characteristics for the individual, which are not affected by the treatment. This is done to deal with some
of the threats to validity, but also for efficiency purposes. The multiple regression model in this case is
Yi = 0 + 1 Xi + 2 W1i + ... +
1+ rWri + ui, i = 1,..., n
^
^
1 is the differences estimator with additional regressors. Here 1 is consistent even if E(ui Xi,
W1i,..., Wri) = 0 does not hold, as long as there is conditional mean independence. In that case, the OLS
and
estimator is consistent. The inclusion of the characteristics also allows for testing for random receipt of
Stock/Watson 2e -- CVC2 8/23/06 -- Page 325
treatment and random assignment using the usual F-statistic in auxiliary regressions.
The third estimator generalizes the two estimators above to the case of panel data. The idea here is that
data is available for two periods, one before the treatment is administered and one after. The
differences-in-differences estimator is then defined as
^ diffs-in-diffs
1
= (Ytreatment,after - Ytreatment,before) - (Ycontrol,after - Ycontrol,before)
=
Ytreatment -
Ycontrol.
If the treatment is randomly assigned, the estimator is unbiased, consistent, and more efficient that the
differences estimator. In addition, it eliminates pretreatment differences in Y.
Alternative it can be viewed in a regression framework
Yi = 0 + 1 Xi + ui
where Y is the value for the ith individual after the experiment is completed, minus the value of Y
before it starts. Then for an individual who did not receive treatment,
^
Ycontrol,after- Ycontrol,before = 0 . If the individual received treatment, then
^
^
Ytreatment,after - Ytreatment,before= 0 + 1 . Hence
^
treatment,after - Ytreatment,before) - (Ycontrol,after - Ycontrol,before).
1=Y
As in the case for a single time period, additional characteristics can be added. In that case
Yi = 0 + 1 Xi + 2 W1i + ... + 1+ rWri + ui, i = 1,..., n
where the interpretation of the W variable effect is different from before, since the dependent variable is
differenced.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 326
6) To analyze the effect of a minimum wage increase, a famous study used a quasi -experiment for two adjacent
states: New Jersey and (Eastern) Pennsylvania. A
^ diffs-in-diffs
1
was calculated by comparing average
employment changes per restaurant between to treatment group (New Jersey) and the control group
(Pennsylvania). In addition, the authors provide data on the employment changes between “low wage”
restaurants and “high wage” restaurants in New Jersey only. A restaurant was classified as “low wage,” if the
starting wage in the first wave of surveys was at the then prevailing minimum wage of $4.25. A “high wage”
restaurant was a place with a starting wage close to or above the $5.25 minimum wage after the increase.
(a) Explain why employment changes of the “high wage” and “low wage” restaurants might constitute a
quasi-experiment. Which is the treatment group and which the control group?
(b) The following information is provided
FTE Employment
before
FTE Employment
after
Low wage
19.56
High wage
22.25
20.88
20.21
Where FTE is “full time equivalent” and the numbers are average employment per restaurant.
Calculate the change in the treatment group, the change in the control group, and finally
minimum wages represent a price floor, did you expect
(c) The standard error for
^ diffs-in-diffs
1
^ diffs-in-diffs
1
^ diffs-in-diffs
1
. Since
to be positive or negative?
is 1.48. Test whether or not this is statistically significant, given that
there are 174 observations.
Answer: (a) In the above example, the increase in wages (“treatment”) occurs not because of changes in the
demand or supply of labor, but because of an external event, namely the raising of the minimum wage
in New Jersey. This is therefore a good example of a “natural experiment.” The treatment group is the
“low wage” restaurants, since the wages there are actually changed. The “high wage” restaurants are the
control group.
(b) change in treatment group: + 1.32, change in control group: - 2.04,
^ diffs-in-diffs
1
= 3.36. The prior
expectation would be negative.
(c) The t-statistic is 2.27, making the coefficient statistically significant at the 5% level (two-sided test).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 327
7) Specify the multiple regression model that contains the difference-in-difference estimator (with additional
regressors). Explain the circumstances under which this model is preferable to the simple
difference-in-difference estimator. Explain how the W’s can be used to test for randomization. How does the
interpretation of the W variables change compared to the differences estimator with additional regressors?
Answer: The differences-in-differences estimator with additional regressors is
Yi = 0 + 1 Xi + 2 W1i + ... + 1+ rWri + ui, i = 1,..., n.
This is more general than the differences-in-differences estimator
Yi = 0 + 1 Xi + ui
which equals
=
^ diffs-in-diffs
Ytreatment -
1
= (Ytreatment ,after - Ytreatment ,before) - (Ycontrol ,after - Ycontrol ,before)
Ycontrol, and hence the name.
Since in some applications, the assumption E(ui Xi, W1i,..., Wri) = 0 is not likely to hold, the
differences-in-differences estimator will not be consistent. However, the differences-in-differences
estimator will be consistent under the weaker assumption of conditional mean independence. Including
the additional characteristics (W variables) also can improve efficiency. Furthermore, adding these
variables allows the researcher to perform tests for randomization, since Xi should be uncorrelated with
the W variables, and also with the assignment. Regressing Xi on W1i, …, Wri, and using an F-test for the
hypothesis that all coefficients on the W’s are constant constitutes a test for the random receipt of
treatment. Performing a similar regression of the assignment Zi on the W’s with an accompanying F-test
is a test for random assignment. Obviously if treatment and assignment were randomly determined,
then neither should be dependent on characteristics of the entities.
The dependent variable in the case of the differences estimator is a level, while in the case of the
differences-in-differences estimator it is a change. Hence W affects the change in the latter case, not the
level itself.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 328
8) Let the vertical axis of a figure indicate the average employment fast food restaurants. There are two time
periods, t = 1 and t = 2, where time period is measured on the horizontal axis. The following table presents
average employment levels per restaurant for New Jersey (the treatment group) and Eastern Pennsylvania (the
control group).
FTE Employment
before
FTE Employment
after
PA
23.33
NJ
20.44
21.17
21.03
Enter the four points in the figure and label them Ytreatment ,before, Ytreatment ,after , Ycontrol,before, and
Ycontrol ,after. Connect the points. Finally calculate and indicate the value for
Answer:
^ diffs-in-diffs
1
=
Ytreatment -
^ diffs-in-diffs
1
Ycontrol= (21.03-20.44)-(21.17-23.33) = 2.75.
See also accompanying figure.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 329
.
9) (Requires Appendix material) Discuss how the differences-in-differences estimator can be extended to
multiple time periods. In particular, assume that there are n individuals and T time periods. What do the
individual and time effects control for?
Answer: The extension of the differences-in-differences estimator to multiple time periods uses the differences
estimator for a single period, and adds binary variables for entity and time fixed effects. As with the
differences estimator and the differences-in-differences estimator, additional regressors W for
characteristics can be added. Without these characteristics, the population regression model is as follows
Yit = 0 + 1 Xit + 2 D2 i + ... + nDni + 2 B2 t + ... + TBTt + v it
with i = 1,…,n entities, and t = 1, … ,T time periods.
The entity effects control for unobserved variables that remain constant over time for the same entity,
and the time effects control for unobserved variables that are the same for all individuals at a point in
time. Examples of time fixed effects could be business cycle conditions or macroeconomic conditions in
general. Examples of entity fixed effects might be gender, race, years of previous education, etc. The
model simplifies to the differences-in-differences regression model for two periods (T = 2). If W
variables are added, then these can also be interacted with the time effect binary variables. The major
advantage over the differences-in-differences model is that effects can be traced out over time.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 330
10) The New Jersey-Pennsylvania study on the effect of minimum wages on employment mentioned in your
textbook used a comparison in means “before” and “after” analysis. The difference -in-difference estimate
turned out to be 2.76 with a standard error of 1.36.
The authors also used a difference-in-differences estimator with additional regressors of the type
Yi = 0 + 1 Xi + 2 W1,t + ... + 1+ rWr,i + ui
where i = 1, …, 410. X is a binary variable taking on the value one for the 331 observations in New Jersey. Since
the authors looked at Burger King, KFC, Wendy’s, and Roy Rogers fast food restaurants and the restaurant
could be company owned, four W-variables were added.
(a) Given that there are four chains and the possibility of a company ownership, why did the authors not
include five W-variables?
^
(b) OLS estimation resulted in 1 of 2.30 with a standard error of 1.20. Test for statistical significance and
specify the alternative hypothesis.
(c) Why is this estimate different from the number calculated from Ytreatment – Ycontrol = 2.76? What is the
advantage of employing this estimator of the simple difference -in-difference estimator?
Answer: (a) Including a fifth W-variable would have resulted in perfect multicollinearity.
(b) The t-statistic is +1.92. If the alternative hypothesis was H1 : 1 < 0, then you cannot reject the null
hypothesis. If the alternative hypothesis was H1 : 1 0, then you cannot reject the null hypothesis at the
5% level, although you can at the 10% level. The choice of alternative hypothesis depends on prior
expectations, and standard economic theory would suggest H1 : 1 < 0.
(c) The difference is small in terms of the standard error and may be due to sample variation. Although
the difference-in-difference estimator is consistent, the difference-in-difference estimator with
additional regressors can be more efficient. It is different because it stems from using the multiple
regression model
Yi = 0 + 1 Xi + 2 W1i + ... + 1+ rWri + ui, i = 1,..., n
rather than the regression with a single regressor
Yi + 0 + 1 Xi + ui, i = 1,..., n
^
and E(ui Xi, W1i, ..., Wri) = 0 may not hold. In that case, 1 is consistent as long as there is conditional
mean independence. The inclusion of the characteristics also allows for testing for random receipt of
treatment and random assignment using the usual F-statistic in auxiliary regressions.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 331
Chapter 14 Introduction to Time Series Regression and Forecasting
14.1 Multiple Choice
1) Pseudo out of sample forecasting can be used for the following reasons with the exception of
A) giving the forecaster a sense of how well the model forecasts at the end of the sample.
B) estimating the RMSFE.
C) analyzing whether or not a time series contains a unit root.
D) evaluating the relative forecasting performance of two or more forecasting models.
Answer: D
2) Autoregressive distributed lag models include
A) current and lagged values of the error term.
B) lags of the dependent variable, and lagged values of additional predictor variables.
C) current and lagged values of the residuals.
D) lags and leads of the dependent variable.
Answer: B
3) Time series variables fail to be stationary when
A) the economy experiences severe fluctuations.
B) the population regression has breaks.
C) there is strong seasonal variation in the data.
D) there are no trends.
Answer: B
4) Departures from stationarity
A) jeopardize forecasts and inference based on time series regression.
B) occur often in cross-sectional data.
C) can be made to have less severe consequences by using log -log specifications.
D) cannot be fixed.
Answer: A
5) In order to make reliable forecasts with time series data, all of the following conditions are needed with the
exception of
A) coefficients having been estimated precisely.
B) the regression having high explanatory power.
C) the regression being stable.
D) the presence of omitted variable bias.
Answer: D
6) The first difference of the logarithm of Yt equals
A) the first difference of Y.
B) the difference between the lead and the lag of Y.
C) approximately the growth rate of Y when the growth rate is small.
D) the growth rate of Y exactly.
Answer: C
7) The time interval between observations can be all of the following with the exception of data collected
A) daily.
B) by decade.
C) bi-weekly.
D) across firms.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 332
8) One reason for computing the logarithms (ln), or changes in logarithms, of economic time series is that
A) numbers often get very large.
B) economic variables are hardly ever negative.
C) they often exhibit growth that is approximately exponential.
D) natural logarithms are easier to work with than base 10 logarithms.
Answer: C
9) The jth autocorrelation coefficient is defined as
cov(Yt, Yt-1 )
A)
.
var(Yt) var(Yt-1 )
B)
C)
D)
cov(Yt, Yt-j-1 )
var(Yt) var(Yt-j)
cov(Yt, ut)
var(Yt) var(ut)
.
.
cov(Yt, Yt-j)
var(Yt) var(Yt-j)
.
Answer: D
10) Negative autocorrelation in the change of a variable implies that
A) the variable contains only negative values.
B) the series is not stable.
C) an increase in the variable in one period is, on average, associated with a decrease in the next.
D) the data is negatively trended.
Answer: C
11) An autoregression is a regression
A) of a dependent variable on lags of regressors.
B) that allows for the errors to be correlated.
C) model that relates a time series variable to its past values.
D) to predict sales in a certain industry.
Answer: C
12) The root mean squared forecast error (RMSFE) is defined as
^
A)
E YT - YT T-1
B)
E (YT+1 - YT+1 T)2 .
C)
^
(YT - YT T - 1 )2 .
D)
E (YT - YT T-1 ) .
.
^
^
Answer: B
13) One of the sources of error in the RMSFE in the AR(1) model is
A) the error in estimating the coefficients 0 and 1 .
B) due to measuring variables in logarithms.
C) that the value of the explanatory variable is not known with certainty when making a forecast.
D) the model only looks at the previous period’s value of Y when the entire history should be taken into
account.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 333
14) The forecast is
A) made for some date beyond the data set used to estimate the regression.
B) another word for the OLS predicted value.
C) equal to the residual plus the OLS predicted value.
D) close to 1.96 times the standard deviation of Y during the sample.
Answer: A
15) The AR(p) model
A) is defined as Yt = 0 + p Yt-p + ut.
B) represents Yt as a linear function of p of its lagged values.
C) can be represented as follows: Yt = 0 + 1 Xt + p Yt-p + ut.
D) can be written as Yt = 0 + 1 Yt-1 + ut-p .
Answer: B
16) The ADL(p,q) model is represented by the following equation
A) Yt = 0 + p Yt-p + qXt-q + ut.
B) Yt = 0 + 1 Yt-1 + 2 Yt-2 + ... + p Yt-p + qut-q.
C) Yt = 0 + 1 Yt-1 + 2 Yt-2 + ... + p Yt-p + 0 + 1 Xt-1 + ut-q.
D) Yt = 0 + 1 Yt-1 + 2 Yt-2 + ... + p Yt-p + 1 Xt-1 + 2 Xt-2 + ... + qXt-q + ut.
Answer: D
17) Stationarity means that the
A) error terms are not correlated.
B) probability distribution of the time series variable does not change over time.
C) time series has a unit root.
D) forecasts remain within 1.96 standard deviation outside the sample period.
Answer: B
18) The Times Series Regression with Multiple Predictors
A) is the same as the ADL(p,q) with additional predictors and their lags present.
B) gives you more than one prediction.
C) cannot be estimated by OLS due to the presence of multiple lags.
D) requires that the k regressors and the dependent variable have nonzero, finite eighth moments.
Answer: A
19) The Granger Causality Test
A) uses the F-statistic to test the hypothesis that certain regressors have no predictive content for the
dependent variable beyond that contained in the other regressors.
B) establishes the direction of causality (as used in common parlance) between X and Y in addition to
correlation.
C) is a rather complicated test for statistical independence.
D) is a special case of the Augmented Dickey-Fuller test.
Answer: A
20) To choose the number of lags in either an autoregression or in a time series regression model with multiple
predictors, you can use any of the following test statistics with the exception of the
A) F-statistic.
B) Akaike Information Criterion.
C) Bayes Information Criterion.
D) Augmented Dickey-Fuller test.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 334
21) The random walk model is an example of a
A) deterministic trend model.
B) binomial model.
C) stochastic trend model.
D) stationary model.
Answer: C
22) Problems caused by stochastic trends include all of the following with the exception of
A) the estimator of an AR(1) is biased towards zero if its true value is one.
B) the model can no longer be estimated by OLS.
C) t-statistics on regression coefficients can have a nonnormal distribution, even in large samples.
D) the presence of spurious regression..
Answer: B
23) The Augmented Dickey Fuller (ADF) t-statistic
A) has a normal distribution in large samples.
B) has the identical distribution whether or not a trend is included or not.
C) is a two-sided test.
D) is an extension of the Dickey-Fuller test when the underlying model is AR(p) rather than AR(1).
Answer: D
24) If a “break” occurs in the population regression function, then
A) inference and forecasting are compromised when neglecting it.
B) an Augmented Dickey Fuller test, rather than the Dickey Fuller test, should be used to test for
stationarity.
C) this suggests the presence of a deterministic trend in addition to a stochastic trend.
D) forecasting, but not inference, is unaffected, if the break occurs during the first half of the sample period.
Answer: A
25) You should use the QLR test for breaks in the regression coefficients, when
A) the Chow F-test has a p value of between 0.05 and 0.10.
B) the suspected break data is not known.
C) there are breaks in only some, but not all, of the regression coefficients.
D) the suspected break data is known.
Answer: B
26) The Bayes-Schwarz Information Criterion (BIC) is given by the following formula
ln(T)
SSR(p)
] + (p+1)
A) BIC(p) = ln [
T
T
B) BIC(p) = ln [
SSR(p)
2
] + (p+1)
T
T
C) BIC(p) = ln [
SSR(p)
ln(T)
] - (p+1)
T
T
D) BIC(p) = ln [
SSR(p)
ln(T)
] × (p+1)
T
T
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 335
27) The Akaike Information Criterion (AIC) is given by the following formula
SSR(p)
ln(T)
A) AIC(p) = ln [
] + (p+1)
T
T
B) AIC(p) = ln [
SSR(p)
2
] + (p+1)
T
T
C) AIC(p) = ln [
SSR(p)
p+2
]+
T
T
D) AIC(p) = ln [
SSR(p)
2
] × (p+1)
T
T
Answer: B
28) The BIC is a statistic
A) commonly used to test for serial correlation
B) only used in cross-sectional analysis
C) developed by the Bank of England in its river of blood analysis
D) used to help the researcher choose the number of lags in an autoregression
Answer: D
29) The AIC is a statistic
A) that is used as an alternative to the BIC when the sample size is small (T < 50)
B) often used to test for heteroskedasticity
C) used to help a researcher chose the number of lags in a time series with multiple predictors
D) all of the above
Answer: C
30) The formulae for the AIC and the BIC are different. The
A) AIC is preferred because it is easier to calculate
B) BIC is preferred because it is a consistent estimator of the lag length
C) difference is irrelevant in practice since both information criteria lead to the same conclusion
D) AIC will typically underestimate p with non-zero probability
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 336
14.2 Essays and Longer Questions
1) You set out to forecast the unemployment rate in the United States (UrateUS), using quarterly data from 1960,
first quarter, to 1999, fourth quarter.
(a) The following table presents the first four autocorrelations for the United States aggregate unemployment
rate and its change for the time period 1960 (first quarter) to 1999 (fourth quarter). Explain briefly what these
two autocorrelations measure.
First Four Autocorrelations of the U.S. Unemployment Rate and Its Change,
1960:I – 1999:IV
Lag
Unemployment Rate
1
2
3
4
0.97
0.92
0.83
0.75
Change of
Unemployment Rate
0.62
0.32
0.12
-0.07
(b) The accompanying table gives changes in the United States aggregate unemployment rate for the period
1999:I-2000:I and levels of the current and lagged unemployment rates for 1999:I. Fill in the blanks for the
missing unemployment rate levels.
Changes in Unemployment Rates in the United States
First Quarter 1999 to First Quarter 2000
Quarter
1999:I
1999:II
1999:III
1999:IV
2000:I
U.S. Unemployment First Lag
Rate
4.3
4.4
Change in
Unemployment Rate
-0.1
0.0
-0.1
-0.1
-0.1
(c) You decide to estimate an AR(1) in the change in the United States unemployment rate to forecast the
aggregate unemployment rate. The result is as follows:
UrateUSt = -0.003 + 0.621
UrateUSt-1 , R2 = 0.393, SER = 0.255
(0.022) (0.106)
The AR(1) coefficient for the change in the inflation rate was 0.211 and the regression R2 was 0.04. What does
the difference in the results suggest here?
(d) The textbook used the change in the log of the price level to approximate the inflation rate, and then
predicted the change in the inflation rate. Why aren’t logarithms used here?
(e) If much of the forecast error arises as a result of future error terms dominating the error resulting from
estimating the unknown coefficients, then what is your best guess of the RMSFE here?
(f) The actual unemployment rate during the fourth quarter of 1999 is 4.1 percent, and it decreased from the
third quarter to the fourth quarter by 0.1 percent. What is your forecast for the unemployment rate level in the
first quarter of 1996?
(g) You want to see how sensitive your forecast is to changes in the specification. Given that you have
estimated the regression with quarterly data, you consider an AR(4) model. This results in the following output
UrateUSt = -0.005 + 0.663
UrateUSt-1 - 0.082 UrateUSt-2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 337
(0.022) (0.125)
(0.139)
+ 0.106 UrateUSt-3 – 0.176
(0.117)
UrateUSt-4 , R2 = 0.416, SER = 0.253
(0.091)
What is your forecast for the unemployment rate level in 2000:I? Compare the forecast error of the AR(4) model
with the forecast error of the AR(1) model.
(h) There does not seem to be much difference in the forecast of the unemployment rate level, whether you use
the AR(1) or the AR(4). Given the various information criteria and the regression R2 below, which model
should you use for forecasting?
p
0
1
2
3
4
5
6
BIC AIC
R2
0.604 0.624
0.158 0.1181
0.185 0.125
0.217 0.138
0.218 0.1183
0.249 0.130
0.277 0.138
0.000
0.393
0.397
0.400
0.416
0.417
0.420
Answer: (a) There is a very strong positive autocorrelation for the unemployment rate level. The 1 st to 4 th
autocorrelation coefficient is even higher than for the inflation rate. This suggests that a high (low) level
of the unemployment rate will persist for quite a while. Although the autocorrelations decline, they are
still high even at lag 4. This reflects the long-term trends in unemployment rates. If during a given
quarter in the 1960s or the 1990s the unemployment rate was low, then it was also low in the following
quarter. If the unemployment rate was high in a given quarter, as it was in the early 1980s, then it was
also high in the following quarter. Different from the inflation rate results discussed in the text, the
change in the unemployment rate also shows positive autocorrelations. Furthermore, these are quite
large for the first lag. Eventually, after a year, they turn negative. Hence an increase (decrease) in the
unemployment rate is followed typically by an increase (decrease) in the following quarters, before the
process reverses itself.
(b)
Changes in Unemployment Rates in the United States from the
First Quarter 1999 to the First Quarter 2000
Quarter
1999:I
1999:II
1999:III
1999:IV
2000:I
U.S. Unemployment First Lag
Rate
4.3
4.4
4.3
4.3
4.2
4.3
4.1
4.2
4.0
4.1
Change in
Unemployment Rate
-0.1
0.0
-0.1
-0.1
-0.1
(c) There is a higher persistence in the change of unemployment rate than in the change of the inflation
rate. The higher regression R2 means that almost 40 percent of the variation in the change of the
unemployment rate can be explained by a single regressor, namely its lag. Students may recall Figure
12.1 from the textbook, which shows a much smoother behavior for the levels, and hence the differences,
for the unemployment rate.
(d) The change of the log of the price level was used to convert a level variable (prices) into a change of
its growth rate. Unemployment is already measured as a rate in the above example. Hence differencing
the variable results in a change in the rate.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 338
(e) In this situation, the SER approximates the RMSFE. In the case of the change of the unemployment
rate, it is 0.255 percentage points.
(f) UrateUS1999:IV = 4.1 and the predicted change in the unemployment rate from 1999:IV to 2000:I is
0.06 or 0.1 rounded. The forecasted unemployment rate for 2000:I is UrateUs 2000:I = UrateUS1999:IV +
UrateUS2001:1 = 4.1% + 0.1% = 4.2%. The model therefore forecasts a slight increase in the
unemployment rate.
(g) UrateUS
= -0.005 + 0.663 × (-0.1)
2001:I 1999:IV
-0.082 × (-0.1) + 0.106 × 0.0 - 0.176 × (-0.1)
-0.046. (Students may suggest a forecast of –0.1 or 0.0. The answer will proceed with 0.0.) The
corresponding forecast for the unemployment rate in 2000:I is then 4.1% + 0.0% = 4.1%. The forecast
error for the AR(4) model is 4.0% - 4.1% = -0.1%, which is slightly smaller than the –0.2% forecast error
of the AR(1) model.
(h) Close call, but both the BIC and the AIC favor the AR(1) over the AR(4). (The F-test statistic for
restricting the AR(4) to an AR(1) is 1.49 with a p-value of 0.21.)
2) You have collected quarterly data on Canadian unemployment (UrateC) and inflation (InfC) from 1962 to 1999
with the aim to forecast Canadian inflation.
(a) To get a better feel for the data, you first inspect the plots for the series.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 339
Inspecting the Canadian inflation rate plot and having calculated the first autocorrelation to be 0.79 for the
sample period, do you suspect that the Canadian inflation rate has a stochastic trend? What more formal
methods do you have available to test for a unit root?
(b) You run the following regression, where the numbers in parenthesis are homoskedasticity -only standard
errors:
InfCt = 0.49– 0.10 Inft-1 – 0.39
(0.28) (0.05)
InfCt-1 – 0.33
(0.09)
InfCt-3 + 0.05
InfCt-2 – 0.21
(0.09)
(0.09)
InfCt-4
(0.08)
Test for the presence of a stochastic trend. Should you have used heteroskedasticity -robust standard errors?
Does the fact that you use quarterly data suggest including four lags in the above regression, or how should
you determine the number of lags?
(c) To forecast the Canadian inflation rate for 2000:I, you estimate an AR(1), AR(4), and an ADL(4,1) model for
the sample period 1962:I to 1999:IV. The results are as follows:
InfCt = 0.002 – 0.31
InfCt-1
(0.014) (0.10)
InfCt = 0.021 – 0.46
InfCt-1 – 0.39
(0.158) (0.10)
InfCt = 1.279 – 0.51
(0.57)
InfCt-2 – 0.25
(0.11)
InfCt-1 – 0.44
(0.10)
(0.08)
InfCt-2 – 0.30
(0.11)
(0.09)
InfCt-3 + 0.03
InfCt-4
(0.07)
InfCt-3 – 0.02
InfCt-4
(0.08)
- 0.16 UrateCt-1
(0.07)
In addition, you have the following information on inflation in Canada during the four quarters of 1999 and the
first quarter of 2000:
Inflation and Unemployment in Canada, First Quarter 1999 to First Quarter 2000
Stock/Watson 2e -- CVC2 8/23/06 -- Page 340
Quarter
1999:I
1999:II
1999:III
1999:IV
2000:I
Unemployment
Rate
(UrateCt)
Rate of
Inflation at an
Annual Rate
(Inft)
First Lag
(Inft-1 )
Change in
Inflation
( Inft)
7.7
7.9
7.7
7.0
6.8
0.8
4.3
2.9
1.3
2.1
0.8
0.8
4.3
2.9
1.3
0.0
3.5
-1.4
-1.5
0.8
For each of the three models, calculate the predicted inflation rate for the period 2000:I and the forecast error.
(d) Perform a test on whether or not Canadian unemployment rates Granger -cause the Canadian inflation rate.
Answer: (a) A small autocorrelation coefficient together with a time series plot which displays no apparent trend
suggest the absence of a stochastic trend. Here the first autocorrelation coefficient is fairly high and the
figure displays long-run swings similar to the U.S. figure discussed in the textbook. To test for a
stochastic trend using more formal methods requires use of the Dickey-Fuller test, or better, the
augmented Dickey-Fuller test.
(b) The t-statistic on the lagged inflation rate level is (-2.00). The critical value for the ADF statistic is
(-2.57) at the 10% level. Hence you cannot reject the null hypothesis of a unit root. The ADF statistic
requires computation using homoskedasticity-only standard errors. Hence heteroskedasticity-robust
standard errors should not be used. The number of lags included should be determined using the AIC
information criterium, rather than the BIC, since it results in a better performance in finite-samples of
the ADF statistic. (As with the U.S. data used in the textbook, this results in a chosen lag length of three.
The ADF statistic in that case is (-1.91), which is still below the critical value at the 10% level.)
(c)
InfC2000:I 1999:IV for the various models is: 0.002 - 0.31 × (-1.5) = 0.467
0.5 (AR(1));
0.021- 0.46 ×(-1.5)- 0.39 × (-1.4) - 0.25 × 3.5 + 0.03 × 0.0 = 0.382 0.4 (AR(4));
1.279 - 0.51 × (-1.5) - 0.44 × (-1.4) - 0.30 × 3.5 - 0.02 ×0.0 - 0.16 × 7.0 = 0.49 0.5
(ADL(4,1)).
InfC2000:I then is: 1.3 + 0.5 = 1.8 (AR(1)); 1.3 + 0.4 = 1.7 (AR(4)); 1.3 + 0.5 = 1.8 (ADL(4,1)).
The forecast error is: 0.3 (AR(1)); 0.4 (AR(4)); 0.3 (ADL(4,1)).
(d) Since the ADL(4,1) only included the lagged unemployment rate, the t-statistic replaces the
F-statistic typically used for this test. The t-statistic is (-2.256) and the F-statistic is 2.256 2 = 5.091. Both
are statistically significant at the 5% level with a p-value of 0.026. Hence the null hypothesis that the
unemployment rate does not Granger-cause the inflation rate is rejected.
3) There is some evidence that the Phillips curve has been unstable during the 1962 to 1999 period for the United
States, and in particular during the 1990s. You set out to investigate whether or not this instability also
occurred in other places. Canada is a particularly interesting case, due to its proximity to the United States and
the fact that many features of its economy are similar to that of the U.S.
(a) Reading up on some of the comparative economic performance literature, you find that Canadian
unemployment rates were roughly the same as U.S. unemployment rates from the 1920s to the early 1980s. The
accompanying figure shows that a gap opened between the unemployment rates of the two countries in 1982,
which has persisted to this date.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 341
Inspection of the graph and data suggest that the break occurred during the second quarter of 1982. To
investigate whether the Canadian Phillips curve shows a break at that point, you estimate an ADL(4,4) model
for the sample period 1962:I-1999:IV and perform a Chow test. Specifically you postulate that the constant and
coefficients of the unemployment rates changed at that point. The F-statistic is 1.96. Find the critical value from
the F-table and test the null hypothesis that a break occurred at that time. Is there any reason why you should
be skeptical about the result regarding the break and using the Chow -test to detect it?
(b) You consider alternative ways to test for a break in the relationship. The accompanying figure shows the
F-statistics testing for a break in the ADL(4,4) equation at different dates.
The QLR-statistic with 15% trimming is 3.11. Comment on the figure and test for the hypothesis of a break in
the ADL(4,4) regression.
(c) To test for the stability of the Canadian Phillips curve in the 1990s, you decide to perform a pseudo
out-of-sample forecasting. For the 24 quarters from 1994:I-1999:IV you use the ADL(4,4) model to calculate the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 342
forecasted change in the inflation rate, the resulting forecasted inflation rate, and the forecast error. The
standard error of the ADL(4,4) for the estimation sample period 1962:1 -1993:4 is 1.91 and the sample RMSFE is
1.70. The average forecast error for the 24 inflation rates is 0.003 and the sample standard deviation of the
forecast errors is 0.82. Calculate the t-statistic and test the hypothesis that the mean out-of-sample forecast
error is zero. Comment on the result and the accompanying figure of the actual and forecasted inflation rate.
Answer: (a) The critical value from the F5, distribution is 1.85 at the 10% significance level, and 2.21 at the 5%
significance level. (The p-value is actually 0.088.). Hence, at the 10% significance level, you can reject
that null hypothesis that the constant and the four lagged unemployment rate coefficients remained
constant over the entire sample period, which suggests that a break occurred in 1982:2. There is not
sufficient evidence to reject the null hypothesis at the 5% significance level. However, the text
emphasizes that “[preliminary] estimation of the break date means that the usual F critical values cannot
be used for the Chow test for a break at that date.” This applies to the above example since the series was
analyzed before testing.
(b) The critical value for the QLR(5) statistic with 15% trimming is 3.26 at the 10% level. Hence you
cannot reject the null hypothesis of no break in the regression. Except for the peak at the end of 1982 and
the beginning of 1983, the F-statistic does not really come close to the critical value.
(c) The average forecast error is very small. The t-statistic is
t=
0.003
= 0.179
0.82
24
and therefore you cannot reject the hypothesis that the mean out -of-sample forecast is zero. Indeed, you
get the same impression from the graph, which shows that there are very few periods of systematically
too large or small inflation rate forecasts. The conclusion is that the Canadian Phillips curve has done
well as a model for forecasting at the end of the sample. This result is quite different from the results in
the textbook for the U.S. Phillips curve.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 343
4) You collect monthly data on the money supply (M2) for the United States from 1962:1 -2002:4 to forecast future
money supply behavior.
where LM2 and DLM2 are the log level and growth rate of M2.
(a) Using quarterly data, when analyzing inflation and unemployment in the United States, the textbook
converted log levels of variables into growth rates by differencing the log levels, and then multiplying these by
400. Given that you have monthly data, how would you proceed here?
(b) How would you go about testing for a stochastic trend in LM2 and DLM2? Be specific about how to decide
the number of lags to be included and whether or not to include a deterministic trend in your test. The textbook
found the (quarterly) inflation rate to have a unit root. Does this have any affect on your expectation about
whether or not the (monthly) money growth rate should be stationary?
(c) You decide to conduct an ADF unit root test for LM2, DLM2, and the change in the growth rate DLM2.
This results in the following t-statistic on the parameter of interest.
LM2
DLM2
DLM2
DLM2
Stock/Watson 2e -- CVC2 8/23/06 -- Page 344
with trend
-0.505
without trend
-4.100
with trend
-4.592
without trend
-8.897
Find the critical value at the 1%, 5%, and 10% level and decide which of the coefficients is significant. What is
the alternative hypothesis?
(d) In forecasting the money growth rate, you add lags of the monetary base growth rate ( DLMB) to see if you
can improve on the forecasting performance of a chosen AR(10) model in DLM2. You perform a Granger
causality test on the 9 lags of DLMB and find a F-statistic of 2.31. Discuss the implications.
(e) Curious about the result in the previous question, you decide to estimate an ADL(10,10) for DLMB and
calculate the F-statistic for the Granger causality test on the 9 lag coefficients of DLM2. This turns out to be 0.66.
Discuss.
(f) Is there any a priori reason for you to be skeptical of the results? What other tests should you perform?
Answer: (a) To annualize monthly growth rates, you would need to multiply them by 1,200. The annualized
growth rate of money would be 1200 ln(LM2 t).
(b) The ADF statistic should be calculated to test for the presence of a unit root in each of the series. The
BIC information criterion can be used to determine the lag length, and homoskedasticity -only standard
errors, rather than heteroskedasticity-robust standard errors, should be considered for the regression.
Studies of the finite-sample properties of unit root tests have shown that it is better to use the AIC
criterion although it overestimates the lag length on average. Given that money growth determines the
inflation rate in the long-run, your expectation would be to also find a unit root for money growth.
(c) LM2 contains a time trend, and hence the critical values for an intercept and a time trend are relevant.
These are (-3.96), (-3.41), and (-3.12) for the three significance levels respectively. Hence you cannot
reject the null hypothesis of a unit root for LM2. The growth rate of money does not have a time trend
for the entire sample period, so the intercept only critical values should be used. These are ( -3.43),
(-2.86), and (-2.57) respectively. Hence you are able to reject the null hypothesis of a unit root for money
at the 1% significance level. The alternative hypothesis is that there is no unit root. However, failure to
reject the null hypothesis only means that there is “insufficient evidence to conclude that it is false.”
(d) The critical value for the null hypothesis that monetary growth rates do not Granger cause money
supply growth rates is F9, = 1.88 at the 5% significance level, and 2.41 at the 1% significance level.
Hence you can reject the null hypothesis at the 5% level, but not at the 1% level.
(e) In this situation, you cannot reject the null hypothesis that the money supply growth does not
Granger cause monetary base growth. This makes sense if the Federal Reserve uses monetary base
growth as an instrument and money supply growth is not a target.
(f) It is somehow surprising to find money growth not to contain a unit root when the inflation rate does.
It is also possible that the relationship has changed over time, as money markets have been liberalized
during the sample period. Hence it would help to test for breaks using the QLR statistic and pseudo
out-of-sample forecasts.
5) Having learned in macroeconomics that consumption depends on disposable income, you want to determine
whether or not disposable income helps predict future consumption. You collect data for the sample period
1962:I to 1995:IV and plot the two variables.
(a) To determine whether or not past values of personal disposable income growth rates help to predict
consumption growth rates, you estimate the following relationship.
LnCt = 1.695 + 0.126 LnCt-1 + 0.153 LnCt-2 ,
(0.484) (0.099)
(0.103)
+ 0.294
(0.103)
+ 0.088
(0.076)
LnCt-3 – 0.008 LnCt-4
(0.102)
LnYt-1 – 0.031
(0.078)
LnYt-2 – 0.050 LnYt-3 – 0.091 LnYt-4
(0.078)
(0.074)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 345
The Granger causality test for the exclusion on all four lags of the GDP growth rate is 0.98. Find the critical
value for the 1%, the 5%, and the 10% level from the relevant table and make a decision on whether or not these
additional variables Granger cause the change in the growth rate of consumption.
(b) You are somewhat surprised about the result in the previous question and wonder, how sensitive it is with
regard to the lag length in the ADL(p,q) model. As a result, you calculate BIC and AIC of p and q from 0 to 6.
The results are displayed in the accompanying table:
p,q
0
1
2
3
4
5
6
BIC
5.061
5.052
5.095
5.110
5.165
5.206
5.270
AIC
5.039
4.988
4.989
4.960
4.972
4.973
4.992
Which values for p and q should you choose?
(c) Estimating an ADL(1,1) model gives you a t-statistic of 1.28 on the coefficient of lagged disposable income
growth. What does the Granger causality test suggest about the inclusion of lagged income growth as a
predictor of consumption growth?
Answer: (a) The critical value for F4, is 3.32, 2.37, and 1.94 respectively. The decision is therefore not to reject the
null hypothesis at the 1% significance level.
(b) The minimum for both the AIC and the BIC is at p=q=1.
(c) For a single restriction, t = F2 and the critical value is therefore 1.96 for the t-statistic. Hence you
cannot reject the null hypothesis that the coefficient on lagged disposable income growth is zero, or that
disposable income growth does not Granger cause consumption growth.
6) (Requires Internet Access for the test question)
The following question requires you to download data from the internet and to load it into a statistical
package such as STATA or EViews.
a.
Your textbook estimates an AR(1) model (equation 14.7) for the change in the inflation rate using a
sample period 1962:I — 2004:IV. Go to the Stock and Watson companion website for the textbook and
download the data “Macroeconomic Data Used in Chapters 14 and 16.” Enter the data for consumer
price index, calculate the inflation rate, the acceleration of the inflation rate, and replicate the result on
page 526 of your textbook. Make sure to use heteroskedasticity-robust standard error option for the
estimation.
b.
Next find a website with more recent data, such as the Federal Reserve Economic Data (FRED) site at
the Federal Reserve Bank of St. Louis. Locate the data for the CPI, which will be monthly, and convert
the data in quarterly averages. Then, using a sample from 1962:I — 2009:IV, re -estimate the above
specification and comment on the changes that have occurred.
c.
Based on the BIC, how many lags should be included in the forecasting equation for the change in the
inflation rate? Use the new data set and sample period to answer the question.
Answer: a. The EViews output would look as follows:
Dependent Variable: D2LP
Method: Least Squares
Date: 12/30/10 Time: 20:29
Sample: 1962Q1 2004Q4
Included observations: 172
Stock/Watson 2e -- CVC2 8/23/06 -- Page 346
White Heteroskedasticity-Consistent Standard Errors & Covariance
Coefficient Std. Error
C
0.017 0.127
0.097
D2LP( -1)
-0.238
R-squared
Adjusted R -squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.056
0.051
1.664
470.691
-330.634
10.157
0.002
t-Statistic
Prob.
0.135
-2.467
0.893
0.015
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
0.017
1.708
3.868
3.904
3.883
2.166
b. Not much has changed. The intercept became smaller, but was statistically insignificant anyway. The
slope coefficient increase somewhat (as did the Regression R2 with it) and its t-statistic also became
stronger. Some of this is the result of data revisions (even for the old sample period the slope coefficient
increased somewhat) while part of it has changed because of the longer sample period.
Dependent Variable: D2LP
Method: Least Squares
Date: 12/30/10 Time: 21:19
Sample: 1962Q1 2009Q4
Included observations: 192
White Heteroskedasticity-Consistent Standard Errors & Covariance
Coefficient Std. Error
C
D2LP(-1)
0.014 0.153
-0.290
R-squared
0.094
t-Statistic
Prob.
0.089
0.929
-3.070
0.002
0.084 Mean dependent var
0.010
Adjusted R-squared
0.079
S.D. dependent var
2.203
S.E. of regression
2.114
Akaike info criterion
4.345
Sum squared resid
849.127
Schwarz criterion
4.379
Log likelihood
-415.161
Hannan-Quinn criter.
4.359
F-statistic
17.428
Durbin-Watson stat
2.203
Prob(F-statistic)
0.000
c. Using the BIC for p = 0, 1, 2, …, 6, the minimum continues to be at p = 2. Hence the BIC still favors an
AR(2).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 347
7) Statistical inference was a concept that was not too difficult to understand when using cross-sectional data. For
example, it is obvious that a population mean is not the same as a sample mean (take weight of students at
your college/university as an example). With a bit of thought, it also became clear that the sample mean had a
distribution. This meant that there was uncertainty regarding the population mean given the sample
information, and that you had to consider confidence intervals when making statements about the population
mean. The same concept carried over into the two-dimensional analysis of a simple regression: knowing the
height-weight relationship for a sample of students, for example, allowed you to make statements about the
population height-weight relationship. In other words, it was easy to understand the relationship between a
sample and a population in cross-sections. But what about time-series? Why should you be allowed to make
statistical inference about some population, given a sample at hand (using quarterly data from 1962 -2010, for
example)? Write an essay explaining the relationship between a sample and a population when using time
series.
Answer: Essays will differ by students. What is crucial here is the emphasis on stationarity or the concept that the
distribution remains constant over time. If the dependent variable and regressors are non -stationary,
then conventional hypothesis tests, confidence intervals, and forecasts can be unreliable. However, if
they are stationary, then it is plausible to argue that a sample will repeat itself again and again and
again, when getting additional data. It is in that sense that inference to a larger population can be made.
There are two concepts crucial to stationarity which are discussed in the textbook: (i) trends, and (ii)
breaks. Students should bring up methods for testing for stationarity and breaks, such as the DF and
ADF statistics, and the QLR test.
8) (Requires Internet access for the test question)
The following question requires you to download data from the internet and to load it into a statistical
package such as STATA or EViews.
a.
Your textbook suggests using two test statistics to test for stationarity: DF and ADF. Test the null
hypothesis that inflation has a stochastic trend against the alternative that it is stationary by
performing the DF and ADF test for a unit autoregressive root. That is, use the equation (14.34) in your
textbook with four lags and without a lag of the change in the inflation rate as a regressor for sample
period 1962:I — 2004:IV. Go to the Stock and Watson companion website for the textbook and
download the data “Macroeconomic Data Used in Chapters 14 and 16.” Enter the data for consumer
price index, calculate the inflation rate and the acceleration of the inflation rate, and replicate the result
on page 526 of your textbook. Make sure not to use the heteroskedasticity -robust standard error
option for the estimation.
b.
Next find a website with more recent data, such as the Federal Reserve Economic Data (FRED) site at
the Federal Reserve Bank of St. Louis. Locate the data for the CPI, which will be monthly, and convert
the data in quarterly averages. Then, using a sample from 1962:I — 2009:IV, re -estimate the above
specification and comment on the changes that have occurred.
c.
For the new sample period, find the DF statistic.
d.
Finally, calculate the ADF statistic, allowing for the lag length of the inflation acceleration term to be
determined by either the AIC or the BIC.
Answer: a. For the sample period 1962:I — 2004:IV, the result is as follows:
Dependent Variable: D2LP
Method: Least Squares
Date: 12/31/10 Time: 10:44
Sample: 1962Q1 2004Q4
Included observations: 172
Stock/Watson 2e -- CVC2 8/23/06 -- Page 348
Coefficient Std. Error
C
0.51 0.21
t-Statistic
Prob.
2.37
0.02
DLP(-1)
-0.11
0.04
-2.69
0.01
D2LP(-1)
-0.19
0.08
-2.32
0.02
D2LP(-2)
-0.26
0.08
-3.15
0.00
D2LP(-3)
0.20
0.08
2.51
0.01
D2LP(-4)
0.01
0.08
0.13
0.90
R-squared
0.24 Mean dependent var
0.02
Adjusted R-squared
0.21
S.D. dependent var
1.71
S.E. of regression
1.51
Akaike info criterion
3.70
Sum squared resid
380.61
Schwarz criterion
3.81
Log likelihood
-312.37
Hannan-Quinn criter.
3.75
F-statistic
10.31
Durbin-Watson stat
1.99
Prob(F-statistic)
0.00
Hence the ADF statistic is -2.69. You cannot reject the null hypothesis of non-stationarity at the
5% level (critical value -2.86), but you could at the 10% level (critical value -2.57).
b. Not much has changed. The intercept became smaller, but was statistically insignificant anyway. The
slope coefficient increase somewhat (as did the Regression R2 with it) and its t-statistic also became
stronger. Some of this is the result of data revisions (even for the old sample period the slope coefficient
increased somewhat) while part of it has changed because of the longer sample period.
Dependent Variable: D2LP
Method: Least Squares
Date: 12/31/10 Time: 11:20
Sample: 1962Q1 2009Q4
Included observations: 192
Coefficient Std. Error
C
0.62 0.26
t-Statistic
Prob.
2.36
0.02
DLP(-1)
-0.15
0.05
-2.75
0.01
D2LP(-1)
-0.29
0.08
-3.54
0.00
D2LP(-2)
-0.30
0.09
-3.45
0.00
D2LP(-3)
0.03
0.08
0.31
0.76
D2LP(-4)
-0.05
0.08
-0.62
0.54
R-squared
0.24 Mean dependent var
0.01
Adjusted R-squared
0.22
S.D. dependent var
2.20
S.E. of regression
1.95
Akaike info criterion
4.20
Sum squared resid
707.46
Schwarz criterion
4.31
Log likelihood
-397.64
Hannan-Quinn criter.
4.25
F-statistic
11.54
Durbin-Watson stat
2.00
Prob(F-statistic)
0.00
Stock/Watson 2e -- CVC2 8/23/06 -- Page 349
c. The DF statistic is obtained by simply regressing the change in the inflation rate on the lagged level of
the inflation rate. The t-statistic on the lagged inflation level is the ADF statistic, which is -5.28, rejecting
the null hypothesis of non-stationarity.
d. Both the AIC and the BIC have a minimum for two lags. For that case, the ADF statistic is -2.94 and
the null hypothesis of non-stationarity can therefore be rejected at the 5% level, but not at the 1% level.
14.3 Mathematical and Graphical Problems
1) (Requires Appendix material) Define the difference operator
LjYt = Yt-j. In general,
= (1 – L) where L is the lag operator, such that
i
j
j = (1- L )i, where i and j are typically omitted when they take the value of 1. Show
the expressions in Y only when applying the difference operator to the following expressions, and give the
resulting expression an economic interpretation, assuming that you are working with quarterly data:
(a) 4 Yt
(b)
2Yt
(c)
1 4 Yt
(d)
2
4 Yt
4 Yt = (1 - L4 ) Yt = Yt - Yt-4 . With quarterly data, this is the annual change. If Y is in logarithms,
then this is the annual growth rate.
(b) 2 Yt = (1 - L)2 Yt = (1 - 2L+ L2 )Yt = Yt - 2Yt-1 + Yt-2
Answer: (a)
= (Yt - Yt-1 ) - (Yt-1 - Yt-2 ) =
Yt -
Yt-1
This represents the change of the change in a variable, or the “acceleration.” If Y is in logarithms, then
this is the quarterly change in the growth rate. A good example would be the acceleration in the
quarterly inflation rate.
(c) 1 4 Yt = (1 - L)(1 - L4 )Yt = (1 - L - L4 + L5 )Yt =Yt - Yt-1 - Yt-4 + Yt-5
= (Yt - Yt-4 ) - (Yt-1 - Yt-5 )
This is the quarterly change in the annual change. If Y is in logarithms, then this is the quarterly
acceleration or change in the annual growth rate.
(d)
2 Y = (1 - L4 )2 Y = (1 - 2L4 + L8 )Y =Y - 2Y
t
t
t-4 + Yt-8
t
t
4
= (Yt - Yt-4 ) - (Yt-4 - Yt-8 ) =
4 Yt -
4 Yt-4
This represents the change in the annual change. If Y is in logarithms, then this is the change in the
annual growth rate.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 350
2) The textbook displayed the accompanying four economic time series with “markedly different patterns.” For
each indicate what you think the sample autocorrelations of the level (Y) and change ( Y) will be and explain
your reasoning.
(a)
(b)
(c)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 351
(d)
Answer: (a) There is strong positive autocorrelation in the federal funds rate, with sample autocorrelations
declining for higher lags. There are obvious long-term trends in the series in that the federal funds rate
was high during the first quarter of 1982, and high again in the second quarter of 1982. Similarly, it was
low during the first quarter of 1962 and again low in the second quarter of that year. Since inflationary
expectations and therefore the inflation rate itself play a large role in federal funds rate movements, it
should not be surprising to find a similar pattern in the autocorrelations for the inflation rate and the
federal funds rate. (The autocorrelations are 0.90, 0.83, 0.80 and 0.72 for lags one to four.) For the change
in the federal funds rate you would also expect a similar pattern in the autocorrelations as for the
inflation rate, i.e., a negative first autocorrelation. On average, an increase in the federal funds rate in
one quarter is associated with a decrease in the following quarter. (The autocorrelations are –0.14, -0.19,
0.24, -0.12.)
(b) (Different from the textbook, the figure here only displays the exchange rate behavior after the
collapse of the Bretton Woods system of fixed exchange rates.) As in the previous graph, there should be
positive autocorrelations reflecting long-term trends in the exchange rates. Students might point out that
due to purchasing power parity you could expect long-term exchange rate behavior or be similar to the
behavior of inflation rates. However, the inflation rate of the U.K would also have to be considered. (The
Stock/Watson 2e -- CVC2 8/23/06 -- Page 352
actual autocorrelations are 0.93, 0.85, 0.79, and 0.72 for lags one to four). Students may have difficulty
detecting the positive nature of the sample autocorrelations in the change of the exchange rate: positive
(negative) changes in the exchange rate tend to be followed by positive (negative) changes in the
following period. Perhaps students are able to see that the behavior of the exchange rate is somewhat
smoother than that of the federal funds rate. (The actual autocorrelations are 0.22, 0.14, 0.12, and 0.07 for
lags one to four.)
(c) Students should be able to identify the high autocorrelations in the level: typically a high level of real
GDP will be followed by a high level in the next period. In addition, there is to a large extent, a trend
increase. (The actual autocorrelations are 0.98, 0.96, 0.94, and 0.92 for one to four lags.) Since positive
growth rates in real GDP are typically followed by positive growth rates during the next quarter,
students should be able to see that the autocorrelations for the change in the logarithm of real GDP will
also be positive. (The actual autocorrelations are 0.29, 0.39, 0.40, and 0.36 for lags one to four.)
(d) Students should be able to see that the returns are essentially unpredictable, and that the level
autocorrelations should be very low. There are no long-term trends visible and a high return on a given
day is as likely to be followed by a high return the next day as a low return. (The actual autocorrelations
are 0.07, -0.01, -0.02, and 0.00 for lags one to four.) At the same time students should be able to see a
relatively strong negative first autocorrelation, since there are no long-term trends in the level returns. A
strong positive day-to-day change must therefore be followed, on average, by a strong negative change.
Due to the unpredictability, these autocorrelations should also fall off quite quickly (The actual
autocorrelations are –0.46, -0.04, -0.02, and 0.03 for lags one to four.)
3) You have decided to use the Dickey Fuller (DF) test on the United States aggregate unemployment rate (sample
period 1962:I – 1995:IV). As a result, you estimate the following AR(1) model
UrateUs t = 0.114 – 0.024 UrateUSt-1 , R2 =0.0118, SER = 0.3417
(0.121) (0.019)
You recall that your textbook mentioned that this form of the AR(1) is convenient because it allows for you to
test for the presence of a unit root by using the t- statistic of the slope. Being adventurous, you decide to
estimate the original form of the AR(1) instead, which results in the following output
UrateUs t = 0.114 – 0.976 UrateUSt-1 , R2 =0.9510, SER = 0.3417
(0.121) (0.019)
You are surprised to find the constant, the standard errors of the two coefficients, and the SER unchanged,
while the regression R2 increased substantially. Explain this increase in the regression R 2 . Why should you
have been able to predict the change in the slope coefficient and the constancy of the standard errors of the two
coefficients and the SER?
Answer: There is no additional information in the second regression, hence the SSR, and therefore the SER, will
not change. The only difference is that the lag of the dependent variable has been subtracted from both
sides. This linear transformation changes the coefficient on the lag dependent variable from (-0.024) to
(-0.024)-(-1) = -0.976. The regression R2 is defined as ESS/TSS or 1-(SSR/TSS). The only change here
has been in the TSS, which is now calculated from a level rather than a difference. Since TSS increases
and SSR remains unchanged, SSR/TSS must decrease, and the regression R2 will increase. Finally, the
heteroskedasticity-robust standard errors contain the residuals and other terms involving the regressor,
both of which have not changed between the two specifications. Hence the standard errors should also
remain unchanged.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 353
4) Consider the standard AR(1) Yt = 0 + 1 Yt-1 + ut, where the usual assumptions hold.
(a) Show that y t = 0 Yt-1 + ut, where y t is Yt with the mean removed, i.e., y t = Yt – E(Yt). Show that E(Yt) = 0.
(b) Show that the r-period ahead forecast E(y T+r T) =
r
1 y T. If 0 < 1 < 1, how does the r-period ahead
for large r?
T+r T
(c) The median lag is the number of periods it takes a time series with zero mean to halve its current value (in
log(2)
.
expectation), i.e., the solution r to E(y T+r T) = 0.5y T. Show that in the present case this is given by r = –
log( 1 )
forecast behave as r becomes large? What is the forecast of Y
Answer: (a) E(YT) = 0 + 1 E(Yt-1 ), since E(ut) = 0. Therefore
Yt - E(Yt) = 1 [Yt-1 - E(Yt-1 )] + ut or y t = 1 y t-1 + ut. Now
y t = 1 y t-1 + ut = 1 ( 1 y t-2 + ut-1 ) + ut =
2
1 y t-2 + ut + 1 ut-1.
Repeated substitution then results in
n
n
i
i
n+1
y t = 1 y t-(n+1) +
, yt =
1 ut-i , or as n
1 ut-i .
i=1
i=0
Taking expectations on both sides of the equation results in E(y t) = 0, since
E(ut) = E(ut-1 ) = ... = E(ut-n) = ... = 0.
(b) E(y
T+1 T
) = 1 y T since E(y T+1 ) = 0. E(y
) = 1 y T+1 =
T+2 T
y T. For large r, E(y
E(y
T+r T
(c) E(y
or r =
)=
T+r T
T+r T
0
+
1- 1
)=
2
1 y T and so on until E(y T+r T ) =
r
1
) = 0. Performing similar repeated substitutions for Yt instead of y t, results in
r
0
for large r.
1 y T and hence E(y T+r T ) = 11
r
1
1 y T = 2 y T or
r
1
1 = 2 . Taking logs and solving for r then results in rlog( 1 ) = -log(2)
log(2)
.
log( 1 )
5) Consider the following model
e
Yt = 0 + 1 X t + ut
where the superscript “e” indicates expected values. This may represent an example where consumption
depended on expected, or “permanent,” income. Furthermore, let expected income be formed as follows:
e
e
e
X t = X t-1 + (Xt-1 - X t-1 ); 0 <
<1
This particular type of expectation formation is called the “adaptive expectations hypothesis.”
(a) In the above expectation formation hypothesis, expectations are formed at the beginning of the period, say
the 1st of January if you had annual data. Give an intuitive explanation for this process.
(b) Transform the adaptive expectation hypothesis in such a way that the right hand side of the equation only
contains observable variables, i.e., no expectations.
(c) Show that by substituting the resulting equation from the previous question into the original equation, you
get an ADL(0, ) type equation. How are the coefficients of the regressors related to each other?
(d) Can you think of a transformation of the ADL(0, ) equation into an ADL(1,1) type equation, if you allowed
Stock/Watson 2e -- CVC2 8/23/06 -- Page 354
the error term to be (ut – ut-1 )?
e
Answer: (a) The term (Xt-1 - X t-1 ) is the forecast error for the previous period. If no forecast error was made,
then the forecast for the current period is the same as for the previous period. If there was a forecast
error, then the forecast for the current period is adjusted by a fraction of that forecast error. Note also
e
e
that the adaptive expectations hypothesis can be rewritten as X t =(1 - ) X t-1 + X t-1 ; 0 <
< 1, in
which case the expected value can be seen as a linear combination of the previous period’s forecast and
the previous periods actual value.
e
e
e
(b) X t =(1 - ) X t-1 + X t-1 = (1- )[(1- ) X t-2 + X t-2 ] + X t-1
e
= (1- )2 X t-2 + X t-1 + (1- )Xt-2 .
Repeated substitution results in
e
n+1
X t-2 = (1- )n+1 X t-(n+1) +
n
(1- )iXt-i-1 or, as n
e
, Xt =
i=0
(1- )i Xt-i-1 .
i=0
e
(c) Yt = 0 + 1 X t + ut = 0 + 1 (
(1- )i Xt-i-1 ) + ut or
i=0
Yt = 0 + 1 Xt-1 + 2 Xt-2 + ... + rXt-r + ... ut . Here 0 = 0 , and
i = 1 (1- )i; 1.
(d) Lagging both sides of Yt = 0 +
1(
(1- )i Xt-i-1 ) + ut and multiplying both sides by (1- ),
i=0
results in
(1- )Yt-1 = 0 (1- ) + 1 (
= 0+
1(
(1- ) i+1 Xt-i-2 ) + (1- )ut-1 . Finally, subtraction of this equation from Yt
i=0
(1- )i Xt-i-1 ) + ut gives you
i=0
Yt = 0 + 1 X t-1 + (1- )Yt-1 + (ut - (1- )ut-1 ).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 355
6) The following two graphs give you a plot of the United States aggregate unemployment rate for the sample
period 1962:I to 1999:IV, and the (log) level of real United States GDP for the sample period 1962:I to 1995:IV.
You want test for stationarity in both cases. Indicate whether or not you should include a time trend in your
Augmented Dickey-Fuller test and why.
Answer: Looking over the entire sample period, there does not appear to be a deterministic trend for the
unemployment rate. There is no need to include a time trend for the ADF test in this case. The log level
of real GDP, on the other hand, is clearly upward trended and a time trend should therefore be included.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 356
7) (Requires Appendix material): Show that the AR(1) process Yt = a1 Yt-1 + et; a1 < 1, can be converted to a MA(
) process.
2
Answer: Yt = a1 Yt-1 + et = a1 (a1 Yt-2 + et-1 ) + et = a 1 Yt-2 + et + a1 et-1 . Repeated substitution then results in Yt =
n+1
n
a 1 Yt-(n+1)+ et + a1 et-1 + ... + a 1 et-n, and for n
,
q
Yt = et + a1 et-1 + ... + a 1 et-q + ... .
8) (Requires Appendix material) The long-run, stationary state solution of an AD(p,q) model, which can be
written as A(L)Yt = 0 + c(L)Xt-1 + ut, where a0 = 1, and aj = - j, cj = j, can be found by setting L=1 in the two
lag polynomials. Explain. Derive the long-run solution for the estimated ADL(4,4) of the change in the
inflation rate on unemployment:
Inft = 1.32 – .36 Inft-1 – 0.34 Inft-2 + 0.7 Inft-3 – 0.3 Inft-4
-2.68Unempt-1 + 3.43Unempt-2 – 1.04Unempt-3 + .07Unempt-4
Assume that the inflation rate is constant in the long-run and calculate the resulting unemployment rate. What
does the solution represent? Is it reasonable to assume that this long -run solution is constant over the
estimation period 1962-1999? If not, how could you detect the instability?
Answer: In a stationary state equilibrium, variables do not change from one period to the next. Hence Xt-1 =Xt-2
= ... Xt-q. This is achieved in the above formulation by setting L=1. This solution represents the
equilibrium rate of unemployment or NAIRU. In the above example it is 6%. The NAIRU does not
remain constant but instead is a function of various determining variables such as demographic
composition of the labor force, the competitiveness of labor and product markets, the generosity of the
unemployment benefits system, etc. One way to detect instability is to test for breaks, using a Chow -test,
if the break date is known, or using the QLR statistic, if the break date is unknown.
9) You want to determine whether or not the unemployment rate for the United States has a stochastic trend
using the Augmented Dickey Fuller Test (ADF). The BIC suggests using 3 lags, while the AIC suggests 4 lags.
(a) Which of the two will you use for your choice of the optimal lag length?
(b) After estimating the appropriate equation, the t-statistic on the lag level unemployment rate is (–2.186)
(using a constant, but not a trend). What is your decision regarding the stochastic trend of the unemployment
rate series in the United States?
(c) Having worked in the previous exercise with the unemployment rate level, you repeat the exercise using the
difference in United States unemployment rates. Write down the appropriate equation to conduct the
Augmented Dickey-Fuller test here. The t-statistic on relevant coefficient turns out to be (-4.791). What is your
conclusion now?
Answer: (a) The BIC is a consistent estimator of the true lag length, while the AIC will overestimate the lag
length. The textbook suggests that if the researcher is concerned about too few lags, then the AIC can be
used as a reasonable alternative.
(b) The large-sample critical value of the ADF statistic is –2.57 at the 10% level. Hence you cannot reject
the null hypothesis of a unit root.
(c) 2 UrateUs t = 0 +
UrateUSt-1 + 1 2 UrateUSt-1
+ 2 2 UrateUSt-2 + 3 2 UrateUSt-3 + ut
The critical value at the 1% level is –3.43, so that you can reject the null hypothesis of a unit root in the
change of the U.S. unemployment rate.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 357
10) Consider the AR(1) model Yt = 0 + 1 Yt-1 + ut,
(a) Find the mean and variance of Yt.
1 < 1..
(b) Find the first two autocovariances of Yt.
(c) Find the first two autocorrelations of Yt.
Answer: (a) Rewrite the AR(1) model as follows
Yt = 0 + 1 Yt-1 + ut = 0 + 1 ( 0 + 1 Yt-2 + ut-1 ) + ut
2
Y
= 0 (1+ 1 ) +
+ ut + 1 ut-1 .
1 t-2
Continuing the substitution indefinitely then results in
i
u
Given the result for the sum of a geometric series, the final
1 t-i .
2
3
Yt = 0 (1 + 1 +
+
1
1 + ...) +
i=0
expression is
Yt =
0
1- 1
0
1- 1
i
u
1 t-i . To find the mean and the variance, take first expectations on both sides E(Yt) =
+
i=0
+
i=0
i
0
E(ut-i) =
, since E(ut) = 0 for all t.
1
1- 1
To derive the variance, note that Yt - E(Yt) =
i=0
i=0
i
( ) E(ut-i)2 =
1
2
u
i=0
i
( 1 )2 =
i
2
u
1 t-i . Hence the variance is E(Yt - E(Yt)) =
2
u
2
11
.
(b) The first two autocovariances are defined as cov(Yt, Yt-1 ) and cov(Yt, Yt-2 ). Using the fact that Yt =
0
+
1- 1
i=0
i
u
1 t-i and that the expected values for both Yt and Yt-j, you get E[(Yt - E(Yt)(Yt-1 i
u )(
1 t-i
E(Yt-1 )] = E[(
i=0
var(ut)( 1 +
3
1 +
i=1
i
u
1 t-i )]=
5
1 + ...) = var(ut) 1 (1 + 1 +
2
1 + ...)
2
u
11 -
1
.
Similarly cov(Yt, Yt-2 ) =
(c) Since corr(Yt, Yt-j) =
corr(Yt, Yt-j) =
2
u
2
(and, more generally cov(Yt, Yt-j)=
11- 1
cov(Yt, Yt-j)
var(Yt)
j
).
11- 1
, corr(Yt, Yt-1 ) = 1 and corr(Yt, Yt-2 ) =
j
1 ).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 358
2
u
2
1 (and, in general,
11) Find data for real GDP (Yt) for the United States for the time period 1959:I (first quarter) to 1995:IV. Next
generate two growth rates: The (annualized) quarterly growth rate of real GDP [(ln Yt — ln Yt-1 )×400] and the
annual growth rate of real GDP [(ln Yt — ln Yt-4 )×100]. Which is more volatile? What is the reason for this?
Explain.
Answer:
The quarterly growth rate that is more volatile because the annual growth rate is a moving average of the
quarterly growth rate, and hence “wild swings” are smoothed out:
(ln Yt — ln Yt-4) = (ln Yt — ln Yt-1) + (ln Yt-1 — ln Yt-2) + (ln Yt-2 — ln Yt-3)+ (ln Yt-3 — ln Yt-4)
12) You have collected data for real GDP (Y) and have estimated the following function:
^
lnYt = 7.866 + 0.00679×Zeit
(0.007) (0.00008)
t = 1961:I — 2007:IV, R2 = 0.98, SER = 0.036
where Zeit is a deterministic time trend, which takes on the value of 1 during the first quarter of 1961, and is
increased by one for each following quarter.
a.
Interpret the slope coefficient. Does it make sense?
b.
Interpret the regression R2 . Are you impressed by its value?
c.
Do you think that given the regression R2 , you should use the equation to forecast real GDP beyond the sample
period?
Answer: a. The slope coefficient indicates the average growth rate per quarter. Since 1896, the U.S. economy has grown at a
rate of approximately 3%. As a result, observing a quarterly growth rate of 0.7% makes very much sense.
b. The regression R2 tells you that 98 percent of the variation in the log of real GDP is explained by the model.
Since the model only contains a deterministic time trend, this seems high on face value.
c. The logarithm of real GDP is bound to be non-stationary (using the ADF statistic, you would not be able to reject
the null hypothesis that the log of real GDP has a unit root). Hence this equation should not be used for forecasting
despite the very high regression R2 .
Stock/Watson 2e -- CVC2 8/23/06 -- Page 359
Chapter 15 Estimation of Dynamic Causal Effects
15.1 Multiple Choice
1) A distributed lag regression
A) is also called AR(p).
B) can also be used with cross-sectional data.
C) gives estimates of dynamic causal effects.
D) is sometimes referred to as ADL.
Answer: C
2) Heteroskedasticity- and autocorrelation-consistent standard errors
A) result in the OLS estimator being BLUE.
B) should be used when errors are autocorrelated.
C) are calculated when using the Cochrane-Orcutt iterative procedure.
D) have the same formula as the heteroskedasticity robust standard errors in cross-sections.
Answer: B
3) Sensitivity analysis of the results may include the following with the exception of
A) stability over time analysis of the estimated multipliers.
B) using homoskedasticity only rather than HAC standard errors.
C) investigation of omitted variable bias.
D) looking at different computations of the HAC standard errors.
Answer: B
4) A seasonal binary (or indicator or dummy) variable, in the case of monthly data,
A) is a binary variable that take on the value of 1 for a given month and is 0 otherwise.
B) is a variable that has values of 1 to 12 in a given year.
C) is a variable that contains 1s during a given year and is 0 otherwise.
D) does not exist, since a month is not a season.
Answer: A
5) Ascertaining whether or not a regressor is strictly exogenous or exogenous ultimately requires all of the
following with the exception of
A) economic theory.
B) institutional knowledge.
C) expert judgment.
D) use of HAC standard errors.
Answer: D
6) In time series, the definition of causal effects
A) says that one variable helps predict another variable.
B) does not make much sense since there are not multiple subjects.
C) assumes that the same subject is being given different treatments at different points in time.
D) requires panel data.
Answer: C
7) The distributed lag model is given by
A) Yt = 0 + 1 Xt + 2 Yt-1 + ut.
B) Yt = 0 + 1 Yt-1 + 2 Yt-2 + ... + rYt-r + ut.
C) Yt = 0 + 1 ut + 2 ut+1 + 3 ut+2 + ... + r+1 ut+r + et.
D) Yt = 0 + 1 Xt + 2 Xt-1 + 3 Xt-2 + ... + r+1 Xt-r + ut.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 360
8) The concept of exogeneity is important because
A) it clarifies whether or not the variable is determined inside or outside your model.
B) maximum likelihood estimation is no longer valid.
C) under strict exogeneity, OLS may not be efficient as an estimator of dynamic causal effects.
D) endogenous variables are not stationary, but exogenous variables are.
Answer: C
9) The impact effect is the
A) zero period dynamic multiplier.
B) h period dynamic multiplier, h>0.
C) cumulative dynamic multiplier.
D) long-run cumulative dynamic multiplier.
Answer: A
10) Estimation of dynamic multipliers under strict exogeneity should be done by
A) instrumental variable methods.
B) OLS.
C) feasible GLS.
D) analyzing the stationarity of the multipliers.
Answer: C
11) Autocorrelation of the error terms
A) makes it impossible to calculate homoskedasticity only standard errors.
B) causes OLS to be no longer consistent.
C) causes the usual OLS standard errors to be inconsistent.
D) results in OLS being biased.
Answer: C
12) The long-run cumulative dynamic multiplier
A) cannot be calculated since in the long-run, we are all dead.
B) is the sum of all individual dynamic multipliers.
C) is the coefficient on Xt-r in the standard formulation of the distributed lag model.
D) is the difference between the coefficient on Xt-1 and Xt-r.
Answer: B
13) The concepts of exogeneity, strict exogeneity, and predeterminedness
A) are defined in such a way that strict exogeneity implies exogeneity.
B) can be used interchangeably.
C) are defined in such a way that exogeneity implies strict exogeneity.
D) correspond to endogeneity, strict endogeneity, and lagged endogenous variables.
Answer: A
14) GLS
A) results in smaller variances of the estimator than OLS if the regressors are strictly exogenous.
B) is the same as OLS using HAC standard errors.
C) can be used even if the regressors are not strictly exogenous.
D) can be used for time-series estimation, but not in cross-sectional data.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 361
15) Quasi differences in Yt are defined as
A) Yt - Yt-1 .
B) Yt - 1 Yt-1 .
C) Yt - 1 Yt-1 .
D) 1 (Yt - Yt-1 ).
Answer: B
16) Infeasible GLS
A) requires too much memory even for today’s PCs.
B) uses complicated interative techniques.
C) cannot be calculated since it also uses quasi differences for Xt.
D) assumes the parameters of the error autocorrelation process to be known.
Answer: D
17) The 95% confidence interval for the dynamic multipliers should be computed by using the estimated coefficient
±
A) 1.96 times the RMSFE.
B) 1.96 times the HAC standard errors.
C) 1.96, since the HAC errors are standardized.
D) 1.64 times the HAC standard errors since the alternative hypothesis is one -sided.
Answer: B
18) The Cochrane-Orcutt iterative method is
A) a special case of GLS estimation.
B) a method to compute HAC standard errors.
C) a special case of maximum likelihood estimation.
D) a grid search for the autoregressive parameters on the error process.
Answer: A
19) To convey information about the dynamic multipliers more effectively, you should
A) plot them.
B) discuss these carefully one at a time.
C) estimate them by maximum likelihood methods.
D) first make sure that they are stationary.
Answer: A
20) GLS involves
A) writing the model in differences and estimating it by OLS, using HAC standard errors.
B) truncating the sample at both ends of the period, then estimating by OLS using HAC standard errors.
C) checking the AIC rather than the BIC in choosing the maximum lag -length of the regressors.
D) transforming the regression model so that the errors are homoskedastic and serially uncorrelated, and
then estimating the transformed regression model by OLS.
Answer: D
21) GLS is consistent and BLUE if
A) X is predetermined.
B) the error process is AR(1).
C) X is strictly exogenous.
D) all the roots are inside the unit circle.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 362
22) The distributed lag model assumptions include all of the following with the exception of:
A) There is no perfect multicollinearity.
B) Xt is strictly exogenous.
C) E(ut Xt, Xt-1 , Xt-2 ) = 0
D) The random variables Xt and Yt have a stationary distribution.
Answer: B
23) In the distributed lag model, the coefficient on the contemporaneous value of the regressor is called the
A) dynamic effect.
B) cumulative multiplier.
C) autoregressive error.
D) impact effect.
Answer: D
24) In the distributed lag model, the dynamic causal effect
A) is the sequence of coefficients on the current and lagged values of X.
B) is not the same as the dynamic multiplier.
C) is generated by choosing different truncation points for the HAC standard errors.
D) requires estimation of the model by Cochrane-Orcutt method.
Answer: A
25) HAC standard errors should be used because
A) they are convenient simplifications of the heteroskedasticity -robust standard errors.
B) conventional standard errors may result in misleading inference.
C) they are easier to calculate than the heteroskedasticity-robust standard errors and yet still allow you to
perform inference correctly.
D) when there is a structural break, then conventional standard errors result in misleading inference.
Answer: B
26) The interpretation of the coefficients in a distributed lag regression as causal dynamic effects hinges on
A) the assumption that X is exogenous
B) not having more than four lags when using quarterly data
C) using GLS rather than OLS
D) the use of monthly rather than annual data
Answer: A
27) Given the relationship between the two variables, the following is most likely to be exogenous:
A) the inflation rate and the short term interest rate: short-term interest rate is exogenous
B) U.S. rate of inflation and increases in oil prices: oil prices are exgoneous
C) Australian exports and U.S. aggregate income: U.S. aggregate income is exogenous
D) change in inflation, lagged changes of inflation, and lags of unemployment: lags of unemployment are
exogenous
Answer: C
28) When Xt is strictly exogenous, the following estimator(s) of dynamic causal effects are available:
A) estimating an ADL model and calculating the dyamic multipliers from the estimated ADL coefficients
B) using GLS to estimate the coefficients of the distributed lag model
C) neither (a) or (b)
D) (a) and (b)
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 363
29) In time series data, it is useful to think of a randomized controlled experiment
A) consisting of the same subject being given different treatments at different points in time
B) consisting of different subjects being given the same treatment at the same point in time
C) as being non-existent (this is a time series after all, and there are no real “parallel universes”
D) consisting of the at least two subjects being given different treatments at the same point in time
Answer: A
30) Consider the distributed lag model Yt =
is
A) 0 + 1
B) 1 + 2 +…+ r+1
C) 0 + 1 +…+ r+1
D) 1
0
+
1 Xt
+
2 Xt-1
+
3 Xt-2
+…+
r+1 Xt-r
+ ut. The dynamic causal effect
Answer: B
15.2 Essays and Longer Questions
1) To estimate dynamic causal effects, your textbook presents the distributed lag regression model, the
autoregressive distributed lag model, and a quasi-difference representation of the distributed lag model with
autoregressive errors. Using a simple example, such as a distributed lag model with only the current and past
value of X and an AR(1) model for the error term, discuss how these models are related. In each case suggest
estimation methods and evaluate the relative merit in using one rather than the other.
Answer: The student’s answer should follow the discussion in section 13.2-13.3 (distributed lag model) and 13.5
(autoregressive distributed lag model and quasi-difference representation of the distributed lag model
with autoregressive errors). Major points which should include the assumption of exogeneity in the case
of the distributed lag model, which, together with the other distributed lag model assumptions, allows
for the dynamic multiplier and cumulative dynamic multiplier estimation by OLS. Given the AR(1)
nature of the error term, the importance of using HAC standard errors should be stressed.
For the ADL and quasi-difference representation, the importance of the strictly exogenous regressor
assumption must be emphasized. The answer should include the derivation of the dynamic multipliers
from the OLS estimated ADL coefficients and the difference between the infeasible and feasible GLS
estimator. For the latter, the Cochrane-Orcutt procedure should be mentioned.
If the regressors are strictly exogenous, then GLS is asymptotically BLUE. However, since the ADL
specification requires estimation of fewer parameters, it may be preferred in practice. If there is no
convincing argument for the regressor being strictly exogenous, but an argument for exogeneity can be
made, then OLS estimation using HAC standard errors is the preferred method.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 364
2) Your textbook presents as an example of a distributed lag regression the effect of the weather on the price of
orange juice. The authors mention U.S. income and Australian exports, oil prices and inflation, monetary policy
and inflation, and the Phillips curve as other candidates for distributed lag regression. Briefly discuss whether
or not the exogeneity assumption is likely to hold in each of these cases. Explain why it is so hard to come up
with good examples of distributed lag regressions in economics.
Answer: Student’s answers should follow the discussion of section 13.7 in the textbook. Although there is some
degree of simultaneity between Australian exports and U.S. income, the Australian economy is too small
relative to the American economy to present much of a feedback from a fall in exports. It is therefore
reasonable to assume that U.S. income is exogenous in a regression of Australian exports on U.S. income.
The situation is different for oil prices and inflation since it is reasonable to assume that members of
OPEC countries analyze world wide economic conditions, including inflation rates, when setting oil
prices. If this is the case, then oil prices are not exogenous. Monetary policy and inflation are other
examples where it cannot be assumed reasonably that the monetary base or the federal funds rate is
exogenous. The Federal Reserve takes into account current and future inflation rates when setting their
instrument, which is thereby endogenous. Finally, the Phillips curve is another example where it cannot
be assumed that the (lagged) unemployment rate is exogenous, since past values of the unemployment
rate were simultaneously determined with past inflation rates.
3) Money supply is linked to the monetary base by the money multiplier. Macroeconomic textbooks tell you that
the central bank cannot control the money supply, but it can control the monetary base. As a result, you decide
to specify a distributed lag equation of the growth in the money supply on the growth in the monetary base.
One of your peers tells you that this is not a good idea for modeling the relationship between the two variables.
What does she mean?
Answer: Although the monetary base is one of the determinants of the money supply, there are other factors,
such as interest rates, that have an effect on the money multiplier. Hence there is the problem of omitted
variables. If interest rates are correlated with the monetary base, then the OLS estimator will be
inconsistent. Furthermore, it is likely that due to financial innovations, dynamic causal effects have
changed over time. Finally there is the concern of simultaneous causality bias. If the Federal Reserve
changes the monetary base as a result of changes in the money supply, perhaps as a result of targeting,
then the monetary base becomes endogenous.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 365
4) In your intermediate macroeconomics course, government expenditures and the money supply were treated as
exogenous, in the sense that the variables could be changed to conduct economic policy to influence target
variables, but that these variables would not react to changes in the economy as a result of some fixed rule. The
St. Louis Model, proposed by two researchers at the Federal Reserve in St. Louis, used this idea to test whether
monetary policy or fiscal policy was more effective in influencing output behavior. Although there were
various versions of this model, the basic specification was of the following type:
ln(Yt) = 0 + 1 ln mt + ... + p ln mt-p-1 + p+1 ln Gt + ... + p+q ln Gt-q-1 + ut
Assuming that money supply and government expenditures are exogenous, how would you estimate dynamic
causal effects? Why do you think this type of model is no longer used by most to calculate fiscal and monetary
multipliers?
Answer: If the money supply and government expenditures were exogenous, then a distributed lag model could
be used to estimate the dynamic multipliers and cumulative dynamic multipliers using OLS. The
coefficients in the above equation are then the dynamic multipliers. To obtain the h-period cumulative
dynamic multipliers, all coefficients over the h-periods have to be added up. There is an alternative form
for the above equation which allows for statistical testing of the cumulative dynamic multipliers. This
involves differencing the regressors with the exception of the last lag, p and q, in the above equation. The
coefficient on the p and q lagged regressor then represents the long-run cumulative multiplier. The OLS
estimator of the coefficients in the above equation is consistent. However, the errors are likely to be
autocorrelated since omitted variables from the above equation are probably serially correlated
themselves. In that case the OLS standard errors are inconsistent and statistical inference based on these
standard errors will be misleading. To avoid this problem, heteroskedasticity- and
autocorrelation-consistent standard errors can be calculated. The reason why this type of model is no
longer used by most to calculate fiscal and monetary multipliers is that researchers are not willing to
assume that the money supply and government expenditures are exogenous. Both monetary and fiscal
policy takes into account current and future expected output growth in setting their policy instruments,
which are therefore endogenous.
5) Your textbook mentions heteroskedasticity- and autocorrelation- consistent standard errors. Explain why you
should use this option in your regression package when estimating the distributed lag regression model. What
are the properties of the OLS estimator in the presence of heteroskedasticity and autocorrelation in the error
terms? Explain why it is likely to find autocorrelation in time series data. If the errors are autocorrelated, then
why not simply adjust for autocorrelation by using some non-linear estimation method such as
Cochrane-Orcutt?
Answer: In the presence of either heteroskedasticity and/or autocorrelation in the errors, OLS estimation of the
regression coefficients is still consistent. However, the homoskedasticity-only or
heteroskedasticity-robust standard errors are inconsistent and use of these in the presence of serial
correlation results in misleading statistical inference. For example, confidence intervals do not contain
the true value in the postulated number of times in repeated samples. The solution is to adjust the
estimator for the standard errors by incorporating sample autocorrelation estimates. This results in the
heteroskedasticity- and autocorrelation-consistent (HAC) estimator of the variance of the estimator. For
this estimator to be consistent, a certain truncation parameter is introduced, so that not all T-1 sample
autocorrelations are used. Incorporating this idea into the HAC formula results in the Newey -West
variance estimator.
Autocorrelation in the errors is likely if there are omitted variables which are slowly changing over time.
Since the omitted variables are implicitly contained in the error term, this would result in autocorrelation
of the error term. For generalized least squares to have desirable properties, the regressors have to be
strictly (past, present, and future) exogenous, rather than just (past and present) exogenous. There are
very few truly exogenous variables in economics. Furthermore, most of the relationships between
economic time series contain simultaneous causality. As the example in the textbook on orange juice
prices and cold weather showed, it is even more difficult to find strictly exogenous variables.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 366
6) Your textbook presents as an example of a distributed lag regression the effect of the weather on the price of
orange juice. The authors mention U.S. income and Australian exports, oil prices and inflation, monetary policy
and inflation, and the Phillips curve as other potential candidates for distributed lag regression. You are
considering estimating the effect of minimum wages on teenage employment (employment population ratio)
using a time series of U.S. data. Write a short essay on whether a distributed lag model would be a suitable
tool to figure out dynamic causal effects in this case.
Answer: One of the first questions student must address is whether or not the X variable here is exogenous. In
studies of the labor market, e.g. microeconomics, students learned that it is real wages that determine
employment, not nominal wages. Some authors have used relative wages as an explanatory variable,
where the denominator is average hourly earnings. Setting aside whether or not minimum wages are
exogenous, the students should then focus on whether the price index used to adjust nominal minimum
wages or average hourly earnings are exogenous. However, most students will focus only on the
numerator (nominal minimum wages) and will argue that minimum wages are typically set by the
legislature following some political process and may therefore be considered exogenous. Some will go
further and argue that the process of setting minimum wages will depend on the state of the business
cycle. For example, recent increases in minimum wages (2007, 2008, 2009) would most likely not have
occurred if legislators would have anticipated teenage unemployment rates of over 25% for teenagers. If
that is the case, then minimum wage legislation depends on the state of the business cycle and hence
teenage employment. As a result, minimum wages should not be considered exogenous.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 367
15.3 Mathematical and Graphical Problems
1) One of the central predictions of neo-classical macroeconomic growth theory is that an increase in the growth
rate of the population causes at first a decline the growth rate of real output per capita, but that subsequently
the growth rate returns to its natural level, itself determined by the rate of technological innovation. The
intuition is that, if the growth rate of the workforce increases, then more has to be saved to provide the new
workers with physical capital. However, accumulating capital takes time, so that output per capita falls in the
short run.
Under the assumption that population growth is exogenous, a number of regressions of the growth rate of
output per capita on current and lagged population growth were performed, as reported below. (A constant
was included in the regressions but is not reported. HAC standard errors are in brackets. BIC is listed at the
bottom of the table).
Regression of Growth Rate of Real Per -Capita GDP on Lags of Population Growth,
United States, 1825-2000
Lag
number
0
1
2
3
4
BIC
(1)
Dynamic
multipliers
-0.9
(1.3)
3.5
(1.6)
-1.3
(1.7)
0.2
(1.7)
-2.0
(1.5)
-234.4
(2)
Dynamic
multipliers
-1.1
(1.3)
3.2
(1.6)
-3.0
(1.6)
1.5
(1.2)
-
(3)
Dynamic
multipliers
-1.3
(1.7)
1.8
(1.6)
-2.2
(1.4)
-
(4)
Dynamic
multipliers
-0.2
(1.7)
0.8
(1.5)
-
(5)
Dynamic
multipliers
-2.0
(1.5)
-
-
-
-
-
-
-236.1
-238.5
-240.0
-241.8
-
(a) Which of these models is favored by the information criterion?
(b) How consistent are these estimates with the theory? Is this a fair test of the theory? Why or why not?
(c) Can you think of any improved data to test the theory?
Answer: (a) BIC has a minimum for no lag and this criterium therefore favors a static specification.
(b) The estimates tell us that there is no dynamic multipliers other than the contemporaneous or impact
effect. Even the impact effect is not statistically significant. It is unlikely that population growth is
exogenous and therefore this does not represent a fair test of the theory. In addition, there is omitted
variable bias with other relevant variables, such as the savings rate, education, etc. missing as regressors.
(c) Per capita output or income is likely to be a determinant of fertility. As a result, population growth is
not likely to be exogenous. Perhaps the working age population would be a better choice here, but data
for early periods are almost impossible to obtain.
2) The Gallup Poll frequently surveys the electorate to quantify the public’s opinion of the president. Since 1945,
Gallup settled on the following wording of its presidential poll: “Do you approve or disapprove of the way
(name) is handling his job as president?” Gallup has not changed its presidential question since then, and
respondents can answer “approve,” “disapprove,” or “no opinion.”
You want to see how this approval rating is related to the Michigan index of consumer sentiment (ICS). The
monthly survey, conducted with a minimum sample of 500, asks people if they feel “better/worse off” with
regard to current and future conditions.
(a) To estimate dynamic causal effects, you collect quarterly data from 1962:I – 1998:II for the United States.
You allow a binary variable for each presidency to capture the intrinsic popularity of the President.
Furthermore, you eliminate observations that include a change in party for the presidency by using a binary
Stock/Watson 2e -- CVC2 8/23/06 -- Page 368
variable, which takes on the value of one during the first quarter of the year after the election. Finally, a
friendly political scientist provides you with (i) an “events” variable, (ii) a “Vietnam” binary variable, and (iii) a
“honeymoon” variable, which measures the effect of a higher popularity of a president immediately following
the election. (The coefficients of these variables will not be reported here.)
Assuming that consumer sentiment is exogenous, you estimate the following two specifications (numbers in
parenthesis are heteroskedasticity- and autocorrelation-consistent standard errors):
Approvalt = 26.08 + 0.178 × ICSt + 0.232 × ICSt-1 ; R2 = 0.667, SER = 7.00
(8.83) (0.120)
(0.135)
Approvalt = 26.08 + 0.178 × ICSt + 0.411 + ICSt-1 ; R2 = 0.667, SER = 7.00
(8.17) (0.120 )
(0.089)
What is the difference between the two specifications? What is the advantage of estimating the second
equation, if any?
(b) Assuming that the errors follow an AR(1) process, you also estimate the following alternative:
Approvalt = -4.61 + 0.300 × ICSt – 0.070 × ICSt-1 - 0.054 × ICSt-2 ; + 0.776 × Approvalt-1 ;
(5.84) (0.083)
(0.099)
(0.083)
(0.057)
R2 = 0.868, SER = 4.45
How is this specification related to the previous ones? What implicit assumptions did you have to make to
allow for desirable properties of the OLS estimator?
(c) You finally estimate the approval equation using the quasi-difference specification and the GLS estimator.
~
~
~
Approvalt= –4.61 + 0.300 × ICSt – 0.070 × ICSt-1 ;
(5.84) (0.083)
(0.099)
R2 = 0.868, SER = 4.45
~
^
^
where Zt = Zt – 1 Zt-1 and 1 = 0.896 (0.040).
How is this equation related to the ones in (a) and (b)? What are the properties of the GLS estimator here,
under the assumption that ICS is strictly exogenous?
(d) Is it likely that the ICS is exogenous here? Strictly exogenous?
Answer: (a) If the regressor is exogenous, then the estimates in the first regression measure the impact effect and
the one-period dynamic multiplier of a change in consumer sentiment on approval ratings. The
coefficients in the second equation are cumulative dynamic multipliers, where the coefficient on ICSt-1
represents the long-run cumulative multiplier. The advantage of the second equation is that it allows for
testing cumulative dynamic multipliers.
(b) This is the ADL representation of the distributed lag model with first order autocorrelation. The
assumption is that ICS is a strictly exogenous regressor. If this is the case, then the dynamic multipliers
can be calculated from these estimates.
(c) This is the quasi-difference representation of the distributed lag model with autoregressive errors.
Given the restrictions on the parameters of the ADL model, it simply reorganizes the regressors. If ICS
were strictly exogenous, then GLS produces asymptotically efficient (BLUE) estimators.
(d) If approval ratings depend on economic variables, such as the inflation rate, the unemployment rate,
and income growth, then there is omitted variable bias, since these variables will be correlated with
consumer sentiment. Furthermore, if lower approval ratings (“popularity deficit”) result in stimulating
the economy, which in return will have an effect on consumer sentiment, then there is simultaneous
Stock/Watson 2e -- CVC2 8/23/06 -- Page 369
causality in addition. If a variable is not exogenous, then it is also not strictly exogenous.
~ ~
3) Consider the following distributed lag model Yt = 0 + 1 Xt + 2 Xt-1 + ut, where ut = 1 ut-1 + ut, ut is serially
uncorrelated, and X is strictly exogenous.
(a) How many parameters are there to be estimated between the two equations?
(b) Using the two equations of the model above, derive the ADL form of the model.
(c) There are five regressors in the ADL model, namely Yt-1 , Xt, Xt-1 , Xt-2 and the constant. Estimating the
ADL model linearly will give you five coefficients. Can you derive the parameters of the original two equation
model from these five estimates? Why or why not?
(d) What alternative method do you have to retrieve the parameters of the two equation model?
Answer: (a) There are four parameters to be estimated, 0 , 1 , 2 and 1 .
(b) The ADL form of the model is derived by multiplying the first equation by
1 and lagging it, then
subtracting the resulting equation from the first equation, and using the AR(1) equation of the error term
for simplification of the resulting specification.
Yt = 0 + 1 Xt + 2 Xt-1 + ut
-[ 1 Yt-1 = 1 0 + 1 1 Xt-1 + 1 2 Xt-2 + 1 ut-1 ]
which, after collecting terms, results in
Yt = 0 (1- 1 ) + 1 Yt-1 + 1 Xt + ( 2 - 1 1 ) Xt-1 - 1 2 Xt-2 + (ut - 1 ut-1 )
or
~
Yt = 0 + 1 Yt-1 + 0 Xt + 1 Xt-1 + 2 Xt-2 + ut.
(c) The original four parameters cannot be derived without restrictions since in essence you have five
equation in four unknowns.
(d) The above model can be specified in quasi-differences, i.e.,
~
(Yt - 1 Yt-1 ) = 0 (1- 1 ) + 1 (Xt - 1 Xt-1 ) + 2 (Xt-1 - 1 Xt-2 ) + ut
or
~
~
~
~
Yt = 0 + 1 Xt + 2 Xt-1 + ut.
The parameters now can be estimated using nonlinear least squares, or specifically, the
Cochrane-Orcutt, or the iterated Cochrane-Orcutt estimator.
4) A model that attracted quite a bit of interest in macroeconomics in the 1970s was the St. Louis model. The
underlying idea was to calculate fiscal and monetary impact and long run cumulative dynamic multipliers, by
relating output (growth) to government expenditure (growth) and money supply (growth). The assumption
was that both government expenditures and the money supply were exogenous. Estimation of a St. Louis type
model using quarterly data from 1960:I-1995:IV results in the following output (HAC standard errors in
parenthesis):
ygrowtht = 0.018 + 0.006 × dmgrowtht + 0.235 × dmgrowtht-1 + 0.344 × dmgrowtht-2
(0.004) (0.079)
(0.091)
(0.087)
+ 0.385 × dmgrotht-3 + 0.425 × mgrowtht-4 + 0.170 × dggrowth t – 0.044dggrowth t-1
(0.097)
(0.069)
(0.049)
(0.068)
- 0.003 × dggrowth t-2 – 0.079 × dggrowth t-3 + 0.018 × ggrowtht-4 ;
(0.040)
(0.051)
(0.027)
R2 = 0.346, SER=0.03
Stock/Watson 2e -- CVC2 8/23/06 -- Page 370
where ygrowth is quarterly growth of real GDP, mgrowth is quarterly growth of real money supply (M2), and
ggrowth is quarterly growth of real government expenditures. “d” in front of ggrowth and mgrowth indicates a
change in the variable.
(a) Assuming that money and government expenditures are exogenous, what do the coefficients represent?
Calculate the h-period cumulative dynamic multipliers from these. How can you test for the statistical
significance of the cumulative dynamic multipliers and the long-run cumulative dynamic multiplier?
(b) Sketch the estimated dynamic and cumulative dynamic fiscal and monetary multipliers.
(c) For these coefficients to represent dynamic multipliers, the money supply and government expenditures
must be exogenous variables. Explain why this is unlikely to be the case. As a result, what importance should
you attach to the above results?
Answer: (a) In that case the coefficients represent dynamic multipliers.
Lag number
0
1
2
3
4
Monetary
Dynamic
Multiplier
0.006
0.235
0.344
0.385
0.425
Monetary
Cumulative
Multiplier
0.006
0.241
0.585
0.970
1.395
Fiscal
Dynamic
Multiplier
0.170
-0.044
-0.003
-0.079
0.018
Fiscal
Cumulative
Multiplier
0.170
0.126
0.123
0.044
0.062
To test for the significance of the cumulative dynamic multipliers and the long -run cumulative dynamic
multiplier, the equation must be reestimated with all regressors appearing in differences with the
exception of the longest lag. The coefficients of these regressors then represent cumulative dynamic
multipliers and t-statistics can be used to test for their statistical significance.
(b) See the accompanying figures.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 371
(c) There is little reason to believe that these government instruments are exogenous. Even if the
monetary base and those components of government expenditures which do not respond to business
cycle fluctuations had been chosen rather than the above regressors, then these instruments respond to
changes in the growth rate of GDP. As a matter of fact, government reaction functions were also
estimated at the time to capture how government instruments respond to changes in target variables. As
a result, the regressors will be correlated with the error term, OLS estimation is inconsistent, and
inference not dependable. It is hard to imagine how useable information can be retrieved from these
numbers.
5) Your textbook used a distributed lag model with only current and past values of Xt–1 coupled with an AR(1)
error model to derive a quasi-difference model, where the error term was uncorrelated.
(a) Instead use a static model Yt = 0 + 1 Xt + ut here, where the error term follows an
AR(1). Derive the quasi difference form. Explain why in the case of the infeasible GLS estimators you could
easily estimate the s by OLS.
(b) Since 1 (the autocorrelation parameter for ut) is unknown, describe the Cochrane-Orcutt estimation
procedure.
(c) Explain how the iterated Cochrane-Orcutt estimator works in this situation. Iterations stop when there is
“convergence” in the estimates. What do you think is meant by that?
(d) Your textbook has pointed out that the iterated Cochrane-Orcutt GLS estimator is in fact the nonlinear least
squares estimator of the model. Given that -1 < 1 < 1, suggest a “grid search” or some strategy to “nail down”
^
the value of 1 which minimizes the sum of squared residuals. This is the so-called Hildreth-Lu method.
Answer: (a) The quasi-difference model is derived by multiplying the equation by
1 and lagging it, then
subtracting the resulting equation from the first equation, and using the AR(1) equation of the error term
for simplification of the resulting specification.
Yt = 0 + 0 Xt + ut
-[ 1 Yt-1 = 1 0 + 1 1 Xt-1 + 1 ut-1 ]
which results in
Stock/Watson 2e -- CVC2 8/23/06 -- Page 372
Yt - 1 Yt-1 = 0 (1 - 1 ) + 1 Xt-1 - 1 1 Xt-1 + (ut - ut-1 ).
Using the quasi-difference notation then yields
~
~
~
Yt = 0 + 1 Xt + ut.
If 1 was known, then it would be possible to generate the quasi-difference variables in a statistical
package and then estimate the coefficients using the transformed variables using OLS.
(b) In this case, nonlinear least squares has to be used to estimate the three parameters. One possible
feasible GLS estimator in this case is the Cochrane-Orcutt estimator. In the first step, 1 is set to zero, in
which case 0 and 1 can be estimated by OLS. The resulting residuals are then used to calculate the
OLS estimator for 1 . This, in return, can then generate the quasi-differenced variables and OLS is then
employed to get the estimate of 0 and 1 .
(c) The iterated Cochrane-Orcutt estimator continues the process described in (a). For example, in the
next step, a new set of residuals is used to update the previous estimate of 1 , which will generate a new
set of quasi-differenced variables and new estimates of 0 and 1 . The iterations stop when the
differences in the estimates from one round to the next differ by less than a very small number, which
can be chosen by the econometrician. This is then called convergence.
(d) Under the Hildreth-Lu method, the sum of squared residuals is computed for various values of 1 ,
using quasi-differenced variables. For example, initially a coarse grid is chosen of –0.9, -0.8, -0.7, …, 0.7,
0.8, 0.9. For the value of 1 which yields the smallest SSR, say 0.7, a new finer grid is chosen, such as
0.65, 0.66, 0.67, …, 0.73, 0.74, 0.75, and again the SSR is calculated for each of these values. The value of
1 which has the smallest SSR is retained and yet a finer grid around it is chosen, etc.
6) (Requires Appendix material) Your textbook states that in “the distributed lag regression model, the error term
ut can be correlated with its lagged values. This autocorrelation arises, because, in time series data, the omitted
factors that comprise ut can themselves be serially correlated.”
(a) Give an example what the authors have in mind.
(b) Consider the ADL model, where the X’s are strictly exogenous, and there is no autocorrelation (and/or
heteroskedasticity) in the error term.
Yt =
~
*
0 + 1 Xt + 2 Xt-1 + 3 Yt-1 + ut
How many coefficients are there to be estimated? Show that this model can be respecified using the lag
operator notation:
(L)Yt =
~
*
0 + 1 (L)Xt + ut
where, (L) = 1 – 3 L. What is (L) here?
(c) Assume heroically that 3 = 2 , i.e., that there is a “common factor” in the lag polynomials
1
Show that in this case the model becomes
Yt = 0 + 1 Xt + ut
Stock/Watson 2e -- CVC2 8/23/06 -- Page 373
(L) and (L)
*
0
1 ~
where 0 =
and ut =
u.
1- 3
1- 3 L t
(d) Explain why autocorrelation in this model can be seen as a “simplification,” not a “nuisance.” Can you use
the F-test to test the above hypothesis? Why or why not?
Answer: (a) Taking the textbook example of the percentage change in the real price of orange juice and the
number of freezing degree days, the error term potentially contains other variables such as change in
tastes of the population, the price of substitutes, income, etc. Some of these variables may be hard to
measure, but all of these are bound to change slowly over time and are not likely to be correlated with
the weather variable.
*
(b) (1 - 3 L)Yt =
+ 1 (1+
0
2
1
~
2
L) Xt + ut, so (L) = (1+
1
L)
(c) Dividing both sides by 1 - 3 L results in the above equation after cancellation.
(d) There is one parameter less to estimate. The restriction is non -linear, so the F-test does not apply
here.
7) It has been argued that Canada’s aggregate output growth and unemployment rates are very sensitive to
United States economic fluctuations, while the opposite is not true.
(a) A researcher uses a distributed lag model to estimate dynamic causal effects of U.S. economic activity on
Canada. The results (HAC standard errors in parenthesis) for the sample period 1961:I -1995:IV are:
urcant = -1.42 + 0.717 × urus t + 0.262 × urust-1 + 0.023 × urus t-2 - 0.083 × urust-3
(0.83) (0.457)
(0.557)
(0.398)
(0.405)
- 0.726 × urus t-4 + 1.267 × urus t-5 ; R2 = 0.672, SER = 1.444
(0.504)
(0.385)
where urcan is the Canadian unemployment rate, and urus is the United States unemployment rate.
Calculate the long-run cumulative dynamic multiplier.
(b) What are some of the omitted variables that could cause autocorrelation in the error terms? Are these
omitted variables likely to uncorrelated with current and lagged values of the U.S. unemployment rate? Do
you think that the U.S. unemployment rate is exogenous in this distributed lag regression?
Answer: (a) The long-run cumulative dynamic multiplier is 1.460.
(b) Autocorrelation in the error term is the result of omitted variables which are serially correlated.
Canadian unemployment rates depend on Canadian labor market conditions and most likely on
Canadian aggregate demand variables in the short run. Prime candidates for slowly changing omitted
variables would be demographics, indicators of unemployment insurance generosity, changes in the
terms of trade, monetary policy indicators such as the real interest rate, etc. Some of these variables are
highly likely to be correlated with U.S. unemployment rates since demographics are similar between the
two countries and Canadian monetary policy often follows moves made by the Federal Reserve. A case
could be made that the U.S. unemployment rate is exogenous as a result of the relative size of the two
economies. However, due to the size of the trade between the two countries, this is not as easy to
support as if the dependent variable were the unemployment rate in Costa Rica, say.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 374
e
8) Consider the following model Yt = 0 + X t + ut where the superscript “e” indicates expected values. This may
represent an example where consumption depends on expected, or “permanent,” income. Furthermore, let
expected income be formed as follows:
e
e
e
X t = X t-1 + (Xt - X t-1 ); 0 <
<1
(a) In the above expectation formation hypothesis, expectations are formed at the end of the period, say the 31 st
of December, if you had annual data. Give an intuitive explanation for this process.
(b) Rewrite the expectations equation in the following form:
e
e
X t = (1 – ) X t-1 + X t
e
Next, following the method used in your textbook, lag both sides of the equation and replace X t-1 . Repeat
e
e
this process by repeatedly substituting expression for X t-2 , X t-3 , and so forth. Show that this results in the
following equation:
e
e
X t = X t + (1- ) Xt-1 + (1- )2 Xt-2 + ... + (1- )n Xt-n + (1 – )n+1 X t+1
Explain why it is reasonable to drop the last right hand side term as n becomes large.
e
(c) Substitute the above expression into the original model that related Y to X t . Although you now have right
hand side variables that are all observable, what do you perceive as a potential problem here if you wanted to
estimate this distributed lag model without further restrictions?
(d) Lag both sides of the equation, multiply through by (1- ), and subtract this equation from the equation
found in (c). This is called a “Koyck transformation.” What does the resulting equation look like? What is the
error process? What is the impact effect (zero-period dynamic multiplier) of a unit change in X, and how does
it differ from long run cumulative dynamic multiplier?
e
Answer: (a) If the forecast error for the previous period, (Xt - X t-1 ) was zero, then expectations are not changed
for the next period. If there was a non-zero forecast error, then expectations are changed by a fraction of
that forecast error.
e
e
e
e
e
(b) Substitution of X t-1 = (1 - ) X t-2 + X t-1 into X t = (1- ) X t-1 + X t results in X t = (1- )2
e
e
e
e
X t-2 + X t + (1- ) X t-1 . The process is then repeated for X t-2 , which gives X t = (1- )3 X t-3 +
X t + (1- ) X t-1 + (1- )2 X t-2 and so on. The last term involving the unobservable expectation can be
dropped for large n since 0 <
< 1.
e
(c) Yt = 0 + 1 X t + ut=
0 + 1 X t + 1 (1- )Xt- 1 + 1 (1- )2 Xt-2 + ... + 1 (1- )n Xt-n + ut.
For large n, this would require estimation of a large number of coefficients, potentially more than there
are observations available on lags of X.
(d) The Koyck transformation works as follows
Stock/Watson 2e -- CVC2 8/23/06 -- Page 375
Yt = 0 + 1 Xt + 1 (1- )Xt-1 + 1 (1- )2 Xt-2 + ... 1 (1- )n Xt-n + ut
-[(1- )Yt-1 = (1- ) 0 + 1 (1- )Xt-1 + 1 (1- )2 Xt-2 + ... + 1 (1- )n Xt-n + 1 (1- )n+1 Xt-n-1 +
(1- )ut-1 ]
which, after canceling terms results in
Yt = 0 + 1 X t + (1- )Yt-1 + ut - (1- )ut-1
where 1 (1- )n+1 Xt-n-1 has been dropped using the same argument as above. Note that there the
error process is now a moving average. The impact effect is 1 , which is smaller than the long-run
cumulative dynamic multiplier 1 , since 0 <
< 1.
9) The distributed lag regression model requires estimation of (r+1) coefficients in the case of a single explanatory
variable. In your textbook example of orange juice prices and cold weather, r = 18. With additional explanatory
variables, this number becomes even larger.
Consider the distributed lag regression model with a single regressor
Yt = 0 + 1 Xt + 2 Xt-1 + 3 Xt-2 + ... + r+1 Xt-r + ut
(a) Early econometric analysis of distributed lag regression models was interested in reducing the number of
parameters by approximating the coefficients by a polynomial of a suitable degree, i.e., i+1 f(i) for i = 0, 1, …,
r. Let f(i) be a third degree polynomial, with coefficients 0 , ...., 3 . Specify the equations for 1 , 2 , 3 , 4 , and
r+1 .
(b) Substitute these equations into the original distributed lag regression, and rearrange terms so that Y appears
as a linear function of 0 , 0 , 1 , 2 , 3 and a transformation of the Xt, Xt-1 , Xt-2 , ..., Xt-r
(c) Assume that the third-degree polynomial approximation is quite accurate. Then what is the advantage of
this polynomial lag technique?
Answer: (a) For a third degree polynomial, f(i) = 0 + 1 i + 2 i2 + 3 i3 . Then
1 = f(0) = 0
2 = f(1) = 0 + 1 + 2 + 3
3 = f(2) = 0 + 2 1 + 4 2 + 8 3
4 = f(3) = 0 + 3 1 + 9 2 + 27 3
...
r+1 = f(r) = 0 + r 1 + r2 2 + r3 3
(b) Substitution into the original distributed lag regression yields
Yt = 0 + 0 Xt + ( 0 + 1 + 2 + 3 )Xt-1 + ( 0 + 2 1 + 4 2 + 8 3 )Xt-2
+ ... + ( 0 + r 1 + r2 2 + r3 3 )Xt-r
and collecting terms in the coefficients results in
Yt = 0 + 0 (Xt + Xt-1 + Xt-2 + ... + Xt-r) + 1 (Xt-1 + 2Xt-2 + ... + rXt-r)
+ 2 (Xt-1 + 4Xt-2 + ... + r2 Xt-r) + 3 (Xt-1 + 8Xt-2 + ... + r3 Xt-r).
(c) By placing restrictions on the lag distribution and transforming the regressors, there are fewer
parameters to estimate, in this case five.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 376
10) The distributed lag model relating orange juice prices to the Orlando weather reported in the text was of the
form
%ChgPt = 0 + 1 FDD t + 2 FDD t-1 + 3 FDD t-2 + ... +
19FDD t–18 + ut
(a) Suppose that an agricultural economist tells you that a freeze in December is more harmful than a freeze in
the other months. How would you modify the regression to incorporate this effect? How would you test for
this December effect?
(b) The same economist tells you that the damage caused by freezes is not well captured by the FDD variable.
She says that a single day temperature with a temperature of 24° is more damaging than 8 days with a
temperature of 31°. How would you modify the regression to incorporate this effect?
Answer: (a) A binary variable can be added to the list of regressors, which takes on the value of one in December
and is zero otherwise. A t-statistic can be computed for the coefficient of the December binary variable,
using HAC standard errors. The t-statistic has a standard normal distribution.
(b) An additional regressor (TempFreeze) can be introduced, either by itself or interacted with FDD. To
capture the postulated effect, it might be specified as follows:
TempFreezet = DFreezet ×
FDD
(Tempt - 32°)2
i=1
where DFreeze is a binary variable that takes on the value of one for a month with freezing temperature,
Temp is the minimum temperature for any monthly freezing degree day.
11) (Requires some calculus) In the following, assume that Xt is strictly exogenous and that economic theory
suggests that, in equilibrium, the following relationship holds between Y* and Xt, where the “*” indicates
equilibrium.
Y* = kXt
An error term could be added here by assuming that even in equilibrium, random variations from strict
proportionality might occur. Next let there be adjustment costs when changing Y, e.g. costs associated with
changes in employment for firms. As a result, an entity might be faced with two types of costs: being out of
equilibrium and the adjustment cost. Assume that these costs can be modeled by the following quadratic loss
function:
L=
1 (Y t
— Y* )2 +
1 (Y t
— Yt-1 )2
a.
Minimize the loss function w.r.t. the only variable that is under the entity’s control, Yt and solve for Yt.
b.
Note that the two weights on Y* and Yt-1 add up to one. To simplify notation, let the first weight be
and the second weight (1- ). Substitute the original expression for Y* into this equation. In terms of the
ADL(p,q) terminology, what are the values for p and q in this model?
Answer: a. Yt =
b. Yt =
1
Y* +
+
1 2
1
Yt-1
+
1 2
Y* + (1- ) Yt-1 = k X t + (1- ) Yt-1 = 1 Yt-1 + 1 Xt
Stock/Watson 2e -- CVC2 8/23/06 -- Page 377
12) Your textbook estimates the initial relationship between the percentage change of real frozen OJ and the
freezing degree days as follows:
%ChgPt = -0.40 + 0.47 FDD t
(0.22) (0.13)
t = 1950:1 — 2000:12, R2 = 0.09, SER = 4.8
a.
Calculate the t-statistic for the slope coefficient. Can you reject the null hypothesis that the coefficient
is zero in the population?
b.
The above regression was estimated using HAC standard errors. When you re -estimate the regression
using homoskedasticity-only standard errors, the standard error of the slope coefficient drops to 0.06.
Calculate the t-statistic for the slope coefficient again. Which of the two standard errors should you
use for statistical inference?
Answer: a. The t-statistic is 3.62. Hence you can reject the null hypothesis at any reasonable level of significance.
b. The t-statistic has now increased to 7.94. In the presence of either heteroskedasticity and/or
autocorrelation in the errors, OLS estimation of the regression coefficients is still consistent. However,
the homoskedasticity-only or heteroskedasticity-robust standard errors are inconsistent and use of
these in the presence of serial correlation results in misleading statistical inference. For example,
confidence intervals do not contain the true value in the postulated number of times in repeated
samples. The solution is to adjust the estimator for the standard errors by incorporating sample
autocorrelation estimates. This results in the heteroskedasticity- and autocorrelation-consistent (HAC)
estimator of the variance of the estimator. For this estimator to be consistent, a certain truncation
parameter is introduced, so that not all T-1 sample autocorrelations are used. Incorporating this idea
into the HAC formula results in the Newey-West variance estimator.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 378
13) You are hired to forecast the unemployment rate in a geographical area that is peripheral to a large
metropolitan area in the United States. The area in question is called the Inland Empire (San Bernardino
County and Riverside County) and is situated east of Greater Los Angeles (Los Angeles County and Orange
County). While the area has a large population (it is the 14 th largest metropolitan statistical area in the United
States), its economic activity relies heavily on that of the larger area it is attached to. For example, it is estimated
that approximately 20% of its workforce commutes into the Greater Los Angeles area for work and few
workers commute the other way. Furthermore, its logistics industry is heavily dependent on economic activity
in the Greater Los Angeles Area. As a result, you view the unemployment rate of the Greater Los Angeles Area
(urGLA) to be exogenous in determining the unemployment rate in the Inland Empire (urIE ). You estimate the
following distributed lag model, where numbers in parenthesis are HAC standard errors:
IE
ur t = 0.00002 + 0.74
GLA
ur t
- 0.04
GLA
ur t-1 - 0.01
GLA
ur t-2 + 0.07
(0.06)
(0.06)
(0.06)
(0.00010) (0.06)
+ 0.09
(0.05)
GLA
ur t-5 + 0.10
GLA
ur t-3 + 0.05
GLA
ur t-4
(0.06)
GLA
ur t-6
(0.06)
t = 1991:01-2009:12, R2 = 0.60, SER = 0.001
a.
What is the impact effect of a one percentage point increase (say from 0.06 to 0.07) of the
unemployment rate in the Greater Los Angeles area?
b.
What is the long-run cumulative dynamic multiplier?
c.
Why do you think the variables above appear in changes rather than in levels?
Answer: a. The unemployment rate in the Inland Empire will increase by 0.0074, or roughly three -quarters of a
percentage point.
b. The unemployment rate in the Inland Empire will increase by roughly one percentage points in the
long-run.
c. The implication must be that the unemployment rates are not stationary over the sample period.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 379
14) There is some economic research which suggests that oil prices play a central role in causing recessions in
developed countries. Some of this work suggests that it is only oil price increases that matter and even more
specifically, that it is the percentage point difference between oil prices at date t and the maximum value over
the previous year. Realizing that energy prices in general can fluctuate quite dramatically in both directions
and that geographic areas also benefit substantially from oil price decreases, you decide to estimate the
following distributed lag model using annual data (numbers in parenthesis are HAC standard errors):
^
Yt = 3.39 - 0.009 (Poil/CPI)t - 0.028 (Poil/CPI)t-1
(0.27) (0.010)
(0.011)
t = 1960-2008, R2 = 0.15, SER = 1.88
a.
What is the impact effect of a 25 percent increase in real oil prices?
b.
What is the predicted cumulative change in GDP Growth over two years of this effect?
c.
The HAC F-statistic is 4.07. Can you reject the null hypothesis that oil price changes have no effect on
real GDP growth? What is the critical value you considered? Is there any reason why you should be
cautious using an F-test in this case, given the sample period?
Answer: a. GDP growth would decrease by almost a quarter of a percentage point.
b. The predicted decline in growth would be almost one percentage point ( -0.925).
c. The critical value of F2, = 3.00 at the 5% significance level. Hence you can reject the null hypothesis
that oil prices have no effect on real GDP growth. However, since the sample period involves only 50 or
so observations, it is not clear that the test statistic is actually F-distributes (small sample).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 380
Chapter 16 Additional Topics in Time Series Regression
16.1 Multiple Choice
1) A vector autoregression
A) is the ADL model with an AR process in the error term.
B) is the same as a univariate autoregression.
C) is a set of k time series regressions, in which the regressors are lagged values of all k series.
D) involves errors that are autocorrelated but can be written in vector format.
Answer: C
2) A multiperiod regression forecast h periods into the future based on an AR(p) is computed
A) the same way as the iterated AR forecast.
B) by estimating the multiperiod regression Yt = 0 + 1 Yt-h + ... + p Yt-p-h+1 + ut, then using the estimated
coefficients to compute the forecast h periods in advance.
C) by estimating the multiperiod regression Yt = 0 + 1 Yt-h + ut , then using the estimate coefficients to
compute the forecast h period in advance.
D) by first computing the one-period ahead forecast, next using that to compute the two-period ahead
forecast, and so forth.
Answer: B
3) Multiperiod forecasting with multiple predictors
A) is the same as the iterated AR forecast method.
B) can use the iterated VAR forecast method.
C) will yield superior results when using the multiperiod regression forecast h periods into the future based
on p lags of each Yt , rather than the iterated VAR forecast method.
D) will always yield superior results using the iterated VAR since it takes all equations into account.
Answer: B
4) If Yt is I(2), then
A)
2 Yt is stationary.
B) Yt has a unit autoregressive root.
Yt is stationary.
D) Yt is stationary.
C)
Answer: A
5) The following is not a consequence of Xt and Yt being cointegrated:
A) if Xt and Yt are both I(1), then for some , Yt – X t is I(0).
B) Xt and Yt have the same stochastic trend.
C) in the expression Yt – Xt , is called the cointegrating coefficient.
D) if Xt and Yt are cointegrated then integrating one of the variables gives you the same result as integrating
the other.
Answer: D
6) One advantage of forecasts based on a VAR rather than separately forecasting the variables involved is
A) that VAR forecasts are easier to calculate.
B) you typically have knowledge of future values of at least one of the variables involved.
C) it can help to make the forecasts mutually consistent.
D) that VAR involves panel data.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 381
7) The coefficients of the VAR are estimated by
A) using a simultaneous estimation method such as TSLS.
B) maximum likelihood.
C) panel methods.
D) estimating each of the equations by OLS.
Answer: D
8) Under the VAR assumptions, the OLS estimators are
A) consistent and have a joint normal distribution even in small samples.
B) BLUE.
C) consistent and have a joint normal distribution in large samples.
D) unbiased.
Answer: C
9) A VAR allows you to test joint hypothesis that involve restrictions across multiple equations by
A) computing a z-statistic.
B) computing the BIC but not the AIC.
C) using a stability test.
D) computing an F-statistic.
Answer: D
10) A VAR with five variables, 4 lags and constant terms for each equation will have a total of
A) 21 coefficients.
B) 100 coefficients.
C) 105 coefficients.
D) 84 coefficients.
Answer: C
11) You can determine the lag lengths in a VAR
A) by using confidence intervals.
B) by using critical values from the standard normal table.
C) by using either F-tests or information criteria.
D) with the help from economic theory and institutional knowledge.
Answer: C
12) The biggest conceptual difference between using VARs for forecasting and using them for structural modeling
is that
A) you need to use the Granger causality test for structural modeling.
B) structural modeling requires very specific assumptions derived from economic theory and institutional
knowledge of what is exogenous and what is not.
C) you can no longer use the information criteria to decide on the lag length.
D) structural modeling only allows a maximum of three equations in the VAR.
Answer: B
13) The error term in a multiperiod regression
A) is serially correlated.
B) causes OLS to be inconsistent.
C) is serially correlated, but less so the longer the forecast horizon.
D) is serially uncorrelated.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 382
14)
2 Yt
A) =
Yt -
Yt-1 .
2
2
B) = Y t - Y t-1 .
C) =
Yt -
Yt-2 .
D) = Yt - Yt-2 .
Answer: A
15) The order of integration
A) can never be zero.
B) is the number of times that the series needs to be differenced for it to be stationary.
C) is the value of 1 in the quasi difference( Yt - 1 Yt-1 ).
D) depends on the number of lags in the VAR specification.
Answer: B
16) To test the null hypothesis of a unit root, the ADF test
A) has higher power than the so-called DF-GLS test.
B) uses complicated interative techniques.
C) cannot be calculated if the variable is integrated of order two or higher.
D) uses a t-statistic and a special critical value.
Answer: D
17) Unit root tests
A) use the standard normal distribution since they are based on the t-statistic.
B) cannot use the standard normal distribution for statistical inference. As a result the ADF statistic has its
own special table of critical values.
C) can use the standard normal distribution only when testing that the level variable is stationary, but not
the difference variable.
D) can use the standard normal distribution but only if HAC standard errors were computed.
Answer: B
18) In a VECM,
A) past values of Yt -
X t help to predict future values of Yt and/or Xt.
B) errors are corrected for serial correlation using the Cochrane-Orcutt method.
C) current values of Yt - Xt help to predict future values of Yt and/or Xt.
D) VAR techniques, such as information criteria, no longer apply.
Answer: A
19) The following is not an appropriate way to tell whether two variables are cointegrated:
A) see if the two variables are integrated of the same order.
B) graph the series and see whether they appear to have a common stochastic trend.
C) perform statistical tests for cointegration.
D) use expert knowledge and economic theory.
Answer: A
20) If Xt and Yt are cointegrated, then the OLS estimator of the coefficient in the cointegrating regression is
A) BLUE.
B) unbiased when using HAC standard errors.
C) unbiased even in small samples.
D) consistent.
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 383
21) Assume that you have used the OLS estimator in the cointegrating regression and test the residual for a unit
root using an ADF test. The resulting ADF test statistic has a
A) normal distribution in large samples.
B) non-normal distribution which requires ADF critical values for inference.
C) non-normal distribution which requires EG-ADF critical values for inference.
D) normal distribution when HAC standard errors are used.
Answer: C
22) The DOLS estimator has the following property if Xt and Yt are cointegrated:
A) it is BLUE even in small samples.
B) it is efficient in large samples.
C) it has a standard normal distribution when homoskedasticity-only standard errors are used.
D) it has a non-normal distribution in large samples when HAC standard errors are used.
Answer: B
23) Volatility clustering
A) is evident in most cross-sections.
B) implies that a series is serially correlated.
C) can mostly be found in studies of the labor market.
D) is evident in many financial time series.
Answer: D
24) Using the ADL(1,1) regression Yt = 0 + 1 Yt-1 + 1 Xt-1 + ut, the ARCH model for the regression error
assumes that ut is normally distributed with mean zero and variance
A)
2
2
2
2
t = 0 + 1 u t-1 + 2 u t-2 + ... + p u t-p .
B)
2
2
2
t = u t-1 + ... + u t-p + 1
C)
2
t= 1
D)
2
2
2
t = 0 + 1 u t-1 + ... + p u t-p + 1
2
t , where
2
2
t-1 + ... + q t-q .
2
2
t-1 + ... + q t-q .
2
t-1 + ... + q
2
t-q .
Answer: A
25) ARCH and GARCH models are estimated using the
A) OLS estimation method.
B) the method of maximum likelihood.
C) DOLS estimation method.
D) VAR specification.
Answer: B
26) A VAR with k time series variables consists of
A) k equations, one for each of the variables, where the regressors in all equations are lagged values of all the
variables
B) a single equation, where the regressors are lagged values of all the variables
C) k equations, one for each of the variables, where the regressors in all equations are never more than one
lag of all the variables
D) k equations, one for each of the variables, where the regressors in all equations are current values of all the
variables
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 384
27) The BIC for the VAR is
^
2
A) BIC(p) = ln[det ( u)] + k(kp+1)
T
^
B) BIC(p) = ln[det ( u)] + k(p+1)
^
C) BIC(p) = ln[det ( u)] + k(kp+1)
D) BIC(p) = ln[SSR(p)] + k(p+1)
ln(T)
T
ln(T)
T
ln(T)
T
Answer: C
28) The lag length in a VAR using the BIC proceeds as follows: Among a set of candidate values of p, the estimated
lag length xxx is the value of p
A) For which the BIC exceeds the AIC
B) That maximizes BIC(p)
C) Cannot be determined here since a VAR is a system of equations, not a single one
D) That minimizes BIC(p)
Answer: D
29) The dynamic OLS (DOLS) estimator of the cointegrating coefficient, if Yt and Xt are cointegrated,
A) is efficient in large samples
B) statistical inference about the cointegrating coefficient is valid
C) the t-statistic constructed using the DOLS estimator with HAC standard errors has a standard normal
distribution in large samples
D) all of the above
Answer: D
30) The EG-ADF test
A) is the similar to the DF-GLS test
B) is a test for cointegration
C) has as a limitation that it can only test if two variables, but not more than two, are cointegrated
D) uses the ADF in the second step of its procedure
Answer: B
16.2 Essays and Longer Questions
1) “Heteroskedasticity typically occurs in cross-sections, while serial correlation is typically observed in
time-series data.” Discuss and critically evaluate this statement.
Answer: Serial correlation in cross-sections can occur by chance if the data is ordered using one of the regressors.
While it is easy to get rid of serial correlation in this case by simply “reshuffling” the data, the serial
correlation contains some information, such as a possible misspecification of functional form.
Serial correlation does occur typically in time-series data, but as the textbook emphasized, conditional
heteroskedasticity “shows up in many economic time series.” The ARCH and GARCH models are often
used when volatility clustering is present in financial time series, including the inflation rate. Hence this
special type of heteroskedasticity is observed in time-series data.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 385
2) Some macroeconomic theories suggest that there is a short-run relationship between the inflation rate and the
unemployment rate. How would you go about forecasting these two variables? Suggest various alternatives
and discuss their advantages and disadvantages.
Answer: There are various methods available for forecasting the inflation rate and the unemployment rate. One
basic distinction is whether or not the two variables are forecasted separately, or jointly as a system of
two equations. Another distinction involves one period ahead forecasts vs. multiperiod forecasts.
Finally, if multiperiod forecasts are used, then there is a multiperiod forecasting regression method vs.
an interated forecast method.
Univariate Regression Methods: Here either the change in the inflation rate or the unemployment rate is
modeled as an AR(p) and estimated by OLS. Observed values for the regressors are then substituted to
produce a one period ahead forecast. (The one period ahead forecast for the inflation rate can then also
be derived.) Statistical methods, such as the BIC or AIC can be used for choosing the number of lags.
There are two important properties of the forecasts: the best forecast of either the change of the inflation
rate or the unemployment rate depends only on the most recent p past values, and the errors are serially
uncorrelated. These follow from the OLS assumptions. The multiperiod regression method for making
an h-period ahead forecast of the change in inflation or unemployment rate using the AR( p) involves
regressing these variables on its p lags, starting from (t-h), i.e., Yt = 0 + 1 Yt-h + . . . + p Yt-p-h+1 + ut.
Since the error term is serially correlated for the multiperiod regression, HAC standard errors must be
used to have a reliable basis for inference. The iterated AR forecast method for the AR( p) is achieved by
forecasting one period ahead initially, then using the forecasted value for the two period ahead forecast,
^
^
^ ^
^
^
and so on. More formally, the two-period ahead forecast is Yt t-2 = 0 + 1 Yt-1 t-2 + 2 Yt-2 + 3 Yt-3
^
^
+ ... + p Yt-p , while the three-period ahead forecast is Yt t-3 =
^
^
^ ^
^
^
0 + 1 Yt-1 t-3 + 2 Yt-2 t-3 + 3 Yt-3
+ ... + p Yt-p , etc.
Multiple Predictors: If economic theory suggests that other variables could help forecast either the change
in the inflation rate or the unemployment rate, then lags of these variables can be included. The
Granger-causality test can be used to determine whether or not these additional variables belong in the
regression. The same methods that were used for the AR(p) model can be applied for the ADL(p,q)
model. For example, in the multiperiod forecasting using multivariate forecasts, all regressors must be
lagged h periods to produce the h-period ahead forecast. To forecast both the change in the inflation and
unemployment rate, regressions for each of the two dependent variables have to be estimated first, i.e.,
for both variables the following regression is estimated by OLS: Yt = 0 + 1 Yt-h + ... + p Yt-p-h+1 +
p+1 Xt-h + ... + 2 pXt-p-h+1 + ut. Then the estimated coefficients are used to make the h -period ahead
forecast. The interated forecast method now involves making one-period ahead forecasts using the
estimated VAR specification, and using these forecasted values for both variables in subsequent
forecasts. The two period ahead forecast, for example, for variables would be calculated as follows:
^
^
^
^
^
^
^
Yt t-2 = 10 + 11 Yt-1 t-2 + 12 Yt-2 + 13Yt-3 + ...+ 1p Yt-p
^
^
^
^
^
+ 11 Xt-1 t-2 + 12 Xt-2 + 13Xt-3 + ... + 1p Xt-p .
The decision on which method to use depends on the quality of the specification. If the AR( p) or the
VAR is a good approximation to the underlying relationship, then the iterated forecast method is better.
Note that if multiple predictors are involved, the ADL is not an alternative, since the additional
predictors have to be forecasted themselves. However, even if one of the VAR equations is not a good
representation of the underlying process, then the multiperiod regression forecasts are more accurate on
average. Since the difference between the two methods is typically small, the textbook suggests to use
the one “which is most conveniently implemented in your software.”
Stock/Watson 2e -- CVC2 8/23/06 -- Page 386
3) Think of at least five examples from economics where theory suggests that the variables involved are
cointegrated. For one of these cases, explain how you would test for cointegration between the variables
involved and how you could use this information to improve forecasting.
Answer: Answers will vary by student, but given the textbook example of the three -month and one-year interest
rates, you can expect students to list it. Consumption and income, real money balances, income and the
interest rate (or income velocity and the interest rate), purchasing power parity, inflation rates across
countries, are prime candidates.
I will use the example of real consumption and income to explain how to test for cointegration and how
to potentially incorporate the information into forecasting. Both (the log of) consumption and income
should be plotted over time to check whether they give the appearance of having a common stochastic
trend. Furthermore, economic theory suggests that they are proportional to each other, although the
factor of proportionality may depend on other variables. Under the null hypothesis, Ct - Yt has a unit
root, where C is the log of consumption and Y is the log of disposable income. If was known, then the
DF or DF–GLS unit root tests could be employed here, but since it is not, the cointegrating coefficient has
to be estimated first by OLS, which is consistent if consumption and disposable income are cointegrated.
The resulting residuals from the regression Ct = + Yt + zt are then subjected to a DF t-test with an
intercept and no time trend. The t-statistic is compared to the critical values for the EG–ADF, and if they
exceed these, then the null hypothesis is reject in favor of consumption and disposable income being
cointegrated.
^
The lag of the estimated error correction term (Ct - Yt )can then be used as an additional regressor in a
VAR specification to predict both the growth rate of real consumption and the growth rate of real
disposable income. This specification is known as the vector error correction model (VECM).
4) What role does the concept of cointegration and the order of integration play in modeling the relationship
between variables? Explain how tests of cointegration work.
Answer: Cointegration between two or more variables is a regression analysis concept to potentially reveal
long-run relationships among time series variables. Variables are said to be cointegrated if the have the
same stochastic trend in common. Most economic time series are I(1) variables, which means that they
have a unit autoregressive root and that the first difference in that variable is stationary. Since these
variables are often measured in logs, their first difference approximates growth rates. Cointegration
requires a common stochastic trend. Therefore, variables which are tested for cointegration must have
the same order of integration.
The concept of cointegration is also an effort to bring back long-run relationships between variables into
short-run forecasting techniques, such as VARs. Adding the error correction term from the cointegrating
relationship to the VARs results in the vector error correction model. Here all variables are stationary,
either because they have been differenced or because the common stochastic trend has been removed.
VECMs therefore combine short-run and long-run information. One way to think about the role of the
error correction term is that it provides an “anchor” which pulls the modeled relationships eventually
back to their long-run behavior, even if it is disturbed by shocks in the short-run.
Cointegration also represents the return of the static regression model, i.e., regressions where no lags or
used. To test for cointegration using the EG-ADF test requires estimating a static regression between the
potentially cointegrated variables by OLS first, and then to conduct an ADF test on the residuals from
this regression. If the residuals do not have a unit root, then the variables are said to be cointegrated.
Since this is a two step procedure, critical values for the ADF t -statistic are adjusted and are referred to
the critical values for the EG-ADF statistic. Although the OLS estimator is consistent, it has a nonnormal
distribution and hence inference should not be conducted based on the t-statistic, even if HAC standard
errors are used. Alternative techniques to circumvent this problem, such as the DOLS estimator, which is
consistent and efficient in large samples, have been developed. The DOLS and another frequently used
technique, called the Johansen method, can be easily extended to multiple cointegrating relationships.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 387
5) Carefully explain the difference between forecasting variables separately versus forecasting a vector of time
series variables. Mention how you choose optimal lag lengths in each case. Part of your essay should deal with
multiperiod forecasts and different methods that can be used in that situation. Finally address the difference
between VARS and VECM.
Answer: t-When variables are forecasted separately, then single equations of the AR( p) type are typically
involved. If economic theory and/or institutional knowledge suggest that additional predictors should
be included, then forecasts can be potentially improved by estimating an ADL(p,q) model. For one
period ahead forecasts, these are identical to forecasts based on systems of equations. Lag lengths will be
chosen using the BIC or the AIC criterium.
There are three important reasons why VARs may be preferable for forecasting. One results from the
forecasting horizon. If forecasts are to be made two or more periods ahead, then if future values of the
additional predictors are to be used, these have to be forecasted themselves. This can be avoided by
choosing the multiperiod regression method. Here, in the case of an h period forecast, multiperiod
regressions are estimated where all predictors are lagged h periods or more. Second, using VAR
forecasting methods will make the forecasts for the variables involved mutually consistent. This is the
result of using the iterated VAR forecasts whereby the forecasted values are subsequently used to
forecast further ahead. Finally VAR models allow for restrictions across equations to be tested.
Multiperiod regression methods in general may be preferable over iterated forecasts if the AR(p),
ADL(p,q) or VAR models are incorrectly specified. In practice, the difference in forecasts tends to be very
small between the multiperiod regression and iterated forecast methods.
VAR models can be enhanced by incorporating long -run information in the form of error correction
terms. If some of the variables in the VAR model have a common stochastic trend, then this can be used
to improve the forecasts by including the error correction term, thereby turning the VAR model into a
VECM.
6) You have collected quarterly data for the unemployment rate ( Unemp) in the United States, using a sample
period from 1962:I (first quarter) to 2009:IV (the data is collected at a monthly frequency, but you have taken
quarterly averages).
a.
Does economic theory suggest that the unemployment rate should be stationary?
b.
Testing the unemployment rate for stationarity, you run the following regression (where the lag length
was determined using the BIC; using the AIC instead does not change the outcome of the test, even
though it chooses 9 lags of the LHS variable):
Unempt = 0.217 - 0.035 Unempt-1 + 0.689 Unempt-1
(0.01) 0.0012)
(0.054)
Use the ADF statistic with an intercept only to test for stationarity. What is your decision?
c.
The standard errors reported above were homoskedasticity -only standard errors. Do you think you
could potentially improve on inference by allowing for HAC standard errors?
d.
An alternative test for a unit root, the DF-GLS, produces a test statistic of -2.75. Find the critical value
and decide whether or not to reject the null hypothesis. If the decision is different from (c), is there any
reason why you might prefer the DF-GLS test over the ADF test?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 388
Answer: a. In macroeconomics or labor economics, you have learned about the natural rate of unemployment, or
the Non-Accelerating Inflation Rate of Unemployment (NAIRU). The idea here is that unemployment
rates may deviate from this equilibrium unemployment rate, but that, following a shock, the
unemployment rate will revert towards this equilibrium. Hence you might expect the difference between
the unemployment rate and the NAIRU, referred by some as the cyclical unemployment rate, to be
stationary. Unfortunately the equilibrium unemployment rate is not a constant over time and may be
affected by demographics, the price of search (unemployment insurance benefits), and other variables. If
the NAIRU is not a constant over time, then the unemployment rate itself may not be stationary.
Furthermore, there is also the idea of hysteresis, which allows for the unemployment rate to move to a
new equilibrium rate once a shock hits the economy. The bottom line is that while there is some
guidance from economic theory, it is an empirical question whether or not the unemployment rate is
stationary.
b. The t-statistic for the ADF test is -2.84. The critical value at the 5% level is -2.86. Hence you can reject
the null hypothesis of a unit root for the unemployment rate at the 10% level, but (just) fail to reject the
null hypothesis at the 5% level. Most economist treat the unemployment rate as stationary.
c. The ADF statistic is computed using non-robust standard errors. It turns out that under the null
hypothesis of a unit root, the homoskedasticty-only standard errors generate a t-statistic that is robust to
heteroskedasticity.
d. The critical value for the DF-GLS test is -2.58 at the 1% level. Hence you can reject the null hypothesis
of a unit root using this test. The DF-GLS has a higher power when compared to the ADF test, and
hence should be preferred.
16.3 Mathematical and Graphical Problems
1) Consider the GARCH(1,1) model
0
1- 1
2
2
+ 1 ( u t-1 + 1 u t-2 +
2
2
t = 0 + 1 u t-1 + 1
2u2 +
t-3
1
2
t-1 . Show that this model can be rewritten as
2
t =
3 u 2 + ...). (Hint: use the GARCH(1,1) model but specify it for
t-4
1
2
t-1 ; substitute this expression into the original specification, and so on.) Explain intuitively the meaning of
the resulting formulation.
Answer:
2
2
t = 0 + 1 u t-1 + 1
2
2
2
t-1 = 0 + 1 u t-1 + 1 ( 0 + 1 u t-2 + 1
2
2
= 0 (1 + 1 ) + 1 ( u t-1 + 1 u t-2 ) +
=
2
1
2
2
2
1 ) + 1 ( u t-1 + 1 u t-2 +
0 (1 + 1 +
2
t-2 )
2
t-2
2 2
1 u t-3 ) +
3
1
2
t-3 . Continuing with the
substitutions infinitely and noting that the sum of the geometric series is 1+ 1 +
you finally arrive at
2
0
2
2
+ 1 ( u t-2 + 1 u t-2 +
t = 11
2 2
1 u t-3 +
2
1 +
3
1
1 +... = 11
3 2
1 u t-4 + ...). This expression
states that the variances depend on a weighted average of past squared residuals, where the distant past
receives a smaller weight than more recently observed squared residuals.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 389
2) You have collected quarterly data on inflation and unemployment rates for Canada from 1961:III to 1995:IV to
estimate a VAR(4) model of the change in the rate of inflation and the unemployment rate. The results are
Inft = 1.02 – .54 Inft-1 – .46 Inft-2 – .32 Inft-2 – .01 Inft-4
(.09) (.09)
(.09)
(.08)
(.44)
-.76 Unempt-1 + .20 Unempt-2 – .16 Unempt-3 + .59 Unempt-4
(.43)
(.76)
(.76)
(.44)
R2 = .26.
Unempt = 0.18 – .003 Inft-1 – .016 Inft-2 – .018 Inft-3 – .010 Inft-4
(.10) (.016)
(.018)
(.017)
(.016)
+ 1.47 Unempt-1 – .46 Unempt-2 – .08 Unempt-3 + .05 Unempt-4
(.08)
(.14)
(.14)
(.08)
R2 = .980.
(a) Explain how you would use the above regressions to conduct one period ahead forecasts.
(b) Should you test for cointegration between the change in the inflation rate and the unemployment rate and,
in the case of finding cointegration here, respecify the above model as a VECM?
(c) The Granger causality test yields the following F-statistics: 3.75 for the test that the coefficients on lagged
unemployment rate in the change of inflation equation are all zero; and 0.36 for the test that the coefficients on
lagged changes in the inflation rate are all zero. Based on these results, does unemployment Granger–cause
inflation? Does inflation Granger-cause unemployment?
Answer: (a) One period ahead forecasts are the same as for the ADL(4,4) models of the inflation rate and
unemployment rate. For example, forecasting the change in the inflation rate for 1996:I requires use of
the actual values for unemployment and change in inflation rates through 1995:IV. The unemployment
rate for 1996:I is forecasted in the same way using the second regression.
(b) Most economic theories suggest that there is no long-run relationship between the inflation rate and
the unemployment rate, or, stated differently, that the long -run Phillips curve is vertical. Hence
economic theory does not suggest testing for cointegration or using the error correction term in a VECM
model.
(c) The critical value for the F4, statistic is 3.32 at the 1% significance level, and 1.94 at the 10%
significance level. Based on the calculated F-statistics above you can reject the null hypothesis that
lagged unemployment rates do not Granger-cause the inflation rate, but you cannot reject the null
hypothesis that lagged inflation does not Granger -cause the unemployment rate.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 390
3) Purchasing power parity (PPP), postulates that the exchange rate between two countries equals the ratio of the
Pf
respective price indexes or ExchRate =
(where ExchRate is the foreign exchange rate between the two
P
countries, and P represents the price index, with f indicating the foreign country). The long-run version of PPP
implies that that the exchange rate and the price ratio share a common trend.
(a) You collect monthly foreign exchange rate data from 1974:1 to 2002:4 for the U.S./U.K. exchange rate ($/£)
and you collect data on the Consumer Price Index for both countries. Explain how you would used the Engle
–Granger test statistic to investigate the long-run PPP hypothesis.
(b) One of your peers explains that there may be an easier way to test for the validity of PPP. She suggests to
simply test whether or not the “real” exchange rate, or competitiveness, is stationary. (The real exchange rate is
P
.) Is she correct? Explain. How would you implement her suggestion? Which
given by ExchRate ×
Pf
alternative test-statistic is available?
Answer: (a) Using the Engle-Granger two step procedure, the (log of) the exchange rate will be regressed on the
relative price ratio (log difference of the two prices). The residuals from this regression will then be
subjected to a Dickey-Fuller t-test with an intercept but no time trend. This is the EG-ADF procedure.
However, the OLS estimator of the coefficient in this regression is only consistent if the two variables are
cointegrated. Furthermore, inference can be misleading since the OLS estimator does not have a normal
distribution. If a test is performed on whether the coefficient of the price ratio is unity, then the DOLS
estimator should be used with HAC standard errors.
(b) If PPP holds, then the exchange rate and the relative price ratio will have a cointegrating coefficient of
= 1. First the real exchange rate should be plotted to inspect visually whether or not the two variables
are cointegrated. To test this more formally, the real exchange rate should be tested for containing a unit
root, using the ADF statistic. If the null hypothesis is rejected, then this would suggest that PPP holds in
the long-run. Since the ADF test is not the most powerful test, the DF-GLS test can be used as an
alternative.
4) You have collected quarterly Canadian data on the unemployment and the inflation rate from 1962:I to 2001:IV.
You want to re-estimate the ADL(3,1) formulation of the Phillips curve using a GARCH(1,1) specification. The
results are as follows:
Inft = 1.17 – .56 Inft-1 – .47 Inft-2 – .31 Inft-3 – .13 Unempt-1
(.48) (.08)
(.10)
(.09)
(.06)
^2
2
t = .86 + .27 u t-1 + .53
(.40) (.11)
2
t-1 .
(.15)
2
(a) Test the two coefficients for u t-1 and
2
t-1 in the GARCH model individually for statistical significance.
(b) Estimating the same equation by OLS results in
Inft = 1.19 – .51 Inft-1 – .47 Inft-2 – .28 Inft-3 – .16Unempt-1
(.54) (.10)
(.11)
(.08)
(.07)
Briefly compare the estimates. Which of the two methods do you prefer?
(c) Given your results from the test in (a), what can you say about the variance of the error terms in the Phillips
Curve for Canada?
(d) The following figure plots the residuals along with bands of plus or minus one predicted standard
deviation (that is, ± t) based on the GARCH(1,1) model.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 391
Describe what you see.
Answer: (a) The two t-statistics are 2.46 and 3.53 respectively. Since they are normally distributed in large
samples you can use the standard normal distribution for significance testing and the construction of
confidence intervals. The first is coefficient statistically significant at the 5% level, while the second is
statistically significant at the 1% level.
(b) These are two estimation methods, OLS and Maximum Likelihood. The GARCH(1,1) model
produces very similar estimates for the lagged inflation and unemployment rates. The difference stems
from the fact that the two GARCH coefficients are (significantly) different from zero. Since they are
statistically significant, GARCH is the preferred model since it does not constrain the coefficients to zero.
(c) The tests in (a) suggest that the errors are not homoskedastic but conditionally heteroskedastic.
(d) There is changing volatility in the residuals. The conditional standard deviation bands are relatively
tight in the ‘60s but the uncertainty about inflation forecasts increases steadily. There are periods of
widening bands in the early ‘80s and ’90s, and again at the end of the sample period. These follow
economic recessions.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 392
5) Consider the following model Yt = 0 + 1 Xt + 2 Xt-1 + 3 Yt-1 + ut, where Xt is strictly exogenous. Show that
3
by imposing the restriction
i = 1 , you can derive the following so-called Error Correction Mechanism
i=1
(ECM) model
Yt =
0 + 1 Xt – (Y – X)t-1 + ut
= 1 + 2 . What is the short-run (impact) response of a unit increase in X? What is the long-run
solution? Why do you think the term in parenthesis in the above expression is called ECM?
where
Answer: Starting with Yt = 0 + 1 Xt + 2 Xt-1 + 3 Yt-1 + ut, subtracting Yt-1 from both sides, and adding and
subtracting 1 Xt-1 on the right hand side, results in Yt = 0 + 1 Xt + ( 1 + 2 )Xt-1 - (1- 3 )Yt-1 + ut.
3
Note that
i = 1 implies 1 + 2 = 1- 3 . Since
=
1 + 2 , then
Yt =
0 + 1 Xt - ( Y - X)t-1 +
i=1
ut. The impact response is .
Yt
= 1 . The steady-state solution is Y =
Xt
0 + 1g2 - g1
+ X, where gY and
gX are the steady-state growth rates of Y and X respectively (assuming that the model is in logs). (Y-X)
represents the amount of disequilibrium in the previous period. The term is sometimes referred to as
“Equilibrium Correction Mechanism” rather than “Error Correction Mechanism.” If the relationship is in
equilibrium in the previous period, then there is no additional movement in Y other than from the
short-run response.
6) Your textbook states that there “are three ways to decide if two variables can plausibly be modeled as
cointegrated: use expert knowledge and economic theory, graph the series and see whether they appear to
have a common stochastic trend, and perform statistical tests for cointegration. All three ways should be used
in practice.” Accordingly you set out to check whether (the log of) consumption and (the log of) personal
disposable income are cointegrated. You collect data for the sample period 1962:I to 1995:IV and plot the two
variables.
(a) Using the first two methods to examine the series for cointegration, what do you think the likely answer is?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 393
(b) You begin your numerical analysis by testing for a stochastic trend in the variables, using an Augmented
Dickey-Fuller test. The t-statistic for the coefficient of interest is as follows:
Variable with
lag of 1
t-statistic
LnYpd
LnYpd
-1.93
-5.24
LnC
LnC
-2.20
-4.31
where LnYpd is (the log of) personal disposable income, and LnC is (the log of) real consumption. The estimated
equation included an intercept for the two growth rates, and, in addition, a deterministic trend for the level
variables. For each case make a decision about the stationarity of the variables based on the critical value of the
Augmented Dickey-Fuller test statistic. Why do you think a trend was included for level variables?
(c) Using the first step of the EG–ADF procedure, you get the following result:
lnC t = – 0.24 + 1.017 lnYpd t
Should you interpret this equation? Would you be impressed if you were told that the regression R2 was 0.998
and that the t-statistic for the slope was 266.06? Why or why not?
(d) The Dickey–Fuller test for the residuals for the cointegrating regressions results in a t-statistic of (–3.64).
State the null and alternative hypothesis and make a decision based on the result.
(e) You want to investigate if the slope of the cointegrating vector is one. To do so, you use the DOLS estimator
and HAC standard errors. The slope coefficient is 1.024 with a standard error of 0.009. Can you reject the null
hypothesis that the slope equals one?
Answer: (a) There are economic theories which postulate that real consumption and real personal disposable
income are proportional to each other in equilibrium. The above figure also suggests that the (log)
difference between the two series is stationary, or that they appear to have a common stochastic trend.
(b) The graph suggests the presence of a time trend. The critical values at the 10% significance level is
(-3.12) and (-3.96) at the 1% level. Hence you cannot reject the null hypothesis that the log levels of
consumption and disposable income contain a unit root. You are able to reject the null hypothesis for the
difference in both variables. Hence both series are I(1).
(c) The equation is estimated using OLS, which is only consistent if consumption and disposable income
are cointegrated. But even if the null hypothesis of a unit root can be rejected, the t -statistic does not
have a normal distribution, even when using HAC standard errors. As a result, inference can be
misleading. The high regression R2 is not surprising, given that the two variables are I(1). This could be
an example of a spurious regression. However, alternative estimators are available, such as DOLS, which
is consistent and efficient in large samples and statistical inference on the coefficient of disposable
income is valid if HAC standard errors are used. Alternatively, the Johansen procedure can be used.
(d) Under the null hypothesis, the residuals from the above regression will have a unit root. Given the
critical value for the EG–ADF statistic of (-3.96) at the 1% significance level, the null hypothesis is
rejected in favor of the alternative hypothesis that consumption and disposable income are cointegrated
over this period.
(e) The t-statistic on the null hypothesis is 2.67. Hence you can reject the null hypothesis at the 5%
significance level.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 394
7) Your textbook so far considered variables for cointegration that are integrated of the same order. For example,
the log of consumption and personal disposable income might both be I(1) variables, and the error correction
term would be I(0), if consumption and personal disposable income were cointegrated.
(a) Do you think that it makes sense to test for cointegration between two variables if they are integrated of
different orders? Explain.
(b) Would your answer change if you have three variables, two of which are I(1) while the third is I(0)? Can you
think of an example in this case?
Answer: (a) To test for cointegration requires that the two variables have the same stochastic trend. If one variable
is I(1) while the other is I(0), then obviously they do not have the same stochastic trend and therefore
cannot be cointegrated.
(b) In this case there would possibly be cointegration between the two I(1) variables, but not between all
three variables. This does not imply that the third variable could not enter into the relationship. Think,
for example, about a money demand relationship between the (log of) real money balances, income, and
the nominal interest rate. It may well be that in some samples the nominal interest rate is I(0), while real
money balances and income are I(1). Finding real money balances and income to be cointegrated does
not imply that the nominal interest rate does not enter the money demand function. There is simply no
need for the interest rate to enter the cointegrating relation because it is I(0). The cointegrating relation
only involves zero-frequency relationships between the first differences of real money balances and
income, and the zero-frequency component of the first difference of the interest rate is non-existent.
8) For the United States, there is somewhat conflicting evidence whether or not the inflation rate has a unit
autoregressive root. For example, for the sample period 1962:I to 1999:IV using the ADF statistic, you cannot
reject at the 5% significance level that inflation contains a stochastic trend. However the null hypothesis can be
rejected at the 10% significance level. The DF-GLS test rejects the null hypothesis at the five percent level. This
result turns out to be sensitive to the number of lags chosen and the sample period.
(a) Somewhat intrigued by these findings, you decide to repeat the exercise using Canadian data. Letting the
AIC choose the lag length of the ADF regression, which turns out to be three, the ADF statistic is ( -1.91). What
is your decision regarding the null hypothesis?
(b) You also calculate the DF-GLS statistic, which turns out to be (-1.23). Can you reject the null hypothesis in
this case?
(c) Is it possible for the two test statistics to yield different answers and if so, why?
Answer: (a) For the Canadian data, the null hypothesis cannot be rejected even at the 10% significance level.
Hence for the chosen sample period and lag length, the Canadian inflation rate seems to have a
stochastic trend.
(b) The critical value for the DF-GLS statistic is (-1.62) at the 10% significance level. Hence the DF-GLS
test comes to the same conclusion as the test based on the ADF statistic: there is evidence of a stochastic
trend.
(c) The two test statistics can come to different conclusion, although this is not the case with the
Canadian inflation rate. The reason is that the DF-GLS test has more power.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 395
9) You have collected time series for various macroeconomic variables to test if there is a single cointegrating
relationship among multiple variables. Formulate the null hypothesis and compare the EG–ADF statistic to its
critical value.
(a) Canadian unemployment rate, Canadian Inflation Rate, United States unemployment rate, United States
inflation rate; t = (-3.374).
(b) Approval of United States presidents (Gallup poll), cyclical unemployment rate, inflation rate, Michigan
Index of Consumer Sentiment; t = (-3.837).
(c) The log of real GDP, log of real government expenditures, log of real money supply (M2); t = (-2.23).
(d) Briefly explain how you could potentially improve on VAR(p) forecasts by using a cointegrating vector.
Answer: (a) The null hypothesis of a unit root in the error correction term cannot be rejected even at the 10% level.
Hence there is little support of a single cointegrating relationship between these four variables.
(b) The critical value is (-4.20) at the 10% significance level. Hence you cannot reject the null hypothesis
of the error correction term having a unit root.
(c) Since the critical value for three variables is (-3.84) at the 10% significance level, there does not seem
to be a cointegrating relationship between the three variables.
(d) Adding the error correction term from the cointegrating relationship between variables to the
VAR(p) model results in a vector error correction model (VECM). The advantage of this model over a
VAR model is that it incorporates both short-run and long-run information into the forecasting
equation.
10) There has been much talk recently about the convergence of inflation rates between many of the OECD
economies. You want to see if there is evidence of this closer to home by checking whether or not Canada’s
inflation rate and the United States’ inflation rate are cointegrated.
(a) You begin your numerical analysis by testing for a stochastic trend in the variables, using an Augmented
Dickey-Fuller test. The t-statistic for the coefficient of interest is as follows:
Variable with
lag of 1
t-statistic
InfCan
InfCan
InfUS
-1.93
-6.38
-2.37
InfUS
-5.63
where InfCan is the Canadian inflation rate, and InfUS is the United States inflation rate. The estimated
equation included an intercept. For each case make a decision about the stationarity of the variables based on
the critical value of the Augmented Dickey-Fuller test statistic.
(b) Your test for cointegration results in a EG–ADF statistic of (–7.34). Can you reject the null hypothesis of a
unit root for the residuals from the cointegrating regression?
(c) Using a working hypothesis that the two inflation rates are cointegrated, you want to test whether or not the
slope coefficient equals one. To do so you estimate the cointegrating equation using the DOLS estimator with
HAC standard errors. The coefficient on the U.S. inflation rate has a value of 0.45 with a standard error of 0.13.
Can you reject the null hypothesis that the slope equals unity?
(d) Even if you could not reject the null hypothesis of a unit slope, would that have been sufficient evidence to
establish convergence?
Answer: (a) The critical value for the ADF is (-2.57) at the 10% significance level for the sample period. Therefore
you cannot reject the null hypothesis that there is a unit root for both inflation rates. However, given the
critical value for the ADF statistic of (-3.43) you can reject the null hypothesis for the difference or the
acceleration in the inflation rates at the 1% significance level. Both price levels appear to be I(2) variables.
(b) Given the critical value of (-3.96) for the EG-ADF statistic, you can reject the null hypothesis of a unit
root in favor of the two inflation rates being cointegrated.
(c) The DOLS estimator allows for statistical inference on the coefficient using the standard normal
distribution. Since 0.45 is more than two standard deviations from unity, you can reject the null
hypothesis of that regression coefficient being one.
(d) Finding a unit slope would not be sufficient for convergence, since it would allow for a constant
difference between the two inflation rates. To have convergence you would need that difference to be
zero.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 396
11) You have re-estimated the two variable VAR model of the change in the inflation rate and the unemployment
rate presented in your textbook using the sample period 1982:I (first quarter) to 2009:IV. To see if the
conclusions regarding Granger causality of changed, you conduct an F-test for this new sample period. The
results are as follows: The F-statistic testing the null hypothesis that the coefficients on Unempt-1 , Unemp t-2 ,
Unempt-3 , and Unemplt-4 are zero in the inflation equation (Equation 16.5 in your textbook) is 6.04. The
F-statistic testing the hypothesis that the coefficients on the four lags of Inft are zero in the unemployment
equation (Equation 16.6 in your textbook) is 0.80.
a.
What is the critical value of the F-statistic in both cases?
b.
Do you think that the unemployment rate Granger-causes changes in the inflation rate?
c.
Do you think that the change in the inflation rate Granger -causes the unemployment rate?
Answer: a. The critical value at the 5% level is F4, = 2.37
b. Given the value of the Granger causality statistic, which is greater than the critical value, you can
reject the null hypothesis, meaning that the unemployment rate is a useful predictor for the change in
the inflation rate. Hence the unemployment rate Granger-causes changes in inflation.
c. In this case, the Granger causality statistic does not exceed the critical value, and hence the conclusion
is that the change in the inflation rate does not Granger-cause the unemployment rate.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 397
12) In this case, the Granger causality statistic does not exceed the critical value, and hence the conclusion is that
the change in the inflation rate does not Granger-cause the unemployment rate.
Inft = 0.05 - 0.31 Inft-1
(0.14) (0.07)
t = 1982:I — 2009:IV, R2 = 0.10, SER = 2.4
a.
Calculate the one-quarter-ahead forecast of both Inf2010:I and Inf2010:I (the inflation rate in 2009:IV was
2.6 percent, and the change in the inflation rate for that quarter was -1.04).
b.
Calculate the forecast for 2010:II using the iterated multiperiod AR forecast both for the change in the
inflation rate and the inflation rate.
c.
What alternative method could you have used to forecast two quarters ahead? Write down the
equation for the two-period ahead forecast, using parameters instead of numerical coefficients, which
you would have used.
Answer: a. Inf2010:I|2009:IV = 0.05 - 0.31 Inf2009:IV = 0.05 - 0.31 ×(- 1.04) = 0.4
The forecast is therefore that the inflation rate would increase by 0.4 percentage points, and the
inflation rate for 2005:I would therefore be 3.0 percent.
b. Inf2010:II|2009:IV = 0.05 - 0.31 Inf2010:I|2009:IV = 0.05 - 0.31 × 0.4 = -0.1
The forecast for the change in the inflation rate is to decline by 0.1 percentage points. The
forecasted level would therefore be 2.9 percent.
c.
The alternative would have been to use the “Direct Multiperiod Forecasts” method. The
^
^
estimated equation would have been Inf2010:II|2009:IV = 0 + 1 Inf2009:IV
Stock/Watson 2e -- CVC2 8/23/06 -- Page 398
13) You have collected quarterly data for real GDP (Y) for the United States for the period 1962:I (first quarter) to
2009:IV.
a.
Testing the log of GDP for stationarity, you run the following regression (where the lag length was
determined using the AIC):
ln Yt = 0.03 - 0.0024 ln Yt-1 + 0.253 ln Yt-1 + 0.167 ln Yt-2
(0.03) (0.0014)
(0.072)
(0.072)
t = 1962:I — 2009:IV, R2 = 0.16, SER = 0.008
Use the ADF statistic with an intercept only to test for stationarity. What is your decision?
b.
You have decided to test the growth rate of real GDP for stationarity for the same sample period. The
regression is as follows:
2 ln Yt = 0.0041 - 0.543
(0.0009) (0.082)
ln Yt-1 - 0.186
2 ln Yt-1
(0.071)
t = 1962:I — 2009:IV, R2 = 0.36, SER = 0.008
Use the ADF statistic with an intercept only to test for stationarity. What is your decision?
c.
Using the orders of integration terminology, what order of integration is the log level of real GDP? The
growth rate?
d.
Given that the SER hardly changed in the second equation, why is the regression R2 larger?
Answer: a. The t-statistic for the ADF test is -1.77. The critical value at the 5% level is -2.86. Hence you cannot
reject the null hypothesis of a unit root for the log level of real GDP.
b. The t-statistic for the ADF test is -6.65. The critical value at the 5% level is -2.86. Hence you can reject
the null hypothesis of a unit root for the (quarterly) growth rate of real GDP.
c. The log of real GDP is I(1), the growth rate is I(0); the growth rate is stationary.
d. Obviously the TSS must have increased since R 2 = 1 — (SSR/TSS).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 399
14) Economic theory suggests that the law of one price holds. Applying this concept to foreign and domestic goods
implies that goods will sell for the same price across countries. The consumer price index is the price for a
basket of goods, and is calculated for countries as a whole. Hence in the absence of barriers to trade, and large
transportation costs (and the fact that not all goods are traded) you should observe Purchasing Power Parity
(PPP) between two countries, or ExchRate×P=Pf, where ExchRate is the foreign exchange rate between the two
countries, and P represents the price index, with f indicating the foreign country. Dividing both sides of the
Pf
equation by the domestic price level then gives you the standard formulation for PPP: ExchRate =
. If PPP
P
holds in the long run, then the exchange rate and the price ratio should share a common trend. Since it is a
long-run concept, cointegration provides an interesting way to test for it.
a.
Using monthly data for the U.S./U.K. exchange rate ($/ ) and the respective price indexes, you estimate
the following regression:
ExchRatet = 0.44 + 0.69 (ln PUS - ln PUK )
Collecting the residuals from this regression and using an ADF test for cointegration, you find a
t-statistic of -2.71. Can you reject the null-hypothesis of no cointegration? What is the critical value?
b.
Was it good econometric practice to test for cointegration right away? What else should you have done
before proceeding with the EG-ADF test?
Answer: a. The critical value is -3.41 and hence the EG-ADF test cannot reject the null hypothesis of no
cointegration.
b. For the regression to establish cointegration, you should test first whether or not the LHS and RHS
variables are of the same order of integration. It is well known that exchange rates follow a random walk
and are therefore I(1) variables, but price indexes are typically of the same order of integration for
countries with similar inflation rates such as the U.K. and the U.S. Hence the RHS variable will likely be
stationary or I(0). (The ADF statistic for the exchange rate is -2.18 while the log price difference has an
ADF statistic of -4.67.)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 400
Chapter 17 The Theory of Linear Regression with One Regressor
17.1 Multiple Choice
1) All of the following are good reasons for an applied econometrician to learn some econometric theory, with the
exception of
A) turning your statistical software from a “black box” into a flexible toolkit from which you are able to
select the right tool for a given job.
B) understanding econometric theory lets you appreciate why these tools work and what assumptions are
required for each tool to work properly.
C) learning how to invert a 4×4 matrix by hand.
D) helping you recognize when a tool will not work well in an application and when it is time for you to look
for a different econometric approach.
Answer: C
2) Finite-sample distributions of the OLS estimator and t-statistics are complicated, unless
A) the regressors are all normally distributed.
B) the regression errors are homoskedastic and normally distributed, conditional on X1 ,... Xn.
C) the Gauss-Markov Theorem applies.
D) the regressor is also endogenous.
Answer: B
3) If, in addition to the least squares assumptions made in the previous chapter on the simple regression model,
the errors are homoskedastic, then the OLS estimator is
A) identical to the TSLS estimator.
B) BLUE.
C) inconsistent.
D) different from the OLS estimator in the presence of heteroskedasticity.
Answer: B
4) When the errors are heteroskedastic, then
A) WLS is efficient in large samples, if the functional form of the heteroskedasticity is known.
B) OLS is biased.
C) OLS is still efficient as along as there is no serial correlation in the error terms.
D) weighted least squares is efficient.
Answer: A
5) The following is not part of the extended least squares assumptions for regression with a single regressor:
A) var(ui Xi) =
2
u.
B) E(ui Xi) = 0.
C) the conditional distribution of ui given Xi is normal.
D) var(ui Xi) =
2
u,i .
Answer: D
6) The extended least squares assumptions are of interest, because
A) they will often hold in practice.
B) if they hold, then OLS is consistent.
C) they allow you to study additional theoretical properties of OLS.
D) if they hold, we can no longer calculate confidence intervals.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 401
7) Asymptotic distribution theory is
A) not practically relevant, because we never have an infinite number of observations.
B) only of theoretical interest.
C) of interest because it tells you what the distribution approximately looks like in small samples.
D) the distribution of statistics when the sample size is very large.
Answer: D
8) Besides the Central Limit Theorem, the other cornerstone of asymptotic distribution theory is the
A) normal distribution.
B) OLS estimator.
C) Law of Large Numbers.
D) Slutsky’s theorem.
Answer: C
9) The link between the variance of Y and the probability that Y is within (± of Y is provided by
A) Slutsky’s theorem.
B) the Central Limit Theorem.
C) the Law of Large Numbers.
D) Chebychev’s inequality.
Answer: D
10) It is possible for an estimator of Y to be inconsistent while
A) converging in probability to Y.
B) Sn
p
Y.
C) unbiased.
D) Pr Sn – Y
0.
Answer: C
11) Slutsky’s theorem combines the Law of Large Numbers
A) with continuous functions.
B) and the normal distribution.
C) and the Central Limit Theorem.
D) with conditions for the unbiasedness of an estimator.
Answer: C
12) An implication of
^
^
n ( 1 – 1)
d
N(0,
var(v i)
[var(Xi)]2
) is that
^
A) 1 is unbiased.
^
B) 1 is consistent.
C) OLS is BLUE.
D) there is heteroskedasticity in the errors.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 402
13) Under the five extended least squares assumptions, the homoskedasticity -only t-distribution in this chapter
A) has a Student t distribution with n-2 degrees of freedom.
B) has a normal distribution.
C) converges in distribution to a
2
n-2 distribution.
D) has a Student t distribution with n degrees of freedom.
Answer: A
2
2
14) You need to adjust S ^ by the degrees of freedom to ensure that S ^ is
u
u
A) an unbiased estimator of
2
u.
B) a consistent estimator of
2
u.
C) efficient in small samples.
D) F-distributed.
Answer: A
15) E
n ^
2
ui
i=1
A) is the expected value of the homoskedasticity only standard errors.
1
n-2
B) =
2
u.
C) exists only asymptotically.
D) =
2
u /(n-2).
Answer: B
16) The Gauss-Markov Theorem proves that
A) the OLS estimator is t distributed.
B) the OLS estimator has the smallest mean square error.
C) the OLS estimator is unbiased.
D) with homoskedastic errors, the OLS estimator has the smallest variance in the class of linear and unbiased
estimators, conditional on X1 ,…, Xn.
Answer: D
17) The following is not one of the Gauss-Markov conditions:
A) var(ui X1 ,…, Xn) =
2
u, 0 <
2
u<
for i = 1,…, n,
B) the errors are normally distributed.
C) E(uiuj X1 ,…, Xn) = 0, i = 1,…, n, j = 1,..., n, i
j
D) E(ui X1 ,…, Xn) = 0
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 403
18) The class of linear conditionally unbiased estimators consists of
A) all estimators of 1 that are linear functions of Y1 ,…, Yn and that are unbiased, conditional on X1 ,…, Xn .
B) OLS, WLS, and TSLS.
C) those estimators that are asymptotically normally distributed.
D) all estimators of 1 that are linear functions of X1 ,…, Xn and that are unbiased, conditional on X1 ,…, Xn.
Answer: A
^
19) The OLS estimator is a linear estimator, 1 =
A)
Xi - X
B)
C)
.
n
j=1
n ^
ai Yi , where a^i =
i=1
(Xj - X)2
1
.
n
Xi - X
n
.
(Xj - X)
j=1
D)
Xi
n
j=1
.
(Xj - X)2
Answer: A
20) If the errors are heteroskedastic, then
A) the OLS estimator is still BLUE as long as the regressors are nonrandom.
B) the usual formula cannot be used for the OLS estimator.
C) your model becomes overidentified.
D) the OLS estimator is not BLUE.
Answer: D
21) Estimation by WLS
A) although harder than OLS, will always produce a smaller variance.
B) does not mean that you should use homoskedasticity -only standard errors on the transformed equation.
C) requires quite a bit of knowledge about the conditional variance function.
D) makes it very hard to interpret the coefficients, since the data is now weighted and not any longer in its
original form.
Answer: C
22) The WLS estimator is called infeasible WLS estimator when
A) the memory required to compute it on your PC is insufficient.
B) the conditional variance function is not known.
C) the numbers used to compute the estimator get too large.
D) calculating the weights requires you to take a square root of a negative number.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 404
23) Feasible WLS does not rely on the following condition:
A) the conditional variance depends on a variable which does not have to appear in the regression function.
B) estimating the conditional variance function.
C) the key assumptions for OLS estimation have to apply when estimating the conditional variance function.
D) the conditional variance depends on a variable which appears in the regression function.
Answer: D
24) In practice, the most difficult aspect of feasible WLS estimation is
A) knowing the functional form of the conditional variance.
B) applying the WLS rather than the OLS formula.
C) finding an econometric package that actually calculates WLS.
D) applying WLS when you have a log -log functional form.
Answer: A
25) The advantage of using heteroskedasticity -robust standard errors is that
A) they are easier to compute than the homoskedasticity-only standard errors.
B) they produce asymptotically valid inferences even if you do not know the form of the conditional
variance function.
C) it makes the OLS estimator BLUE, even in the presence of heteroskedasticity.
D) they do not unnecessarily complicate matters, since in real-world applications, the functional form of the
conditional variance can easily be found.
Answer: B
26) Homoskedasticity means that
A) var(ui|Xi) =
B) var(Xi) =
2
ui
2
u
C) var(ui|Xi) =
2
u
^
2
ui
D) var(ui|Xi) =
Answer: C
27) In order to use the t-statistic for hypothesis testing and constructing a 95% confidence interval as 1.96
standard errors, the following three assumptions have to hold:
A) the conditional mean of ui , given Xi is zero; (Xi ,Yi), i = 1,2, …, n are i.i.d. draws from their joint
distribution; Xi and ui have four moments
B) the conditional mean of ui , given Xi is zero; (Xi ,Yi), i = 1,2, …, n are i.i.d. draws from their joint
distribution; homoskedasticity
C) the conditional mean of ui , given Xi is zero; (Xi ,Yi), i = 1,2, …, n are i.i.d. draws from their joint
distribution; the conditional distribution of ui given Xi is normal
D) none of the above
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 405
28) If the variance of u is quadratic in X, then it can be expressed as
A) var(ui|Xi) =
2
0
1/2
B) var(ui|Xi) = 0 + 1 X i
2
C) var(ui|Xi) = 0 + 1 X i
D) var(ui|Xi) =
2
u
Answer: C
29) In practice, you may want to use the OLS estimator instead of the WLS because
A) heteroskedasticity is seldom a realistic problem
B) OLS is easier to calculate
C) heteroskedasticity robust standard errors can be calculated
D) the functional form of the conditional variance function is rarely known
Answer: D
30) If the functional form of the conditional variance function is incorrect, then
A) the standard errors computed by WLS regression routines are invalid
B) the OLS estimator is biased
C) instrumental variable techniques have to be used
D) the regression R2 can no longer be computed
Answer: A
31) Suppose that the conditional variance is var(ui|Xi ) = h(Xi ) where is a constant and h is a known function.
The WLS estimator is
A) the same as the OLS estimator since the function is known
B) can only be calculated if you have at least 100 observations
C) the estimator obtained by first dividing the dependent variable and regressor by the square root of h and
then regressing this modified dependent variable on the modified regressor using OLS
D) the estimator obtained by first dividing the dependent variable and regressor by h and then regressing
this modified dependent variable on the modified regressor using OLS
Answer: C
^
32) The large-sample distribution of 1 is
var( i)
^
where i= (Xi- x)ui
A) n( 1 - 1 ) d N(0
[var(Xi)]2
^
B)
n( 1 - 1 ) d N(0
C)
n( 1 - 1 ) d N(0
D)
^
^
n( 1 - 1 ) d N(0
var( i)
[var(Xi)]2
var( i)
[var(Xi)]2
where i= ui
where i= Xiui
2
u
[var(Xi)]2
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 406
33) (Requires Appendix material) If X and Y are jointly normally distributed and are uncorrelated,
A) then their product is chi-square distributed with n-2 degrees of freedom
B) then they are independently distributed
C) then their ratio is t-distributed
D) none of the above is true
Answer: B
2
34) Assume that var(ui|Xi) = 0 + 1 X i . One way to estimate
0 and
1 consistently is to regress
^
2
A) ui on X i using OLS
^2
2
B) u i on X i using OLS
^2
C) u i on
Xiusing OLS
^2
2
D) u i on X i using OLS but surpressing the constant ( restricted least squares )
Answer: B
35) Assume that the variance depends on a third variable, W i, which does not appear in the regression function,
1
One way to estimate 0 and 1 consistently is to regress
and that var(u i|Xi,Wi) = 0 + 1
Wi
^
2
A) ui on W i using OLS
^
B) ui on
1
using OLS
Wi
Xi
^2
C) u i on
using OLS
Wi
^2
1
using OLS
D) u i on
Wi
Answer: D
Stock/Watson 2e -- CVC2 8/23/06 -- Page 407
17.2 Essays and Longer Questions
1) Discuss the properties of the OLS estimator when the regression errors are homoskedastic and normally
distributed. What can you say about the distribution of the OLS estimator when these features are absent?
Answer: In the initial discussion of the OLS estimator, it was established that if the three least squares
assumptions hold, then the OLS estimator is unbiased, consistent, and has an asymptotically normal
distribution. Small sample properties are more difficult to establish, at least in the case when the
regressors are random variables. If the assumption of homoskedasticity is added to the previous
assumptions, then the OLS estimator is efficient in the class of linear and conditionally unbiased
estimators. This result is known as the Gauss-Markov Theorem. Since the proof depends on the
assumption of homoskedasticity, OLS is not efficient in its absence. In that case, an alternative estimator,
WLS, is efficient in large samples. However, the result depends on knowing the functional form of the
heteroskedasticity, so that the parameters can be estimated. If the functional form is unknown, which is
the case in virtually all real-world applications, then using the computed standard errors results in
invalid statistical inference.
If the conditional distribution of the errors is normal, then a small sample distribution for the OLS
estimator can be derived using the homoskedasticity-only standard errors. The resulting t-statistic now
follows a Student t distribution.
2) What does the Gauss-Markov theorem prove? Without giving mathematical details, explain how the proof
proceeds. What is its importance?
Answer: The Gauss-Markov Theorem proves that in the class of linear and unbiased estimators the OLS
estimator has the smallest variance or is BLUE. The proof first establishes the conditions under which a
linear estimator is unbiased. It then derives the variance of the estimator. The smallest variance property
is then established by showing that the conditional variance of any old linear and unbiased estimator
exceeds that of the OLS estimator, unless they are the same. To show this it is assumed that the OLS
weights and the weights of any other linear estimator differ by some amount. Substitution of this
condition into the conditional variance formula for any linear and unbiased estimator then shows that
the resulting variance exceeds that of the OLS estimator unless the difference in the weights is zero.
Hence OLS is BLUE. The Gauss-Markov Theorem gave the major justification for the widespread use of
the OLS estimator.
3) One of the earlier textbooks in econometrics, first published in 1971, compared “estimation of a parameter to
shooting at a target with a rifle. The bull’s-eye can be taken to represent the true value of the parameter, the
rifle the estimator, and each shot a particular estimate.” Use this analogy to discuss small and large sample
condition? (Dependent on your
properties of estimators. How do you think the author approached the n
view of the world, feel free to substitute guns with bow and arrow, or missile.)
Answer: Unbiasedness: the shots produce a scatter, but the center of the scatter is the bulls -eye. If the riffle
produces a scatter of shots that is centered on another point, then the gun is biased.
Efficiency: Requires comparison with other unbiased guns. Looking at the scatters produced by the
shots, the smallest scatter is the one from the efficient gun.
BLUE: Remove all guns which are not linear and/or biased. The gun among these remaining ones which
produces the smallest scatter is the BLUE gun.
Consistency: n
is the condition as you march towards the bulls-eye, i.e., the distance becomes
shorter as n
. A shot fired from a consistent gun hits the bull’s-eye with increasing probability as you
get closer to the bull’s-eye. Or, perhaps even better, you might want to substitute “being very close to
the bull’s-eye” for “hitting the bull’s-eye.”
Stock/Watson 2e -- CVC2 8/23/06 -- Page 408
4) “I am an applied econometrician and therefore should not have to deal with econometric theory. There will be
others who I leave that to. I am more interested in interpreting the estimation results.” Evaluate.
Answer: Being presented with regression output and interpreting these uncritically does not allow the applied
econometrician to understand the limitations of the tool. As a result, the interpretation may be false as
might be the case in rejecting hypotheses when standard statistical inference does not apply in the
situation at hand. In particular, having knowledge of econometric theory allows the econometrician to
check whether or not the assumptions, which are necessary for statistical properties to hold, apply in a
given situation. Knowing when to apply and when not to apply certain techniques is essential in
conducting statistical inference, such as hypothesis testing and using confidence intervals. If the applied
econometrician understands the limitations of certain estimation techniques, such as OLS, then she will
be able to look for alternative approaches rather than blindly applying techniques by pushing “buttons”
in econometric software. The above statement therefore seems short-sighted.
5) “One should never bother with WLS. Using OLS with robust standard errors gives correct inference, at least
asymptotically.” True, false, or a bit of both? Explain carefully what the quote means and evaluate it critically.
Answer: WLS is a special case of the GLS estimator. Furthermore, OLS is a special case of the WLS estimator. Both
will produce different estimates of the intercept and the coefficients of the other regressors, and different
estimates of their standard errors. WLS has the advantage over OLS, that it is (asymptotically) more
efficient than OLS. However, the efficiency result depends on knowing the conditional variance
function. When this is the case, the parameters can be estimated and the weights can be specified.
Unfortunately in practice, as Stock and Watson put it, “the functional form of the conditional variance
function is rarely known.” Using an incorrect functional form for the estimation of the parameters results
in incorrect statistical inference. The bottom line is that WLS should be used in those rare instances
where the functional form is known, but not otherwise. Estimation of the parameters using OLS with
heteroskedasticity-robust standard errors, on the other hand, leads to asymptotically valid inferences
even for the case where the functional form of the heteroskedasticity is not known. It therefore seems
that for real world applications the above statement is true.
17.3 Mathematical and Graphical Problems
2
1) Consider the model Yi = 1 Xi + ui, where ui = c X i ei and all of the X’s and e’s are i.i.d. and distributed N(0,1).
(a) Which of the Extended Least Squares Assumptions are satisfied here? Prove your assertions.
(b) Would an OLS estimator of 1 be efficient here?
(c) How would you estimate 1 by WLS?
Answer: (a) The extended least squares assumptions are:
1. E(cXiei Xi) = 0 (conditional mean zero) – this holds here since the X’s and e’s are i.i.d;
2. (Xi, Yi), i = 1,…, n are independent and identically distributed (i.i.d.) draws from their joint
distribution - this applies here;
3. (Xi, ui) have nonzero finite fourth moments – this follows from the normal distribution, which has
moments of all orders.
4. var(ui Xi) =
2
4
u (homoskedasticity) – this fails since var(ui Xi) = X i ; and
5. The conditional distribution of ui given Xi is normal (normal errors) – this holds since Xi, ui is
perfectly normal, so to speak.
(b) Since the model is heteroskedastic, WLS offers efficiency gains.
2
2
(c) You would weight each observation by 1/ X i , i.e., regress Yi/ X i on 1/Xi.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 409
2) (Requires Appendix material) This question requires you to work with Chebychev’s Inequality.
(a) State Chebychev’s Inequality.
(b) Chebychev’s Inequality is sometimes stated in the form “The probability that a random variable is further
than k standard deviations from its mean is less than 1/k2 .” Deduce this form. (Hint: choose artfully.)
(c) If X is distributed N(0,1), what is the probability that X is two standard deviations from its mean? Three?
What is the Chebychev bound for these values?
(d) It is sometimes said that the Chebychev inequality is not “sharp.” What does that mean?
Answer: (a) Pr( V – V
var(V)/ 2 , where V is a random variable.
(b) In the statement of the result, choose = k , where 2 = var(V).
)
(c) 0.046 and 0.0027 respectively. (The smallest/largest z-value in Table 1 of the textbook is –2.99/2.99.
Using these values, the second number modifies to 0.0028.) Chebychev’s inequality gives 0.25 and 0.11,
respectively.
(d) Answer: This means that, for some distributions, the probability that a random variable is further
than k standard deviations away from its mean is much less than 1/ k2 .
3) For this question you may assume that linear combinations of normal variates are themselves normally
distributed. Let a, b, and c be non-zero constants.
(a) X and Y are independently distributed as N(a, 2 ). What is the distribution of (bX+cY)?
(b) If X1 ,..., Xn are distributed i.i.d. as N(a,
2
1
X ), what is the distribution of n
n
Xi ?
i=1
(c) Draw this distribution for different values of n. What is the asymptotic distribution of this statistic?
(d) Comment on the relationship between your diagram and the concept of consistency.
n
1
Xi . What is the distribution of n(X – a)? Does your answer depend on n?
(e) Let X =
n
i=1
Answer: (a) E(bX + cY) = bE(X) + cE(Y) = a(b + c); var(bX + xY) = (b2 + c2 ) 2 .
Hence (bX+cY) are distributed N(a(b + c), 2 (b2 + c2 )).
(b) From (a) it follows that this is distributed as N(a,
2
n
).
(c) The curves will be normal curves centered on a, but becoming spike-like as n grows.
(d) The diagram shows that, as n grows, the probability distribution concentrates on a. The probability of
n
1
Xi different from a becomes small as n grows. This is consistency.
observing a value of
n
i=1
(e) n(X - a) is distributed N(0, 2 ). This does not dependent on n, in contrast to the large-sample
non-normal case where this distribution is only approached as n grows.
4) Consider the model Yi - 1 Xi + ui, where the Xi and ui the are mutually independent i.i.d. random variables
with finite fourth moment and E(ui) = 0.
^
(a) Let 1 denote the OLS estimator of 1 . Show that
n
Xiui
i=1
n
^
n( 1 - 1 ) =
n
.
2
Xi
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 410
n
(b) What is the mean and the variance of
Xiui
i=1
? Assuming that the Central Limit Theorem holds, what is
n
its limiting distribution?
^
n( 1 - 1 )? State what theorems are necessary for your deduction.
(c) Deduce the limiting distribution of
n
^
Answer: (a) The OLS estimator in this case is 1 =
i=1
n
XiYi
2
Xi
. Substituting for Yi into the estimator and
i=1
re-arranging terms then gives the above expression.
n
(b) The mean is zero and the variance is obtained from var
Xiui
i=1
n
If the Central Limit Theorem holds, then this will be distributed N(0,
n
=
1
n var (Xiui) =
n
2
2
u E( X i ).
2
2
u E( X i ).
Xiui
i=1
(c) Let
n
^
n( 1 - 1 ) =
n
2
Xi
=
xN
bN
, say. Then x N approaches N(0,
2
2
u E( X i )) in distribution, and
i=1
xN
2
x
bN approaches E(X ) in probability. It follows that
approaches
in distribution, which is
i
bN
b
distributed N(0,
2
2
u /E( X i )) (Slutsky’s theorem).
5) (Requires Appendix material) If the Gauss-Markov conditions hold, then OLS is BLUE. In addition, assume
here that X is nonrandom. Your textbook proves the Gauss-Markov theorem by using the simple regression
n
~
aiYi . Substitution of the simple regression
model Yi = 0 + 1 Xi + ui and assuming a linear estimator 1 =
i=1
model into this expression then results in two conditions for the unbiasedness of the estimator:
n
i=1
ai = 0 and
n
aiXi = 1.
i=1
~
The variance of the estimator is var( 1 X1 ,…, Xn) =
2
u
n
2
ai .
i=1
Different from your textbook, use the Lagrangian method to minimize the variance subject to the two
constraints. Show that the resulting weights correspond to the OLS weights.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 411
Answer: Define the Lagrangian as follows:
n
2
u
L=
2
ai - 1
n
ai - 2 (
i=1
i=1
n
aiXi - 1).
i=1
To obtain the first order conditions, take the (n+2) derivatives with respect to the n weights and the two
Lagrange multipliers and set these to zero.
ai
1
2
u - 1 - 2 Xi; i= 1,..., n
L = 0 = 2ai
L=0=
n
ai
i=1
n
aiXi - 1
L=0
2
i=1
Using the summation operator on both sides of the first equation and bringing the first constraint into
play then gives 1 = - 2 X . Using this result in the first equation to eliminate the first Lagrange
multiplier results in the following conditions for the n weights: 2ai
2
u = 2 (Xi - X). To bring the second
constraint into play, multiply both sides by Xi and use the summation operator on both sides again 2
n
i=1
aiXi = 2
n
(Xi - X) Xi or 2
2
u= 2
i=1
multiplier 2 =
n
i=1
2
2
u
into 2ai
n
i=1
(Xi - X)2 . Substituting the result for the second Lagrange
2
u = 2 (Xi - X) then gives 2ai
2
u=
(Xi - X)2
after simplifying ai =
(Xi - X)
n
i=1
2
u
2
2
u
(Xi - X) and
n
i=1
(Xi - X)2
. But these are the OLS weights, since the OLS slope estimator is
(Xi - X)2
defined as follows
n
n
(Xi - X)Yi
(Xi - X)(Xi - Y)
n
Xi - X
^
i=1
i=1
wi -Yi) , where wi =
.
=
=
=
1
n
n
n
i=1
(Xi - X)2
(Xi - X)2
(Xi - X)2
i=1
i=1
i=1
6) Your textbook states that an implication of the Gauss-Markov theorem is that the sample average, Y, is the
most efficient linear estimator of E(Yi) when Y1 ,..., Yn are i.i.d. with E(Yi) = Y and var(Yi) =
from the regression model with no slope and the fact that the OLS estimator is BLUE.
~ n aY
Provide a proof by assuming a linear estimator in the Y’s, =
i i.
i=1
(a) State the condition under which this estimator is unbiased.
(b) Derive the variance of this estimator.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 412
2
Y . This follows
(c) Minimize this variance subject to the constraint (condition) derived in (a) and show that the sample mean is
BLUE.
n
~
Answer: (a) E( ) = E
ai Yi =
i=1
n
aiE(Yi) = Y
i=1
n
ai . Hence for this to be an unbiased estimator, the
i=1
n
ai = 1 .
following condition must hold:
i=1
~
~
n
~
(b) var( ) = E( - E( ))2 = E(
i=1
aiYi - y )2 = (
2
Y
(c) Define the Lagrangian L =
n
2
ai -
n
2
a E(Yi - y )2 =
i=1 i
2
Y
n
2
ai .
i=1
n
ai - 1) , where is the Lagrange multiplier. To obtain
i=1
i=1
the first order conditions, minimize L with respect to the n weights and the Lagrange multiplier, and
solve the resulting (n+1) equations in the (n+1) unknowns.
ai
2
Y ai - ; i = 1,..., n
L=0=2
n
L=0=
(
ai - 1
i=1
Summing the first equation 2
2
Y
n
ai = n
and bringing in the second equation subsequently, results
i=1
2
Y
2
in
=
n
. Substituting this result into the first equation then gives 2
= 1,..., n. Since these are also the OLS weights, then OLS is BLUE.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 413
2
Y ai =
2
Y
2
n
; i = 1,..., or ai =
1
;i
n
7) (Requires Appendix material) State and prove the Cauchy-Schwarz Inequality.
Answer: The proof here reproduces the relevant section from Appendix 15.2 of the textbook. Chebychev’s
inequality uses the variance of the random variable V to bound the probability that V is farther than ±
from its mean, where is a positive constant:
Pr( V - V
)
var(V)/ 2 (Chebychev’s inequality).
Proof. Let W = V – V , let f be the p.d.f. of W, and let
E(W)2 ) =
be any positive number. Now,
w2 f (w)dw
-
=
w2 f( w)dw +
w2 f (w)dw +
w2 f (w)dw
-
w2 f (w)dw +
w2 f (w)dw
2
f (w)dw +
f (w)dw
= 2 Pr( W
),
where the first equality is the definition of E(W2 ), the second equality holds because the range of
integration divides up the real line, the first inequality holds because the term that was dropped is
nonnegative, the second inequality holds because w2 2 over the range of integration, and the final
). Substituting W = V – V into the final expression, noting
equality holds by the definition of Pr( W
that E(W2 ) = E[(V – V )2 ] = var(V), and rearranging yields the inequality.
8) Consider the simple regression model Yi = 0 + 1 Xi + ui where Xi > 0 for all i, and the conditional variance is
2
var(ui Xi) = X where is a known constant with > 0.
i
~
~
~
~
~
~
~
(a) Write the weighted regression as Yi = 0 X0i + 1 X1i + ui. How would you construct Yi, X0i and X1i?
~
(b) Prove that the variance of is ui homoskedastic.
(c) Which coefficient is the intercept in the modified regression model? Which is the slope?
(d) When interpreting the regression results, which of the two equations should you use, the original or the
modified model?
~
Answer: (a) Yi =
Yi ~
Xi
~
1
, X0i =
, and X1i =
= 1.
Xi
Xi
Xi
2
Xi
~
ui
var(ui Xi)
(b) var(ui Xi) = var X Xi =
=
= , which is constant.
i
2
2
Xi
Xi
~
~
~
(c) The coefficient on X1i is now the intercept, while the coefficient on X0i is the slope.
(d) The modified model is simply used to obtain estimates of the original model. The modified model
should therefore not be used for interpretation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 414
9) (Requires Appendix material) Your textbook considers various distributions such as the standard normal, t, 2 ,
and F distribution, and relationships between them.
2
n1
.
(a) Using statistical tables, give examples that the following relationship holds: Fn , =
1
n1
(b) t is distributed standard normal, and the square of the t-distribution with n2 degrees of freedom equals
the value of the F distribution with (1, n2 ) degrees of freedom. Why does this relationship between the t and F
distribution hold?
Answer: (a) For example, the critical value at the 10% significance level for the F-distribution is F30, . the 10%
significance level for the 2 distribution is 40.26 and dividing by 30 results in 1.34.
(b) The textbook states that if W1 and W2 are independent random variables with chi-squared
distributions and respective degrees of freedom n1 and n2 . Then the random variable
F=
W1 /n1
W2/n2
has an F distribution with (n1 , n2 ) degrees of freedom. This distribution is denoted Fn n . For the
1 2
t-distribution, the following holds: Let Z have a standard normal distribution, let W have a
2
m
distribution, and let Z and W be independently distributed. Then the random variable
t=
Z
W/m
Z2
has a Student t distribution with m degrees of freedom, denoted tm. Squaring this term gives t2 =
.
W/m
But if Z1 ,Z2 ,…,Zn are n i.i.d standard normal random variables, then the random variable
n
2
W=
Zi
i=1
has a chi-squared distribution with n degrees of freedom. Hence Z2 , the square of a standard normal
Z2 /1
variable, has a chi-square distribution with one degree of freedom. This gives t2 =
= F1,m.
W/m
10) Consider estimating a consumption function from a large cross-section sample of households. Assume that
households at lower income levels do not have as much discretion for consumption variation as households
with high income levels. After all, if you live below the poverty line, then almost all of your income is spent on
necessities, and there is little room to save. On the other hand, if your annual income was $1 million, you could
save quite a bit if you were a frugal person, or spend it all, if you prefer. Sketch what the scatterplot between
consumption and income would look like in such a situation. What functional form do you think could
approximate the conditional variance var(ui Inome)?
Answer: See the accompanying figure. var(ui Inome) could be a + b × Income or a + b × Income2 . Hence there would
be heteroskedasticity.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 415
Chapter 18 The Theory of Multiple Regression
18.1 Multiple Choice
1) The extended least squares assumptions in the multiple regression model include four assumptions from
Chapter 6 (ui has conditional mean zero; (Xi,Yi), i = 1,…, n are i.i.d. draws from their joint distribution; Xi and
ui have nonzero finite fourth moments; there is no perfect multicollinearity). In addition, there are two further
assumptions, one of which is
A) heteroskedasticity of the error term.
B) serial correlation of the error term.
C) homoskedasticity of the error term.
D) invertibility of the matrix of regressors.
Answer: C
2) The difference between the central limit theorems for a scalar and vector -valued random variables is
A) that n approaches infinity in the central limit theorem for scalars only.
B) the conditions on the variances.
C) that single random variables can have an expected value but vectors cannot.
D) the homoskedasticity assumption in the former but not the latter.
Answer: B
3) The Gauss-Markov theorem for multiple regression states that the OLS estimator
A) has the smallest variance possible for any linear estimator.
B) is BLUE if the Gauss-Markov conditions for multiple regression hold.
C) is identical to the maximum likelihood estimator.
D) is the most commonly used estimator.
Answer: B
4) The GLS assumptions include all of the following, with the exception of
A) the Xi are fixed in repeated samples.
B) Xi and ui have nonzero finite fourth moments.
C) E(UU X) = (X), where
D) E(U X) = 0 n.
(X) is n×n matrix-valued that can depend on X.
Answer: A
5) The multiple regression model can be written in matrix form as follows:
A) Y = X .
B) Y = X + U.
C) Y = X + U.
D) Y = X + U.
Answer: D
6) The linear multiple regression model can be represented in matrix notation as Y= X + U, where X is of order
n×(k+1). k represents the number of
A) regressors.
B) observations.
C) regressors excluding the “constant” regressor for the intercept.
D) unknown regression coefficients.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 416
7) The multiple regression model in matrix form Y = X + U can also be written as
A) Yi = 0 + X
i
B) Yi = X
+ ui, i = 1,…, n.
, i = 1,…, n.
i i
C) Yi = X + ui, i = 1,…, n.
i
D) Yi = X
+ ui, i = 1,…, n.
i
Answer: D
8) The assumption that X has full column rank implies that
A) the number of observations equals the number of regressors.
B) binary variables are absent from the list of regressors.
C) there is no perfect multicollinearity.
D) none of the regressors appear in natural logarithm form.
Answer: C
9) One implication of the extended least squares assumptions in the multiple regression model is that
A) feasible GLS should be used for estimation.
B) E(U|X) = In.
C) X X is singular.
D) the conditional distribution of U given X is N(0 n, In).
Answer: D
10) One of the properties of the OLS estimator is
^
A) X = 0 k+1 .
B) that the coefficient vector
^
^
has full rank.
C) X (Y – X ) = 0 k+1 .
D) (X X)-1 = X Y
Answer: C
n
11) Minimization of
^
i=1
(Yi - b0 - b1 X1i - ... - bkXki)2 results in
A) X Y = X .
^
B) X = 0 k+1 .
^
C) X (Y – X ) = 0 k+1 .
D) R = r.
Answer: C
12) The Gauss-Markov theorem for multiple regression proves that
A) MX is an idempotent matrix.
B) the OLS estimator is BLUE.
C) the OLS residuals and predicted values are orthogonal.
D) the variance-covariance matrix of the OLS estimator is
2
-1
u (X X) .
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 417
13) The GLS estimator is defined as
A) (X -1 X)-1 (X -1 Y).
B) (X X)-1 X Y.
C) A Y.
D) (X X)-1 X U.
Answer: A
14) The OLS estimator
A) has the multivariate normal asymptotic distribution in large samples.
B) is t-distributed.
C) has the multivariate normal distribution regardless of the sample size.
D) is F-distributed.
Answer: A
15)
^
A) cannot be calculated since the population parameter is unknown.
B) = (X X)-1 X U .
^
C) = Y - Y.
D) =
+ (X X)-1 X U
Answer: B
16) The heteroskedasticity-robust estimator of
^
n( - )
is obtained
A) from (X X)-1 X U.
B) by replacing the population moments in its definition by the identity matrix.
C) from feasible GLS estimation.
D) by replacing the population moments in its definition by sample moments.
Answer: D
17) A joint hypothesis that is linear in the coefficients and imposes a number of restrictions can be written as
A) (X X)-1 X Y.
B) R = r .
^
C) – .
D) R = 0.
Answer: B
18) Let there be q joint hypothesis to be tested. Then the dimension of r in the expression
R = r is
A) q × 1.
B) q × (k+1).
C) (k+1) × 1.
D) q.
Answer: A
19) The formulation R = r to test a hypotheses
A) allows for restrictions involving both multiple regression coefficients and single regression coefficients.
B) is F-distributed in large samples.
C) allows only for restrictions involving multiple regression coefficients.
D) allows for testing linear as well as nonlinear hypotheses.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 418
20) Let PX = X(X X)-1 X and MX = In - PX. Then MX MX =
A) X(X X)-1 X - PX.
2
B) M X
C) In.
D) MX.
Answer: D
21) In the case when the errors are homoskedastic and normally distributed, conditional on X, then
A)
B)
C)
^
^
^
is distributed N( ,
^
is distributed N( ,
^ ), where
is distributed N( ,
^
),where
X
),where
X
^
X
=
-1
/n = Q X
^
n( - )
^=
^
2
u I(k+1).
X
=
V
-1
Q X /n.
2
-1
u (X X) .
^
D) U = PXY where PX = X(X X)-1 X .
Answer: C
22) An estimator of is said to be linear if
A) it can be estimated by least squares.
B) it is a linear function of Y1 ,…, Yn .
C) there are homoskedasticity-only errors.
D) it is a linear function of X1 ,…, Xn .
Answer: B
23) The leading example of sampling schemes in econometrics that do not result in independent observations is
A) cross-sectional data.
B) experimental data.
C) the Current Population Survey.
D) when the data are sampled over time for the same entity.
Answer: D
24) The presence of correlated error terms creates problems for inference based on OLS. These can be overcome by
A) using HAC standard errors.
B) using heteroskedasticity-robust standard errors.
C) reordering the observations until the correlation disappears.
D) using homoskedasticity-only standard errors.
Answer: A
25) The GLS estimator
A) is always the more efficient estimator when compared to OLS.
B) is the OLS estimator of the coefficients in a transformed model, where the errors of the transformed
model satisfy the Gauss-Markov conditions.
C) cannot handle binary variables, since some of the transformations require division by one of the
regressors.
D) produces identical estimates for the coefficients, but different standard errors.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 419
26) The extended least squares assumptions in the multiple regression model include four assumptions from
Chapter 6 (ui has conditional mean zero; (Xi,Yi), i = 1,…, n are i.i.d. draws from their joint distribution; Xi and ui
have nonzero finite fourth moments; there is no perfect multicollinearity). In addition, there are two further
assumptions, one of which is
A) heteroskedasticity of the error term.
B) serial correlation of the error term.
C) the conditional distribution of ui given Xi is normal.
D) invertibility of the matrix of regressors.
Answer: C
27) The OLS estimator for the multiple regression model in matrix form is
A) (X X)-1 X Y
B) X(X X)-1 X - PX
C) (X X)-1 X U
D) (X -1 X)-1 X -1 Y
Answer: A
28) To prove that the OLS estimator is BLUE requires the following assumption
A) (Xi ,Yi) i = 1, …, n are i.i.d. draws from their joint drstribution
B) Xi and ui have nonzero finite fourth moments
C) the conditional distribution of ui given Xi is normal
D) none of the above
Answer: D
29) The TSLS estimator is
A) (X X)-1 X Y
B) (X Z(Z’Z)-1 Z’X)-1 X Z(Z’Z)-1 Z’ Y
C) (X -1 X)-1 (X -1 Y)
D) (X’Pz )-1 Pz Y
Answer: B
30) The homoskedasticity-only F-statistic is
^
^
(R -r) [R (X X)-1 R]-1 (R -r)/q
A)
2
s ^
u
B)
^
^
(R -r) [R (X X)-1 R]-1 (R -r)
2
s ^
u
^
C)
(R -r) [R
^
^
R]-1 (R -r)
q
^
^
U PZU
D) ^
^
U MZU
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 420
18.2 Essays and Longer Questions
1) Write an essay on the difference between the OLS estimator and the GLS estimator.
Answer: Answers will vary by student, but some of the following points should be made.
The multiple regression model is
Yi = 0 + 1 X1i + 0 X2i + ... + kXki + ui, i = 1, …, n
which, in matrix form, can be written as Y = X + U. The OLS estimator is derived by minimizing the
^
squared prediction mistakes and results in the following formula: = (X X)-1 X Y. There are two GLS
^
estimators. The infeasible GLS estimator is GLS = (X -1 X)-1 (X -1 Y). Since is typically
unknown, the estimator cannot be calculated, and hence its name. However, a feasible GLS estimator
can be calculated if is a known function of a number of parameters which can be estimated. Once
these parameters have been estimated, they can then be used to calculate
^
^
^
feasible GLS estimator is defined as GLS= (X -1 )-1 (X -1 Y).
^
, the estimator of
. The
There are extended least squares assumptions.
·
·
E(ui Xi) = 0 (ui has conditional mean zero);
(Xi,Yi), i = 1, …, n are independently and identically distributed (i.i.d.) draws from their
joint
distribution;
Xi and ui have nonzero finite fourth moments;
·
·
X has full column rank (there is no perfect multicollinearity);
·
var(ui Xi) =
·
the conditional distribution of ui given Xi is normal (normal errors),
2
u (homoskedasticity);
These assumptions imply E(U X) = 0 n and E(UU X) =
2
u In, the Gauss-Markov conditions for
multiple regression. If these hold, then OLS is BLUE. If assumptions 5 and 6 do not hold, but
assumptions 1 to 4 still hold, then OLS is consistent and asymptotically normally distributed. Small
sample statistics can be derived for the case where the errors are i.i.d. and normally distributed,
conditional on X.
The GLS assumptions are
1.
E(U X) = 0 n;
2.
3.
E(UU X) = (X), where (X) is n×n matrix-valued that can depend on X;
Xi and ui have nonzero finite fourth moments;
4.
X has full column rank (there is no perfect multicollinearity).
The major differences between the two sets of assumptions relevant to the estimators themselves are that
(i) GLS allows for homoskedastic errors to be serially correlated (dropping assumption 2 of OLS list),
and (ii) there is the possibility that the errors are heteroskedastic (adding assumption 2 to GLS list). For
the case of independent sampling, replacing E(UU X) = (X) with E(UU X) =
2
u In turns the GLS
estimator into the OLS estimator.
In the case of the infeasible GLS estimator, the model can be transformed in such a way that the
Gauss-Markov assumptions apply to the transformed model, if the four GLS assumptions hold. In that
case, GLS is BLUE and therefore more efficient than the OLS estimator. This is of little practical value
Stock/Watson 2e -- CVC2 8/23/06 -- Page 421
since the estimator typically cannot be computed. The result also holds if an estimator of exists.
However, for the feasible GLS estimator to be consistent, the first GLS assumption must apply, which is
much stronger than the first OLS assumption, particularly in time series applications. It is therefore
possible for the OLS estimator to be consistent while the GLS estimator is not consistent.
2) Give several economic examples of how to test various joint linear hypotheses using matrix notation. Include
specifications of R = r where you test for (i) all coefficients other than the constant being zero, (ii) a subset of
coefficients being zero, and (iii) equality of coefficients. Talk about the possible distributions involved in
finding critical values for your hypotheses.
Answer: Answers will vary by student. Many restrictions involve the equality of coefficients across different
types of entities in cross-sections (“stability”).
Using earnings functions, students may suggest testing for the presence of regional effects, as in the
textbook example at the end of Chapter 5 (exercises). The textbook tested jointly for the presence of
interaction effects in the student achievement example at the end of Chapter 6. Students may want to test
for the equality of returns to education and on-the-job training. The panel chapter allowed for the
presence of fixed effects, the presence of which can be tested for. Testing for constant returns to scale in
production functions is also frequently mentioned.
Consider the multiple regression model with k regressors plus the constant. Let R be of order q × (k+ 1),
where q are the number of restrictions. Then to test (i) for all coefficients other than the constant to be
zero, H0 : 1 = 0, 2 = 0,. . ., k = 0 vs. H1 : j 0, at least one j, j=1, ..., n, you have R = [0 k×1 Ik ] and r =
0 k×1 . In large samples, the test will produce the overall regression F-statistic, which has a Fk,
distribution. In case (ii), reorder the variables so that the regressors with non-zero coefficients appear
first, followed by the regressors with coefficients that are hypothesized to be zero. This leads to the
following formulation
Yi =
0+
1 X1i + 2 X2i + ... + k-qXk-q,i
+ k-q+1Xk-q+1,i + k-q+2 Xk-q+2,i + ... +
kXki + ui,
i = 1, …, n. R = [0 q× (k-q+1) Iq ] and r = 0 q×1 . In large samples, the test will produce an F-statistic, which
has an Fq, distribution. In (iii), assume that the task at hand is to test the equality of two coefficients,
say H0 : 1 = 1 vs. H1 : 1
2 , as in section 5.8 of the textbook.
Then R = [0 1 -1 0 … 0], r = 0 and q = 1. This is a single restriction, and the F-statistic is the square of
the corresponding t-statistic. Hence critical values can be found either from Fq, or from the standard
normal table, after taking the square root.
3) Define the GLS estimator and discuss its properties when
is known. Why is this estimator sometimes called
infeasible GLS? What happens when is unknown? What would the matrix look like for the case of
2
independent sampling with heteroskedastic errors, where var( ui Xi) = ch(Xi) = 2 X 1i ? Since the inverse of the
error variance-covariance matrix is needed to compute the GLS estimator, find -1 . The textbook shows that
~ ~
~
~
~
~
the original model Y = X + U will be transformed into Y = X + U, where Y = FY, X = FX, and U = FU, and
F F = -1 . Find F in the above case, and describe what effect the transformation has on the original data.
Answer:
^ GLS
= (X -1 X)-1 (X -1 Y). The key point for the GLS estimator with known is that is used to
create a transformed regression model such that the resulting error term satisfies the Gauss-Markov
conditions. In that case, GLS is BLUE. However, since
is typically unknown, the estimator cannot be
calculated, and is therefore sometimes referred to as infeasible GLS. If
is unknown, then a feasible GLS
estimator can be calculated if is a known function of a number of parameters which can be estimated.
Once the parameters have been estimated, they can then be used to calculate
. The feasible GLS estimator is then
Stock/Watson 2e -- CVC2 8/23/06 -- Page 422
^
, which is the estimator of
^ GLS
^
= (X
-1 X)-1 (X ^ -1 Y).
In the above example of heteroskedasticity,
2
X 11 0
E(UU X) =
(X) = 2 ,
1
N
0
0
2
X 12 N
0
O
O
0
0
0
2
X 11
1
0
2
X 12
-1 (X) = 1
2 O
O
0
0
N
,
R O
2
N X 1n
0
N
1
0
X11
0
0
1
N 0
X12
O
O
0
0
,F=
R O
1
N
2
X 1n
N 0
.
R O
1
N
X1n
The transformation in effect scales all variables by X1 .
4) Consider the multiple regression model from Chapter 5, where k = 2 and the assumptions of the multiple
regression model hold.
(a) Show what the X matrix and the vector would look like in this case.
(b) Having collected data for 104 countries of the world from the Penn World Tables, you want to estimate the
effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from
1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. What are your expected signs for the regression
coefficient? What is the order of the (X X) here?
(c) You are asked to find the OLS estimator for the intercept and slope in this model using the formula
X X)-1 X Y. Since you are more comfortable in inverting a 2×2 matrix (the inverse of a 2×2 matrix is,
^
=(
1
a b -1 =
d -b )
ad
bc
c d
-c a
you decide to write the multiple regression model in deviations from mean form. Show what the X matrix, the (
X X) matrix, and the X Y matrix would look like now.
(Hint: use small letters to indicate deviations from mean, i.e., zi = Zi - Z and note that
^
Yi = 0 +
Y=
^
0+
^
^
1 X1i +
1 X1 +
^
^
^
2 X2i + ui
2 X2 .
Subtracting the second equation from the first, you get
Stock/Watson 2e -- CVC2 8/23/06 -- Page 423
^
^
^
y i = 1 x 1i + 2 x 2i + ui)
(d) Show that the slope for the population growth rate is given by
n
^
1=
n
y ix 1i
2
x 2i -
n
2
x 1i
i=1
n
n
y ix 2i
i=1
i=1
i=1
n
x 1ix 2i
i=1
n
2
x 2i - (
x 1ix 2i )2
i=1
i=1
(e) The various sums needed to calculate the OLS estimates are given below:
n
2
x 1i = .0122;
i=1
i=1
n
n
2
y i = 8.3103;
y ix 1i = -0.2304;
i=1
n
n
2
x 2i = 0.6422
i=1
y ix 2i = 1.5676;
i=1
n
x 1ix 2i = -0.0520
i=1
Find the numerical values for the effect of population growth and the saving rate on per capita income and
interpret these.
(f) Indicate how you would find the intercept in the above case. Is this coefficient of interest in the
interpretation of the determinants of per capita income? If not, then why estimate it?
Answer: (a)
1 X 11 X21
X 12 X22
X= 1
, and
... ...
...
1 X 1n X2n
0
=
1
2
(b) You would expect the population growth rate to have a negative coefficient, and the saving rate to
have a positive coefficient. The order of X X is 3×3.
n
n
n
2
y ix 1i
x 1ix 2i
x
1i
x 11 x21
i=1
i=1
i=1
x 12 x22
(c) X =
, X X= n
,
X
X
.
=
n
n
2
... ...
x 1ix 2i
y ix 2i
X 2i
x 1n x2n
i=1
i=1
i=1
(d)
n
i=1
n
i=1
2
x 1i
x 1ix 2i
n
i=1
n
i=1
-1
n
x 1ix 2i
2
x 2i
=
2
x 2i
-
n
x 1ix 2i
i=1
i=1
n
n
2
n
n
n
x 1ix 2i
2
2
x 1i
x 1ix 2i )2 x 1i - (
x 1i
i=1
i=1
i=1
i=1
i=1
1
n
Post multiplying this expression with
i=1
n
.
y ix 1i
^
results in the two least squares estimators
y ix 2i
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 424
^
1 =
2
n
y ix 1i
i=1
n
i=1
n
i=1
2
x 1i
n
i=1
x 1ix 2i
x 1ix 2i )2
i=1
n
y ix 1i
i=1
n
n
i=1
2
x 2i - (
2
x 1i -
i=1
n
y ix 2i
i=1
y ix 2i
i=1
n
2
x 1i
n
i=1
i=1
n
n
2
x 2i -
2
x 2i - (
n
x 1ix 2i
^
, and hence gives the formula for 1 .
i=1
n
x 1ix 2i )2
i=1
(e)
^
^
-0.2304×0.6422-(1.5676×(-0.0520))
0.0122×0.6422-(-0.0520)2
1 =
1.5676×0.0122- ((-0.2304)× (-0.0520)2
2
0.0122×0.6422- (-0.0520)2
= -12.953 .
1.393
A reduction of the population growth rate by one percent increases the per capita income relative to the
United States by roughly 0.13. An increase in the saving rate by ten percent increases per capita income
relative to the United States by roughly 0.14.
(f) The first order condition for the OLS estimator in the case of k = 2 is
n
n
n
Yi = n ^0 + ^1
X1i + ^2
X2i , which, after dividing by n, results in ^1 = Y - ^1 X1 - ^2 X2 . The
i=1
i=1
i=1
intercept is only of interest if there are observations close to the origin, which is not the case here. If it is
set to zero, then the regression is forced through the origin, instead being allowed to choose a level.
5) In Chapter 10 of your textbook, panel data estimation was introduced. Panel data consist of observations on the
same n entities at two or more time periods T. For two variables, you have
(Xit, Yit), i = 1,..., n and t = 1,..., T
where n could be the U.S. states. The example in Chapter 10 used annual data from 1982 to 1988 for the fatality
rate and beer taxes. Estimation by OLS, in essence, involved “stacking” the data.
(a) What would the variance-covariance matrix of the errors look like in this case if you allowed for
homoskedasticity-only standard errors? What is its order? Use an example of a linear regression with one
regressor of 4 U.S. states and 3 time periods.
(b) Does it make sense that errors in New Hampshire, say, are uncorrelated with errors in Massachusetts
during the same time period (“contemporaneously”)? Give examples why this correlation might not be zero.
(c) If this correlation was known, could you find an estimator which was more efficient than OLS?
Answer: (a) Under the extended least least squares assumptions, E(UU X) =
2
u In.
In the above example of 4 U.S. states and 3 time periods, the identity matrix will be of order 12 ×12, or
(nT) × (nT) in general. Specifically
.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 425
(b) It is reasonable to assume that a shock to an adjacent state would have an effect on its neighboring
state, particularly when the shock affects the larger of the two such as the case in Massachusetts. Other
examples may be Texas and Arkansas, Michigan and Indiana, California and Arizona, New York and
New Jersey, etc. A negative oil price shock, which affects the demand for automobiles produced in
Michigan, will have repercussions for suppliers located not only in Michigan, but also elsewhere.
(c) In case of a known variance-covariance matrix of the error terms, the GLS estimator
^ GLS
= (X -1 X)-1 (X -1 Y) could be used. The variance-covariance matrix would be of the form
(There is a subtle issue here for the case of a feasible GLS estimator, where the variances and covariances
have to be estimated. It can be shown, in that case, that the GLS estimator does not exist unless n T,
which is not the case for most panels. It is easier to see that the variance-covariance matrix is singular for
n>T if the data is stacked by time period.)
18.3 Mathematical and Graphical Problems
1) Your textbook derives the OLS estimator as
^
= (X X)-1 X Y.
Show that the estimator does not exist if there are fewer observations than the number of explanatory variables,
including the constant. What is the rank of X X in this case?
Answer: In order for a matrix to be invertible, it must have full rank. Since X X is of order (k + 1) × (k + 1), then in
order to invert X X , it must have rank (k+1). In the case of a product such as X X, the rank is less than or
equal to the rank of X or X, whichever is smaller. X is of order n × (k + 1), and assuming that there is no
perfect multicollinearity, will have either rank n or rank (k+1), whichever is the smaller of the two. Hence
if there are fewer observations than the number of explanatory variables (including the constant), then
the rank of X will be n(< k+1), and the rank of X X is also n( < k +1). Hence X X does not have full rank,
and therefore cannot be inverted. The OLS estimator does not exist as a result.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 426
2) Assume that the data looks as follows:
Y1
Y=
Y2
O
Yn
,U=
u1
u2
O
un
, X=
X11
X12
O
X1n
, and
Using the formula for the OLS estimator
^
= ( 1)
^
= (X X)-1 X Y, derive the formula for 1 , the only slope in this
“regression through the origin.”
Answer: In this case, X Y =
n
X1iYi , and X X =
2
X 1i . Hence
i=1
i=1
n
^
n
i=1
= (X X)-1 X Y =
n
X1iY1
.
2
X 1i
i=1
3) Write the following three linear equations in matrix format Ax = b, where x is a 3×1 vector containing q, p, and
y, A is a 3×3 matrix of coefficients, and b is a 3×1 vector of constants.
q = 5 +3 p – 2 y
q = 10 – p + 10 y
p=6y
p
5
-3 1 2
-3 1 2
Answer: A = 1 1 -10 , x = q , b = 10 or 1 1 -10
y
0
1 0 -6
1 0 -6
-2
3
4) Let Y = 10 and X =
2
2
1
1
1
1
1
p
5
q = 10 .
y
0
0
1
3
-1
2
Find X X, X Y, (X X)-1 and finally (X X)-1 X Y.
Answer: X X = 5 5 , X Y = 15 , (X X)-1 = 0.3 -0.1 , and (X X)-1 X Y = 1 .
5 15
35
2
-0.1 0.1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 427
5) A =
a11 a12
a21 a22
,B=
b11 b12
c c c
, and C = 11 12 13
b21 b22
c21 c22 c23
show that (A+B) = A + B and (AC) = C A .
Answer: (A+B) =
a11 + b11
a12 + b12
a21 + b21
a22 + b22
a11 a21
A =
a12 a22
a11 + b11
a21 + b21
a12 + b12
a22 + b22
,
b11 b21
a +b
a +b
, A + B = 11 11 21 21 .
b12 b22
a12 + b12 a22 + b22
,B=
a11c11 + a12 c21
a21c11 + a22 c21
(AC) =
, (A+B) =
a11c11 + a12 c21
(AC) = a11c13 + a12 c22
a11c13 + a12 c23
a11c12 + a12 c22
a21c12 + a22 c22
a11c13 + a12 c23
a21c13 + a22 c23
a21c11 + a22 c21
c11
a21c12 + a22 c22 , C = c12
a21c13 + a22 c23
c13
a11c11 + a12 c21
C A = a11c13 + a12 c22
a11c13 + a12 c23
c21
a11 a12
,
c22 , A =
a21 a22
c23
a21c11 + a22 c21
a21c12 + a22 c22 .
a21c13 + a22 c23
6) Write the following four restrictions in the form R = r, where the hypotheses are to be tested simultaneously.
3 = 2 5,
1 + 2 = 1,
4 = 0,
2 = - 6.
Can you write the following restriction 2 = -
3
1
in the same format? Why not?
0
0
Answer: 0
0
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
0
-2
0
0
0
0
0
0
1
1
0
1 .
3 =
0
4
0
5
2
6
The restriction 2 = -
3
1
cannot be written in the same format because it is nonlinear.
^
7) Using the model Y = X + U, and the extended least squares assumptions, derive the OLS estimator . Discuss
the conditions under which X X is invertible.
Answer: The derivation copies the relevant parts of section 16.1 of the textbook. The model is Y = X + U, where Y
u1
Y1
1 X11 N Xk1
0
u2
X
X
, X = 1 12 N k2 , and = 1 .
O
O
O
O O R O
un
Yn
X
X
k
1 1n N kn
Y is the n×1 dimensional vector of n observations on the dependent variable, X is the n×(k + 1)
=
Y2
,U=
Stock/Watson 2e -- CVC2 8/23/06 -- Page 428
dimensional matrix of n observations on the k+1 regressors (including the “constant” regressor for the
intercept), U is the n×1 dimensional vector of the n error terms, and is the (k+1)×1 dimensional vector
of the k+1 unknown regression coefficients.
The extended least squares assumptions are:
E(ui Xi) = 0 (ui has conditional mean zero);
(Xi,Yi), i = 1, ..., n are independently and identically distributed (i.i.d.) draws from their joint
distribution;
Xi and ui have nonzero finite fourth moments.
X has full column rank (there is no perfect multicollinearity);
var(ui Xi) =
2
u (homoskedasticity);
the conditional distribution of ui given Xi is normal (normal errors).
The OLS estimator minimizes the sum of squared prediction mistakes,
n
(Yi- b0 - b1 X1i - ... - bkXki)2
i=1
The derivative of the sum of squared prediction mistakes with respect to the jth regression coefficient,
bj, is
n
n
Xji(Yi- b0 - b1 X1i - ... - bkXki) for j = 0, ..., k, where, for
i=1
i=1
j = 0, X0i = 1 for all i. The formula for the OLS estimator is obtained by taking the derivative of the sum
bj
(Yi- b0 - b1 X1i - ... - bkXki)2 = -2
of squared prediction mistakes with respect to each element of the coefficient vector, setting these
^
derivatives to zero, and solving for the estimator . The derivative on the right-hand side of above
equation is the jth element of the k+1 dimensional vector, –2X (Y – Xb), where b is the k+1 dimensional
vector consisting of b0 ,…, bk. There are k+1 such derivatives, each corresponding to an element of b.
Combined, these yield the system of k+1 equations that constitute the first order conditions for the OLS
^
estimator that, when set to zero, define the OLS estimator . That is,
equations,
X (Y – X
^
^
^
solves the system of k+1
)= 0 k+1 ,
or, equivalently, X Y = X X . Solving this system of equations yields the OLS estimator
form:
^
^
= in matrix
= (X X ) -1 X Y ,
where (X X ) -1 is the inverse of the matrix X X.
X X is invertible as long as it has full rank. This requires that there are more observations than regressors
(including the constant), and that there is no perfect multicollinearity among the regressors.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 429
8) Prove that under the extended least squares assumptions the OLS estimator
^
is unbiased and that its
2
-1
u (X X) .
variance-covariance matrix is
Answer: Start the proof by relating the OLS estimator to the errors
^
= (X X)-1 X Y = (X X)-1 X (X + U) =
+ (X X)-1 XU.
To prove the unbiasedness of the OLS estimator, take the conditional expectation of both sides of the
expression.
^
E( X) =
+ E[(X X)-1 X U X] =
+ (X X)-1 X E(U X)
Since E(U X) = 0 (from extended least squares assumptions 1 and 2),
^
E( X) = .
^
^
^
To find the variance-covariance matrix var( X) = E[( - )( -
X], we have
E[(X X)-1 X UU X(X X)-1 X] = (X X)-1 X E(UU X)X(X X)-1 ,
and following the extended least squares assumptions 1, 2, and 5,
^
var( X) =
2
-1
-1
u (X X) X X(X X) =
2
-1
u (X X) .
^
9) For the OLS estimator = (X X)-1 X Y to exist, X X must be invertible. This is the case when X has full rank.
What is the rank of a matrix? What is the rank of the product of two matrices? Is it possible that X could have
rank n? What would be the rank of X X in the case n<(k+1)? Explain intuitively why the OLS estimator does not
exist in that situation.
Answer: The rank of a matrix is the maximum number of linearly independent rows or columns. In general, in the
case of a rectangular matrix, the maximum number of linearly independent columns is also equal to the
maximum number of linearly independent rows. In the case of X, it can be, at most, either n or (k+1),
whichever is smaller. The rank of product of two matrices will be, at most, the minimum of the rank of
the two matrices of the product. In the case of X X, both matrices will have, at most, either rank n or
(k+1), whichever is smaller. Since X X is a square matrix of order (k+1)×(k+1), it must have full rank in
order to be invertible. In the absence of perfect multicollinearity, the rank will be (k+1) as long as (k+1)
n. If there are fewer observations than regressors (including the constant), then the rank will be n. Except
for the special case where there are exactly as many observations as regressors (including the constant),
X X will not have full rank in this case, and cannot be inverted. Intuitively you have to have as many
independent equations as there are unknowns to find a unique solution. This is not the case when you
have n<(k+1).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 430
10) In order for a matrix A to have an inverse, its determinant cannot be zero. Derive the determinant of the
following matrices:
A=
3 6
-2 1
1 -1 2
B= 1 0 3
4 0 2
X X where X = (1 10)
Answer: det (A) =15, det (B) = -10, det (X X) = 0.
11)
Your textbook shows that the following matrix (Mx = In - Px ) is a symmetric idempotent matrix.
1
1
Consider a different Matrix A, which is defined as follows: A = I and = 1
n
...
1
a. Show what the elements of A look like.
b. Show that A is a symmetric idempotent matrix
c. Show that A = 0.
^
^
^
d. Show that AU= U , where U is the vector of OLS residuals from a multiple regression.
1
Answer: a. A = 0
...
0
0
1
...
0
...
...
...
...
0
1
0 - 1 1
n ...
...
1
1
1
1
...
1
...
...
...
...
1
1-1/n -1/n ...
1 = -1/n 1-1/n ...
...
...
...
...
1
-1/n -1/n ...
-1/n
-1/n
...
1-1/n
1-1/n -1/n ... -1/n
1-1/n -1/n ... -1/n
1-1/n
...
-1/n
-1/n
b. A =
= -1/n 1-1/n ... -1/n = A
...
...
... ...
...
...
... ...
-1/n -1/n ... 1-1/n
-1/n -1/n ... 1-1/n
1
1
1
1
1
A×A =( I )×(I ) = (I )
+
n
n
n
n
n2
1 1 ... 1
1 1 ... 1
1
1
...
1
But
× 1 1 ... 1 =
=
... ... ... ...
... ... ... ...
1 1 ... 1
1 1 ... 1
n n ... n
1 1 ... 1
1 1 1 ... 1
1
1 n n ... n
=
=
n ... ... ... ...
n
n 2 ... ... ... ...
n n ... n
1 1 ... 1
This means that the last two terms in the
idempotent.
c. A = ( I ^
1
n
d. AU = ( I -
1
n
) = -
1
n
^
^
)U=U-
= 0 since
1
n
^
^
n
n
...
n
n
n
...
n
...
...
...
...
n
n , and
...
n
above equation cancel each other, and therefore A×A = A, that is,
'=n
^
U = U since U = 0
Stock/Watson 2e -- CVC2 8/23/06 -- Page 431
12) Write down, in general, the variance-covariance matrix for the multiple regression error term U. Using the
assumptions cov(u i ,uj|XiXj) = 0 and var(u i|Xi) =
as
2
u . Show that the variance-covariance matrix can be written
2
u In .
u1
Answer: (var-cov)(
u2
...
un
2
u1
=E(
|X) = E(
u1 -E(u1 ) u1 -E(u1 )
u2 -E(u2 ) u2 -E(u2 )
...
...
un -E(un ) un -E(un )
2
u 0
u1 u2 ... u1 un
u2 u1 u 2
2
... u2 un
...
... ...
...
|X) =
un u1 un u2 ... u 2
n
u1
|X) = E(
...
un
u1 u2 ... un |X
... 0
2
u ... 0
0
u2
...
...
... ...
0
0
...
=
2
u In
2
u
1
13) Consider the following symmetric and idempotent Matrix A: A = I n
1
and = 1
...
1
a.
Show that by postmultiplying this matrix by the vector Y (the LHS variable of the OLS regression),
you convert all observations of Y in deviations from the mean.
b.
Derive the expression Y’AY. What is the order of this expression? Under what other name have
you encountered this expression before?
Answer: a. Note that
1
Y = Y. Given this result, then if you pre multiply Y with A, you get
n
Y1 -Y
AY = ( I
1
n
)Y=Y
Y=
Y2 -Y
...
Yn -Y
b. Note that Y’A’AY = Y’AAY = Y’AY =
. This is a scalar which is called the variation in Y or the
GR:iem2s:K40062003
Total Sums of Squares (TSS).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 432
14) Consider the following population regression function: Y = X + U
Y1
1 X1
Y
where Y= 2 , X= 1 X2 ,
...
... ...
Yn
1 Xn
=
u1
u2
0 , U=
...
1
un
Given the following information on population growth rates (Y) and education (X) for 86 countries
n
n
n
n
n
2
2
Yi = 1.594 ,
Xi = 449.6 ,
Y i = 0.03982 ,
X i = 3,022.76 ,
XiYi = 6.4697
i=1
i=1
i=1
i=1
i=1
a)
b)
find X X, X Y, (X X)-1 and finally (X X)-1 X Y.
Interpret the slope and, if necessary, the intercept.
n
Xi
n
Answer: a. X X =
n
i=1
n
X Y==
i=1
n
Xi
i=1
n
2
Xi
449.6
= 86
449.6 3022.76
i=1
Yi
Xi Yi
= 1.594
6.4697
i=1
(X X)-1 =
3022.76 -449.6
86×3022.76 - 449.6 2 -449.6 866
1
(X X)-1 X Y = 0.0331
-0.0028
b. According to these results, five more years of education will lower population growth rates by roughly
one percent.
15) You have obtained data on test scores and student -teacher ratios in region A and region B of your state. Region
B, on average, has lower student-teacher ratios than region A. You decide to run the following regression
Yi =
0
+
1 X1i
+
2 X2i
+
3 X3i
+ ui
where X1 is the class size in region A, X2 is the difference between the class size between region A and B, and X 3
is the class size in region B. Your regression package shows a message indicating that it cannot estimate the
above equation. What is the problem here and how can it be fixed? Explain the problem in terms of the rank of
the X matrix.
Answer: There is perfect multicollinearity here, in that X2 = X 1 -X3 , hence the X matrix (and the X’X) matrix does
not have full rank (rank = 3 here, not 4). If the X’X is singular, you cannot invert it, since its determinant
is zero. Dropping one of the three explanatory variables allows you to estimate the above equation.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 433
Download