Sample exam questions.
1. Using data from the Survey of Consumers Finances between 1983 and 2007 (the surveys are done every 3 years), I used OLS to examine the determinants of a household’s credit card debt. Credit card debt was converted into 1983 dollars using the CPI. The results from two specifications are presented in the table below.
Coefficients are from an OLS regression of a household’s real credit card balance. Numbers in parentheses are t-statistics.
Year dummies (1983 excluded) (1) (2)
1986 309.5 308.1
(3.666) (3.655)
1989 266.8
(4.404)
276.9
(4.577)
1992
1995
318.8
(5.367)
518.5
(8.800)
351.6
(5.927)
558.9
(9.492)
1998
2001
2004
2007
749.1
(12.71)
657.9
(11.19)
839.6
(14.28)
1290
(21.88)
800.6
(13.59)
706.9
(12.02)
894.6
(15.22)
1343
(22.78)
Real income
Age
Years of education
Black
-4.31e-05
(-15.28)
-11.79
(-21.48)
68.40
(22.46)
-211.0
(-6.940)
Marital status dummies (never married omitted)
Married
Widowed
Divorced
Constant
Observations
--
--
--
40.34
(0.556)
152055
-4.58e-05
(-16.22)
-14.01
(-22.32)
58.64
(18.93)
-92.56
(-3.000)
550.6
(20.69)
200.2
(4.569)
340.2
(10.28)
-168.6
(-2.292)
152055
1
Sample exam questions. a. Based upon specification (1), if the omitted year dummy was switched from 1983 to 2007, what would be the new estimate of i. the coefficient on the constant term? ii. the coefficient on the 2004 dummy? b. When controls for marital status are added to the regression, the coefficient on the black dummy drops from -211 to -92.56. What does this tell you about the relationship between marital status and race? Justify your conclusion. c. Suppose you want to test the hypothesis that, controlling for observed characteristics, credit card balances are significantly different in the last three years of the sample (2001, 2004, and 2007). Explain how you could construct a Wald test of this hypothesis (not an F-test). In your answer, define the matrices that you use in describing the construction of the Wald statistic, the degrees of freedom for the test statistic, and the values of the test statistic that would lead you to reject the null hypothesis. In writing out your matrices, assume that the order in the coefficient vector (call it B) and the corresponding variance-covariance matrix (call it V) matches that given in the above table (i.e. coefficient on 1986 year dummy is first element of B, coefficient on intercept is last element in B, etc.) d. Suppose that many people in the SCF had their income “imputed” (i.e. estimated) and thus income is reported with some error, explain how this would bias i. the estimated effect of income on credit card debt. ii. the estimated effect of education on credit card debt.
Provide the basis for your conclusions about the direction of the biases. e. Suppose that you want to test whether the effect of income on credit card debt differs by marital status. Explain how you could test this hypothesis. Describe the regression you would estimate and the statistic you would calculate to test the hypothesis. Be sure to describe the distribution of the test statistic (including the degrees of freedom) and the conditions under which the null hypothesis would be rejected. f. Suppose that the errors in the credit card debt equation are correlated among people that live in the same state. If the standard error of the coefficients are corrected for clustering by state, are the tstatistics for the coefficients likely to rise or fall? Which variables are likely to experience the greatest change in the t-statistic? Explain what would determine the size of the change in the t-statistic for a specific variable.
2
Sample exam questions.
2. Using OLS, I estimated a OLS regression model of the determinants of real pension income using CPS data for people aged 65 and over. The regression includes an intercept term and dummies reflect age, marital status, and race.
Coefficient Std. Err.
t-statistic
Intercept
Female
Age dummies (65-75 omitted)
Age 76-85
Age 86 or over
Marital status dummies (single omitted)
Married
Widowed
R ace dummies (white non-H ispanic omitted)
Black
Hispanic
6969.901
62.24044
-4577.719
40.21796
-984.0998
40.6494
-1515.369
71.27117
-143.5389
60.20101
-531.9046
63.50952
-902.5294
68.50944
-1063.545
192.058
111.98
-113.82
-24.21
-21.26
-2.38
-8.38
-13.17
-5.54
a. Based on the estimates above, what is the predicted value of real pension income for a single, white, non-Hispanic female? b. If the regression was re-estimated with the dummy variable for age 65-75 included and the dummy for age 86 and over excluded, what would be the estimated coefficient for i. the intercept. ii. the age 65-75 dummy. iii. the age 76-85 dummy. c. Suppose that you wanted to test the null hypothesis that the effect of age (captured by the age group dummies) on pension income is identical for men and women. Carefully explain how you could test this hypothesis. Be precise about what regression you would run, how you would form a relevant test statistic and the distribution of the test statistic. Under what conditions would you reject the null? d. Suppose that you believe that the variance of the residual in the pension income equation varies with gender and race. Carefully explain the two steps of an iterative GLS process that would allow you to correct for this form of heteroskedasticity. Be precise about the steps in your process, how the GLS estimator is formed, and how the variance of the GLS estimator is computed.
3
Sample exam questions.
3. In a 1997 study,
1
David Card examined the role of regulatory reform in the airline industry on the wages of airline workers. He uses wage data and personal characteristics of workers inside and outside of the airline industry. a . In one regression, Card regresses the log of hourly wages on an intercept and a dummy indicating whether the worker is in the airline industry. The coefficient on the airline dummy was .458 and the standard error on the coefficient was .004. What is the economic interpretation of the coefficient .458? b.
In a second regression, Card regresses the hourly wage on the airline dummy and adds controls for worker education and experience. The coefficient on the airline dummy drops from .458 to .310. Why do you think the coefficient on the airline dummy fell? Be sure to give details on the relationship between relevant variables that would cause the coefficient to move in the observed direction. Also, provide the econometric basis for your conclusion.
Suppose that in Card’s data, experience is measured with error. Moreover, suppose that men, on average, have more experience than women. c.
If the Card regression in (b) includes a dummy variable indicating whether a person is male along with the experience variable, how will the measurement error affect the coefficient on experience? Justify your answer.
d . how will the measurement error in the experience variable affect the coefficient on the male dummy? Justify your answer.
Card was interested in how deregulation of the airline industry affected the wages of airline workers relative to the rest of the population. To test for the effect of deregulation, he had data for years before and after deregulation occurred and defined a dummy variable (call it DEREG) that was unity after deregulation and zero before. e . Explain how Card could test whether the DEREG period caused airline wages to fall relative to other workers’ wages. Be sure to define any additional variables that would have to be created, the regression that would have to be estimated, and the appropriate test statisticfor the null hypothesis. f. Explain how Card could test whether the DEREG period caused the wages of airline pilots to fall more than wages of other airline workers. Be sure to define any additional variables that would have to be created, the regression that would have to be estimated, and the appropriate test statistic for the null hypothesis.
1 David Card, “Deregulation and Labor Earnings in the Airline Industry,” in Regulatory Reform and Labor
Markets , edited by James Peoples, Kluwer Publishers 1997.
4
Sample exam questions.
4. The regression analysis below was performed using STATA. The variable definitions are as follows: smokpaks age educyrs male
= number of packs of cigarettes smoked per month;
=years of age;
=years of education;
=1 if person is male,=0 otherwise.
. sort male;
. by male: summ smokpaks age educyrs;
-> male = 0
Variable Obs Mean age 1605 55.69782 educyrs 1605 11.63614
-> male = 1
Variable Obs Mean age educyrs
5418 56.95847
5418 11.99022
. reg smokpaks age educyrs if male==1; smokpaks Coef. age -.2596912
Std. Err.
.0324445 educyrs
_cons
-.3802783
25.94481
.0490776
2.012668
. reg smokpaks age educyrs if male==0; smokpaks Coef. Std. Err. age educyrs
-.1520891
-.1402861
.0797061
.0787958
_cons 15.7352 4.652143 3.38 t
-8.00
-7.75
12.89 t
-1.91
-1.78
Use the above results to answer the following questions. a. What is the predicted level of smoking for a 40 year old male with 12 years of education? Explain. b. What is the difference between the average level of smoking between men and women? Explain. c. How much of the gender difference in smoking can be “explained” by the differences in education? Explain. d. How much of the gender difference in smoking can be “explained” by age and education combined? Explain e. Suppose that married people are less likely to smoke and notice that no control for marriage is included in the regression. Will this cause the portion of the gender gap in smoking that is “explained” by age differences to be over- or under-stated? Explain and be sure to outline any assumptions that are necessary to determine the answer.
5