Chapter 1: Economic Questions and Data Multiple Choice for Web 1) Analyzing the behavior of unemployment rates across U.S. states in March of 2010 is an example of using a. b. c. d. 2) Studying inflation in the United States from 1970 to 2010 is an example of using a. b. c. d. 3) time series data. panel data. cross-sectional data. experimental data. randomized controlled experiments. time series data. panel data. cross-sectional data. Analyzing the effect of minimum wage changes on teenage employment across the 48 contiguous U.S. states from 1980 to 2010 is an example of using a. time series data. b. panel data. c. having a treatment group vs. a control group, since only teenagers receive minimum wages. d. cross-sectional data. 4) Econometrics can be defined as follows with the exception of a. b. c. d. 5) the science of testing economic theory. fitting mathematical economic models to real-world data. a set of tools used for forecasting future values of economic variables. measuring the height of economists. The accompanying graph U.S. Unemployment Rate in % 12 10 8 6 4 2 65 70 75 80 85 90 95 Year is an example of a. b. c. d. 6) experimental data. cross-sectional data. a time series. longitudinal data. One of the primary advantages of using econometrics over typical results from economic theory, is that a. it potentially provides you with quantitative answers for a policy problem rather than simply suggesting the direction (positive/negative) of the response. b. teaching you how to use statistical packages. c. learning how to invert a 4 by 4 matrix. d. all of the above. 7) In a randomized controlled experiment a. b. c. d. 8) The reason why economists do not use experimental data more frequently is for all of the following reasons except that real-world experiments a. b. c. d. 9) you control for the effect that random numbers are not truly randomly generated there is a control group and a treatment group. you control for random answers. the control group receives treatment on even days only. with humans are difficult to administer. are often unethical. cannot be executed in economics. have flaws relative to ideal randomized controlled experiments. The most frequently used experimental or observational data in econometrics are of the following type: a. b. c. d. 8 In the graph below, the vertical axis represents average real GDP growth for 65 countries over the period 1960-1995, and the horizontal axis shows the average trade share within these countries. Korea, Republic of Taiwan, China 6 Malta Cyprus Thailand Japan 0 growth 2 4 Malaysia Portugal Ireland Greece Norway Mauritius Iceland Brazil Spain Italy Finland Austria Pakistan IsraelSri Lanka Dominican RepublicBelgium Germany France Canada Ecuador Paraguay Colombia Netherlands Denmark Panama Australia Mexico United Kingdom Kenya India Sweden United States Costa Rica Chile Switzerland Philippines New Trinidad Zealand andPapua Tobago New Guinea Uruguay Guatemala Zimbabwe Bangladesh Argentina Honduras Jamaica South Africa Bolivia Togo Peru Sierra Leone ElSenegal Salvador Haiti Venezuela Ghana -2 10) randomly generated data. time series data. panel data. cross-sectional data. Niger Zaire 0 This is an is an example of a. b. c. d. experimental data. cross-sectional data. a time series. longitudinal data. .5 1 tradeshare 1.5 2 Chapter 4: Linear Regression with One Regressor Multiple Choice for the Web 1) Binary variables a. b. c. d. are generally used to control for outliers in your sample. can take on more than two values. exclude certain individuals from your sample. can take on only two values. 2) In the simple linear regression model, the regression slope a. b. c. d. 3) The regression R2 is a measure of a. b. c. d. 4) whether or not X causes Y. the goodness of fit of your regression line. whether or not ESS > TSS. the square of the determinant of R. In the simple linear regression model Yi 0 1 X i ui , a. b. c. d. 5) indicates by how many percent Y increases, given a one percent increase in X. when multiplied with the explanatory variable will give you the predicted Y. indicates by how many units Y increases, given a one unit increase in X. represents the elasticity of Y on X. the intercept is typically small and unimportant. 0 1 X i represents the population regression function. the absolute value of the slope is typically between 0 and 1. 0 1 X i represents the sample regression function. E(ui | Xi) = 0 says that a. b. c. d. dividing the error by the explanatory variable results in a zero (on average). the sample regression function residuals are unrelated to the explanatory variable. the sample mean of the Xs is much larger than the sample mean of the errors. the conditional distribution of the error given the explanatory variable has a zero mean. 1 6) Assume that you have collected a sample of observations from over 100 households and their consumption and income patterns. Using these observations, you estimate the following regression Ci 0 1Yi ui , where C is consumption and Y is disposable income. The estimate of 1 will tell you Income Consumption b. The amount you need to consume to survive Consumption c. Income Consumption d. Income a. 7) In which of the following relationships does the intercept have a real-world interpretation? a. the relationship between the change in the unemployment rate and the growth rate of real GDP (“Okun’s Law”) b. the demand for coffee and its price c. test scores and class-size d. weight and height of individuals The OLS residuals, uˆi , are sample counterparts of the population 8) a. b. c. d. 9) regression function slope errors regression function’s predicted values regression function intercept Changing the units of measurement, e.g. measuring test scores in 100s, will do all of the following EXCEPT for changing the a. b. c. d. residuals numerical value of the slope estimate interpretation of the effect that a change in X has on the change in Y numerical value of the intercept 2 10) To decide whether the slope coefficient indicates a “large” effect of X on Y, you look at the a. b. c. d. size of the slope coefficient regression R 2 economic importance implied by the slope coefficient value of the intercept 3 Chapter 5: Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Multiple Choice for the Web 1) The t-statistic is calculated by dividing a. the OLS estimator by its standard error. b. the slope by the standard deviation of the explanatory variable. c. the estimator minus its hypothesized value by the standard error of the estimator. d. the slope by 1.96. 2) Imagine that you were told that the t-statistic for the slope coefficient of the 698.9 2.28 STR was 4.38. What are the units of regression line TestScore measurement for the t-statistic? a. b. c. d. 3) The 95% confidence interval for 1 is the interval c. ( 1 1.96SE ( 1 ), 1 1.96SE ( 1 )) 1.645SE ( ), 1.645SE ( )) ( 1 1 1 1 ( 1.96SE ( ), 1.96SE ( )) d. 1.96, 1.96) ( 1 1 a. b. 4) points of the test score. number of students per teacher. TestScore STR standard deviations 1 1 1 1 A binary variable is often called a a. b. c. d. dummy variable dependent variable residual power of a test 1 5) If the errors are heteroskedastic, then a. b. OLS is BLUE WLS is BLUE if the conditional variance of the errors is known up to a constant factor of proportionality LAD is BLUE if the conditional variance of the errors is known up to a constant factor of proportionality OLS is efficient c. d. 6) Using the textbook example of 420 California school districts and the regression of test scores on the student-teacher ratio, you find that the standard error on the slope coefficient is 0.51 when using the heteroskedasticity robust formula, while it is 0.48 when employing the homoskedasticity only formula. When calculating the t-statistic, the recommended procedure is to a. b. c. d. 7) use the homoskedasticity only formula because the t-statistic becomes larger first test for homoskedasticity of the errors and then make a decision use the heteroskedasticity robust formula make a decision depending on how much different the estimate of the slope is under the two procedures Consider the estimated equation from your textbook = 698.9 - 2.28 STR, R2 = 0.051, SER = 18.6 TestScore (10.4) (0.52) The t-statistic for the slope is approximately a. b. c. d. 8) 4.38 67.20 0.52 1.76 You have collected data for the 50 U.S. states and estimated the following relationship between the change in the unemployment rate from the previous year ( ur ) and the growth rate of the respective state real GDP (gy). The results are as follows ur = 2.81 – 0.23 gy, R2= 0.36, SER = 0.78 (0.12) (0.04) 2 Assuming that the estimator has a normal distribution, the 95% confidence interval for the slope is approximately the interval a. b. c. d. 9) [2.57, 3.05] [-0.31,0.15] [-0.31, -0.15] [-0.33, -0.13] Using 143 observations, assume that you had estimated a simple regression function and that your estimate for the slope was 0.04, with a standard error of 0.01. You want to test whether or not the estimate is statistically significant. Which of the following decisions is the only correct one: a. you decide that the coefficient is small and hence most likely is zero in the population b. the slope is statistically significant since it is four standard errors away from zero c. the response of Y given a change in X must be economically important since it is statistically significant d. since the slope is very small, so must be the regression R2. 10) You extract approximately 5,000 observations from the Current Population Survey (CPS) and estimate the following regression function: AHE = 3.32 – 0.45 Age, R2= 0.02, SER = 8.66 (1.00) (0.04) where AHE is average hourly earnings, and Age is the individual’s age. Given the specification, your 95% confidence interval for the effect of changing age by 5 years is approximately a. b. c. d. [$1.96, $2.54] [$2.32, $4.32] [$1.35, $5.30] cannot be determined given the information provided 3 Chapter 6: Linear Regression with Multiple Regressors Multiple Choice for the Web 1) In the multiple regression model, the adjusted R2, R 2 a. b. c. d. 2) cannot be negative. will never be greater than the regression R2. equals the square of the correlation coefficient r. cannot decrease when an additional explanatory variable is added. If you had a two regressor regression model, then omitting one variable which is relevant a. will have no effect on the coefficient of the included variable if the correlation between the excluded and the included variable is negative. b. will always bias the coefficient of the included variable upwards. c. can result in a negative value for the coefficient of the included variable, even though the coefficient will have a significant positive effect on Y if the omitted variable were included. d. makes the sum of the product between the included variable and the residuals different from 0. 3) Under the least squares assumptions for the multiple regression problem (zero conditional mean for the error term, all Xi and Yi being i.i.d., all Xi and ui having finite fourth moments, no perfect multicollinearity), the OLS estimators for the slopes and intercept a. have an exact normal distribution for n > 25. b. are BLUE. c. have a normal distribution in small samples as long as the errors are homoskedastic. d. are unbiased and consistent. 4) The following OLS assumption is most likely violated by omitted variables bias: a. b. c. d. E (ui | X i ) 0 ( X i , Yi ), i 1,..., n are i.i.d draws from their joint distribution there are no outliers for X i , ui there is heteroskedasticity 1 2 The adjusted R 2 , or R , is given by 5) n 2 SSR n k 1 TSS n 1 ESS b. 1 n k 1 TSS n 1 SSR c. 1 n k 1 TSS ESS d. TSS a. 1 6) Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependent variable. When omitting X2 from the regression, there will be omitted variable bias for ̂1 a. b. c. d. 7) The dummy variable trap is an example of a. b. c. d. 8) imperfect multicollinearity something that is of theoretical interest only perfect multicollinearity something that does not happen to university or college students Imperfect multicollinearity a. b. c. d. 9) if X1 and X2 are correlated always if X2 is measured in percentages only if X2 is a dummy variable is not relevant to the field of economics and business administration only occurs in the study of finance means that the least squares estimator of the slope is biased means that two or more of the regressors are highly correlated Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependent variable. You first regress Y on X1 only and 2 find no relationship. However when regressing Y on X1 and X2, the slope coefficient ̂1 changes by a large amount. This suggests that your first regression suffers from a. b. c. d. 10) heteroskedasticity perfect multicollinearity omitted variable bias dummy variable trap Imperfect multicollinearity a. implies that it will be difficult to estimate precisely one or more of the partial effects using the data at hand b. violates one of the four Least Squares assumptions in the multiple regression model c. means that you cannot estimate the effect of at least one of the Xs on Y d. suggests that a standard spreadsheet program does not have enough power to estimate the multiple regression model 3 Chapter 7: Hypothesis Tests and Confidence Intervals in Multiple Regression Multiple Choice for the Web 1) When testing joint hypothesis, you should a. use t-statistics for each hypothesis and reject the null hypothesis is all of the restrictions fail b. use the F-statistic and reject all the hypothesis if the statistic exceeds the critical value c. use t-statistics for each hypothesis and reject the null hypothesis once the statistic exceeds the critical value for a single hypothesis d. use the F-statistics and reject at least one of the hypothesis if the statistic exceeds the critical value 2) In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is calculated a. b. c. d. 3) by dividing the estimate by its standard error. from the square root of the F-statistic. by multiplying the p-value by 1.96. using the adjusted R2 and the confidence interval. If you wanted to test, using a 5% significance level, whether or not a specific slope coefficient is equal to one, then you should a. subtract 1 from the estimated coefficient, divide the difference by the standard error, and check if the resulting ratio is larger than 1.96. b. add and subtract 1.96 from the slope and check if that interval includes 1. c. see if the slope coefficient is between 0.95 and 1.05. d. check if the adjusted R2 is close to 1. 4) When there are two coefficients, the resulting confidence sets are a. b c d rectangles ellipses squares trapezoids 1 5) All of the following are true, with the exception of one condition: 2 a. a high R 2 or R does not mean that the regressors are a true cause of the dependent variable 2 b. a high R 2 or R does not mean that there is no omitted variable bias 2 c. a high R 2 or R always means that an added variable is statistically significant 2 d. a high R 2 or R does not necessarily mean that you have the most appropriate set of regressors 6) You have estimated the relationship between test scores and the student-teacher ratio under the assumption of homoskedasticity of the error terms. The regression output is as 698.9 2.28 STR , and the standard error on the slope is 0.48. The follows: TestScore homoskedasticity-only “overall” regression F- statistic for the hypothesis that the Regression R2 is zero is approximately a. b. c. d. 7) Consider a regression with two variables, in which X1i is the variable of interest and X2i is the control variable. Conditional mean independence requires a. b. c. d. 8) 0.96 1.96 22.56 4.75 E (ui | X 1i , X 2i ) E (ui | X 2i ) E (ui | X 1i , X 2i ) E (ui | X1i ) E (ui | X 1i ) E (ui | X 2i ) E (ui ) E (ui | X 2i ) The homoskedasticity-only F-statistic and the heteroskedasticity-robust F-statistic typically are a. b. c. d. the same different related by a linear function a multiple of each other (the heteroskedasticity-robust F-statistic is 1.96 times the homoskedasticity-only F-statistic) 2 9) Consider the following regression output where the dependent variable is testscores and the two explanatory variables are the student-teacher ratio and the percent of English 698.9 1.10 STR 0.650 PctEL . You are told that the t-statistic learners: TestScore on the student-teacher ratio coefficient is 2.56. The standard error therefore is approximately a. b. c. d. 10) 0.25 1.96 0.650 0.43 The critical value of F4, at the 5% significance level is a. b. c. d. 3.84 2.37 1.94 Cannot be calculated because in practice you will not have infinite number of observations 3 Chapter 8: Nonlinear Regression Functions Multiple Choice for the Web 1) The interpretation of the slope coefficient in the model ln(Yi ) 0 1 ln( X i ) ui is as follows: a a. b. c. d. 2) 1% change in X is associated with a 1 % change in Y. change in X by one unit is associated with a 1 change in Y. change in X by one unit is associated with a 100 1 % change in Y. 1% change in X is associated with a change in Y of 0.01 1 . A nonlinear function a. makes little sense, because variables in the real world are related linearly. b. can be adequately described by a straight line between the dependent variable and one of the explanatory variables. c. is a concept that only applies to the case of a single or two explanatory variables since you cannot draw a line in four dimensions. d. is a function with a slope that is not constant. 3) A polynomial regression model is specified as: a. Yi 0 1 X i 2 X i2 r X ir ui . b. Yi 0 1 X i 12 X i 1r X i ui . c. Yi 0 1 X i 2Yi 2 rYi r ui . d. Yi 0 1 X 1i 2 X 2 3 ( X 1i X 2i ) ui . 4) The best way to interpret polynomial regressions is to a. take a derivative of Y with respect to the relevant X. b. plot the estimated regression function and to calculate the estimated effect on Y associated with a change in X for one or more values of X. c. look at the t-statistics for the relevant coefficients. d. analyze the standard error of estimated effect. 5) In the log-log model, the slope coefficient indicates a. the effect that a unit change in X has on Y. 1 b. the elasticity of Y with respect to X. c. Y / X . Y Y d. . X X In the model ln(Yi ) 0 1 X i ui , the elasticity of E (Y | X ) with respect to X is 6) a. 1 X b. 1 1 X c. 0 1 X d. cannot be calculated because the function is non-linear 7) Assume that you had estimated the following quadratic regression model 607.3 3.85Income 0.0423Income 2 . If income increased from 10 to 11 TestScore ($10,000 to $11,000), then the predicted effect on test scores would be a. b. c. d. 8) 3.85 3.85-0.0423 Cannot be calculated because the function is non-linear 2.96 Consider the polynomial regression model of degree r, Yi 0 1 X i 2 X i2 r X ir ui . According to the null hypothesis that the regression is linear and the alternative that is a polynomial of degree r corresponds to a. H 0 : r 0 vs. H1 : r 0 b. H 0 : 1 0 vs. H1 : 1 0 c. H 0 : 2 0, 3 0,..., r 0, vs.H1 : all j 0, j 2,...., r d. H 0 : 2 0, 3 0,..., r 0, vs.H1 : at least one j 0, j 2,...., r 9) Consider the following least squares specification between test scores and the student 557.8 36.42ln( Income) . According to this equation, a 1% teacher ratio: TestScore increase income is associated with an increase in test scores of a. 0.36 points b. 36.42 points 2 c. 557.8 points d. cannot be determined from the information given here 10) Consider the population regression of log earnings [Yi, where Yi = ln(Earningsi)] against two binary variables: whether a worker is married (D1i, where D1i=1 if the ith person is married) and the worker’s gender (D2i, where D2i=1 if the ith person is female), and the product of the two binary variables Yi 0 1 D1i 2 D2i 3 ( D1i D2i ) ui .The interaction term a. b. c. d. allows the population effect on log earnings of being married to depend on gender does not make sense since it could be zero for married males indicates the effect of being married on log earnings cannot be estimated without the presence of a continuous variable 3 Chapter 9: Assessing Studies Based on Multiple Regression Multiple Choice for the Web 1) A survey of earnings contains an unusually high fraction of individuals who state their weekly earnings in 100s, such as 300, 400, 500, etc. This is an example of a. b. c. d. 2) errors-in-variables bias. sample selection bias. simultaneous causality bias. companies that typically bargain with workers in 100s of dollars. In the case of a simple regression, where the independent variable is measured with i.i.d. error, p a. ˆ1 p b. ˆ1 X2 X2 w2 X2 1 . X2 w2 p 2 c. ˆ1 2 w 2 1 . X w p d. ˆ1 1 3) X2 X2 w2 . In the case of errors-in-variables bias, a. maximum likelihood estimation must be used. b. the OLS estimator is consistent if the variance in the unobservable variable is relatively large compared to variance in the measurement error. c. the OLS estimator is consistent, but no longer unbiased in small samples. d. binary variables should not be used as independent variables. 4) Comparing the California test scores to test scores in Massachusetts is appropriate for external validity if a. Massachusetts also allowed beach walking to be an appropriate P.E. activity. b. the two income distributions were very similar. c. the student-to-teacher ratio did not differ by more than five on average. 1 d. the institutional settings in California and Massachusetts, such as organization in classroom instruction and curriculum, were similar in the two states. 5) In the case of errors-in-variables bias, the precise size and direction of the bias depend on a. b. c. d. 6) the sample size in general. the correlation between the measured variable and the measurement error. the size of the regression R 2 . whether the good in question is price elastic. The question of reliability/unreliability of a multiple regression depends on a. b. c. d. 7) internal but not external validity the quality of your statistical software package internal and external validity External but not internal validity A statistical analysis is internally valid if a. b. c. d. 8) all t-statistics are greater than |1.96| the regression R2 > 0.05 the population is small, say less than 2,000, and can be observed the statistical inferences about causal effects are valid for the population studied Internal validity is that a. b. c. d. 9) the estimator of the causal effect should be unbiased and consistent the estimator of the causal effect should be efficient inferences and conclusions can be generalized from the population to other populations OLS estimation has been used in your statistical package Threats to internal validity lead to a. b. c. d. perfect multicollinearity the inability to transfer data sets into your statistical package failures of one or more of the least squares assumptions a false generalization to the population of interest 2 10) The true causal effect might not be the same in the population studied and the population of interest because a. b. c. d. of differences in characteristics of the population of geographical differences the study is out of date all of the above 3 Chapter 10: Regression with Panel Data Multiple Choice for the Web 1) The Fixed Effects regression model a. has n different intercepts. b. the slope coefficients are allowed to differ across entities, but the intercept is “fixed” (remains unchanged). c. has “fixed” (repaired) the effect of heteroskedasticity. d. in a log-log model may include logs of the binary variables, which control for the fixed effects. 2) In the Fixed Time Effects regression model, you should exclude one of the binary variables for the time periods when an intercept is present in the equation a. b. c. d. because the first time period must always excluded from your data set. because there are already too many coefficients to estimate. to avoid perfect multicollinearity. to allow for some changes between time periods to take place. 3) When you add state fixed effects to a simple regression model for U.S. states over a certain time period, and the regression R 2 increases significantly, then it is safe to assume that a. the included explanatory variables, other than the state fixed effects, are unimportant. b. state fixed effects account for a large amount of the variation in the data. c. the coefficients on the other included explanatory variables will not change. d. time fixed effects are unimportant. 4) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982-1988 for the 48 contiguous U.S. states. To test for the significance of entity fixed effects, you should calculate the F-statistic and compare it to the critical value from your Fq , distribution, where q equals a. 48. b. 54. c. 7. 1 d. 47. 5) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982-1988 for the 48 contiguous U.S. states. To test for the significance of time fixed effects, you should calculate the F-statistic and compare it to the critical value from your Fq , distribution, which equals (at the 5% level) a. b. c. d. 2.01. 2.10. 2.80. 2.64. 6) Assume that for the T = 2 time periods case, you have estimated a simple regression in changes model and found a statistically significant positive intercept. This implies a. a negative mean change in the LHS variable in the absence of a change in the RHS variable since you subtract the earlier period from the later period b. that the panel estimation approach is flawed since differencing the data eliminates the constant (intercept) in a regression c. a positive mean change in the LHS variable in the absence of a change in the RHS variable d. that the RHS variable changed between the two subperiods 7) HAC standard errors and clustered standard errors are related as follows: a. b. c. d. they are the same clustered standard errors are one type of HAC standard error they are the same if the data is differenced clustered standard errors are the square root of HAC standard errors 8) In panel data, the regression error a. b. c. d. is likely to be correlated over time within an entity should be calculated taking into account heteroskedasticity but not autocorrelation only exists for the case of T > 2 fits all of the three descriptions above 2 9) It is advisable to use clustered standard errors in panel regressions because a. b. c. d. without clustered standard errors, the OLS estimator is biased hypothesis testing can proceed in a standard way even if there are few entities (n is small) they are easier to calculate than homoskedasticity-only standard errors the fixed effects estimator is asymptotically normally distributed when n is large 10) If Xit is correlated with Xis for different values of s and t, then a. Xit is said to be autocorrelated b. the OLS estimator cannot be computed c. statistical inference cannot proceed in a standard way even if clustered standard errors are used d. this is not of practical importance since these correlations are typically weak in applications 3