Given name:____________________ Student #:______________________ Family name:___________________ Section #:______________________ BUEC 333 FINAL Multiple Choice (2 points each) 1) The Durbin-Watson test is only valid: a) with models that exclude an intercept b) with models that include a lagged dependent variable c) with models displaying multiple orders of autocorrelation d) all of the above e) none of the above 2) If q is an unbiased estimator of Q, then: a) Q is the mean of the sampling distribution of q b) q is the mean of the sampling distribution of Q c) Var[q] = Var[Q] / n where n = the sample size d) q = Q e) a and c 3) The OLS estimator of the variance of the slope coefficient in the regression model with one independent variable: a) will be smaller when there is less variation in ei b) will be smaller when there are fewer observations c) will be smaller when there is less variation in X d) will be smaller when there are more independent variables e) none of the above 4) Suppose the assumptions of the CLRM model applies and you have used OLS to estimate a slope coefficient as 2.43. If the true value of this slope is 3.05, then the OLS estimator a) has bias of 0.62 b) has bias of –0.62 c) is unbiased d) not enough information e) none of the above 5) Which of the following is not a linear regression model: a) Yi X i X i2 i b) Yi 0 1 X i 2 i c) log( Yi ) 0 1 log( X i ) i d) Yi 0 1 log( X i ) i e) none of the above 1 6) To be useful for hypothesis testing, a test statistic must: a) be computable using sample data b) have a known sampling distribution when the alternative hypothesis is true c) have a known sampling distribution when the null hypothesis is false d) a and b only e) none of the above 7) Impure serial correlation: a) is the same as pure serial correlation b) can be detected with residual plots c) is caused by mis-specification of the regression model d) b and c e) none of the above 8) The power of a test statistic should become larger as the a) sample size becomes larger b) type II error becomes larger c) null becomes closer to being true d) significance level becomes larger e) none of the above 9) Suppose you want to test the following hypothesis at the 5% level of significance: H0: μ = μ0 H1: μ ≠ μ0 Which of the following statements is/are true? a) the probability of erroneously failing to reject the null hypothesis when it is true is 0.05 b) the probability of erroneously failing to reject the null hypothesis when it is false is 0.05 c) the probability of erroneously rejecting the null hypothesis when it is true is 0.05 d) the probability of erroneously rejecting the null hypothesis when it is false is 0.05 e) none of the above 10) Suppose upon running a regression, EViews reports a value of the explained sum of squares as 1648 and an R2 of 0.80. What is the value of the residual sum of squares in this case? a) 0 b) 412 c) 1318.4 d) unknown as it is incalculable e) none of the above 11) The power of a test is the probability that you: a) reject the null when it is true b) reject the null when it is false c) fail to reject the null when it is false d) fail to reject the null when it is true e) none of the above 2 12) In a regression explaining earnings, you include one independent variable: an individuals' number of years of education as an independent variable but nothing else. You know that more educated people earn more. You also know that more educated people drink more. In this case, the OLS estimate of the effect of education on earnings will likely be: a) negatively biased b) positively biased c) unbiased d) not enough information e) none of the above 13) The consequences of multicollinearity are that the OLS estimates: a) will be biased while the standard errors will remain unaffected b) will be biased while the standard errors will be smaller c) will be unbiased while the standard errors will remain unaffected d) will be unbiased while the standard errors will be smaller e) none of the above 14) In the regression specification, Yi 0 1 X i i , which of the following is a justification for including epsilon? a) it accounts for potential non-linearity in the functional form b) it captures the influence of all omitted explanatory variables c) it incorporates measurement error in Y d) it reflects randomness in outcomes e) all of the above 15) In order for our independent variables to be labelled “exogenous” which of the following must be true: a) E(εi) = 0 b) Cov(Xi,εi) = 0 c) Cov(εi,εj) = 0 d) Var(εi) = σ2 e) none of the above 16) The F test of overall significance a) is based on a test statistic that has an F distribution with k and n-k-1 degrees of freedom b) is based on a test statistic that has an F distribution with n-k-1and k degrees of freedom c) helps to detect whether relevant variables have been omitted from the model d) a and c e) b and c 17) The OLS estimator is said to be BUE when: a) Assumptions 1 through 6 are satisfied and errors are normally distributed b) Assumptions 1 through 3 are satisfied and errors are normally distributed c) Assumptions 1 through 6 are satisfied d) Assumptions 1 through 3 are satisfied e) errors are normally distributed 3 18) The RESET test is designed to detect problems associated with: a) specification error of an unknown form b) heteroskedasticity c) multicollinearity d) serial correlation e) none of the above 19) Omitting a constant term from our regression will likely lead to: a) a lower R2, a lower F statistic, and unbiased estimates of the independent variables b) a higher R2, a lower F statistic, and biased estimates of the independent variables c) a higher R2, a lower F statistic, and unbiased estimates of the independent variables d) a higher R2, a higher F statistic, and biased estimates of the independent variables e) none of the above 20) If two random variables X and Y are independent, a) their joint distribution equals the product of their conditional distributions b) the conditional distribution of X given Y equals the marginal distribution of X c) their variance is zero d) a and c e) a, b, and c 4 Short Answer #1 (10 points) Suppose we specify the following regression model on the determination of incomes in British Columbia: ln( wi ) 0 1Educationi 2 FirstNationsi 3 Malei 4 Northi 5 ( Educationi * Malei ) 6 ( Malei * Northi ) i The dependent variable is the log of wages for individual i. Education is years of education for individual i. FirstNations is a dummy variable equal to 1 if an individual self-identifies as being of First Nations origin and equal to 0 if an individual does not. Male is a dummy variable equal to 1 if an individual self-identifies as being male and equal to 0 if an individual self-identifies as being female. North is a dummy variable equal to 1 if an individual resides in northern British Columbia and equal to 0 if an individual resides in southern British Columbia. Verbally explain the following: a) What null hypothesis are you testing when you test β2 = 0? b) What null hypothesis are you testing when you test β3 = 0? c) What null hypothesis are you testing when you test β5 = 0? d) What null hypothesis are you testing when you test β6 = 0? It is best to think of this in terms of the possible combinations of ethnic, gender, and regional origins. First Nations, male, northern British Columbia: β0 + β2 + β3 + β4 + β6 First Nations, male, southern British Columbia: β0 + β2 + β3 First Nations, female, northern British Columbia: β0 + β2 + β4 First Nations, female, southern British Columbia: β0 + β2 Non- First Nations, male, northern British Columbia: β0 + β3 + β4 + β6 Non- First Nations, male, southern British Columbia: β0 + β3 Non- First Nations, female, northern British Columbia: β0 + β4 Non- First Nations, female, southern British Columbia: β0 a) The impact of ethnic origin on wages is zero for everyone. b) The impact of gender identification for someone in southern British Columbia is zero. c) The impact of education for a male is the same as for a female. d) The impact of living in northern British Columbia for a male is the same as for a female. 5 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 6 Short Answer #2 (10 points) The first half of the course was dedicated to developing the least squares estimator. The rest of the course was dedicated to considering those instances when problems with the least squares estimator arise. Underlying the discussion, there were the six assumptions of the classical linear model. a.) Name the six assumptions and explain what each of them mean. b.) Some of these assumptions are necessary for the OLS estimator to be unbiased. Some of these assumptions are necessary for the OLS estimator to be “best”. Explain the distinction between these two concepts. c.) Indicate which of the six assumptions are necessary for the OLS estimator to be unbiased and which of the six assumptions are necessary for the OLS estimator to be “best”. d.) In general, would you prefer your estimates to be biased with small sampling variance or unbiased with a larger sampling variance? Explain your answer. a) The regression model is: a.) linear in the coefficients, b.) is correctly specified, and c.) has an additive error term. The error term has zero population mean or E(εi) = 0. All independent variables are uncorrelated with the error term, or Cov(Xi,εi) = 0 for each independent variable Xi (we say there is no endogeneity). Errors are uncorrelated across observations, or Cov(εi,εj) = 0 for two observations i and j (we say there is no serial correlation). The error term has a constant variance, or Var(εi) = σ2 for every i (we say there is no heteroskedasticity). No independent variable is a perfect linear function of any other independent variable (we say there is no perfect collinearity). b) Unbiasedness relates to the property whereby the expected value of an estimator is equal to the population parameter of interest. “Best” relates to the size of the sampling variance of any such estimator with the lower, the better. Blah, blah, blah… c) Of the assumption listed above the first three are required for unbiasedness. Four through six are necessary for the OLS estimator to be “best”. d) It is probably better to have an indication that your estimator is centered on the population parameter “on average” rather than be “wrong” but very precisely estimated. Thus, bias is the greater sin than inefficiency (although I am open to students persuasively arguing the opposite if we think of a small bias versus large variance case). 7 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 8 Short Answer #3 (10 points) Consider the results from running a regression of SCORE (out of 100) on a BUEC 333 exam on GPA and a dummy variable equal to 1 if a student was born in Tennessee (USA) and 0 if a student was born elsewhere. Dependent variable: SCORE Method: Least Squares Sample: 1 100 Included observations: 100 Variable GPA TENNESSEE C Coefficient 8.03 -3.69 55.00 R-squared Adjusted R-squared Mean dependent variable S.D. dependent variable Std. Error ? 3.03 6.67 t-Statistic 2.00 -1.22 8.25 Prob. 0.00 0.23 0.00 0.12 0.10 76.22 12.78 a) How would you interpret the coefficient estimate for TENNESSEE? b) Approximately what number should appear in the Std. Error column for GPA? c) What score would you forecast for a student born in Sydney, Australia with a GPA of 3.0? Explain your calculation. d) If approximately 20% of the data are from Tennessee, what is the approximate average GPA of the sample? Explain your thinking (you can also round up to the first decimal place in your explanation). a) Holding GPA constant, students from Tennessee score on average 3.69 points lower. b) The t value is given by: t ˆGPA H ˆ 0 ˆ1 GPA 2.00 s.e.( ˆGPA ) s.e.( ˆGPA ) s.e.( ˆGPA ) In this case, this means that the value of the standard error for GPA should be equal to 8.03/2.00 = 4.015. ˆ 55.00 8.03*3 79.09 c) SCORE d) By construction, the regression line passes through the sample averages. So, 76.2 55.0 8.0* GPA 3.7 *0.2 76.2 54.3 8.0* GPA 8.0* GPA 21.9 GPA 2.74 9 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 10 Short Answer #4 (10 points) In this course, we have repeatedly considered the linear regression model with one independent variable: Yi 0 1 X i i We have also seen that OLS defines the set of estimators that minimize the sum of squared residuals: βˆ0 Y ˆ1 X n ˆ1 X i 1 i X Yi Y X n i 1 i X 2 a) Suppose that β1-hat = -2. What must be the sign of the sample covariance between X and Y. Explain your reasoning. b) Now, derive an expression for β1-hat as a function of the following sample statistics: the correlation between X and Y (rx,y); the standard deviation of Y (sY); and the standard deviation of X (sX). c) Given your answer in b), suppose that β1-hat = -2 and sY = 3. Is it possible that sX = 2? Explain. d) Given your answer in b), suppose that β1-hat = -2 and sY = 3. Is it possible that sX = 1? Explain. e) The estimated variance of β1-hat is generally given as ˆ ˆ Var 1 s2 X i i X 2 Explain why the estimated variance of β1-hat can also be written as ˆ ˆ Var 1 s2 (n 1)* s X2 a) The first step is in recognizing that ˆ1 Cov( X , Y ) s XY 2 Var ( X ) sX So if β1-hat = -2, then the sample covariance must be negative since variances are positive for our purposes. b) Using the expression above, ˆ1 Cov( X , Y ) s XY 2 Var ( X ) sX We need to make use of the definition of the sample correlation given on the last page of the exam, or namely: rXY s XY s rXY * s X sY sY s XY rXY * s X sY ˆ1 XY r * XY s X sY s X2 s X2 sX 11 Page intentionally left blank. Use this space for rough work or the continuation of an answer. c)If β1-hat = -2 and sY = 3, then ˆ1 rXY * sY 3 2 rXY * sX sX 2 rXY s X 3 In this case, the standard deviation of X could not be 2, since this would imply that the correlation was -4/3. This value for the correlation is less than -1 which is impossible by construction/definition. d)If β1-hat = -2 and sY = 3, then ˆ1 rXY * sY 3 2 rXY * sX sX 2 rXY s X 3 In this case, the standard deviation of X could be 1, since this would imply that the correlation was -2/3. This value for the correlation falls in the feasible ranges of (-1, +1). e) From the last page of the final exam, s X2 2 1 n Xi X n 1 i 1 We are also given the information that ˆ ˆ Var 1 s2 X i ˆ ˆ Var 1 ˆ ˆ Var 1 i X 2 ˆ ˆ Var 1 s2 2 (n 1) X X i (n 1) i s2 2 c X X i c i s2 X (n 1) i i X 2 (n 1) s2 (n 1) * s X2 12 Short Answer #5 (10 points) Suppose that for all fourth year seminar courses in Economics the length of term papers is uniformly distributed from 10 to 14 pages (also assume that page length is discrete). Suppose we also survey a random sample of 55 terms papers across classes as we are interested in the average length of the research papers. a) In words, define what Xi will be in this case. b) Calculate the value of µX and σX in this case. c) How should we characterize the distribution of X-bar in this case? d) Describe how you would find the probability that an individual paper is longer than 12 pages. Use a diagram to support your answer. e) Describe how you would find the probability that the average length of the 55 papers is longer than 12 pages. Use a diagram to support your answer. a) Xi will be the number of pages in one individual paper. b) Since it is a uniform distribution across the outcomes of 10, 11, 12, 13, and 14, we know the associated probability for any outcome is 0.20. So, 1 5 1 5 1 5 1 5 1 5 X *10 *11 *12 *13 *14 k X2 ( X i X ) 2 pi i 1 60 12 5 1 5 ( X i 12) 2 5 i 1 1 (10 12) 2 (11 12) 2 (12 12) 2 (13 12) 2 (14 12) 2 5 1 X2 (2) 2 (1) 2 (0) 2 (1) 2 (2) 2 5 1 1 X2 4 1 0 1 4 (10) 2 5 5 X 2 X2 c) We know that regardless of the underlying distribution X-bar should be normally distributed with a mean equal to µX and a sampling variance equal to (σX)2/n. That is, X ~ N (12, 2 ) 55 d) This would amount to consulting the probability density function for the distribution of individual paper lengths which we know is uniform and should look something like the figure below. 13 Page intentionally left blank. Use this space for rough work or the continuation of an answer. It would then be a matter of adding up the area of the pdf to the right of 12 (that is, 13 and 14 pages). In this case, we can see that this would be equal to 40%. d) This would amount to consulting the probability density function for the distribution of the average of paper lengths which we know is normal and should look something like the figure below. It would then be a matter of adding up the area of the pdf to the right of 12. In this case, we can see that this would be equal to 50%. 14 Short Answer #6 (10 points) Consider the following set of regression results generated from gravity model of international trade for the period from 1950 to 2000: REGRESSION A Linear regression Number of obs F( 6, 6622) Prob > F R-squared Root MSE logtrade Coef. loggdpprod ldist fixed language empire border 2.018092 -1.27416 .161583 .4565153 1.002045 1.514522 Robust Std. Err. .0089598 .0277619 .0986606 .0845662 .1541489 .0884167 t 225.24 -45.90 1.64 5.40 6.50 17.13 P>|t| 0.000 0.000 0.102 0.000 0.000 0.000 = = = = = 6628 . 0.0000 0.9970 2.2473 [95% Conf. Interval] 2.000528 -1.328583 -.0318237 .2907382 .6998638 1.341197 2.035656 -1.219738 .3549896 .6222923 1.304227 1.687848 Number of obs F( 6, 6621) Prob > F R-squared Root MSE = 6628 = 3328.98 = 0.0000 = 0.7385 = 1.8071 REGRESSION B Linear regression logtrade Coef. loggdpprod ldist fixed language empire border _cons 1.407351 -1.694913 1.205402 .6835998 .6023074 .6093824 18.84188 Robust Std. Err. .0134916 .0232049 .0933823 .0714326 .1348587 .0795588 .3180509 t 104.31 -73.04 12.91 9.57 4.47 7.66 59.24 P>|t| 0.000 0.000 0.000 0.000 0.000 0.000 0.000 [95% Conf. Interval] 1.380903 -1.740402 1.022342 .5435689 .3379409 .4534216 18.21839 1.433799 -1.649424 1.388461 .8236306 .8666739 .7653432 19.46536 The dependent and independent variables are the same as those from Homework #2. Namely, logtrade = the natural log of the product of trade12 and trade21 loggdpprod = the natural log of the product of gdp1 and gdp2 ldist = the natural log of distance separating country 1 and country 2 fixed= a dummy variable equal to 1 if country 1 and country 2 have a fixed nominal exchange rate and 0 if otherwise language = a dummy variable equal to 1 if country 1 and country 2 share the same language and 0 if otherwise (e.g., if Canada is country 1 and the US is country 2, then language = 1; if Canada is country 1 and India is country 2, then language = 1) empire = a dummy variable equal to 1 if country 1 and country 2 were in the same empire either now or in the past and 0 if otherwise (e.g., if Canada is country 1 and the US is country 2, then empire = 0; if Canada is country 1 and India is country 2, then empire = 1) border = a dummy variable equal to 1 if country 1 and country 2 share a border and 0 if otherwise (e.g., if Canada is country 1 and the US is country 2, then border = 1; if Canada is country 1 and the India is country 2, then border = 0) 15 a) From Regression A, interpret the coefficient on ldist. b) From Regression A, interpret the confidence interval for ldist. Your answer should include reference to both population parameters and statistical significance. c) From Regression B, perform a test of joint significance for the model, using a 1% level of significance and explain the results. Do the results change at the 5% level of significance? d) From Regression A, interpret the value of the R-squared. Are the R-squareds from Regression A and Regression B comparable? e) Of the two candidate regressions, which should be your preferred specification and why? a) This is simply the elasticity of bilateral trade with respect to distance. It says that, holding all else constant, for every 1% increase in distance separating two countries the level of bilateral trade between them falls by 1.27416%. b) There is a 95% probability that confidence intervals constructed in this fashion over repeated samples will include the true value of the population parameter βDISTANCE. Also, the fact that the CI does not extend into positive values suggests that we reject the null hypothesis at the 5% level that the true coefficient is equal to zero. c) This test is implicitly being performed in the calculation of the F-statistic and, in particular, its associated p-value. That Prob > F is 0.0000 suggests we can reject the null hypothesis of joint insignificance at the 1% level. As to the 5% level, the logic of hypothesis testing suggests that any null which is rejected at 1% will be rejected at 5% as well. d) This is the proportion of variation in bilateral trade explained by variation in our independent variables. Thus, we ostensibly capture nearly 100% of the variation in bilateral trade in this specification. However, this specification is suspect as it contains no constant term. Because of this, we also know that the TSS changes across Regression A and Regression B. Therefore, the R-squareds are not comparable even though they have the same dependent variable. e) Regression B as it contains a constant term, even though it registers a lower R-squared. We have argued that all specifications should contain a constant. This is primarily because we know that by excluding a constant term we are potentially biasing the estimates of the coefficients attached to our independent variables. This result is clearly seen in how the values of these estimates diverge across the two specifications. 16 Useful Formulas: k k 2 2 X2 Var ( X ) E X X xi X pi X E ( X ) pi xi i 1 k Pr( X x) Pr X x, Y yi Pr(Y y | X x) i 1 m E Y | X x yi PrY yi | X x i 1 i 1 Var (Y | X x) yi E Y | X x PrY yi | X x Ea bX cY a bE( X ) cE (Y ) 2 i 1 XY Cov( X , Y ) x j X yi Y Pr X x j , Y yi k Pr( X x, Y y) Pr( X x) E Y E Y | X xi Pr X xi k k i 1 m Var a bY b 2Var (Y ) i 1 j 1 Cov X , Y Var X Var Y Corr X , Y XY Var aX bY a 2Var ( X ) b 2Var (Y ) 2abCov( X ,Y ) E Y 2 Var (Y ) E (Y ) 2 E XY Cov( X ,Y ) E( X ) E(Y ) 1 X n t 1 n 2 s xi x n 1 i 1 n x Cova bX cV ,Y bCov( X ,Y ) cCov(V ,Y ) 2 X i i 1 X Z s/ n s XY X 2 X ~ N , n rXY s XY / s X sY n 1 xi x yi y n 1 i 1 X n For the linear regression model Yi 0 1 X i i , ˆ1 i 1 i X Yi Y n X i 1 i X 2 & βˆ0 Y ˆ1 X Yˆi ˆ0 ˆ1 X 1i ˆ2 X 2i ˆk X ki e2 ESS TSS RSS RSS i i R 1 1 2 TSS TSS TSS Yi Y e / (n k 1) R 1 Y Y / (n 1) e / n k 1 ˆ ˆ Var X X 2 e 2 i i i 2 2 s where E s 2 2 n k 1 2 2 i i 2 i i i i 1 2 i Z ˆ j H Var[ ˆ j ] ~ N 0,1 Pr[ˆ j t* /2 s.e.(ˆ j ) j ˆ j t* /2 s.e.(ˆ j )] 1 e e d e T t 2 t T 2 t 1 t t 1 t F i ˆ1 H ~ tn k 1 s.e.( ˆ1 ) ESS / k ESS (n k 1) RSS / (n k 1) RSS k 2 2(1 ) 17