Given name:____________________ Student #:______________________ Family name:___________________ Section #:______________________ BUEC 333 FINAL Multiple Choice (2 points each) 1.) Suppose that in the simple linear regression model Yi = β0 + β1Xi + εi on 100 observations, you calculate that R2= 0.5, the sample covariance of X and Y is 10, and the sample variance of X is 15. Then the least squares estimator of β1 is: a) not calculable using the information given b) 1/3 c) 1 / 3 d) 2/3 e) none of the above 2.) The Durbin-Watson test is only valid: a) with models that include an intercept b) with models that include a lagged dependent variable c) with models displaying multiple orders of autocorrelation d) all of the above e) none of the above 3.) Suppose you have a random sample of 10 observations from a normal distribution with mean = 10 and variance = 2. The sample mean (x-bar) is 8 and the sample variance is 3. The sampling distribution of xbar has a.) mean 8 and variance 3 b.) mean 8 and variance 0.3 c.) mean 10 and variance 0.2 d.) mean 10 and variance 2 e.) none of the above 4.) From a gravity model of trade, you estimate that Pr[0.9828 distance 0.7982] 95% , this allows you to state that: a.) there is a 95% chance that all potential estimates of the coefficient on distance are in this range b.) you can reject the null hypothesis that the true coefficient on distance is equal to zero at the 5% level of significance. c.) there is a 5% chance that some of the potential estimate of the coefficient on distance fall outside of this range d.) all of the above e.) none of the above 1 5.) Suppose you compute a sample statistic q to estimate a population quantity Q. Which of the following is/are false? [1] the variance of Q is zero [2] if q is an unbiased estimator of Q, then q = Q [3] if q is an unbiased estimator of Q, then q is the mean of the sampling distribution of Q [4] a 95% confidence interval for q contains Q with 95% probability a.) 2 only b.) 3 only c.) 2 and 3 d.) 2, 3, and 4 e.) 1, 2, 3, and 4 6.) In order for our independent variables to be labelled “exogenous” which of the following must be true: a.) E(εi) = 0 b.) Cov(Xi,εi) = 0 c.) Cov(εi,εj) = 0 d.) Var(εi) = σ2 e.) none of the above 7.) If (correlated) omitted independent variables are serially correlated, then: a) least squares coefficient estimates are biased b) GLS coefficient estimates are biased c) least squares standard errors are wrong d) ordinary least squares is not BLUE e) all of the above 8.) We saw the claim that the value of X1 is an unbiased estimator of the sample mean because E(X1) = μ. Now consider, the estimator (X1 + X2)*2. Is this another unbiased estimator of the population mean? a.) answer depends on the underlying distribution of X b.) this is a biased estimator of the population mean c.) this is an unbiased estimator of the population mean d.) there is insufficient information to answer this question e.) none of the above 9.) To be useful for hypothesis testing, a test statistic must: a.) be computable using sample data b.) have a known sampling distribution when the null hypothesis is true c.) have a known sampling distribution when the null hypothesis is false d.) a and b only e.) none of the above 10.) Adding an irrelevant explanatory variable that is uncorrelated with the other independent variables causes: a.) bias and no change in variance b.) bias and an increase in variance c.) no bias and no change in variance d.) no bias and an increase in variance e.) none of the above 2 11.) A newspaper reports a poll estimating the proportion u of the adult population in favour of legalizing marijuana as 65%, but qualifies this result by saying that “this result is accurate within plus or minus 3 percentage points (19 times out of twenty).” What does this mean? a.) the probability is 95% that u lies between 62% and 68% b.) the probability is 95% that u is equal to 65% c.) 95% of estimates calculated from samples of this size will lie between 62% and 68% d.) not enough information e.) none of the above 12.) Omitting a relevant explanatory variable that is correlated with the other independent variables causes: a.) no bias and no change in variance b.) no bias and an increase in variance c.) no bias and a decrease in variance d.) bias e.) none of the above 13.) The OLS estimator of the variance of the slope coefficient in the regression model with one independent variable: a.) will be smaller when there is less variation in ei b.) will be smaller when there are fewer observations c.) will be smaller when there is less variation in X d.) will be smaller when there are more independent variables e.) none of the above 14.) The central limit theorem tells us that the sampling distribution of the sample mean: a.) is always normal b.) is always normal in large samples c.) approaches normality as the sample size increases d.) is normal in Monte Carlo simulations e.) none of the above 15.) Suppose you compute a sample statistic q to estimate a population quantity Q. Which of the following is/are true? [1] the variance of Q is zero [2] if q is an unbiased estimator of Q, then q = Q [3] if q is an unbiased estimator of Q, then q is the mean of the sampling distribution of Q [4] a 95% confidence interval for q contains Q with 95% probability a.) 1 only b.) 2 only c.) 2 and 3 d.) 2, 3, and 4 e.) 1, 2, 3, and 4 16.) If the covariance between two random variables X and Y is zero then a.) X and Y are independent b.) Knowing the value of X provides no information about the value of Y c.) E(X) = E(Y) = 0 d.) a and b are true e.) none of the above 3 17.) Given the equation for the F statistic, we can say that it is a.) decreasing in R2, decreasing in n, and decreasing in k b.) increasing in R2, increasing in n, and increasing in k c.) decreasing in R2, increasing in n, and decreasing in k d.) increasing in R2, increasing in n, and decreasing in k e.) none of the above 18.) In the Capital Asset Pricing Model (CAPM), a.) β measures the sensitivity of the expected return of a portfolio to systematic risk b.) β measures the sensitivity of the expected return of a portfolio to specific risk c.) β is greater than one d.) α is less than zero e.) R2 is meaningless 19.) If a random variable X has a normal distribution with mean μ and variance σ2 then: a.) X takes positive values only b.) ( X ) / 2 has a standard normal distribution c.) ( X ) 2 / 2 has a chi-squared distribution with n degrees of freedom d.) ( X ) /( s / n ) has a t distribution with n-1 degrees of freedom e.) none of the above 20.) Suppose the assumptions of the CLRM model applies and you have used OLS to estimate a slope coefficient as 2.43. If the true value of this slope is 3.05, then the OLS estimator a.) has bias of 0.62 b.) has bias of –0.62 c.) is unbiased d.) not enough information e.) none of the above 4 Short Answer #1 (10 points) According to the Canada Revenue Agency, the average length of time for an individual to complete a CRA Income Tax Return is 10.53 hours with a standard deviation of 2.00 hours. The distribution of this variable, however, is unknown. Suppose we randomly sample 360 taxpayers. a.) In words, explain what Xi equals. b.) In words, explain what X-bar equals. c.) Now, tell me how X-bar is distributed—that is, tell me the type of distribution and its parameters. d.) Would you be surprised if the 360 taxpayers finished their Income Tax Return in an average of more than 12 hours? Explain why or why not in complete sentences. e.) Would you be surprised if one taxpayer out of the 360 taxpayers finished his Income Tax Return in more than 12 hours? Explain why or why not in complete sentences. a.) Xi is simply one of the observations underlying the sample of individuals filling out an Income Tax Return. b.) X-bar is simply the sample average calculated from the individual Xi’s. c.) Because our sample size exceeds 30, we can invoke the Central Limit Theorem. Therefore, we can state with reasonable assurance that X-bar will be normally distributed with a mean equal to mu (that is, the population mean or 10.53 hours) and a variance equal to sigma-squared divided by n (that is, the variance of the underlying individual observations in the population divided by 360, or 4/360 = 1/90 of an hour = 0.66 minutes). Thus, X N (10.53, 0.0111), d.) We would be very surprised by this result as X-bar is very tightly distributed around the population mean in this particular case. An average of more than 12 hours would be more than about 14 standard deviations from the population mean (=1.47/0.105). e.) We would not be very surprised by this result as the Xi’s are distributed fairly widely around the population mean in this particular case. A value of more than 12 hours would only be about 0.75 standard deviations from the population mean (=1.47/2.00) 5 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 6 Short Answer #2 (10 points) Suppose we have a linear regression model with one independent variable and no intercept: Yi = βXi + εi Suppose also that εi satisfies the six classical assumptions. a.) Verbally, explain the steps necessary to derive the least squares estimator. b.) Formally, derive a mathematical expression for this estimator given your answer in part a). For part a) 1.) Thus, we first have to define our residual as the difference between that which is observed and that which is predicted by the regression. In this way, the residual is best thought of as a prediction error, that is, something we would like to make as small as possible. 2.) Next, we need to define a minimization problem. Because our residuals will likely be both positive and negative, simply considering their sum is unsatisfactory as these will tend to cancel one another out. Additionally, minimizing the sum of residuals does not generally yield a unique answer. A better way forward is to minimize the sum of the squared “prediction errors” which will definitely yield a unique answer and which will penalize us for making big errors. 3.) We must take the derivatives of the sum of squared residuals with respect to the beta-hats and set them equal to zero. These first order conditions establish the values of beta-hat for which the sum of squared residuals “bottoms out” and is, thus, minimized. 4.) Finally, we must solve for the values of the beta-hats which are consistent with these first order conditions, thus, yielding our least squares estimators. For part b) ei Yi ˆ X i e n Minˆ n 2 i i 1 i 1 Yi ˆ X i Y 2 n n 2 i 1 i i 1 n 2Yi ˆ X i ˆ X i i 1 2 This allows us to derive the following first order condition: n ei2 i 1 ˆ 2 Yi X i 2 ˆ X i2 0 n n i 1 i 1 Yi X i ˆ X i2 0 n n i 1 i 1 ˆ X i2 Yi X i n n i 1 i 1 n ˆ Y X i 1 n i i X i 1 2 i 7 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 8 Short Answer #3 (10 points) There are at least two different possible approaches to the problem of building a model of the costs of production of electric power. Model I hypothesizes that per-unit costs (C) as a function of the number of kilowatt-hours produced (Q) continually and smoothly falls as production is increased, but it falls at a decreasing rate. Model II hypothesizes that per-unit costs (C) decrease fairly steadily as production (Q) increases across plant type, but costs start at a higher level for hydroelectric plants than for other kinds of facilities. a.) What functional form would you recommend for estimating Model I? Write out a specific equation. b.) What functional form would you recommend for estimating Model II? Write out a specific equation. c.) Would R2 be a reasonable way to compare the overall fits of the two equations? Why or why not? a.) A number of forms are possible, but a log-log form would perhaps be the most appropriate: ln(Ct ) 0 1 *ln(Qt ) t Whatever functional form chosen, it has to satisfy that the conditions that the first derivative and the second derivatives of the LHS with respect to Q are respectively negative and positive. In the case of the equation above, C C ln C ln Q ln C 1 C C β1 <0 Q ln Ci ln Q Q ln Q Q Q β1 0 2C β1C 0 Q 2 Q2 b.) A number of forms are possible, but a linear form with a dummy variable (Dt) capturing a different intercept term for hydroelectric plants would perhaps be the most appropriate: Ct 0 1Qt 2 Dt t Whatever functional form chosen, it has to satisfy that the conditions that the first derivative of the LHS with respect to Q is constant and negative or β1<0 and that β2>0. c.) Answers may vary depending on the functional form indicated. In the example above, R2 is not appropriate for comparing the overall fits of the two equations as the functional form of the dependent variable changes and, thus, the value of TSS. 9 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 10 Short Answer #4 (10 points) Consider the regression results below where the dependent variable is the amount of time in minutes that individuals spend traveling from home to work. The sample consists of workers across Canada. It also contains information on individual’s earnings, years of schooling, age, sex, and place of birth. Dependent variable: Canadian commuting times Independent variables: Total earnings in 2012 (in $1000s) 0.0225 p-value 0.000 Years of schooling -0.0344 p-value 0.162 Age 0.0183 p-value 0.007 Female -3.1650 p-value 0.000 Africa 4.0390 p-value 0.000 Asia 1.1200 p-value 0.000 Australasia 1.2630 p-value 0.066 Europe -0.5527 p-value 0.635 Latin America 2.039 p-value 0.000 Intercept 27.64 p-value 0.000 R2 0.0081 2 Adjusted R F-statistic p-value of F Observations 0.0080 104.2 0.000 115089 a.) How many of the independent variables are statistically significant? Which ones are not? b.) Does the whole set of independent variables have a reliable collective effect on the dependent variables? Explain your answer. c.) Consider the values of the R2 and adjusted R2 of the regression. Tell me what these mean individually and collectively. d.) Interpret the coefficient associated with variable on total earnings in 2012. Do the sign and magnitude of this coefficient seem reasonable? Why or why not? e.) The sample for this regression include individuals in many different cities of Canada. Does this seem like a good idea? Why or why not? What would you suggest as an alternative? 11 Page intentionally left blank. Use this space for rough work or the continuation of an answer. a.) (ANSWERS MAY VARY DEPENDING ON SIGNIFICANCE LEVEL SPECIFED – THEREFORE, FULL MARKS ONLY TO THOSE WHO TELL US WHAT VALUE OF ALPHA THEY ARE USING) The regression contains nine independent variables of which six have p-values less than or equal to .05 and are, thus, statistically significant. “Years of schooling”, “Australiasia” and “Europe” are not statistically significant. b.) Yes. The p-value associated with the F-statistic is less than .05, indicating that the ensemble of nine explanatory variables has a reliable collective effect on the dependent variable. c.) Individually, the students need to define the two in terms of the amount of variation in commuting times explained by the independent variables (adjusted or not adjusted for the degrees of freedom). Collectively, the results imply that the number of observations far exceeds the number of explanatory variables. Consequently, adjusting the R for the number of 2 explanatory variables has very little effect. d.) The slope associated with earnings in $1,000 is .0225. Its p-value is less than .05, so this slope is statistically significant. It indicates that each increase of $1,000 in annual earnings increases the commuting time to work by a little more than two one-hundredths of a minute, or slightly more than a second. In other words, an increase of a little less than $50,000 in annual earnings is associated with an increase of one minute in commuting time. This may be plausible because people with higher earnings typically live in more expensive housing, lots of expensive housing is in the suburbs, and suburban residents typically have longer commutes than city residents. e.)(ANSWERS MAY VARY – FULL CREDIT FOR ANY WELL-REASONED ARGUMENT) This seems like a potentially bad idea. The transportation systems in different cities can be very different. Two people with similar characteristics living in different cities might therefore have very different commuting times. Consequently, it is probably misleading to have a regression for commuting times that does not, somehow, account for the differences in commuting times across metropolitan areas. 12 Short Answer #5 (10 points) Consider the regression results below where the dependent variable is the natural log of annual earnings for single, child-less men with high-school education or less. The sample consists of workers in Vancouver over the years from 2003 to 2012. It also contains information on individual’s age (as a set of dummy variables capturing a range of ages), their status as a “visible minority” (that is, whether or not they are Caucasian), their status as “Aboriginal” (that is, whether or not they are First Nations origin), and whether or not they possess a “High School Degree”. Dependent variable: natural log of annual earnings Independent variables: OLS Age from 30-34 0.29 standard error 0.16 t-statistic 1.80 Age from 35-39 0.19 standard error 0.18 t-statistic 1.10 Age from 40-44 0.25 standard error 0.16 t-statistic 1.51 Age from 45-49 0.12 standard error 0.17 t-statistic 0.71 Age from 50-54 0.08 standard error 0.17 t-statistic 0.46 Age from 55-59 0.37 standard error 0.19 t-statistic 2.02 Age from 60-64 0.35 standard error 0.30 t-statistic 1.15 Visible minority 0.07 standard error 0.19 t-statistic 0.36 Aboriginal -0.54 standard error 0.20 t-statistic -2.74 High school degree standard error t-statistic Intercept 10.18 standard error 0.12 t-statistic 84.44 R2 F-statistic DW statistic p-value of F Observations 0.03 89.21 0.98 0.000 4160 OLS with OLS with Newey-West SEs Newey-West SEs 0.29 0.26 0.13 0.14 2.20 1.88 0.19 0.21 0.17 0.17 1.13 1.23 0.25 0.24 0.15 0.15 1.66 1.65 0.12 0.13 0.17 0.17 0.70 0.79 0.08 0.06 0.18 0.18 0.43 0.32 0.37 0.38 0.20 0.21 1.83 1.84 0.35 0.33 0.26 0.27 1.34 1.24 0.07 0.02 0.19 0.09 0.73 0.29 -0.54 -0.49 0.27 0.26 -2.01 -1.88 0.33 3.09 0.00 10.18 9.96 0.11 0.14 89.22 73.62 0.03 89.21 1.78 0.000 4160 0.05 108.98 1.78 0.000 4160 13 For a.) through e.), consider only the output in the first and second columns and assume that with 4160 observations, the t distribution is functionally the same as the standard normal distribution. a.) Why are the coefficients the same, but the standard errors different in the first and second column? b.) Which set of estimates do you think are more reliable? Explain. c.) What is the test statistic for the hypothesis that Aboriginal and Caucasian men have the same earnings against the alternative that they do not? Can you reject this hypothesis? Hint: use the “rule of thumb” that 2.00 is a sufficiently large critical value. d.) Do you reject the hypothesis that these two groups have the same log-earnings against the alternative that Aboriginal men have lower log-earnings? e.) Are the R-squared’s too low? Should we ignore these results? f.) The third column reports the results for a regression just like that reported in the second column, but it adds a dummy variable equal to 1 if an individual has a high school diploma. Why is the coefficient on “Aboriginal” now smaller in absolute value than in the second column? a.) The second column simply corrects for pure serial correlation. Inherently, this is a problem with the standard errors and not the values of the coefficients themselves as OLS remains unbiased. b.) The results in the first column indicate that serial correlation is a potential problem as the DurbinWatson test statistic is 0.98. This is far from the value of 2.00 when there is no positive serial correlation. Therefore, we prefer the results in the second column. c.) This is simply the t-statistic right off the table as Caucasian men are the omitted category. In the first column, this is -2.74. In the first column, this is -2.01. These are larger in absolute value than the rule of thumb for the 5% critical value of 2.00, so one can reject with either. d.) The 5% critical value for a one-sided test must be even lower than 2.00, so we still reject this hypothesis regardless of whether we consider the first or second column. e.) R-squared measures the explained variation in the regression. A low R-squared suggests that there is a lot of other stuff going on which we are not accounting for. However, this does not mean that our independent variables do not matter. In fact, the p-value of the F statistic strongly suggests otherwise. f.) “High school degree” must be correlated with both the natural log of earnings and “Aboriginal”. Given that “High school degree”=1 is for high-school completion, “High school degree”=0 is for noncompletion, the coefficient on “High school degree” should be positive. So, it must be negatively correlated with Aboriginal status to make it so that adding it to the regression reduces the measured disparity. Alternatively put, high school completion is less likely for Aboriginals, so controlling for it reduces the amount of disparity in earnings we see in comparison with not controlling for it. 14 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 15 Short Answer #6 (10 points) The first half of the course was dedicated to developing the least squares estimator. The rest of the course was dedicated to considering those instances when problems with the least squares estimator arise. Underlying the discussion, there were the six assumptions of the classical linear model. a.) Name the six assumptions and explain what each of them mean. b.) Some of these assumptions are necessary for the OLS estimator to be unbiased. Some of these assumptions are necessary for the OLS estimator to be “best”. Explain the distinction between these two concepts. c.) Indicate which of the six assumptions are necessary for the OLS estimator to be unbiased and which of the six assumptions are necessary for the OLS estimator to be “best”. d.) In general, would you prefer your estimates to be biased but efficient or unbiased but not efficient? Explain your answer. a) The regression model is: a.) linear in the coefficients, b.) is correctly specified, and c.) has an additive error term. The error term has zero population mean or E(εi) = 0. All independent variables are uncorrelated with the error term, or Cov(Xi,εi) = 0 for each independent variable Xi (we say there is no endogeneity). Errors are uncorrelated across observations, or Cov(εi,εj) = 0 for two observations i and j (we say there is no serial correlation). The error term has a constant variance, or Var(εi) = σ2 for every i (we say there is no heteroskedasticity). No independent variable is a perfect linear function of any other independent variable (we say there is no perfect collinearity). b) Unbiasedness relates to the property whereby the expected value of an estimator is equal to the population parameter of interest. “Best” relates to the size of the sampling variance of any such estimator with the lower, the better. Blah, blah, blah… c) Of the assumption listed above the first three are required for unbiasedness. Four through six are necessary for the OLS estimator to be “best”. d) It is probably better to have an indication that your estimator is centered on the population parameter “on average” rather than be “wrong” but very precisely estimated. Thus, bias is the greater sin than inefficiency (although I am open to students persuasively arguing the opposite if we think of a small bias versus large variance case). 16 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 17 Useful Formulas: E( X ) x k Var ( X ) E X X k p x i i 2 i 2 pi X i 1 i 1 k Pr( X x) Pr X x, Y yi Pr(Y y | X x) i 1 m E Y E Y | X xi Pr X xi k E Y | X x yi PrY yi | X x i 1 i 1 k Var (Y | X x) yi E Y | X x PrY yi | X x Ea bX cY a bE( X ) cE (Y ) 2 i 1 Cov( X , Y ) x j X yi Y PrX x j , Y yi k m i 1 j 1 Cov X , Y Var X Var Y Corr X , Y XY Pr( X x, Y y) Pr( X x) Var a bY b 2Var (Y ) Var aX bY a 2Var ( X ) b 2Var (Y ) 2abCov( X ,Y ) E Y 2 Var (Y ) E (Y ) 2 Cova bX cV ,Y bCov( X ,Y ) cCov(V ,Y ) E XY Cov( X ,Y ) E( X ) E(Y ) t 1 X n 1 n xi x 2 s n 1 i 1 n x 2 i i 1 s XY X Z s/ n n 1 xi x yi y n 1 i 1 X n For the linear regression model Yi 0 1 X i i , ˆ1 i 1 i X Yi Y n X i 1 i X 2 X rXY s XY / s X sY & βˆ0 Y ˆ1 X Yˆi ˆ0 ˆ1 X 1i ˆ2 X 2i ˆk X ki e2 ESS TSS RSS RSS i i R 1 1 2 TSS TSS TSS Yi Y e / (n k 1) R 1 Y Y / (n 1) e / n k 1 ˆ ˆ Var X X 2 e 2 i i i 2 2 s where E s 2 2 n k 1 2 2 i i 2 i i i i 1 2 i Z ˆ j H Var[ ˆ j ] ~ N 0,1 Pr[ˆ j t* /2 s.e.(ˆ j ) j ˆ j t* /2 s.e.(ˆ j )] 1 e e d e T t 2 t T 2 t 1 t t 1 t F i ˆ1 H ~ tn k 1 s.e.( ˆ1 ) ESS / k ESS (n k 1) RSS / (n k 1) RSS k 2 2(1 ) 18