ECN 405 Exam I Key 1. Population Regression Model: y 0 1 x u Systematic Component: Stochastic Component: Parameters: 0 1 x u 0 (the intercept) and 1 (the slope) 2. Experimental data are obtained via a controlled experiment. In such a setting, the researcher/econometrician varies the level of the explanatory variable (x) randomly and observes the resulting value of the dependent variable (y). Since x values are assigned randomly, they should not be correlated with other determinants of y lurking in the error term (u). Thus experimental data should typically satisfy the zero conditional mean assumption. Unfortunately, however, we almost never have experimental data to work with in econometric problems. Instead, econometric problems usually involve observational data. The problem with observational data is that the researcher has no control over the value of x. Its value is determined simultaneously with y. As a consequence, x values may be correlated with other factors that determine y but which are not observed directly and which, consequently, are in the error term. If x is correlated with any element of the error term, then the zero conditional mean assumption is violated and 1 , the pure effect of x on y, cannot be estimated without bias. Problem Set #1, question 5 and Wooldridge end-of-chapter question 2.11 give good illustrations of the difference between the two types of data. 3. 4. This is a slightly recycled version of Wooldridge end-of-chapter question 2.1 (p. 61). See the Chapter 2 Solutions on the course web page for one possible way to answer the question. First, note that: n x (y i 1 i n i y ) [ xi y i xi y ] i 1 n x i y i y x i , by summation rules 3 and 2. Then multiplying the last i 1 term by n/n, we get: xi y i ny x n i xi y i nx y . Second, note that: n n i 1 i 1 ( xi x )( yi y ) [ xi yi xi y xy i xy ] 2 xi yi y xi x yi nxy , where summation rule 3 was applied to distribute the sum sign, summation rule 2 was applied in order to get terms 2 and 3, and summation rule 1 was applied in order to obtain the last term. Then multiplying both terms 2 and 3 by n/n, we get: xi y i ny x n i nx QED. 5. R 2 .9 R 2 .1 y n i nx y xi y i nx y nx y nx y xi y i nx y i 3 6. True. By definition, errors are homoskedastic if Var (u | x) 2 . This means that variation of observed y from predicted y is the same regardless of the value of x. See Figure 2.8 in the text. Here’s a two-dimensional, hand-drawn rendering: 7. Definitely false. The variance of ˆ1 decreases as sample size , n, increases. From the formula n 2 ˆ sheet, Var ( 1 ) . Furthermore, SST x ( x i x ) 2 . All terms on the right-hand side SSTx i 1 of SSTx are positive by virtue of the fact that they are squared. Therefore if the sample is increased, that will put additional positive terms in the sum of squared deviations thereby increasing SSTx. Since SSTx is in the denominator of the right-hand-side of Var ( ˆ1 ) , an increase in SSTx induced by an increase in sample size necessarily shrinks the variance of ˆ1 . This result should seem quite intuitive. A larger sample size implies you have more information about the population. More information about the population should, of course, buy you more precise estimates of the population parameters. 8. log( y) 0 1 x . Log-level model: y is measured in log terms, x is said to be measured in “level” terms. We use the log-level model whenever we believe a change in x will induce a constant percentage change in variable y, rather than a constant absolute change in y (as is the case in a level-level model). A log-level model allows us to capture a non-linear relationship between y and x using a linear regression model. Hence, linear regression allows for much more flexibility in functional form than one might initially presume. 4 9. 10. a. SLR.1 requires that the model be “linear in the parameters” which is the case in the model presented. SLR.1 does not require that the model be linear in the dependent and independent variables. b. The slope coefficient of the log-log model is the elasticity of the predicted value of the dependent variable with respect to the regressor. In the problem at hand, this implies that the elasticity of predicted CEO salary with respect to firm sales is .257. So this would imply that a 1% increase in firm sales is associated with a .257% increase in predicted CEO salary. You could also say based on this elasticity that a 10% increase in firm sales is associated with a 2.57% increase in predicted CEO salary. Partially differentiating ŷ with respect to x1 yields ˆ1 , which is the change in the predicted value of y due to a 1-unit change in x1, holding x2 constant. It measures the pure effect of x1 on y. Similarly, partial differentiation of ŷ with respect to x 2 yields ˆ 2 , which is interpreted as the change in predicted y due to a 1-unit change in x2 holding x1 constant. We no doubt know from other econ classes that such ceteris paribus effects are of the utmost importance in economic analysis. Multiple regression analysis allows us to conduct ceteris paribus analysis and this is a key reason why we view regression as such a powerful tool in the economist’s tool kit. 11. a. Given k = 5, there are six parameters to be estimated (don’t forget 0 !). Hence the SSR is a function of { ˆ0 , ˆ1 , ˆ2 ,..., ˆ5 }. As a consequence, SSR must be partially differentiated with respect to each of these choice variables. By implication, there will be six normal equations (or first order conditions). They are listed on p. 74 for the curious. b. SST measures variation in the dependent variable about its own mean. It is a constant; the number of regressors has no effect on SST. An additional regressor will reduce SSR (or possibly leave it the same, but reduction in SSR is most likely). The reason this must be is that when the sixth regressor is added, it must at the very least be the case that the same SSR that was found with five regressors could once again be achieved. This is because the computer has the option of choosing ˆ6 0 and using the same set of ˆ0 , ˆ1 , ˆ2 ,..., ˆ5 as in the five regressor case. This would then result in exactly the same SSR as in the five regressor case. So what this says is that if the sixth regressor has no explanatory power whatsoever, then SSR will be the same as it was when only five regressors were included. On the other hand, if the sixth regressor has even a small amount of explanatory power, then ˆ 6 0 . In this event, the sixth regressor has some degree of explanatory power, which implies some reduction in the unexplained variation in y measured by SSR. If a sixth regressor is added, then R 2 cannot decrease and will most likely increase rather than stay the same. R 2 will increase as long as SSR decreases when the sixth 5 regressor is added. This follows because: R 2 1 SSR . Clearly, if SSR decreases and SST SST is constant, then the right-hand-side of the statement must increase. Adding regressors will generally drive R 2 up; the real question is whether R 2 increases meaningfully. Longer Problems 1. a. ( x x ) and y x u . Substituting for y in the numerator and for ( x x ) in the denominator of the term after the second 2 Note that SSTx i i 0 1 i i i 2 i equality yields: ˆ1 (x i x )( 0 1 x i u i ) SSTx . Doing the multiplication implied inside the summation gives: ˆ1 (1 / SSTx ){ 0 ( xi x ) 1 xi ( xi x ) ( xi x )u i }. Next, invoke summation rule 3 to distribute the summation operator to the terms inside the sum and invoke summation rule 2 to move multiplicative constants outside of the relevant summation operator. These actions give the following result: n i 1 ˆ1 (1 / SSTx ) 0 ( xi x ) 1 xi ( xi x ) ( xi x )u i } . n Note that the first sum is 0 since (x i 1 i x ) 0 . Furthermore, note that 1 xi ( xi x ) 1 ( xi x ) 2 1 SSTx . Substituting these results into ˆ1 and re-arranging yields the desired expression: n SSTx ˆ1 0 1 SSTx b. (x i x )u i SSTx 1 (x i 1 i x )u i SSTx . To prove that ˆ1 is unbiased, we must show that E ( ˆ1 ) 1 . Assume a given sample of data, hence we’ll take expectations conditional on the sampled values of the regressor. Now, take the expectation on both sides of the most recent expression of ˆ1 : 6 E ( ˆ1 ) E 1 n x )u i . From the formula sheet, apply rule of expectation #2 SSTx (x i 1 i to the right hand side expression. This yields: ( xi x )u i E ( ˆ1 ) E ( 1 ) E . SSTx 1 is a parameter and, therefore, is constant. Therefore: E(1 ) 1 , by expectation rule #1. Moreover, given that we’re conditioning on the sampled values of x, SSTx is treated as a constant as is any ( xi x ) in the numerator of the second term on the right hand side. Now apply rule of expectation #3 to the second term on the right hand side. This results in: n E ( ˆ1 ) 1 (1 / SSTx ) ( xi x )E (u i ) . i 1 By SLR.4, E (u | x) E (u ) 0 , therefore: n E ( ˆ1 ) 1 (1 / SSTx ) {0 ( xi x )} 1 . i 1 Invoking SLR.2, the above result holds for any randomly selected sample, hence ˆ1 is an unbiased estimator of 1 . c. Answers will vary, but here’s how I would answer. ˆ1 , is unlikely to equal 1 for any particular random sample of size n that we might draw. The unbiasedness property implies, however, that if we repeatedly drew random samples of size n from the population and ran the regression for each such sample, we would then find that the average value of ˆ1 across the many samples of size n would equal the parameter 1 . 2. a. By SLR.5, Var (u | x) 2 . Furthermore, it follows from the general definition of variance that Var (u | x) E[(u E (u | x)) 2 | x] E (u 2 | x) [ E (u | x)] 2 . By virtue of SLR.4, [ E (u | x)] 2 0 , therefore Var (u | x) 2 E (u 2 | x) . Since the conditional variance of u given any x is the constant 2 , it follows that the unconditional expectation of u 2 , i.e., E (u 2 ), is equal to 2 . In other words, the error variance, 2 , is the average squared error in the population. We don’t observe the 7 population, we observe the sample. Furthermore, we don’t observe the errors in the sample, rather we observe the residuals. We nevertheless think of the residuals as estimates of the errors. Hence a reasonable approach to estimating the average squared error in the population is to use the average squared residual in the sample. b. c. n 2 uˆ i 2. To show that the proposed estimator is biased, show that E i 1 n n 2 uˆ i n2 2 n E i 1 (1 / n) E uˆ i2 (1 / n)( n 2) 2 2 . What this shows n n i 1 is that the average of the average squared residual across repeated samples of size n does not equal the parameter it is intended to estimate. Therefore such an estimator is deemed to be biased. n2 2 2 , therefore the proposed estimator would systematically under-estimate n the true error variance in repeated sampling. d. As n increases, n2 1 , hence the degree of bias decreases as the sample size gets n larger. Recall the two examples given in class. First, consider a really small sample, say, n = 3. If we repeatedly drew samples of n = 3 from the sample, the average value of the proposed estimator would be 32 2 (1 / 3) 2 .333 2 . In other words, in this 3 scenario the estimator is very likely to be much smaller than the true error variance. Second, consider a relatively large sample of, say, n = 1000. If we repeatedly drew samples of this size from the population, the average value of the proposed estimator across such samples would be 1000 2 2 (998 / 1000) 2 .998 2 2 . In the 1000 second scenario, the average value of the proposed estimator and the true error variance are virtually indistinguishable. A moral of this story is that the degree of freedom correction is important in small sample settings but not in large sample settings. ^ 3 a. math10 37.36 .00246 exp end n 408, R 2 .033 8 b. ˆ1 .00246 implies that one extra dollar spent per student in the school district is associated with an increase of .00246 of a percentage point in the pass rate of the test. Not much! c. According to the table of regression results, SE ( ˆ1 ) .00066 . In other words, if we repeatedly sampled, we estimate the typical amount of discrepancy between 1 and ˆ1 to be .00066. This seems like a pretty good degree of precision! d. R 2 .033 implies that 3.3% of the total variation in the pass rate on the test is explained by variation in school district spending per student. So variation in the pass rate is explained largely by factors other than school district spending. Hmmm. e. ˆ 10.332 . This can be read directly from the regression output at the entry for Root MSE. Alternatively, the value could be calculated from the ANOVA table as the square root of 106.748728, which is the entry corresponding to the Mean Square Residual. ˆ is otherwise known as the standard error of the regression. It aims to estimate the typical amount of vertical distance between the observed and the predicted value of the dependent variable. f. Recall from Problem Set 2, Question 7 that linear transformation of the dependent variable to c1y and of the independent variable to c2x would yield a slope coefficient of ~ 1 c1 ˆ ~ 1 and an intercept coefficient of 0 c1 ˆ1 , where ˆ1 and ˆ0 are the c2 relevant regression coefficients from the original regression. In the problem at hand, the dependent variable is not transformed, hence c1 1 . The independent variable expend is transformed by dividing by 100, putting expend in hundred dollar units rather than dollar units. Accordingly, c2 1 / 100 . The implication is that the intercept in the regression using the transformed independent variable is the same as in the original ~ regression, i.e., 0 1 ˆ0 37.36 . In other words, whether we measure spending per student in dollar units or hundred dollar units, the predicted pass rate is 37.36% if expenditure per student is 0. The implication for the slope coefficient in the new ~ regression is 1 1 .00246 100 .00246 .246 . In other words, dividing (1 / 100) expend by 100 has the effect of moving the decimal point over two places. The interpretation of the slope coefficient in the transformed regression is that a $100 increase in spending per student is associated with a .246 percentage point increase in the test’s pass rate. It should, on reflection, seem that the best way to define the spending variable here is to use the measure set in hundred dollar units. With the variable defined as it was originally, the interpretation should seem a bit strained. Relative to the mean of expend (= $4,376.58), a $1 change is infinitesimal as is the implied impact on the pass rate of .00246 of a percentage point. The moral of this story is that proper choice of units of measurement for the variables can lead to simpler, more straightforward interpretation of resulting regression coefficients. 9 R 2 remains at .033 in the transformed regression, since linear transformation of the variables does not alter the relative amount of variation in the dependent variable explained by variation in the independent variable.