Study Guide for Exam #2 This Study Guide is a supplementary study tool to help you better prepare for the exam. This should not, however, be the only source of information that you use to study for the exam. First, and foremost, begin by reading your lecture notes, solve the problems that we have done in class, go over the concepts and the formulas. You should also solve as many additional problems as you can. The more practice you get, the better. 1 List of Definitions A binary variable is a variable that can only take two values - 0 and 1. A binary variable is also called an indicator variable or a dummy variable. The error term ui is homoskedastic if the variance of the conditional distribution of ui given Xi is constant for i = 1, ..., n and in particular does not depend on Xi . Otherwise, the error terms is heteroskedastic. Mathematically, var(ui |Xi = x) = σu2 ∀i = 1, ..., n (1) If the regressor is correlated with a variable that has been omitted from the analysis and that determines, in part, the dependent variable, then the OLS estimator will have omitted variable bias. σu βˆ1 → β1 + ρXu σX 1 (2) The population regression line/function in the context of multple regression is given by: Yi = β0 + β1 X1 + β2 X2 + ... + βk Xk (3) The coefficient β0 is the intercept; the coefficient β1 is the slope coefficient of X1 , or the coefficient on X1 , and so on. The multiple linear regression model still allows for deviations from the population regression line due to remaining additional factors (including chance), captured in ui : Yi = β0 + β1 Xi1 + β2 Xi2 + ... + βk Xki + ui , ∀i = 1, ..., n (4) Yi is ith observation on the dependent variable: X1i , ..., Xki are the ith observation on each of the k regressors; and ui is the error term. β1 is the slope coefficient on X1 , β2 is the slope coefficient on X2 , and so on. The coefficient β1 is the expected difference in Y1 associated with a unit difference in X1 , holding constant the other regressors, X2 , ..., Xk . The intercept β0 is the expected value Y when all the X’s equal 0. The estimators of the coefficients β0 , β1 , ..., βk that minimize the sum of squared mistakes are called the ordinary least squares (OLS) estimators of β0 , β1 , ..., βk and are denoted βˆ0 , βˆ1 , ..., βˆk . The OLS regression line is the straight line constructed using the OLS estimators: βˆ0 + βˆ1 X1i + ... + βˆk Xki . 2 The predicted value of Yi given X1i , ..., Xki , based on the OLS regression line, is Ŷi = βˆ0 + βˆ1 X1i +...+ βˆk Xki . The OLS residual for the ith observation is the difference between Yi and its OLS predicted value; that is, the OLS residual is ûi = Yi − Ŷi . Zero conditional mean: E(ui |X1i , ..., Xki ) = 0. This assumption is implied if X1i , ..., Xki are randomly assigned or are as-if randomly assigned. The regressors are said to exhibit perfect multicollinearity if one of the regressors is a perfect linear function of the other regressors. The dummy variable trap arises when the set of regressors includes a complete set of dummy variables (indicator variables) for all possible outcomes in addition to estimating the intercept. Imperfect multicollinearity means that two or more of the regressors are highly correlated in the sense that there is a linear function of the regressors that is highly correlated with another regressor. R-squared (R2 ) captures the proportion of the variation in the dependent variable that is explained by the model (i.e. the chosen regressors). Equivalently, the R2 is 1 minus the fraction of the variance of Yi not explained by the regressors. 3 ESS T SS SSR R2 = 1 − T SS R2 = (5) (6) The adjusted R̄2 accounts for the number of regressors and imposes a small penalty for adding regressors that is only offset if they have actual explanatory power. R̄2 = 1 − because s2 n − 1 SSR = 1 − 2u n − k − 1 T SS sY (7) n s2û X 1 SSR = uˆ2i = n − k − 1 i=1 n−k−1 (8) and s2y = T SS(n − 1) (9) The standard error of the regression (SER) is an estimator of the standard deviation of the regression prediction error ui . The SER measures the spread of observations around the fitted regression line, calculated in the same units as the dependent variable. n s2û X 1 SSR = uˆ2i = n − k − 1 i=1 n−k−1 SER = sû Control Variable is a regressor included to hold constant factors that if neglected could lead to omitted variable bias of the variable of interest. 4 (10) (11) The F -statistic is used to test a joint hypothesis about regression coefficients. In the q = 2 restriction case with H0 : β1 = 0 and β2 = 0, 1 t21 + t22 − 2ρ̂t1 ,t2 t1 t2 F = { } ∼ Fq=2,n−k−1 q 1 − ρ̂2t1 ,t2 (12) where ρ̂2t1 ,t2 is an estimator of the correlation between t-statistics. The special homoskedasticity-only F-statistic can be expressed as the improvement in fit of the regression (e.g. as measured by the decrease in the sum of squared residuals or increase in R2 .) F = 2 (RU2 nrestricted − RRestricted )/q 2 (1 − RU nrestricted )/(n − kU nrestricted − 1) (13) F = (SSRRestricted − SSRU nrestricted )/q (SSRU nrestricted )/(n − kU nrestricted − 1) (14) In large samples, p-values are computed and interpreted analogously, except that they use the Fq,∞ distribution. Let F act denote the value of the F-statistic actually computed. Because the F -statistic has a large sample Fq,∞ distribution under the null hypothesis, the p-value is p − value = P r[Fq,∞ > F act ] The p-value can be evaluated using a table of the Fq,∞ distribution. 5 (15) A nonlinear regression function is a nonlinear function of the independent variables. The function f (X) is linear if the slope of f (X) is the same for all values of X, but if the slope depends on the value of X, then f (X) is nonlinear. The nonlinear population regression models are of the form Yi = f (X1i , X2i , ..., Xki ) + ui , i = 1, ..., n (16) where f (X1i , X2i , ..., Xki ) is the population nonlinear regression function, a possibly nonlinear function of the independent variables and ui is the error term. The expected change in Y , ∆Y , associated with the change in X1 , holding X2i , ..., Xki constant, is the difference between the value of the population regression function before and after changing X1 , holding X2i , ..., Xki constant. That is, the expected change in Y is the difference: ∆Y = f (X1 + ∆X1 , X2 , ..., Xk ) − f (X1 , X2 , ..., Xk ) (17) The estimator of this unknown population difference is the difference between the predicted values of these two cases. Let fˆ(X1i , X2i , ..., Xki ) be the predicted value of Y based on the estimator fˆ of the population regression function. Then the predicted change in Y is ∆Ŷ = fˆ(X1 + ∆X1 , X2 , ..., Xk ) − fˆ(X1 , X2 , ..., Xk ) Let r denote the highest power of X that is included in the rgeression. The polynomial regression model of degree r is 6 (18) Yi = β0 + β1 Xi + β2 Xi2 + ... + βr Xir + ui (19) When ∆x is small, the difference between x + ∆x and the logarithm of x is approximately ∆x/x, the percentage change in x divided by 100. When Y is not in logs, but X is, this is sometimes referred to as a linear-log model. Yi = β0 + β1 ln(Xi ) + ui (20) When Y is in logarithms, but X is not, this is referred to as a log-linear model. ln(Yi ) = β0 + β1 Xi + ui (21) When both X and Y are specified in logarithms, this is referred to as a log-log model. ln(Yi ) = β0 + β1 ln(Xi ) + ui (22) We can modify the multiple regression model by introducing the product of the two binary variables as another regressor. Yi = β0 + β1 D1i + β2 D2i + β3 (D1i × D2i ) + ui The product D1i × D2i is called an interaction term or an interacted regressor, and the population regression model is called a binary variable regression model. 7 (23) We can modify the multiple regression model by introducing the product of a binary variable and a continuous variable as another regressor. Yi = β0 + β1 Xi + β2 Di + β3 (Xi × Di ) + ui (24) The product Xi × Di is called an interaction term or an interacted regressor, and the population regression model above illustrates the possibility of an interaction between a continuous variable and a binary variable. We can modify the multiple regression model by introducing the product of the two continuous variables as another regressor. Yi = β0 + β1 X1i + β2 X2i + β3 (X1i × X2i ) + ui (25) The product X1i × X2i is called an interaction term or an interacted regressor, and the population regression model above illustrates the possibility of an interaction between two continuous variables. The interaction term allows the effect of a unit change in X1 to depend on X2 . The chi-squared distribution (χ2m ) is the distribution of the sum of m squared standard normal random variables with degrees of freedom m. The F distribution is the ratio of two independently distributed chi-squared random variables divided by their respective degrees of freedom. 8 If W1 ∼ χ2m , W2 ∼ χ2n , and then P r(W1 = w1 |W2 = w2 ) = P r(W1 = w1 ) (26) W1 /m ∼ Fm,n W2 /n (27) When the denominator degrees of freedom is large enough the Fm,n distribution can be approximated by the Fm,∞ distribution. The Fm,∞ distribution is the distribution of a chi-squared random variable, W , with m degrees of freedom divided by m: W/m is distributed Fm,∞ . 2 List of Key Concepts and Applications 2.1 Statistical Inference in Multiple Regression • To test the hypothesis that H0 : βj = βj,0 against the alternative βj 6= βj,0 , we have to: 1. Compute the standard error of SE(βˆj ). 2. Compute the t-statistic. act t βˆj − β0 = ≈ N (0, 1) SE(βˆj ) (28) 3. Compute the p-value. Specifically, we reject the null (H0 : βj = βj,0 ) at the 5% significance level whenever 1. p − value = 2Φ (− |tact |) ≤ 0.05 2. |tact | ≥ 1.96 3. βˆj falls outside the 95% confidence interval defined by [βˆj − 1.96SE(βˆj ), βˆj + 1.96SE(βˆj )]. 9 2.2 Testing Joint Hypotheses – E.g., H0 : β1 = 0 and β2 = 0 vs. H1 : β1 6= 0 and/or β2 6= 0. – E.g., H0 : β1 = β2 vs. H1 : β1 6= β2 . 2.3 Testing Multiple Restrictions Involving Single Coefficients • To test the hypothesis that H0 : βj = βj,0 , βm = βm,0 , ... against the alternative H1 : one or more of the q restrictions does not hold, we have to: in the q = 2 restriction case with H0 : β1 = 0 and β2 = 0, 1 t21 + t22 − 2ρ̂t1 ,t2 t1 t2 F = { } ∼ Fq=2,n−k−1 q 1 − ρ̂2t1 ,t2 (29) where ρ̂2t1 ,t2 is an estimator of the correlation between tstatistics. Because the F -statistic has a large sample Fq,∞ distribution under the null hypothesis, the p-value is p − value = P r[Fq,∞ > F act ] (30) If the error term is homoskedastic, the F -statistic takes the following form 2 (RU2 nrestricted − RRestricted )/q F = 2 (1 − RU nrestricted )(n − kU nrestricted − 1) F = (SSRRestricted − SSRU nrestricted )/q (SSRU nrestricted )(n − kU nrestricted − 1) 10 (31) (32) 2.4 Testing Single Restrictions Involving Multiple Coefficients H0 : β1 = β2 (33) H1 : β1 6= β2 (34) vs. – Test the restriction directly. – Transform the model and then test the restriction. 2.5 A General Approach to Modeling Nonlinearities Using Multiple Regression 1. Identify a possible nonlinear relationship. 2. Specify a nonlinear function, and estimate its parameters by OLS. 3. Determine whether the nonlinear model improves upon a linear model. 4. Plot the estimated nonlinear regression function. 5. Estimate the effect on Y of a change in X. • You should be able to correctly interpret regression results from STATA. • You should be able to calculate, if need be, and interpret measures of the goodness of fit of a given regression model (e.g., SER, R2 , R̄2 ). • You should be able to 1) detect the presence of heteroskedasticity in the data; 2) propose solutions to standard regression methods to accommodate heteroskedastic errors. 11 • You should be able to propose nonlinear regression models to improve upon linear regression models and to interpret the estimated coefficients in the context of nonlinear regression models. • You should be able to provide policy recommendations based on empirical tests. 12