Noter til 2 timer eksamen Appendix Grundlæggende formler: Population Stikprøve k Middelværdi π μ/E(X) = ∑ xj f(xj ) πΜ = π j=1 −1 ∑ ππ π=1 π Varians 2 πππ(π)/π = πΈ(π 2) −π 1 π = ∑(ππ − πΜ )2 π−1 2 2 π=1 Standard afvigelse/fejl Kovarians Korrelationskoefficient 95 % kofidensinterval for stor n (<120) π π(π)/π = √πππ(π) πΆππ£(π, π) = πΈ[(π − ππ )(π − ππ )] πππ πΆππ£(π, π) πΆπππ(π, π) = , [−1; 1] π π(π) β π π(π) π π = √π 2 π πππ 1 = ∑(ππ − πΜ )(ππ − πΜ ) π−1 π=1 π ππ = πππ ππ ππ π¦Μ ± 1,96 β π π(π¦Μ ) Store bogstaver: estimatorer Små bogstaver: estimater ∑(π₯π − π₯Μ ) = 0 Dvs. hvis hver observation trækkes fra gennemsnittet, så er summen af disse forskelle lig nul. Finite sample: the properties hold for a sample of any size, no matter how small or large Unbiasedness: An estimator, W of Ρ², is an unbiased estimator if: E(W) = Ρ², for all possible values of Ρ². Efficiency: If W1 and W2 are two unbiased estimators of Ρ², W1 is efficient relative to W2 when Var(W1) ≤ Var(W2) for all Ρ² Consistency: Let Wn be an estimator of Ρ² based on a sample Y1, Y2, … Yn og size n. Then Wn is a consistent estimator of Ρ² if for every ε > 0, P(|Wn – Ρ²| > ε) → 0 as n → ∞ 1 Chapter 2 – The Simple Regression Model SLR1 – zero conditional mean assumption E(u|x) = E(u) = 0 SLR1 gives us the population regression function (PRF): E(y|x) = β0+ β1x The sample regression function: π¦Μ = π½Μ0 + π½Μ1 π₯ Ordinary least squares estimates (OLS): Fitted value: π¦Μπ = π½Μ0 + π½Μ1 π₯π Residuals: π’Μπ = π¦π − π¦Μπ π½Μ0 = π¦Μ − π½Μ1 π₯Μ π ∑ (π₯ −π₯Μ )(π¦π −π¦Μ ) π½Μ1 = π=1π π 2 ∑π=1(π₯π −π₯Μ ) (π’Μπ is not the same as ui – while the residuals are computed from the data, the errors are never observable) The 3 most important algebraic properties of OLS residuals: 1. The sum, and therefore the sample average of the OLS residuals, is zero ∑ππ=1 π’Μπ = 0 2. The sample covariance between the independent variables and the OLS residuals is zero ∑ππ=1 π₯π π’Μπ = 0 3. The point (π₯Μ , π¦Μ ) is always on the OLS regression line SST, SSE, SSR and R2: Total sum of squares (SST): Explained sum of squares (SSE): Residual sum of squares (SSR): ∑ππ=1(π¦π − π¦Μ )2 ∑ππ=1(π¦Μπ − π¦Μ )2 ∑ππ=1 π’Μπ2 total sample variation in yi sample variation in π¦Μπ sample variation in π’Μπ SST = SSE + SSR R2 = SSE/SST = 1-SSR/SST Summary of functional forms involving logarithms: Model Level-level Level-log Log-level Log-log Dependent variable y y log(y) log(y) Independent variable x log(x) x log(x) Interpretation of β1 Δy = β1 Δx Δy = (β1/100)%Δx %Δy = (100β1)Δx %Δy = β1%Δx Homoskedasticity: Because Var(u|x) = E(u2|x)-[E(u|x)]2 and E(u|x) = 0, σ2 = E(u2|x), which means σ2 is the unconditional expectation of u2. Therefore, σ2 = E(u2) = Var(u), because E(u) = 0 Var(u|x) = Var(y|x) - heteroskedasticity is present whenever Var(y|x) is a function of x. σ2 = the error variance σ = the standard deviation of the error 2 Variance and standard deviation of the estimates under SLR1-5: πππ(π½Μ1 ) = π2 2 ∑π π=1(π₯π −π₯Μ ) π2 = πππ(π½Μ0 ) = ππππ₯ σ2 ↑ --- Var(β1) ↑ variability in xi ↑ --- Var(β1) ↓ → 2 π 2 π−1 ∑π π=1 π₯π 2 ∑π π=1(π₯π −π₯Μ ) n ↑ --- Var(β1) ↓ Estimation of the standard error of π½Μ1: Μ Μ π π π π(π½Μ1 ) = = 1/2 √ππππ₯ 2 (∑π π=1(π₯π −π₯Μ ) ) Error variance and standard error of the regression: Estimation of the error variance, πΜ 2 / π 2 : πΜ 2 = 1 (π−2) ∑ππ=1 π’Μπ2 = πππ (π−2) Estimation of the standard error of the regression: πΜ = √πΜ 2 Chapter 3 – Multiple Regression Analyses: Estimation The Linear Regression Model in Matrix Form: yt = xt β + ut , t = 1,2, …, n. (nx1) (nx(k+1)) ((k+1)xn) (nx1) [row x column] Μ = (X’X)-1 (X’y) π· The OLS model OLS fitted/predicted values: π¦Μ = π½Μ0 + π½Μ1 π₯1 + π½Μ2 π₯2 + β― + π½Μπ π₯π OLS written in terms of change: βπ¦Μ = π½Μ1 βπ₯1 + π½Μ2 βπ₯2 + β― + π½Μπ βπ₯π (Notice the intercept is out) OLS written in terms of change of the coefficient x1 holding all other independent variables fixed: βπ¦Μ = π½Μ1 βπ₯1 OLS written in terms of change of the coefficients x1 and x2 (same unites) holding all other independent variables fixed: βπ¦Μ = π½Μ1 βπ₯1 + π½Μ2 βπ₯2 3 First order conditions for the OLS estimators The OLS residuals are defined as in the simple regression case: (π’Μπ = π¦π − π¦Μπ ) and have the same properties: 1. The sum, and therefore the sample average of the OLS residuals, is zero π¦Μ = π¦Μ Μ 2. The sample covariance between the independent variables and the OLS residuals is zero ∑ππ=1 π₯ππ π’Μπ = 0 3. The point (π₯Μ 1 , π₯Μ 2 , … , π₯Μ π , π¦Μ ) is always on the OLS regression line Assumptions: MLR1-5 MLR1: Linear in parameters – The model in the population can be written as: π¦ = π½0 + π½1 π₯1 + π½2 π₯2 + β― + π½π π₯π + π’ MLR2: Random sample – we have a random sample of n observations, {(xi1, xi2, …, xik, y): i=1,2…, n}, following the population model in MLR1 (random sample = i.i.d – independent identical distributed) MLR3: No perfect collinearity – in the sample, none of the independent variables is constant, and there are no exact linear relationships among the independent variables (if MLR3 is not met the model suffers from perfect collinearity) MLR4: Zero conditional mean – the error u has an expected value of zero given any values of the independent variables. E(u|x1, x2, …, xk) = 0 3 ways for MLR4 to be violated: 1) the functional relationship between the independent variables and the dependent is misspecified, 2) omitting an important variable, 3) measurement error in an explanatory variable Under MLR1-4 the OLS estimators are unbiased estimators of the population parameters: πΈ(π½Μπ ) = π½π , π = 0,1, … , π, (An estimate cannot be unbiased, but the procedure by which the estimate is obtained can be unbiased when we view the procedure as being applied across all possible random samples.) MLR5: homoskedasticity – the error u has the same variance given any of the explanatory variables. Var(u|x1, x2, …, xk) = σ2 MLR1-5 = The Gauss-Markov assumptions MLR1 and MLR4 gives: πΈ(π¦|π) = π½0 + π½1 π₯1 + π½2 π₯2 + β― + π½π π₯π 4 Estimation of the variances and standard errors in multiple regression analysis: Estimation of the sampling variances of the slope estimators π£ππ(π½Μπ ) = Μ2 π ππππ (1−π π2 ) Where ππππ = ∑ππ=1(π₯ππ − π₯Μ π )2 is the total sample variation in xj and π π2 is the R-squared from regressing xj on all other independent variables. (Valid under MLR1-5) The sample error variance πΜ 2 ↑ --- π£ππ(π½Μπ ) ↑ (Reduce σ2 by adding more explanatory variables) The total sample variation in xj SSTj ↑ --- π£ππ(π½Μπ ) ↓ (Increase sample variation in xj by increasing the sample size) The linear relationship between among the independent variables π π2 ↑ --- π£ππ(π½Μπ ) ↑ (Avoid too much multicollinearity – e.g. collect more data) Omitted variables only cause bias if they are correlated with independent variables in the modeltherefore including irrelevant variables is not a good idea because it will most likely make the multicollinairty problem bigger (π£ππ(π½Μπ ) ↑) If omitted variables are correlated with independent variables in the model they should off cause be included to avoid bias. The estimate of the standard variance of the regression: πΜ 2 = πππ /(π − π − 1) The estimate of the standard error of the regression: πΜ = √πΜ 2 The estimate of the standard error of π½Μπ under MLR5: π π(π½Μπ ) = πΜ/[ππππ (1 − π π2 )]1/2 Under MLR1-5 OLS gives unbiased estimation of σ2, πΈ(πΜ 2 ) = π 2 Normality assumption – MLR6 MLR6: Normality - The population error u is independent of the explanatory variables x1, x2, …, xk, and is normally distributed with zero mean and variance σ2: ~ u Normal(0, σ2). (strong assumption that is problematic in several cases – e.g. if y takes on only a few values) MLR1-6: The classical linear model assumptions (CLM) Under CLM1-6 π½Μπ ~ππππππ[π½π , πππ(π½Μπ )] therefore (π½Μπ − π½π )/π π(π½Μπ )~ππππππ(0,1) In addition any linear combination of the π½Μ0 , π½Μ1 , π½Μ2 , … , π½Μπ is also normally distributed. 5 Assumptions MLR1-4: OLS is LUE MLR1-5: OLS is BLUE CLM1-6: OLS estimators has the smallest variance among all unbiased estimators – not only in comparison to linear estimators (MLR6 is a strong assumption that’s only necessary with small samples to know the sampling distribution for inference) Bias Omitted variable bias – the simple case - Bias when omitting the explanatory variable x2 from the model y = β0+ β1x1 + β2x2 + u π΅πππ (π½Μ1 ) = πΈ(π½Μ1 ) − π½1 = π½2 πΏΜ1 π½Μ1 comes from the underspecified model without x2 π½1 and π½2 comes from the specified model with x2 πΏΜ1 is the slope from the simple regression of x2 on x1 (if x2 on x1 are uncorrelated in the sample then π½Μ1 is unbiased) Summary of bias in π½Μ1when x2 is omitted in estimating equation y = β0+ β1x1 + β2x2 + u Corr(x1, x2) > 0 Corr(x1, x2) < 0 β2 > 0 Positive bias Negative bias β2 < 0 Negative bias Positive bias Μ Μ Upward bias in π½1: πΈ(π½1 ) > π½1 Downward bias in π½Μ1: πΈ(π½Μ1 ) < π½1 Biased toward zero: the case where πΈ(π½Μ1 ) is closer to zero than β1 Omitted variable bias – the more general case with multiple regressors in the estimated model - Bias when omitting the explanatory variable x3 from the model y = β0+ β1x1 + β2x2 + β3x3 u Assume that x2 is uncorrelated with x1 and x3 – then we can study the bias in x1 as if x2 were absent from the model. This means we can use the above equation and table (now just with x 3 instead of x2) 6 Chapter 4 – Multiple Regression Analysis: Inference t and F tests and confidence intervals Under CLM1-6 or with large samples: π‘π½Μπ = (π½Μπ − π½π )/π π(π½Μπ )~π‘π−π−1 where k +1 is the number of unknown parameters in the population model (k slope parameters and the intercept) πΉ= (πππ π −πππ π’π )/π πππ π’π /(π−π−1) ~πΉ(π, π − π − 1) The restricted model (r) always has fewer parameters than the unrestricted model (ur). q is the number of exclusion restrictions to test (=df r - dfur) πΉ= (π 2 π’π −π 2π )/π 1−π 2 π’π /(π−π−1) ~πΉ(π, π − π − 1) Since SSRr can be no smaller than SSRur, the F statistic is always nonnegative The F statistic is often useful for testing exclusion of a group of variables when the variables in the group are highly correlated (when the multicollinearity makes it difficult to uncover the partial effect) It can be shown that F statistic for testing exclusion if a single variable is equal to the square of the corresponding t statistic. A 95% confidence interval: π½Μπ β π β π π(π½Μπ ) where the constant c is the 97,5th percentile in a tn-k.1 distribution. Testing hypotheses about a single linear combination of the parameters: H0: β1 = β2 => β1 - β2 = 0 π‘ = (π½Μ1 − π½Μ2 )/π π(π½Μ1 − π½Μ2 ) 2 2 π π(π½Μ1 − π½Μ2 ) = {[π π(π½Μ1 )] + [π π(π½Μ2 )] − 2πΆππ£(π½Μ1, π½Μ2 )}1/2 To get the right standard error for the test it is easiest to estimate a new model where we define a new parameter as the difference between β1 and β2 – do this by including x1 + x2 in the equation instead of x2 and then the estimate and standard error of x1 can be used for the test - see page 142… Significance level – the probability of rejecting H0 when it is in fact true (we will mistakenly reject H0 when it is true 5% of the time) One sided test: the critical value is the 95th percentile in a t distribution with n-k-1 degrees of freedom Two sided test: the critical value is the 97,5th percentile in a t distribution with n-k-1 degrees of freedom 7 The p-value: the smallest significance level at which the null hypothesis would be rejected. The pvalue is the probability of observing a t statistic as extreme as we did if the null hypothesis is true – small p-values are evidence against the null. Chapter 5 – Multiple Regression Analysis: OLS Asymptotics Consistency = asymptotic unbiased If an estimator is consistent then the distribution of π½Μπ becomes more and more tightly distributed around βj as the sample size grows. As n tends to infinity, the distribution of π½Μπ collapses to the single point βj (e.i. plim π½Μπ = βj) MLR4’ Zero mean and zero correlation. E(u) = 0 and Cov(xj,u) = 0, for j = 1, 2, …, k. MLR4’ is weaker than MLR4: MLR4 requires that any function of xj is uncorrelated with u, MLR4´requires only that each xj is uncorrelated with u. OLS is biased but consistent under MLR4’ if E(u|x1, …, xk) depends on any of xj But if we only assume MLR4’ MLR1 need not to represent the population regression model (PRF), and we face the possibility that some nonlinear function of the xj such as xj2 could be correlated with the error u. This means that we have neglected nonlinearities in the model that could help us better explain y; if we knew that we would usually include such nonlinear functions. That is, most of the time we hope to get a good estimate of the PRF, and so MLR4 (the ‘normal’ one) is natural (we use MLR’ with IV where we have no interest in modelling PRF). Inconsistency in the estimators – asymptotic bias Correlation between u and any of xj causes all of the OLS estimators to be biased and inconsistent (if the independent variables in the model are correlated – which is usually the case). Any bias persists as the sample size grows – the problem actually gets worse with more observations. The inconsistency in π½Μ1(sometimes called the asymptotic bias) is πππππ½Μ1 − π½1 = πππ£(π₯1 , π’)/π£ππ(π₯1 ) Because var(x1) is positive the inconsistency in π½Μ1is positive if πππ£(π₯1 , π’) is positive and negative if πππ£(π₯1 , π’) is negative. Asymptotic analog of omitted variable bias – the simple case: Suppose the true model is y = β0+ β1x1 + β2x2 + u and we omit x2. Then ππππ(π½Μ1) = π½1 + π½2 πΏ1 πΏ1 = πππ£(π₯1 , π₯2 )/π£ππ(π₯1 ) π½Μ1 comes from the underspecified model without x2 π½1 and π½2 comes from the specified model with x2 Asymptotic normality 8 Even though yi are not from a normal distribution (MLR6), under MLR1-5 we can use the central limit theorem to conclude that OLS estimators satisfy asymptotic normality, which means they are approximately normally distributed in large enough sample sizes. πΜ 2 is a consistent estimator of σ2 – an asymptotic analysis can know show that π£ππ(π½Μπ ) shrinks to zero at the rate of 1/n; this is why a large sample size are better. The standard error can be expected to shrink at a rate that is the inverse of the square root of the sample size (ππ /√π where cj is a positive constant that does not depend on the sample size. Lagrange multiplier (LM) statistic for q exclusion restrictions: (Works under MLR1-5 with large sample. Same hypothesis as with F tests) o o o o Regress y on the restricted set of independent variables and save the residuals, π’Μ Regress π’Μ on all of the independent variables and obtain the R-squared, π π’2 Compute LM = π β π π’2 Compare LM to the appropriate critical value in a ππ2 distribution Auxiliary regression - a regression that is used to compute a test statistic but whose coefficients are not of direct interest. Chapter 6 – Multiple Regression Analysis: Further Issues Changing the units of measurement: If xj is multiplied by c, its coefficient is divided by c. If the dependent variable is multiplied by c, all OLS coefficients are multiplied by c. Using Logarithmic Functional Forms: Log-level model: As the change in log(y) becomes larger and larger, the approximation %Δy~100·Δlog(y) becomes more and more inaccurate. The exact percentage change in the predicted y is given by: %ΔyΜ = 100 β [exp(βΜ 2 βx2 ) − 1] Simple using the coefficient (multiplied by 100) gives us an estimate that is always between the absolute value of the estimates for an increase and a decrease. If we are especially interested in an increase or a decrease we can use the calculation based on the equation above. Reasons for using log models: When y>0, models using log(y) as the dependent variable often satisfy CLM assumptions more closely than models using the level of y. Moreover taking log usually narrows the range of the variable, which makes estimates less sensitive to outlying or extreme observations on the dependent or independent variables. When a variable is a positive dollar amount or a large positive whole value the log is often taken. Variables that are measured in years usually appear in their original form 9 A variable that is a proportion or percent usually appear in level from, but can also appear in log form (here we will have a percentage point change) Log cannot be used if a variable takes on zero or negative values. Models with quadratics: If we write the estimated model as: π¦Μ = π½Μ0 + π½Μ1 π₯ + π½Μ2 π₯ 2 + π’ then we have the approximation: βπ¦Μ = (π½Μ1 + 2π½Μ2 π₯)βπ₯ , for ‘small’ Δx This say that the slope of the relationship between x and y depend on the value of x. π½Μ1 can be interpreted as the approximate slope in going from x=0 to x=1. Turning point π₯ ∗ = |π½Μ1 /2π½Μ2 | π½Μ1 is positive and π½Μ2 is negative – x has a diminishing effect on y, parabolic shape π½Μ1 is negative and π½Μ2 is positive – x has an increasing effect on y, u-shape π½Μ1 and π½Μ2 have the same sign – there is no turning point for values x > 0. Both positive - the smallest expected value of y is at x = 0 and increases in x always have a positive effect on y. Both negative – the largest expected value of y is at x = 0 and increases in x always have a negative effect on y. Models with interaction terms: If we write the estimated model as: π¦Μ = π½Μ0 + π½Μ1 π₯1 + π½Μ2 π₯2 + π½Μ3 π₯1 π₯2 + π’ then π½Μ2 is the partial effect of x2 on y when x1 = 0 βπ¦Μ = π½Μ2 + π½Μ3 π₯1 βπ₯2 To estimate the effect of x2 plug in interesting values of x1 – e.g. the mean. Predicting y when log(y) is the dependent variable: πΜ 2 Μ π¦Μ = exp ( ) exp(ππππ¦) 2 Chapter 7 – Multiple Regression Analysis with Qualitative Information: Binary/dummy Variables Difference in intercept – dummy variable: π€πππ Μ = π½Μ0 + π½Μ1 πππ’π + π½Μ2 ππππππ + π’ Then the intercept for males is π½Μ0 and the intercept for females is π½Μ0 + π½Μ2 If the regression model needs to have different intercept for, say, g groups or categories, we need to include g-1 dummy variables in the model along with an intercept. The intercept for the base group is the overall intercept in the model, and the dummy variable for a particular group represent the estimated difference in intercepts between that group and the base group. 10 Interactions among dummy variables: π€πππ Μ = π½Μ0 + π½Μ1 πππππππ + π½Μ2 ππππππ + π½Μ3 ππππππ β πππππππ + π’ Then we can obtain the estimated wage differential among all four groups, but here we must be careful to plug in the correct combination of zero and ones. Setting female = 0 and married = 0 correspond to the group single men, which is the base group since this eliminates all three parameters. We can find the intercept for married men by setting female = 0 and married = 1. Difference in slope – interactions between dummy and quantitative variables: π€πππ Μ = π½Μ0 + π½Μ1 πππ’π + π½Μ2 ππππππ + π½Μ3 ππππππ β πππ’π + π’ Then π½Μ2 measures the difference in intercepts between women and men, and π½Μ3 measures the difference in the return to education between women and men. CHOW test - Testing for difference in regression function across groups: We test the null hypothesis that two populations or groups follow the same regression function, against the alternative that one or more of the slopes differ across groups. (can also be done by adding all of the interactions and computing the F statistic) Chow statistic (e.g. with two groups) πΉ= [πππ ππππππ − (πππ ππππ’π 1 + πππ πππ’ππ 2 )] [π − ππ’ππππ ππ ππππ’ππ (π + 1)] β (πππ ππππ’π 1 + πππ πππ’ππ 2 ) π(π + 1) Where n is the total number of observations, k is the number of explanatory variables, q is the number of dummy variables – 1 (the base group) (with two groups q is zero) and SSRpooled is SSR1 + SSR2. A binary dependent variable – the linear probability model: P(y=1|x)=E(y|x) : the probability of success, that is the probability that y=1, is the same as the expected value of y. In the LPM model βj measures the change in the probability of success when xj changes/increases with one unit, holding other factors fixed. Var(y|x)=p(x)[1-p(x)] Chapter 8 – Heteroskedasticity Heteroskedasticity-robust standard error for βj: π£ππ(π½Μπ ) = 2 2 ∑π Μπ π=1 πΜππ π’ πππ π2 - useful only with large samples. The robust std. errors can be either larger or smaller than the usual – as an empirical matter they are often found to be larger. 11 Wald and LM test: F statistic robust to heteroskedasticity = wald statistic LM statistic robust to heteroskedasticity o Obtain the residuals π’Μ from the restricted model o Regress each of the independent variables excluded under the null on all of the included variables; if there are q excluded variables, this leads to q sets of residuals (πΜ1 , πΜ2 , … , πΜπ ) o Find the product of each πΜπ and π’Μ (for all observations) o Run the regression of 1 on πΜ1 π’Μ, πΜ2 π’Μ, … , πΜπ π’Μ without an intercept. The heteroskedasticityrobust LM statistic is n-SSR1 where SSR1 is just the usualsum of squared residuals from the final regression. Under H0 LM is distributed approximately as ππ2 Testing for hetereoskedasticity: The tests have asymptotic justification under MLR1-4 We take the null hypothesis to be that assumption MLR5 is true. - The Brusch-Pagan test o Estimate the model by OLS and obtain the residuals. Compute the squared residuals π’Μ2 o Run the regression of the same model now with the squared residuals as the dependent variable o Form either F or LM statistic. F statistic = the test for overall significance of the regression. LM statistic = π β π π’2Μ2 ~ππ2 - The White test o Estimate the model by OLS as usual. Obtain the OLS residuals π’Μ and fitted values π¦Μ. Compute the squared OLS residuals π’Μ2 and the squared fitted values π¦Μ 2 o Run the regression π’Μ2 = πΏ0 + πΏ1 π¦Μ + πΏ2 π¦Μ 2 + πππππ Keep the r-squared from this regression π π’2Μ2 . o Form either F or LM statistic. F statistic = the test for overall significance of the regression. LM statistic = π β π π’2Μ2 ~π22 Generalized least squares (GLS) estimators: - Weighted Least Squares (WLS) estimators is more efficient than OLS estimators if we know the form of the variance (as a function of explanatory variables) Var(u|x)=σ2h(x) where h(x) is some function of the explanatory variables that determines the heteroskedasticity. To get WLS estimates we divide the model by √βπ - Feasible generalized least squares (FGLS) estimators – here we model the function h and use the data to estimate the unknown parameters in this model. This result in an estimate of each hi denoted as βΜπ , that is we weight the model by 1/βΜπ o Run the regression of y on x1, x2, …, xk and obtain the residuals π’Μ o Create Log(π’Μ2 ) by first squaring the OLS residuals and then taking the natural log 12 o Run the regression of log(π’Μ2 ) on x1, x2, …, xk and obtain the fitted values (πΜ) o Exponentiate the fitted values: βΜ = exp(πΜ) o Estimate the model by WLS using 1/ βΜ The squared residual for observation I gets weighted by 1/βΜπ . If instead we first transform all variables and run OLS, each variable gets multiplied by 1/√βΜπ including the intercept. FGLS estimators are biased but consistent and asymptotically more efficient than OLS. If OLS and WLS produce statistically significant estimates that differ in sign or the difference in magnitudes of the estimates is practically large, we should be suspicious. Typically this indicates that one of the other Gauss-Markov assumptions is false. If MLR4 not is met then OLS and WLS have different expected values and probability limits. Chapter 9 – More on Specification and Data Issues Endogenous explanatory variables: x is correlated with u Exogenous explanatory variables: x is not correlated with u RESET test for functional form misspecification: The test builds on the fact that if the model satisfies MLR4 then no nonlinear function of the independent variables (π¦Μ 2 πππ π¦Μ 3 ) should be significant when added to the equation. In a RESET test polynomials in the OLS fitted values is added to the equation – normally squared and cubed terms: π¦Μ = π½Μ0 + π½Μ1 π₯1 + π½Μ2 π₯2 + … + π½Μπ π₯π + πΏ1 π¦Μ 2 + πΏ2 π¦Μ 3 + π’ The null hypothesis is that the model is correctly specified. Thus RESET is the F statistic (F2,n-k-3) for testing H0: πΏ1 = πΏ2 = 0 in the above auxiliary equation. Test against nonnested alternatives: Construct a comprehensive model that contains each model as a special case and then use F test to test the restrictions that leads to each of the models. Problem – a clear winner need not to emerge. Using proxy variables for unobserved explanatory variables: A proxy variable is something that is related to the unobserved variable that we would like to control for in our analysis. Assumptions needed for proxy variables to provide consistent estimators: Model: π¦ = π½0 + π½1 π₯1 + π½2 π₯2 + π½3 π₯3∗ + π’ where π₯3∗ is unobserved Proxy: x3 o The proxy should explain at least some of the variation in π₯3∗ . That is in the equation π₯3∗ = πΏ0 + πΏ3 π₯3 + π£3 a t-test of πΏ3 should be significant. o The error u is uncorrelated with π₯1 , π₯2 πππ π₯3∗ . In addition u is uncorrelated with x3 o The variation not explained in the above mentioned equation (π£3 ) must not be correlated with the other variables in the model (x1 and x2) or the proxy variable (x3) 13 Using lagged dependent variables as proxy variables: We suspect one or more of the independent variables is correlated with an omitted variable, but we have no idea how to obtain a proxy for that omitted variable. Using lagged dependent variables in a cross-sectional equation increases the data requirements, but it also provides a simple way to account for historical factors that cause current differences in the dependent variable that are difficult to account in other ways. Properties of OLS under measurement error: Measurements error in the dependent variable – the usual assumption is that the measurement error in y is statistically independent of each explanatory variable. If this is true, then the OLS estimators are unbiased and consistent. Measurement errors in an explanatory variable: π₯1∗ is not observed – instead we have a measure of it; call it x1 The measurement error in the population is: π1 = π₯1 − π₯1∗ We assume that u is uncorrelated with π₯1 πππ π₯1∗ and E(e1)=0 What happens when simple replace π₯1∗ with π₯1 ? It depends on the assumptions we make about the measurements errors - Cov(x1,e1) = 0 => cov(π₯1∗ ,e1) ≠ 0 Here OLS estimation with x1 in place of π₯1∗ produces a consistent estimator of β1 - Cov(x1,e1) ≠ 0 => cov(π₯1∗ ,e1) = 0 Classical errors-in-variables (CEV) assumption Here the variable we include in the model (x1) will be correlated with the error term (uβ1e1) Thus in the CEV case the OLS regression gives biased and inconsistent estimators. We can determine the amount of inconsistency in simple OLS models: ππππ(π½Μ1 ) = π½1 ( ππ₯2∗ 1 ππ₯2∗ +ππ21 ) 1 ππππ(π½Μ1) is always closer to zero than β1 when the CEV assumptions are met - If e1 is correlated with both π₯1∗ and x1 OLS is inconsistent Chapter 13 – Polling Cross Sections across Time: Simple Panel Data Methods Pooling independent cross sections across time: One reason for using independently pooled cross sections is to increase the sample size. By polling random samples drawn from the same population but at different points in time we can get more precise estimators and test statistic with more power. Pooling is helpful only in this regard only insofar as the relationship between the dependent variable and at least some of the independent variables remains constant over time. Typically we allow the intercept to differ across time at a minimum by including dummy variables for the time periods (the earliest year is typically chosen as the base group). 14 Policy analysis with pooled cross section – natural experiment: To control for systematic difference between the control and treatment groups, we need two years of data, one before the policy change and one after the change. Thus, our sample is usefully broken down into four groups: the control group before the change, the control group after the change, the treatment group before the change and the treatment group after the change. Call C the control group and T the treatment group, letting dT equal unity for those in the treatment group T and zero otherwise. Then letting d2 denote a dummy variable for the second time period, the equation of interest is: π¦ = π½0 + πΏ0 π2 + π½1 ππ + πΏ1 π2 β ππ + ππ‘βππ ππππ‘πππ πΏ1 measures the effect of the policy. Without other factors in the regression πΏΜ1 will be the difference-in-differences estimator: πΏΜ1 = (π¦Μ 2,π − π¦Μ 2,πΆ ) − (π¦Μ 1,π − π¦Μ 1,πΆ ) Two-period panel data analysis: In most applications, the main reason for collecting panel data is to allow for the unobserved effect, ai, to be correlated with the explanatory variables. π¦ππ‘ = π½0 + πΏ0 π2π‘ + π½1 π₯ππ‘ + ππ + π’ππ‘ , π‘ = 1,2. t denotes the time period, d2t is a dummy variable that equals zero when t = 1 and one when t = 2 - it does not change across i, which is why it has no i subscript. ai captures all unobserved timeconstant factors that affect y. It is called an unobserved effect or fixed effect. The error u it is often called the idiosyncratic- or time-varying error because it represents unobserved factors that change over time and affect y. Because ai is constant over time we can difference the data across the two years and hereby ‘difference away’ ai π¦π2 = (π½0 + πΏ0 ) + π½1 π₯π2 + ππ + π’π2 (π‘ = 2) π¦π1 = π½0 + π½1 π₯π1 + ππ + π’π1 (π‘ = 1) (π¦π2 − π¦π1 ) = πΏ0 + π½1 (π₯π2 − π₯π1 ) + (π’π2 − π’π1 ) ππ βy = πΏ0 + π½1 βπ₯π + βπ’π This is called the first-differenced equation and the estimators are called first-differenced estimator (The intercept is the change in the intercept from t = 1 to t = 2) We can analysis the equation using the methods already developed, provided the key assumptions are satisfied. The most important of these being that Δui is uncorrelated with Δxi Another crucial condition is that Δxi must have som variation across i. Assumptions for pooled OLS using first differences: FD1-5 FD1: For each I the model is: π¦ππ‘ = π½1 π₯ππ‘1 + β― + π½π π₯ππ‘π + ππ + π’ππ‘ , π‘ = 1, … , π FD2: We have a random sample from the cross section 15 FD3: Each explanatory variable change over time (for at least some i), and no perfect linear relationships among the explanatory variables FD4: For each t, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero. E(uit|Xi, a1) = 0 Xi denote the explanatory variables for all time periods for cross-sectional observation I, thus it contains xitj Under FD1-4 the first difference estimators are unbiased (FD4 is stronger than necessary – if E(Δuit|Xi) = 0 then FD estimators are unbiased) FD5: The variance of the differenced errors, conditional on all explanatory variables, is constant Var(Δuit|Xi) = σ2, t = 2, …, T FD6: For all t ≠ s, the difference in the idiosyncratic errors are uncorrelated (conditional on all the explanatory variables): Cov(Δuit , Δuis| Xi) = 0, t ≠ s FD5 ensures that the differenced errors are homoskedastic. FD6 states that the differenced errors serially uncorrelated, which means that the uit follow a random walk across time (something explained in chapter 11!?). Under FD1-6 the FD estimators of the βj is BLUE Chapter 15 – Instrumental Variables Estimation and Two Stage Least Squares IV – only one endogenous variable and one instrument 2SLS –one or multiple endogenous variables and more than one instrument Assumptions for the instrumental variable z for x: 1. z is uncorrelated with u 2. z is correlated with the endogenous variable x Cov(z,u) = 0 Cov(z,x) ≠ 0 IV estimators are consistent when the assumptions are met, but never unbiased why large samples are preferred. The instrumental variables (IV) estimator of π½Μ1 (simple regression): π ∑ (π§π −π§Μ )(π¦π −π¦Μ ) πΆππ£(π§,π¦) π½Μ1 = = ∑π=1 π πΆππ£(π§,π₯) π=1(π§π −π§Μ )(π₯π −π₯Μ ) Homoskedasticity assumption (simple regression): Needed for inference – now it is stated conditional on the instrumental variable z. E(u2|z) = σ2 16 Asymptotic variance of π½Μ1 (simple regression): πΜ 2 Μ π£ππ(π½1 ) = 2 ππππ₯ β π π₯,π§ The resulting standard error can be used for t-tests but F test are not valid since the R-squared from IV estimation can be negative because SSR for IV can actually be larger than SST. The IV variance is always larger than the OLS variance (since R-squared is always less than one and this is the only thing that is different from the OLS formula – simple regression) 2 The more highly correlated z is with x, the closer π π₯,π§ is to one, and the smaller is the variance of 2 the IV estimator. In the case that z = x, π π₯,π§ = 1 and we get the OLS variance as expected Weak correlation between z and x Weak correlation between z and x can have serious consequences: the IV estimator can have a large asymptotic bias even if z and u are only moderately correlated. πΆπππ(π§,π’) ππ’ πππππ½Μ1πΌπ = π½1 + β πΆπππ(π§,π₯) ππ₯ Thus even if we focus only on consistency it is not necessarily better to use IV than OLS if the correlation between z and x are smaller than between z and u. Nothing prevents the explanatory variable or the IV from being binary variables. IV estimation of the multiple regression model: A minor additional assumption is that there are no perfect linear relationships among the exogenous variables; this is analogous to the assumption of no perfect collinearity in the context of OLS. Reduced form equation: we writ an endogenous variable in terms of all exogenous variables. - Used when testing the assumption cov(z,x) ≠ 0 2SLS – a single endogenous variable and more than one instrument: E.g. with three instruments then: Cov(z1, z2, z3|u) = 0 and at least one of the instruments should be correlated with the endogenous variable Since each of z1, z2 and z3 is uncorrelated with u any linear combination is also uncorrelated with u. To find the best IV we choose the linear combination that is most highly correlated with the endogenous variables = the fitted values from regression the endogenous variable on all exogenous variables in the model plus the instruments. 2SLS – multiple endogenous explanatory variables: We need at least as many excluded exogenous variables as there are included endogenous explanatory variables in the structural equation. 17 IV solution to errors-in-variables problems: In the CEV case (x1 = x1*+ e1 and cov(x1,e1) ≠ 0) what we need is an IV for x1. Such an IV must be correlated with x1, uncorrelated with u and uncorrelated with the measurement error e1. One possibility is to obtain a second measurement of x1* - z1 - x1 and z1 both mismeasure x1*, but their measurements error are uncorrelated. Certainly x1 and z1 are correlated through their dependence on x1*, so we can use z1 as an IV for x1 An alternative is to use other exogenous variables as IVs for a potentially mismeasured variable. Testing for endogeneity of a single explanatory variable: o Estimate the reduced from of the endogenous variable y2 by regressing it on all exogenous variables (including those in the structural equation and the additional IVs) Obtain the residuals π£ Μ2 o Add π£ Μ2 to the structural equation and (which includes y2) and test for significance of π£ Μ2 using an OLS regression. If the coefficient on π£ Μ2 is statistically different from zero, we conclude that y2 is endogenous because (π£ Μ2 and u are correlated). We might want to use a heteroskedasticity-robust t-test. Testing overidentification restrictions Even in model with additional explanatory variables the second requirement (cov(z,x)≠ 0) can be tested using a t-test (with just one instrument) or an F test (when there are multiple instruments). In the context of the simple IV estimator we noted that the exogeneity requirement (cov(z,u) = 0) cannot be tested. However, if we have more instrument than we need, we can effectively test whether some of them are uncorrelated with the structural error. The idea is that, if all instruments are exogenous, the 2SLS residuals should be uncorrelated with the instruments, up to sampling error. The test is valid when the homoskedasticity assumption holds o Estimate the structural equation by 2SLS and obtain the residuals π’ Μ1 2 o Regress π’ Μ1 on all exogenous variables. Obtain R-squared, say π 1 o Under the null hypothesis that all IVs are uncorrelated with u 1, ππ 12 ~ππ2 , where q is the number of instrumental variables from outside the model minus the total number of endogenous explanatory variables. If ππ 12 exceeds the 5 % critical value in the ππ2 distribution, we reject H0 and conclude that at least some of the IVs are not exogenous. 18 Assumptions: 2SLS1-5 2SLS.1: Linear in parameters – The model in the population can be written as: π¦ = π½0 + π½1 π₯1 + π½2 π₯2 + β― + π½π π₯π + π’ The instrumental variables are denoted as zj 2SLS.2: Random sample – we have a random sample on y, the xj and the zj 2SLS.3: i) There are no perfect linear relationships among the instrumental variables. ii) The rank condition for identification holds. 2SLS. 4: The error term u has zero mean, and each IV is uncorrelated with u. (remember that any xj that is uncorrelated with u also acts as an IV) Under assumption 2SLS.1-4 the 2SLS estimator is consistent 2SLS. 5: Homoskedasticity - let z denote the collection of all instrumental variables. Then E(u2|z) = σ2 Under assumption 2SLS.1-5 the 2SLS estimator is asymptotically efficient in the class of IV estimators that uses linear combinations of the exogenous variables as instruments. Instrumental variable estimation in matrix form: The instrumental variable estimator in the just identified case: π½ΜπΌππΈ = (π ′ π)−1 π′π The instrumental variable estimator of the predicted values with 2SLS: πΜ = π(π ′ π)−1 π ′ π = ππ π were ππ = π(π ′ π)−1 π ′ - By this estimation the predicted X’s are exogenous and can be used in a second OLS step The instrumental variable estimator in the over identified case: −1 π½ΜπΌππΈ = (πΜ ′ π) πΜ ′ π = (π ′ π(π ′ π)−1 π ′ π)−1 π′π(π ′ π)−1 π ′ π = (π′ππ π)−1 π′ππ π 19