Lecture 1 Outline Mark Hendricks September 2012

Lecture 1 Mark Hendricks University of Chicago September 2012 Outline OLS OLS Inference OLS Asymptotics Hendricks, Spring 2012 Time Series Statistics Lecture 1 2/42 Linear regression model Consider a linear regression model involving two variables, y and x. y =β0 + βx + Hendricks, I y is referred to as the regressand, or explained variable. I x is referred to as the regressor, covariate, or explanatory variable. I β0 and β are the (constant) parameters of the model. Spring 2012 Time Series Statistics Lecture 1 3/42 Multivariate linear regression In a multivariate regression model with k regressors, y =β0 + β1 x1 + β2 x2 + . . . βk xk + =β0 + k X βj xj + j=1 =x0 β + Hendricks, I The last line defines x such that the first element is the constant 1, and the first element of β is β0 . I Including the regression constant in the vector notation will simplify the algebra, as we will always consider the case where the first regressor is a constant. Spring 2012 Time Series Statistics Lecture 1 4/42 Data from the regression model A sample of n observations is denoted as (yi , xi ) for i = 1, 2, . . . n. yi =x0i β + i where   1 xi,1      xi ≡ xi,2   ..   .  xi,k Hendricks, Spring 2012   β0 β1      β ≡ β2   ..  . βk Time Series Statistics Lecture 1 5/42 Sample regression Consider a sample estimate of β, denoted by b. I Define ei = yi − x0i b Thus, yi = x0i b + ei I Hendricks, This is the sample counterpart to the population regression equation. Spring 2012 Time Series Statistics Lecture 1 6/42 Ordinary least squares The ordinary least squares estimator of β minimizes the sum of squared sample errors: b ≡ arg min bo = arg min bo Hendricks, Spring 2012 n X (i )2 i=1 n X yi − x0i bo 2 i=1 Time Series Statistics Lecture 1 7/42 OLS problem Rewrite the OLS problem in matrix notation, b ≡ arg min e0 e bo = arg min (Y − Xbo )0 (Y − Xbo ) bo where  0 x1  x0   2 X ≡  . ,  ..  x0n Hendricks, Spring 2012   y1 y2    Y ≡  . ,  ..    e1 e2    e ≡  . ,  ..  yn Time Series Statistics en Lecture 1 8/42 Assumption: Full-rank Assumption 1: X0 X is full rank. Equivalently, assume that there is no exact linear relationship among any of the regressors. Hendricks, I Clearly, the existence of OLS estimator requires that this assumption be satisfied. I Multicollinearity refers to the case where this assumption fails. Spring 2012 Time Series Statistics Lecture 1 9/42 OLS estimate Solving the minimization problem above gives the OLS estimate: b = X0 X Hendricks, −1 X0 Y I This estimate yields sample residuals of −1 0 e =Y − X X0 X XY −1 0 = I − X X0 X X Y I Thus e is orthogonal to X. I Equivalently, the in-sample correlation between xi and ei is zero. Spring 2012 Time Series Statistics Lecture 1 10/42 OLS as linear projection Using the derivation of b and e, we can write the regression equation as a linear projection. Y = PY + MY where M and P are defined as P ≡ X X0 X −1 X0 M ≡ I n − X X0 X Hendricks, −1 X0 I P is an n × n matrix which projects data into the space spanned by the column vectors of X. I M is an n × n matrix which projects data into the space orthogonal to X. Spring 2012 Time Series Statistics Lecture 1 11/42 OLS as method of moments Suppose the population correlation between x and is zero. 0 =E [x] 0 =E x y − x0 β Thus, −1 β = E xx0 E [xy ] If we estimate β by simply replacing population moments with sample moments, −1 0 b = X0 X XY which is the same as the OLS estimator above. Hendricks, Spring 2012 Time Series Statistics Lecture 1 12/42 Assumption: Exogeneity Assumption 2: is exogenous to the regressors, x. E [ |x] = 0 The exogeneity assumption, Hendricks, I implies that is uncorrelated with x. I implies that is uncorrelated with any function of x. I does NOT imply that is independent of x. Spring 2012 Time Series Statistics Lecture 1 13/42 Conditional expectation It is generally true that given any variable y along with a vector of variables, x, we can write y =E [y | x] + where is simply the prediction error, = y − E [y | x] Under the exogeneity assumption, E [y | x] =x0 β Hendricks, Spring 2012 Time Series Statistics Lecture 1 14/42 Projection and conditional expectation I Using just the assumption of full-rank, linear projection still gives the best linear conditional forecast. h 2 i b = arg min E y − x0 γ γ That is, the linear projection x0 b minimizes the mean-squared-error. I Contrast with assuming linearity and exogeneity. Then the linear projection equals the conditional expectation: E [y |x] = x0 b Hendricks, Spring 2012 Time Series Statistics Lecture 1 15/42 Unbiased statistics An estimate is unbiased if its expectation equals the population value. I Consider the sample estimator, x̄, for a sample of n. I Suppose we have a variable x with population mean µx .; I Verify that x is unbiased: " E [x] =E n 1X xi n # i=1 n 1X = E [xi ] n i=1 =µx Hendricks, Spring 2012 Time Series Statistics Lecture 1 16/42 Is OLS estimate unbiased? Check if the OLS estimator is unbiased: h −1 0 i XY E [b] =E X0 X i h −1 0 0 X (Xβ + ) =E X X i h −1 0 X =β + E X0 X But E [b] = β if, and only if, h −1 0 i E X0 X X =0 which is guaranteed by the exogeneity assumption but not simply by orthogonality. Hendricks, Spring 2012 Time Series Statistics Lecture 1 17/42 Variance of estimator Continue with the example of the sample mean estimator, x̄ for sample size n. I Suppose var [x] = σ 2 . I What is the variance of the sample mean estimator, x̄? var [x̄] = Hendricks, Spring 2012 σ2 n Time Series Statistics Lecture 1 18/42 Variance of OLS estimator The variance of the OLS estimator is −1 −1 0 X ΣX X0 X var [b |x] = X0 X where Σ = E [0 |x]. Hendricks, I So far, no assumptions have been made regarding the distribution of the residuals, , beyond their orthogonality. I Inference regarding regression estimates is heavily influenced by the covariance of the residuals. Spring 2012 Time Series Statistics Lecture 1 19/42 Heteroscedasticity and autocorrelation of residuals I I Hendricks, Heteroscedasticity refers to the have distinct variances,  2 σ1 0  0 σ2 2  Σ=  0 0 case where the residuals but ··· ··· .. . ···  0 0    σn2 Autocorrelation refers to the case where residuals are correlated. In this case, Σ is not diagonal. Spring 2012 Time Series Statistics Lecture 1 20/42 Assumption: Homoscedastic and orthogonal residuals Assumption 3: The residuals are uncorrelated across observations, with identical variances, Σ = E 0 |x = σ 2 In Hendricks, Spring 2012 Time Series Statistics Lecture 1 21/42 Gauss-Markov Theorem With these assumptions, the OLS estimator, b, is the minimum variance linear unbiased estimator of β. Hendricks, I The assumption on Σ simplifies the variance of the OLS estimator, −1 0 2 −1 var [b |x] = X0 X X σ In X X0 X −1 =σ 2 X0 X I Any other linear, unbiased estimator of β will have larger variance. I This is known as the Gauss-Markov theorem. It depends on the above assumptions regarding linearity, exogeneity, full-rank, and residual covariance structure. Spring 2012 Time Series Statistics Lecture 1 22/42 Outline OLS OLS Inference OLS Asymptotics Hendricks, Spring 2012 Time Series Statistics Lecture 1 23/42 OLS Inference The distribution of the OLS estimates is required in order to assess statistical significance. I Hendricks, Above the mean and variance of b were derived without making any distributional assumptions. Spring 2012 Time Series Statistics Lecture 1 24/42 Assumption: Normality of residuals Assumption 4: The residuals, are normally distributed. |x ∼ N (0, Σ) Hendricks, Spring 2012 Time Series Statistics Lecture 1 25/42 Distribution of OLS estimator Assumptions 1, 2, 3, 4 imply b |x ∼ N (β, Ω) where Ω = σ2 X 0X −1 Often, these 4 assumptions are referred to as the classical regression model. I Hendricks, Note that many results were derived without any distributional assumptions. Spring 2012 Time Series Statistics Lecture 1 26/42 OLS Z-test Testing the significance of an element of b, would simply be a z-test: bj − βj ∼Z σ ωjj where ωjj2 is the (j, j) element of X0 X element of Ω. . Thus, σ 2 ωij2 is the (i, j) I However, this statistic depends on σ, which is unknown. I Instead one must use the sample estimate of the variance, s2 = Hendricks, −1 Spring 2012 e0 e n−k −1 Time Series Statistics Lecture 1 27/42 OLS t-test The t-test assesses statistical significance of an element of b, bj − βj ∼ t(n − k − 1) s ωjj which depends only on observable data as well as the hypothesized value, βj . Hendricks, Spring 2012 Time Series Statistics Lecture 1 28/42 R-squared The R-squared, or coefficient of determination, in a regression is defined as Ry2,x = regression sum of squares total sum of squares =1 − error sum of squares total sum of squares Algebraically, this is P b n (xi − x̄)2 Ry2,x = Pni=1 2 i=1 (yi − ȳ ) 0 ee =1 − Pn 2 i=1 (yi − ȳ ) Hendricks, Spring 2012 Time Series Statistics Lecture 1 29/42 R-squared versus correlation Intuitively, the R-squared is the square of the correlation between y and the projection of y onto x. Ry2,x = [corr (Y, PY)]2 In a univariate regression of y on x, Ry2,x = [corr(y , x)]2 Hendricks, Spring 2012 Time Series Statistics Lecture 1 30/42 Caveat: Regressing on a constant The interpretation and formula for R-squared does not hold if there is no constant regressor. I Recall that the first element of x is the constant 1, and the first column of X is a column of 1’s. I Including a constant in the regression is equivalent to running a regression with demeaned data. I Running a regression on just a constant regressor and nothing else, would simply pick up the mean in the data. I Thus, simply shifting the sample data will improve the fit as it lines up closer to the sample mean. Note that throughout the derivations above, the constant regressor guarantees that the mean of e will be zero. Hendricks, Spring 2012 Time Series Statistics Lecture 1 31/42 OLS F-test An F-test will determine the joint significance of the linear regression: Ry2,x 1 − Ry2,x n−k −1 k −2 ∼ F (k − 2, n − k − 1) Namely, this tests whether all coefficients are jointly equal to zero. Hendricks, I Note the use of the R-squared stat. I It is simple to generalize this to test whether b is jointly equal to a non-zero hypothesis vector, β ∗ . Spring 2012 Time Series Statistics Lecture 1 32/42 Problems with multicollinearity If regressor j is highly correlated with the other regressors, then the variance of the coefficient estimate can be written as var [bj |x] = 1 1 − Rx2j ,x[−k] ! σ2 Pn 2 i=1 (xi,j − x̄j ) where Rx2j ,x[−k] denotes the R-squared from regressing xj on the remaining regressors. Hendricks, Spring 2012 Time Series Statistics Lecture 1 33/42 Variance inflation factor The term 1 1 − Rx2j ,x[−k] is known as the variance inflation factor. Hendricks, I The VIF is closely related to the condition number of X0 X. I The condition number in linear algebra measures the sensitivity of inverting a matrix. I It compares the largest and smallest eigenvalues. I Most software packages will warn the user if the condition number (VIF) is too big. Spring 2012 Time Series Statistics Lecture 1 34/42 Variation in regressors The sample variance of xj reduces the variance of the OLS estimate bj . I Again write var [bj |x] = Hendricks, 1 1 − Rx2j ,x[−k] ! σ2 Pn 2 i=1 (xi,j − x̄j ) I The denominator of the second term is the scaled sample variance. I If there is not enough variation in the regressor data, then the OLS estimation can not precisely estimate βj . Spring 2012 Time Series Statistics Lecture 1 35/42 Time Series Statistics Lecture 1 36/42 Outline OLS OLS Inference OLS Asymptotics Hendricks, Spring 2012 Non-normality Applications often do not satisfy Assumption 4, upon which the inference results relied. Hendricks, I However, the asymptotic distribution of the OLS estimate is still normally distributed under much weaker assumptions. I In practice, inference often relies on having large data sets and appealing to the asymptotic results. Spring 2012 Time Series Statistics Lecture 1 37/42 Assumption: Orthogonality of population residuals Assumption 5: the regressors. The population residuals are uncorrelated with E x0 = 0 Hendricks, I This assumption is much weaker than Assumption 2. I This is a restriction on the population variables, not the fitted estimates, which have zero correlation by construction. Spring 2012 Time Series Statistics Lecture 1 38/42 Consistency A sample statistic is consistent if it converges to the true population value in probability. I Suppose that Assumptions 1, 5 hold. I Then the OLS estimator, b is consistent, plim b = β I Hendricks, In practice, more attention is paid to having a consistent estimator than an unbiased estimator, due to the weaker assumption. Spring 2012 Time Series Statistics Lecture 1 39/42 Asymptotic distribution of OLS Under Assumptions 1, 5, 3, the OLS estimate is asymptotically normal, b |x ∼asym N (β, Ω) where Ω = σ2 X 0X Hendricks, Spring 2012 −1 Time Series Statistics Lecture 1 40/42 Heteroscedastic and autocorrelated inference For many applications, particularly in time-series, Assumption 3 is clearly false. Hendricks, I For practical purposes, this is not a big problem for inference. I Standard errors which correct for heteroscedasticity and autocorrelation are widely employed. I We come back to these later. Spring 2012 Time Series Statistics Lecture 1 41/42 References I Hendricks, Cochrane, John. Asset Pricing. 2001. I Greene, William. Econometric Analysis. 2011. I Hamilton, James. Time Series Analysis. 1994. I Wooldridge, Jeffrey. Econometric Analysis of Cross Section and Panel Data. 2011. Spring 2012 Time Series Statistics Lecture 1 42/42

Lecture 1 Outline Mark Hendricks September 2012

Related documents

Products

Support

Lecture 1 Outline Mark Hendricks September 2012

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib