ECMT1020 Introduction to Econometrics Semester 1, 2024 Formulas and statistical tables Two random variables. Let X and Y be two random variables with expected values µX and 2 and σ 2 . µY , and variances σX Y 1. Covariance: σXY := Cov(X, Y ) = E[(X − µX )(Y − µY )] = E(XY ) − E(X)E(Y ). 2. Correlation coefficient: ρXY := Corr(X, Y ) = σXY . σX σY Expected value, variance, covariance rules. In the following, b is any constant, X, Y, V, W are any random variables. 1. E(X + Y ) = E(X) + E(Y ). 2. E(b) = b. 3. E(bX) = bE(X). 4. Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ). 5. Var(b) = 0 6. Var(bX) = b2 Var(X). 7. Cov(X, Y ) = Cov(Y, X). 8. Cov(X, V + W ) = Cov(X, V ) + Cov(X, W ). 9. Cov(X, bY ) = bCov(X, Y ). 10. Cov(X, b) = 0. Estimators. Let X and Y be two random variables. Let {X1 , . . . , Xn } be a sample of X, and {Y1 , . . . , Yn } be a sample of Y . Below are the commonly used sample estimators. 1. Sample mean: X = n1 Pn i=1 Xi 2 = 1 2. Sample variance: σ̂X n−1 Pn 2 i=1 (Xi − X) 1 3. Sample covariance: σ̂XY = n−1 Pn i=1 (Xi − X)(Yi − Y ) Pn i=1 (Xi −X)(Yi −Y ) √ 4. Sample correlation: ρ̂XY = √σ̂XY = P Pn n 2 σ̂ 2 2 2 σ̂X Y i=1 (Xi −X) i=1 (Yi −Y ) Hypothesis tests of a normal sample. Let {X1 , . . . , Xn } be a random sample of X which follows a normal distribution with mean µ and variance σ 2 . We would like to test H0 : µ = µ0 for some µ0 . X−µ 0 √0 1. If σ 2 is known, then we use a z statistic: z = X−µ σX = σX / n which follows the standard normal distribution, under the null hypothesis. 2. If σ 2 is unknown (meaning it needs to be estimated), then we use a t statistic: t = X−µ0 X−µ √0 σ̂X = σ̂X / n which follows t distribution with degrees of freedom n − 1, under the null hypothesis. Simple regression analysis. Consider a simple regression model Y = β1 + β2 X + u which satisfies CLRM assumptions. We fit the regression by OLS procedure using a random sample of (X, Y ) with n observations. 1. OLS estimators: Pn β̂1 = Y − β̂2 X, β̂2 = i=1 (Xi − X)(Yi − Y ) . Pn 2 i=1 (Xi − X) 2 ! 2. The variances of β̂1 and β̂2 : σβ̂2 = σu2 1 X 1 + Pn 2 n (X i − X) i=1 , σu2 . 2 i=1 (Xi − X) σβ̂2 = Pn 2 Multiple regression analysis. Consider a multiple regression model with k − 1 explanatory variables Y = β1 + β2 X2 + · · · + βk Xk + u, which satisfies CLRM assumptions. Given a sample of n observations, the fitted regression is Ŷ = β̂1 + β̂2 X2 + · · · β̂k Xk . Note: simple regression is a special case of multiple regression with k = 2. 1. Goodness of fit: ESS TSS RSS/(n − k) 2 R =1− TSS/(n − 1) R2 = Adjusted R2 : 2. t statistic for testing H0 : βj = βj0 : t= β̂j − βj0 s.e.(β̂j ) ∼ tn−k , for j = 1, 2, . . . , k, where tn−k denotes the t distribution with degrees of freedom n − k. 3. F statistic for testing the joint explanatory power of the regression model: F (k − 1, n − k) = ESS/(k − 1) R2 /(k − 1) = ∼ Fk−1,n−k , RSS/(n − k) (1 − R2 )/(n − k) where Fk−1,n−k denotes the F distribution with degrees of freedom k − 1 and n − k. 4. “Generalized” F statistic for testing general null hypothesis of certain linear restrictions on the parameters (restricted model against unrestricted model): F (extra DF, DF remaining) = improvement in fit/extra DF RSS remaining/DF remaining 5. For a regression model with two explanatory variables: Y = β1 + β2 X2 + β3 X3 + u, the standard error for OLS estimator β̂2 is s s.e.(β̂2 ) = σ̂u2 1 . × 2 2 1 − ρ̂ X) (X − i X2 ,X3 i=1 Pn 1 Pn 2 2 where σ̂u2 = n−k i=1 ûi is the unbiased estimator for σu , and ρ̂X2 ,X3 is the sample correlation between X2 and X3 . Heteroskedasticity. Consider a linear regression with k parameters and sample size n. 1. Test statistic of Goldfeld-Quandt test: F (n∗ − k, n∗ − k) = RSS2 RSS1 where n∗ is the size of the first and the last subsamples, and RSS1 and RSS2 are the RSS of the subregressions using the first and last subsamples, respectively. 2. White test statistic for heteroskedasticity: W = nR2 ∼ χ2k−1 where R2 is the R2 from a particular regression.