Contents 1 Regression Topic 1.1 Linear Transformation . . . . . . . . . 1.2 Binary Variable . . . . . . . . . . . . . 1.3 Relaxing The Assumption . . . . . . . 1.4 Heteroskedasticity . . . . . . . . . . . 1.5 Non-Constant term . . . . . . . . . . . 1.6 Serial Correlation . . . . . . . . . . . . 1.7 Omitted Variables . . . . . . . . . . . 1.8 Multicollinearity and Non-Linear effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 6 7 8 9 2 Other Econometric Models 2.1 Binary dependent variable regression - LPM . . . . . . 2.2 Binary dependent variable regression - Logistic . . . . 2.3 Binary dependent variable regeression - Probit Model 2.4 Instrumental variable regression . . . . . . . . . . . . . 2.5 Panel data regression . . . . . . . . . . . . . . . . . . . 2.6 causal inference and Interaction effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 12 12 13 15 15 3 Time series 3.1 Lag Operator . . . . . . . 3.2 AR model . . . . . . . . . 3.3 MA model . . . . . . . . . 3.4 Out of sample forecasting 3.5 Non-stationary time series 3.6 martingale series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 15 15 15 15 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 additional porpositions 15 4.1 Omitted variables - redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Heteroskedasticity - HC standard error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 Sample forcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 *All rights reserved. *Appreciate to PhD. Cheng-Che, Hsu PhD. Shu-Sheng, Chen PhD. Shi-Hsun, Hsu PhD. Chung-Min, Kuan 1 1 Regression Topic 1.1 Linear Transformation Basic Linear Transformation Xi∗ = aXi + b, Yi∗ = cYi + d then c bc β̂ ∗ = β̂, α̂∗ = cα̂ + d − β̂ a a proof X̄ ∗ = aX̄ + b , Ȳ ∗ = cȲ + dβ̂ ∗ = Pn ∗ 2 i=1 (ûi ) • = c2 , Pn 2 i=1 (ûi ) |ac|Sx∗ y∗ c = β̂ 2 a S x∗ a = c2 × SSE Demean Yi∗ = (Yi − Ȳ ) , Xi∗ = (Xi − X̄) then proof α̂∗ = 0 , β̂ = β̂ ∗ Yi∗ + Ȳ = α̂ + β̂(X̂i∗ + X̄) ⇒ Yi∗ = (α̂ + β̂ X̄ − Ȳ ) + β̂ X̂i∗ = β̂ X̂i∗ that is α̂ + β̂ X̄ = Ȳ Standarlize Yi∗ = (Yi − Ȳ ) (Xi − X̄) , Xi∗ = SY SX 2 then <proof> α̂∗ = 0 , β̂ ∗ = rXY Yi∗ SY + Ȳ = α̂ + β̂(Xi∗ SX + X̄) , Yi∗ SY = (α̂ + β̂ X̄ − Ȳ ) + β̂Xi∗ SX Yi∗ = (α̂ + β̂ X̄ − Ȳ ) SX ∗ + β̂ X = rXY Xi∗ SY SY i that is α̂ + β̂ X̄ = Ȳ , β̂ SX = rXY SY • Linear Transformation won’t change coefficient of correlation ∗ 2 2 R2 = R2 , rXY = rX ∗Y ∗ 1.2 Binary Variable estimation by analogy Easy to use <proof> Y = α + βX + u α = E(Y |X = 0) , α + β = E(Y |X = 1) β = E(Y |X = 1) − E(y|X = 0) E(Y |X = 0) = Ȳ0 , E(Y |X = 1) = Ȳ1 3 Equivalence test Sp2 ≡ SE(β̂) φ∗ = SE(β̂) = β̂ − 0 SE(β̂) v u u tP = Y1 − Y0 − (µ1 − µ0 ) q Sp2 ( n10 + σ̂ 2 = n 2 i=1 (Xi − X̄) 1 ) n1 s Sp2 ( 1 1 + ) n0 n1 1.3 Relaxing The Assumption • Throw two assumptions of Gauss - Markov Theorem 1. Let X is a random Variable 2. ui doesn’t subject to Normal Distribution • Three points : E(β̂|X), V ar(β̂|X), p limn→∞ β̂ σ2 E(β̂|X) = β = E(β̂) , V ar(β̂|X) = Pn 2 i=1 (Xi − X̄) <proof> β̂ = n X d i ui + β , n X di = 0 , E(β̂|X) = E(β + d i Xi = 1 , n X di ui |X) = β + i=1 V ar(β̂|X) = V ar(β + n X di ui |X) = i=1 lim β̂ = β + n→∞ 1 2 i=1 (Xi − X̄) d2i = Pn n X di E(ui |X) = β i=1 n X d2i V ar(ui |X) = i=1 n X i=1 Pn (Xi − X̄)ui lim Pi=1 n 2 n→∞ i=1 (Xi − X̄) n X i=1 i=1 i=1 i=1 n X Pn σ2 2 i=1 (Xi − X̄) d2i σ 2 = Pn (Xi − X̄)ui /n p E(X̄ − µX )u →β+ 2 V ar(X) i=1 (Xi − X̄) /n = β + n→∞ lim Pi=1 n 4 =β+ E(X̄ − µX )(ū − µu ) Cov(X, u) =β+ =β V ar(X) V ar(X) • when E(u|X) = 0 , OLS estimator is unbiased and consistent • when E(u|X) = k , Cov(X, u) = Cov(X, E(u|X)) = Cov(X, k) = 0 , β̂ is consistent. 1.4 Heteroskedasticity Definition The conditional variance of error term is X’s function V ar(ui |Xi ) = σ 2 h(Xi ) • OLS estimator is not BLUE eneymore. • t test and F test are not effective. • Although OLS estimator is not BLUE, It is still unbiased and consistent. Test the heteroskedasticity variance - Breusch-Pagan test estimate û2i = δ0 + δ1 X1i + δ2 X2i + ... + δk Xki + vi 1. H0 : V ar(u|X) = σ 2 against H1 : V ar(u|X) ̸= σ 2 2. A φ = nR2 ∼ χ2 (k) 3. given α, RR = φ|φ > χ2α (k) 4. W hen φ∗ > χ2α (k), Reject H0 We can proof that V ar(u|X) ̸= σ 2 5 HC standard error SE(β̂)HC = v u n Pn u t n−2 i=1 (Xi Pn − X̄)2 u2i [ i=1 (Xi − X̄)2 ]2 • Usually , SE(β̂)HC > SE(β̂) , so when we use HC standard error , test statisitc is usaully smaller , significancy and power are also weaker. WLS - Weighted Least Square Regression model Yi = α + βXi + ui , V ar(ui |Xi ) = h(Xi )σ 2 q Yi α Xi ui =q + βq +q h(Xi ) h(Xi ) h(Xi ) h(Xi ) 1.5 Non-Constant term Regression model with no intercept Yi = βXi + ui Regression model Yi = βXi + ui Pn X i Yi 2 i=1 Xi β̂ = Pi=1 n Pn β̂ = i=1 Xi Yi Pn 2 i=1 Xi = n X i=1 Xi Pn i=1 Yi = n X i=1 d i Yi = 6 n X i=1 di (βXi + εi ) = β n X i=1 d i Xi + n X i=1 d i εi E(β̂) = E(β n X di Xi + i=1 N ote : n X d i εi ) = β + i=1 n X di εi = β...unbiased i=1 n X sumn Xi di = Pn i=1 2 ̸= 0 i=1 Xi i=1 • So called as ”regression with zero intercept” , Model must go through ( 0 , 0 ) • Characteristic n SST ̸= SSR + SSE, because X ûi ̸= 0 i=1 1.6 Serial Correlation Cause and effects Yt = α + βXt + ut , Cov(ut , ut−s ) ̸= 0 • OLS estimator isn’t BLUE anymore, that is, OLS estimator is ineffiecient. • t-test and F-test are invalid. • OLS estimator is still unbiased and consistent. 7 Durbin-Watson test test if there are AR(1) between error terms AR(1) : ut = ρut−1 + εt procedure 1. H1 : ρ ̸= 0 H0 : ρ = 0 2. PT ρ= t=2 (ût − ût−1 ) PT 2 t=1 ût 2 ≃ 2(1 − ρ̂) 3. RR = [ρ|ρ < dL or ρ > 4 − dL ] , 0 < dL < dU < 2 4. when φd < dL or φd > 4 − dL ⇒ RejectH0 when dU < φd < 4 − dU ⇒ do not reject H0 when 4 − dU < φd < 4 − dL or dL < φd < dU ⇒ U ndef ined disadvantages and limitations • Durbin-Watson test can only test AR(1) model. • the test has undefined region. • can’t exist Yt−p , p ≥ 1 8 1.7 Omitted Variables characteristic • Satisfy Cov(X, Z) ̸= 0 , γ ̸= 0 • If model has omitted variable, error terms can’t be endogenous. Model Y = α + βX + γZ + u SLR : Ŷ = α̂ + β̂X...T otal Ef f ect M LR : Ỹ = α̃ + β̃X + γ̃Z...Direct Ef f ect AR : Ẑ = θ̂ + δ̂X...Indirect Ef f ect β̂ = β̃ + δ̂γ̃ According to WLLN & CMT Pn β̂ = i=1 (Xi − X̄)(Yi − Ȳ )/n p = Pn 2 i=1 (Xi − X̄) /n Cov(X, Y ) Cov(X, α + βX + γZ) γCov(X, Z) = =β+ V ar(X) V ar(X) V ar(X) • γCov(X, Z) > 0 → β̂ overrated • γCov(X, Z) = 0 → β̂ unbiased • γCov(X, Z) < 0 → β̂ underrated Biased Pn β̂ = i=1 (Xi − X̄)(Yi − Pn 2 i=1 (Xi − X̄) Ȳ ) = n X i=1 d i Yi = n X di (α + βXi + γZi + ui ) = β + γ i=1 n X d i Zi + ui i=1 n X (Xi − X̄)(Zi − Z̄) → given Xi , Zi , di Zi is constant d i Zi = Pn 2 i=1 (Xi − X̄) i=1 i=1 i=1 n X n X 9 E(β̂|Xi , Zi ) = E(β + γ n X d i Zi + i=1 n X d i ui ) = β + γ i=1 SXZ 2 SX 1.8 Multicollinearity and Non-Linear effect How to test highly multicollinearity? • Other coefficients’ estimator has big difference when increase / decrease an explanatory variable. • F − test is significant but t − test isn’t significant. 1 • V IFj = 1−R 2 , When V IF > 10 , highly multicollinearity exists in the model. j Linear effect & Linear in parameter • Linear Effect ∂E(Y |X) = Constant ∂X • Linear in parameter ∂E(Y |X) ̸= f unction of β ∂β Polynomial regression model • Quadratic Regression model Y = β0 + β1 X + β2 X 2 + u ⇒ ∂E(Y |X) = β1 + 2β2 X(N ot Constant) ∂X • Reciprocal Regression model Y =α+β 1 1 + u ⇒ −β 2 (N ot Constant) X X • Other model Y = αX β eu ⇒ ln Y = α + β ln X + u ⇒ ∂E(ln Y | ln X) = β(Constant) ∂ ln X Y = c − e−(α+βX) + u ⇒ Can′ t be estimated by OLS estimator 10 Logarithmic Regression model • Log - Linear Model logY = α1 + β1 X + u → X change 1 unit, ∆Y /Y change β1 • Linear - Log Model Y = α2 + β2 logX + u → X change 100%, Y change β2 unit • Log - Log Model logY = α3 + β3 logX + u → X change 100%, Y change 100β3 % β3 = ∆Y /Y ∆Y X = = EY X ∆X/X ∆X Y 2 Other Econometric Models 2.1 Binary dependent variable regression - LPM Introduction P = E(Yi |Xi ) = P (Yi = 1|Xi ) = G(Xi β) Y is a binary variable Y ∼ Bernoulli(p), P (Y = 1) = p = E(Y ), V ar(Y ) = p(1 − p) Let Yi |Xi ∼ Bernoulli[G(α + βXi )] E(Yi |Xi ) = G(α + βXi ), V ar(Yi |Xi ) = G(α + βXi )[1 − G(α + βXi )] • Heteroskedasticity exists, So this regression model isn’t efficient (Not BLUE), but still unbiased and consistent. Linear Probability Model(LPM model) P (Yi = 1|Xi ) = G(α + βXi ) = α + βXi 11 • Heteroskedasticity exists in this model, So we need to use SEHC to test. Disadvantages of LPM model • It’s not propriate to assume the model as linear effect. • We can’t use OLS to estimate, so the coefficient aren’t efficient, the test stat aren’t t-test too. ◦ To increase the efficiency of the estimator, we use WLS or GLS to estimate. Yi = α + βXi + ui , ui = Yi − E(Yi |Xi ) = Yi − α − βXi V ar(ui |Xi ) = (α + βXi )[1 − (α + βXI )] = h(Xi ) Let vi = q ui h(Xi ) →q Yi =q h(Xi ) T arget F unction = n h X q i=1 α βXi ui +q +q h(Xi ) h(Xi ) h(Xi ) Yi − (α̂ + β̂Xi ) i2 α̂ + β̂Xi [1 − (α̂ + β̂Xi )] • The probability may >1 or <0. Maximum Likelihood method w ∼ Bernoulli(p), f (w) = pw (1 − p)1−w , w = 0, 1 yi is a binary P (yi = 1|xi ) = G(xi β) f (yi |xi ) = G(xi β)yi [1 − G(xi β)]1−yi , yi = 0, 1 L(β) = n Y i=1 f (yi |xi ) = n Y G(xi β)yi [1 − G(xi β)]1−yi i=1 12 ln L(β) = n X ln{G(xi β)yi [1 − G(xi β)]1−yi } = i=1 n X {yi ln G(xi β) + (1 − yi ) ln[1 − G(xi β)]} i=1 2.2 Binary dependent variable regression - Logistic P (Yi = 1|Xi ) = G(α + βXi ) = 1 1+ e−(α+βXi ) • the fitted value of LPM (α + βXi ) ∈ [−∞, ∞], but P (Yi = 1|Xi ) ∈ [0, 1], so we transform it’s probability to let the range ∈ [−∞, ∞]. p p 1 p ∈ [0, ∞] ⇒ ln ∈ [−∞, ∞] ⇒ ln = α + βXi ⇒ p = 1−p 1−p 1−p 1 + e−(α+βXi ) p • 1−p is called Odds. • Maximum Likelihood method p ∈ [0, 1] ⇒ g(z) = G(z)[1 − G(z)] = ∂G(z) e−z = ∂z (1 + e−z )2 1 e−z 1 (1 − ) = = g(z) 1 + e−z 1 + e−z (1 + e−z )2 2.3 Binary dependent variable regeression - Probit Model P (Yi = 1|Xi ) = Φ(α + βXi ) • Probit Model attempts that there is a latent Variable Yi∗ |Xi = α + βXi + ui Let ui ∼ N (0, 1) ⇒ Yi∗ |Xi ∼ N (α + βXi , 1) 13 Yi∗ can’t be obeserved, but we can observe Binary Variable Yi ( Yi |Xi = 1, if Yi∗ |Xi > 0 0, if Yi∗ |Xi < 0 (1) (2) P (Yi = 1|Xi ) = P (Yi∗ |Xi ) > 0 = P [Yi∗ − (α + βXi ) > −(α + βXi )] = P [Z > −(α + βXi )] = P (Z < α + βXi ) = Φ(α + βXi ) Beacuse Φ(α + βXi ) ∈ (0, 1) ⇒ G(α + βXi ) = Φ(α + βXi ) = p ∂p ∂p ∂z = = g(z) × β = β × ϕ(α + βx) ∂x ∂z ∂x • Goodness of fit analysis McFadden’s pseudo - R2 = 1 − ln L̂F ull ln L̂Intercept ln L̂F ull = Model’s Likelihood value ln L̂Intercept = Intercept’s Likelihood value 2.4 Instrumental variable regression Measurement error • Endogeneity In Empirical Searching, We can’t accept the estimator to be biased and inconsistency, but inefficiency is acceptable. ◦ Cov(X, u) ̸= 0 p lim β̂ = n→∞ Cov(X, Y ) Cov(X, α + βX + u) Cov(X, α) + βV ar(X) + Cov(X, u) = = V ar(X) V ar(X) V ar(X) =β+ Cov(X, u) ̸= β...inconsistency V ar(X) ◦ Factors causes Endogeneity 1. Omitted Variable 2. Measurement error 3. Simultaneity 14 • Omitted Variables Y = α + βX + γZ + u, Cov(X, u) = Cov(Z, u) = 0, Let (γZ + u) = ε Cov(X, ε) = Cov(X, γZ + u) = γCov(X, Z) + Cov(X, u) = γCov(X, Z) ̸= 0 • Measurement error X ∗ = observed variable X = realixed variable (can′ t be observed) Let X ∗ = X + v ⇒ Y = α + βX + u = α + β(X ∗ − v) + u = α + βX ∗ + (u − βv) Let u + βv = ε Cov(X ∗ , ε) = Cov(X + v, u − βv) = Cov(X, u) − βCov(X, v) + Cov(u, v) − βV ar(v) ̸= 0 Y ∗ = observed variable Y = realized variable, Let Y ∗ = Y + v Y ∗ − v = α + βX + u ⇒ Y ∗ = α + βX + (u + v) = α + βX + ε Cov(X, ε) = Cov(X, u+v) = Cov(X, u)+Cov(X, v) = 0...OLS estimators are still consistent and unbiased V ar(ε) = V ar(u + v) = V ar(u) + V ar(v) > V ar(u)...SE(β̂) will be bigger X̃ = best predictor of X, Let v = X − X̃, Cov(X̃, v) = 0 ◦ OLS estimator is still consistent and unbiased. 2 stage estimation(2SLS) panel data regression 15 2.5 Panel data regression 2.6 causal inference and Interaction effect 3 Time series 3.1 Lag Operator 3.2 AR model 3.3 MA model 3.4 Out of sample forecasting 3.5 Non-stationary time series 3.6 martingale series 4 additional porpositions 4.1 Omitted variables - redundancy 4.2 Heteroskedasticity - HC standard error 4.3 Sample forcasting 16