HS499: Introduction to Applied Econometrics H ANDOUT - 4 [R EGRESSION A NALYSIS - I] Dr. Bhavesh Garg Contents 1 Two Variable Regression Model . . . . . . . . . . . . . . . . . . . . 1.1 The Method Of Ordinary Least Square . . . . . . . . . . . . . . 1.2 Derivation of least square estimate . . . . . . . . . . . . . . . . 1.3 CLRM: Assumptions Underlying the Method of Least Square 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 5 1 1.1 Two Variable Regression Model The Method Of Ordinary Least Square Recalling the PRF Yi = β1 + β2 Xi + ui (1) Yi = β̂1 + β̂2 Xi + ûi (2) Yi = Ŷi + ûi (3) and SRF is where Ŷi is the estimated value of Yi Rewriting equation (3) ûi = Yi − Ŷi = Yi − β̂1 − β̂2 Xi (4) which implies that ûi (or the errors) are the difference between Yi (actual) and Ŷi (estimated) values. For a given pair of observations on Y and X we need to determine SRF such that it is as clear as possible to the actual Y . In other words, we want to choose SRF P P in which the residuals ûi = (Yi − Ŷi ) is minimum. P However, if we adopt the criterion of minimising ûi , then all the residuals in figure 1 are given equal weightage. The problem with this is that even if the residuals are widely scattered such as u1 = 20, u2 = −5, u3 = 5 and u4 = −20 then it will add upto zero. But in reality, the residuals are far from the SRF in case of û1 and û4 . To overcome this limitation, we employ the least square criterion in which SRF can be fixed in such a way that X X (Yi − Ŷi )2 X = (Yi − β̂1 − β̂2 Xi)2 û2i = (5) is as small as possible, squaring ûi will give more weightage to the û1 and û4 and overcomes the problem of individual small algebraic terms even if the residuals are large. 1.2 Derivation of least square estimate From equation (5) we know that 2 Figure 1: Plot for Deviation of residuals from SRF X û2i = X (Yi − β̂1 + β̂2 Xi)2 (6) Now differentiating the above equation partially w.r.t β̂1 and β̂2 , P ∂( 2i ) ∂ β̂1 ⇒ −2 X =0 (Yi − β̂1 − β̂2 Xi) = 0 P ∂( 2i ) ∂ β̂2 ⇒ −2 X (7) =0 (Yi − β̂1 − β̂2 Xi)Xi = 0 3 (8) P we also know that, X̄ = P therefore yi = nȲ P ⇒ xi = nX̄ xi n Similarly, X X (Xi − X̄)2 = (Xi2 + X̄ 2 − 2Xi X̄) X X X = Xi2 + X̄ 2 − 2X̄i Xi X = Xi2 + nX̄ 2 − 2X̄nX̄ X = Xi2 + nX̄ 2 taking equation(6) X −2 (Yi − β̂1 − β̂2 Xi ) = 0 X X X β̂2 Xi = 0 β̂1 − ⇒ Yi − ⇒ nȲ − nβ̂1 − β̂2 nX̄ = 0 ⇒ Ȳ − β̂1 − β̂2 X̄ = 0 β̂1 = Ȳ − β̂2 X̄ taking equation (7) X (Yi − β̂1 − β̂2 Xi )Xi = 0 X X X ⇒ Xi Yi − β̂1 Xi − β̂2 Xi2 = 0 X X Xi Yi − (Ȳ − β̂2 X̄)nX̄ − β̂2 Xi2 = 0 ⇒ X X ⇒ Xi Yi − nX̄ Ȳ + nβ̂2 X̄ 2 − β̂2 Xi2 = 0 X X ⇒ Xi Yi − nX̄ Ȳ = β̂2 XI2 − nβ̂2 X̄i2 = 0 X X ⇒ Xi Yi − nX̄ Ȳ = β̂2 ( Xi2 − nX̄ 2 ) P Xi Yi − nX̄ Ȳ ⇒ β̂2 = P (Xi − X̄)2 P ( Xi − X̄)(Yi − Ȳ ) P ⇒ β̂2 = (Xi − X̄)2 −2 4 (9) P xi yi β̂2 = P 2 xi 1.3 (10) CLRM: Assumptions Underlying the Method of Least Square Our objective is not only to estimate β̂1 and β̂2 but also to draw inferences about the true β1 and β2 that is, we want to know how close the estimated values, β̂1 and β̂2 are true to their counterparts in the population, β1 and β2 or how close is Ŷi estimate to the Yi . Thus, we need to make certain assumptions about the manner in which Yi are generated. Yi = β1 + β2 Xi + µi tells us that Yi depends on both Xi and µi . Therefore, unless we know about about Xi and µi , we may not be able to make any statistical inference about Yi and thereby β1 and β2 . From this point of view, the CLRM makes 10 assumptions: Assumption 1: Model is linear in the parameters, though it may or may not be linear in variables. Y i = β1 + β2 Xi + u i Assumption 2: X values are fixed in repeated sampling. Assumption 3: X values are independent of the error term.It is assumed that X variables and error term are independent, i.e. Cov (Xi , ui ) = 0 Cov(Xi , ui ) = E[ui − E(ui )][Xi − E(Xi )] = E[ui (Xi − E(Xi ))] since E(ui ) = 0 = E(ui Xi ) − E(Xi )E(ui ) = E(ui Xi ) =0 Assumption 4: Zero mean value of disturbance term ui . The mean value of the random error term is zero. E(ui ) = 0 Assumption 5: Homogeneity or equal variance of ui . Given the value of Xi , the 5 conditional variances of ui are identical. V ar(ui |Xi ) = E[ui − E(ui |Xi )]2 = E(u2i |Xi ) = σ2 because of assumption (3) Assumption 6: No auto correlation between the disturbances. Given any two X values Xi and Xj , i 6= j. The correlation between any two ui and uj ’s is zero Cov(ui , uj |xi , xj ) = E {[ui − E(ui )]|xi } {[uj − E(uj )]|xj } = E(ui |xi )(uj |xj ) =0 Assumption 7: The number of observations n must be greater than the number of parameters to be estimated. Assumption 8: Variability in X values: The X values in the given sample must not all be the same. Assumption 9: The regression model is correctly specified. In other words, there is no specification bias in the model. Assumption 10: There is not perfect multicolinearity: No perfect linear relationship among the explanatory variables. 6