The Geometry of Linear Regression Guochang Zhao RIEM, SWUFE Fall 2016 November 3, 2016 The Geometry of Vector Spaces Some notations I Denote the real line as R, and denote the set of n-vectors as Rn . I Denote the a Euclidean space in n dimensions as E n . I Addition, subtraction and multiplication of Euclidean space is the same as n × 1 vectors. I Scalar product: hx, yi ≡ xT y I The notation on the left is generally used in the context of the geometry of vectors, while the notation on the right is generally used in the context of matrix algebra. The Geometry of Vector Spaces Some notations I The scalar product is commutative: hx, yi = hy, xi. I The length, or norm, of a vector x is simply ||x|| = (xT x)1/2 I In scalar term, it is ||x|| = 2 X i=1 x2i !1/2 Pythagoras’ Theorem Pythagoras’ Theorem Pythagoras’ Theorem 2.2 The Geometry of Vector Spaces I 45 AC: hypotenuse ......... .......... .... ... .......... .......... ... .......... ... ......... . . . . . . ... . . . ........ . ... . . . . . . . . ... ......... . . . . . . . ... . ........ . . . ... . . . . . . ... ......... . . . . . . . . ... .... ... ............ ... ... ... ... ... ... ... ... ... ... ... 2 2 ... ... ... 1 2 ... ... ... ... ... ... ... ... .................................................................... ... ... ... ... .... ... ... ... ... ... ... ... 2 ... ... ... ... 2 ... .... ... ... ............ ... . ...................................................................... ... . . . . .... .... ... ... ... ... ..... ..... .... .... ... ... ... ... ... ... ... ... 2 ... ... .... .... 1 ... ... ..... ..... .... .... ... ... ... ... ... ... ... ... ... . ..................................................................................................................................... x +x C ...... ......... .... . . . . . . . ... ........ .. ........ x .. ......... . . . . . . . .. . . . . . . . . . .................................................................................... . . A B x Figure 2.1 Pythagoras’ Theorem other sides, AB and BC, of lengths x1 and x2 respectively. The squares on each of the three sides of the triangle are drawn, and the area of the square on the hypotenuse is shown as x21 + x22 , in accordance with the theorem. Pythagoras’ Theorem A beautiful proof of Pythagoras’ Theorem, not often found in geometry texts, is shown in Figure 2.2. Two squares of equal area are drawn. Each square contains four copies of the same right-angled triangle. The square on the left also contains the squares on the two shorter sides of the triangle, while the I Proof of Pythagoras theorem ..................................................................................................................................................................................................................... ... . .............. . ... .......... .... .... .......... ... .... .... .......... .......... .... .... .... .......... 2 . .... .......... . . .... .......... .... 2 ... . . . . . .... .......... .. .......... .... . . ... .......... .... .......... .... ... . .......................................................................................................................................................................................................................... . ....... .... .... .... .... .... .... .... ..... .... .... .... .... .... .... ... . .... . . . .... ... . .... .... ... .... . ... .... . . .... . ... .... .... ... 2 .... . ... .... . . .... . ... 1 .... .... ... .... . ... .... . . . .... ... .... .... ... .... ... ... .... .... ... .. .... ... ... .... ... ... .... .... ..... . .... .. . . ......................................................................................................................................................................................................... x x ..................................................................................................................................................................................................................... .. . ... . .......... .... ... .... ... .......... ... .......... ... .... ......... ... .... .......... .... ... .......... . . .... . . . . . . . .... . ... ......... . . .... . . . . . . . .... . ... ......... . .... . . . . . . . . . .... . . ... .... ................... . .... ... ............... . . .... ... ......... . .... ... .... .... . . .... ... .... ..... . .... ... . .... .... 2 2 . . ... .... ... .... . . 1 2 ... ... ... .... ... ... ... ... ... ... .... ...... ... .... .. ... ......... ... .... .......... .... ... .......... . .... . . . ... . . . . . .... ... ......... . .... . . . . . . . . .... ... ....... .... ... .......... .... ... .......... . . .... . . . . . . .... . ... ......... . . .... . . ... . . . . .... . ... ......... . .... . . . . . . . .... . ... .......... . .... . . . . . . . . . . ......................................................................................................................................................................................................... x +x Figure 2.2 Proof of Pythagoras’ Theorem c 1999, Russell Davidson and James G. MacKinnon Copyright � Pythagoras’ Theorem I Any vector x ∈ E 2 has two components, usually denoted as x1 and x2 ; these two components can be interpreted as the Cartesian coordinates of the vector in the plane. Pythagoras’ Theorem Pythagoras’ Theorem I 46 Any vector x ∈ E 2 has two components, usually denoted as x1 and x2 ; these two components can be interpreted as the Geometry Linear Regression Cartesian coordinates of the The vector in the of plane. B .. ......... ......... ... ........... ... ...... .... . . . . ... . . ... ... ...... ... .. ...... ... .. ...... . . . . ... .. . . . . . . ... .. . . . . . . ... .. . . . . . . ... .. . . . . . . ... .. . . . . . . ... . .. . . . . . .... . .. 2 . . . . . ... . .. . . . . . ... .. . . . . . . ... .. . . . . . . ... .. . . . . . . ... .. . . . . . . ... . .. . . . . .... . . .. . . . . . ... . .. . . . . . ... .. . . . . . . ... .. . . .... ....... . ............................................................................................................................................................................................................................................................................................................................................................................................................................................. . x O x1 Figure 2.3 A vector x in E 2 x A 2 The Geometryofofvectors Vector Spaces Addition 47 C ......... ...... .. ... ............... ... ........................ . . . . . . . . ... . . .. .......... ..... .. .... .......... .......... .... ... .......... . . .. . . . . . . . . . . . . ... . ... ..... .. .......... ..... ... ... .......... . . . . . . . . . ... . . . . . . . .. ... ..... .......... ... .. ..... .......... . . . . . . . . . . . . . ... . . . .. . .. .. ..... .......... ... .. .......... ..... .. ..... 2 ....... ...................................................................... . . . . . . . .... .. . .. ..... ... .. .. .. ..... .. ... .. ... ..... . . . . . ... ... .... . . .. .. ..... .. ... .... .. ..... .. .. .. .. ..... . . .. . .. . . . ... . . . .. ..... ... .. .. .. ... ..... .. .. .. ..... .... .. .. .. ....... .. . . .. ...... .... .. .. . .. .. .. 2 ................................................................................................................................................................ . . . .. ..... .......... ..... .. .. .. .. .......... ..... .... .. .. ... .......... ..... . . . . . .. . . . . . . .. . . . . . ... . . . . . . . . . . . . . . . . ... . . ... .. ........ . .... ... ......... . . .. .. . . . . . . . . .. . . . .. ... ......... ................... .... .... .. .. .. .... .. ... .. .... .. ..... .............. .. ... .. .. ..... .......... .. .. .... ....... .......... .. ... .................. .................................................................................................................................................................................................................................................................................................................................................... x y B x+y y y x A x O y1 x1 Figure 2.4 Addition of vectors 48 Multiplication by a scalar O I The Geometry of Linear Regression D ................. .............. .............. . . . . . . . . . . . . . . .............. .............. αx................................... . . ............ .............. .............. .............. . . . . . . . . . . . . . ....... .............. B .............. ........ ................ .............. C . . . . . . . . . . . . . ............ .............. .............. .............. . . . . . . . . . . . . . . . .............• .............. A .............. .............. . . . . . . . . . . . . . ... . . . . . . . . x . . . . . .......... .............. Figure 2.5 Multiplication by a scalar The vector αx is always parallel with x. For the norm, we have TheI Geometry of Scalar Products The scalar product of two vectors x and y, whether in E 2 or E n, can be T 1/2 ||αx|| = hαx, αxi1/2 = |α|(x |α|vectors · ||x|| and the expressed geometrically in terms of the lengths x) of the=two angle between them, and this result will turn out to be very useful. In the case of E 2, it is natural to think of the angle between two vectors as the angle between the two line segments that represent them. As we will now show, it The geometry of scalar products I The scalar product of two vectors x and y, can be expressed geometrically in terms of the lengths of the two vectors and the angle between them. I If the angle between two vectors is 0, they must be parallel. For example, y = αx, then hx, yi = hx, αxi = α(xT x) = α · ||x|| I If α > 0, it follows that hx, yi = ||x|| · ||y||. This result is true only if x and y are parallel and point in the same direction (rather than in opposite directions). The geometry of scalar products I Consider 2.2 The Geometry Vector Spaces two ofvectors, w and 49 θ z, both of length 1, and let denote the angle between them. O I I z. ............ B ..... .. ..... .... . . . . ... . . . . .... .. .... ..... ..... .. ..... . . . . ... ... . . . .... . ... . . . . .. ... . . . ... . ... . . . .... . ... . . . .. . ... . . . .... . ... . . . ... . ... . . . .. . ... . . . .... . ... . . . .. . ... . . . ... . ... . . . .... . ... . . . .. . . ........ . . ... . ... ..... . . . . .... ... θ ... . . . . . . . . . . . . . . . . . . . . .. ......................................................................................................................................................................................................................................... A w By elementaryFigure trigonometry, the coordinates of z must be 2.6 The angle between two vectors (cos θ, sin θ) the angle BOAproduct is given, byofthewusual rule, by the ratio of the The scalar andtrigonometric z is length of the opposite side AB to that of the hypotenuse OB. This ratio is sin θ/1 = sin θ, and so the angle BOA is indeed equal to θ. hw, zi = wT z = w1 z1 + w2 z2 = z1 = cos θ Now let us compute the scalar product of w and z. It is because w1 =�w, 1 z� and � 2 = 0. = ww z = w1 z1 + w2 z2 = z1 = cos θ, The geometry of scalar products I More generally, let x = αw and y = γz, for positive scalars α and γ. I Then ||x|| = α, ||y|| = γ, and hx, yi = xT y = αγwT z = αγhw, zi = ||x|| · ||y|| · cos θ I It is the same in E n as in E 2 . I If two nonzero scalar product, they are at right angles, they are called orthogonal, or, less commonly, perpendicular. I Since the consine function can take on values only between -1 and 1, a consequence of the above equation is that |xT y| ≤ ||x|| · ||x||. This is the Cauchy-Schwartz inequality. Subspaces of Euclidean Space I A subspace have a dimension lower than n. I A subspace of particular interest is the one for which the columns of X provide the basis vectors: The subspace ∆(x1 , x2 , ..., xk ) consists of every vector that can be formed as a linear combination of the xi , i = 1, ..., k. Formally, it is defined as ( ) k X ∆(x1 , x2 , ..., xk ) ≡ z ∈ E n |z = bi xi , bi ∈ R . i=1 I Here, x1 , x2 , ..., xk are the basis vector and they span this subspace ∆. I Correspondingly, ∆ is called the column space of X, or less formally, the span of X, or the span of the xi . 2.2 The Geometry of Vector Spaces Subspaces of Euclidean Space 51 ... ... ⊥ ... S (X ) ... ... ... ... ... ... ... ... ...... ... ........... ........... ... ........... S(X) ... ........... ... ........... . . . . . . . . . . . ... ............................ ... ... ...................... ... ............................ x .................. . . ... . . . . . . . . . . . . . . . . ... ............... .............. ............................ ... ....................................... . ......... ................. . . . . . . . . . . ........... O .... .......... ... ........... ... ... ... ... ... ... .. Figure 2.7 The spaces S(X ) and S⊥ (X ) The orthogonal complement of ∆(X) in E n , which is T (X), is the set of all vectors w in E n that are denoted The subspace ∆ defined in (2.09) is called the subspace spanned by the xi , i = 1, . . . , k, or the column space of X; less formally, it may simply be referred tooreverything toorthogonal as the span of X, the span of theinxi∆(X): . I The orthogonal complement of S(X)nin ETn, which is denoted S⊥ (X), is the ⊥ w in E n that are orthogonal set of all vectors ∆ (X) ≡ w ∈ E |w z =to 0everything for all in z S(X). ∈ ∆XThis. means that, for every z in S(X), �w, z� = w�z = 0. Formally, I � � � S⊥ (X) ≡ w ∈ E n ∆ | w⊥ z is = 0(n for− allk)-dimensional. z ∈ S(X) . If ∆ is k-dimensional, Subspaces of Euclidean Space I 52 The Geometry of Linear Regression The subspace in E 2 . x 2 ....... .... ... ... . . ... ... ... ... ... . . . ... ... b2 x2 ............... ............................................................................................................................................................. b1 x1 + b2 x2 ....... ... ... . .. .............. . . .............. ... ... .. ............. .. ... .............. ... .............. . . . ... . . . . . . . . . . . . . . . .... ... ... .............. .. ... .............. .. ... .............. .. ...... ......................... ... .. ... ... .... ..θ ................................................................................................................................................................................................................................................................. x1 O b1 x 1 Figure 2.8 A 2-dimensional subspace I 2.2 The Geometry of Vector Spaces 53 The geometry of linear regression model. ......... of the page, with as our second dimension the vertical direction in the plane ............ ...... .. the result that we can draw an arrow for x2 , as shown. ....... ... . ....... ..... ...... .. .. .. ....... of Figure Any vector in S(x1 , x2 ) can be drawn in the plane 2.8. Consider, ....... ... ....... ... by the expression ....... for instance, the linear combination of x1.........and x2 given .. . . . . . . . . . . . z ≡ b1 x1 + b2 x2 . We could draw the ...vector z by computing its length and ....... ... y........ ... ... u apply the rules for the angle that it makes with x1 . Alternatively, we could .. ....... .. ....... . . . . . . . . . adding vectors geometrically that 2.4 to the vectors ....... were illustrated in ..Figure ....... ... case in which b = 2/ ...... b1 x1 and b2 x2 . This is illustrated in the figure for the .. 1 3 ....... .. ....... . . . . . . . . and b2 = 1/2 . ...... ... ....... ..... ...... .. .. .. In precisely the same .............................way, ...........................we ...................can ......................represent ....................................................any .......................... three Xβ vectors by arrows in O 3--dimensional space, but we leave this task to the reader. It will be easier to appreciate Figure the renderings of vectorsofinthe three dimensions perspective that 2.9 The geometry linear regressionin model Linear independence I I I The Assumption 1.3 — X is full column rank — is equivalent to that the columns of X should be linearly independent. Linear independence is different from statistical independence. The vectors x1 through xk are said to be linearly dependent, if there is a vector xj , 1 ≤ j ≤ k, and coefficients ci such that X xj = ci xi i6=j I Or in other words, there exists coefficients bi , at least one of which is nonzero, such that k X i=1 bi xi = 0. Linear independence I linearly independent in matrix form: Xb = 0, where b is a k-vector with typical element bi . I Why does linearly dependent mean XT T is not invertible? I Consider and vector a, and suppose that XT Xa = c. I If XT X is invertible, then we could have (XT X)−1 c = a, I (*) However, linear independence also means XT X(a + b) = c, which wold give (XT X)−1 c = a + b. I Equations (*) and (#) cannot hold at the same time. (#) The Geometry of OLS estimation I The matrix product of Xβ can be written as β1 k X β2 Xβ = [x1 x2 · · · xk ] . = x β +x β +· · ·+x β = βi x i 1 1 2 2 k k .. i=1 βk I Because of the linear independence, any element of the subspace ∆(X) = ∆(x1 , ..., xk ), can be written as Xβ for some β. I Thus, every n-vector Xβ belongs to ∆(X), which is, in general, a k-dimensional subspace of E n . I The vector Xβ̂ constructed using the OLS estimator β̂ belongs to this subspace. I The estimator β̂ was obtained by solving the following equation: XT (y − Xβ̂) = 0. The Geometry of OLS estimation The Geometry of Linear Reg I The Geometry of OLS estimation O ........ .......... ...... .... . . . . . ... .... ...... .... ...... . . . . . ..... .... . . . . . ..... .... . . . . . ... ..... . . . . . .... ..... . . . . . ..... .... . . . . . ... û y............. .... .. . . . . . ... ..... . . . . . .... ..... . . . . . ..... .... . . . . . ..... ... . . . . . .... ..... . . . . . .... ..... . . . . . .......... . ................ . . . . . . . . .. . θ ..................................................................................................................................................................................................................................... Xβ̂ Figure 2.10 Residuals and fitted values ..... The Geometry of OLS estimation.....................................B I . ... ..... ... ..... ... ..... . . . ... . .. . . . ... . 2.3 The Geometry of OLS Estimation 57 y ....... ... . . . . ... û . .. . . . ... ....... . .. . . . . . . . . B . . . . ... . ..... ....... ... ..... . .. ..... x2 ..... .... .... ..... ..... ........... ..... ... ... ..... . ..... ......... . . . . . . . . .. ......... ............ .. ... S(x1 , x2 ) ..... ..... ... ..... ......... ..... ..... . . . . ... . y . . . . . . . . . . . . . . . . . . ......... . . X...β̂ .. ......................... . A . . . . . . . . . . . . ........ . . . . . . . . ..... û . . . . ......... ............ θ ........................................... .. .......... .. . ................................................................................................................................................................................................................................ . .. x1 ... x ..... Linear regression in three dimensions 2 ...... .. .. .......... .... ... ......... ..... ... ......... ..... S(x1 , x2 ) ............... ......... ..... .......... . . . . . . . . . . . . ......... . . . X...β̂ ........................ . A .... . . . . . . ........ . . . . . . . . . . . . . . . . . . . . . . ......... ........... θ ................................... ....................................................................................................................................................................................................................... .. x1 O a) y projected on two regressors O x.2 a) y projected on two regressors . ........ ... ... x2 ... ......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . ........................................................................................................................................................................................A A β̂2 x2 ....... β̂2 x2 ......... ...... ........................................................................................ ... .... ... ... .... ...... ............. B ....... ... ... ... .... ... ... . ......... ..... .............. B . . . . . . . ... . . ......... .......... ... ... ...... ... ......... ........ X.....β̂ .......... ... ... ... .... ....... ......... ............. ..... . . . . . . . X β̂ ... . . . y . . . .. . . . .. ... . ... ..... ... .. .... ...... ... ......... ............. ..... û ... ... .... ... ... . ......... ........y . . . . . . . . . . . . . . . . . . . .... û . . . . . . ... ... ... ..... ... . ....... ........... .. . . . . . . . . . . . . . . . . ... ... . . . . .. . ... .... ............. ............ ............ ................................................................................................................................ x ... ..... ... ..................................θ ......................................................................................... . . 1 ... ... ....... . . . ....... ... . O O A . ................ . . ........ . . X β̂ . β̂ x . ........................................................................................................................1.. x1 1 ............................θ ........................................................................................ O O A Xβ̂ β̂1 x1 b) The span S(x1 , x2 ) of the regressors c) The vertical plane through y Orthogonal Projections I A projection is a mapping that takes each point of E n into a point in a subspace of E n , while leaving all points in that subspace unchanged. I The subspace is called the invariant subspace of the projection. I An orthogonal projection maps any point into the point of the subspace that is closest to it. If a point is already in the invariant subspace, it is mapped into itself. I Algebraically, an orthogonal projection on to a given subspace can be performed by premultiplying the vector to be projected by a suitable projection matrix. P ≡ X(X0 X)−1 X0 , and M ≡ In − P n×n n×n Orthogonal Projections I Geometrically, an orthogonal decomposition y = PX y + MX y can be represented by a right-angled triangle, with y as the hypotenuse and PX y and MX y as the other two sides. I In terms of projections, the Pythagoras Theorem can be written as ||y||2 = ||PX y||2 + ||MX y||2 I For any orthogonal projection matrix PX and any vector y ∈ E n , ||PX y|| ≤ ||y||. The Frish-Waugh-Lovell Theorem I Adding a constant does not affect the slope coefficient 2.4 The Frisch-Waugh-Lovell Theorem Figure 2.12 z cι x D ...............................................................................................................B .. .... .. ... .. ... . . ... . ... ŷ .. α̂2 z .............................................................................................................................................................................................................................................................................................................................................................. . . . . . . . . . . A . . . ... C . . ... β̂2 x . .......... ... .. ... ......... ...... .... ... ... ......... . . . . . . . .. . . . ..... . . . .... .... ......... ... . ... .......... . . ... . . .. . . . . . . . . . ..... . . ... ... ......... . . ... . . . . . . . . . .... . . . . . ... ... .......... ... ... ... ......... ... .. ... ......... . . .... . . . . . . . . . . . ..... . . . ... ......... ... . ... ... .......... ... ... ... ... ......... . . ... . . . . . . . . . .... . . . . .. .. ......... ... ... ... .......... ... .... .... ... ......... . . . . . . . . . . . ..... ..... . . . ........ .. ... ... ......... ... ... .... ......... ... ... . . ..... ...... .................. . . . .. .. ..... .... ... .... ............. ... ... ................................................................................................................................................................................................................................................................................................................. ... .. .. ι O α̂ ι 1 β̂1 ι The Frish-Waugh-Lovell Theorem I The Geometry of Linea Orthogonal regressors may be ommited z ........ α̂2 z ... ... .. .. ŷ . .......................................................................................................................................................................................................... ... ... ... ... ... ... ... ... .. .. ... ... .... .... ... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ............ .. ............................................................................................................................................................................................................................. O α̂1 ι ι The group of regressors I Partition X into X1 and X2 , with k1 + k2 = k, then we have n×k n×k1 n×k2 y = X1 β 1 + X2 β 2 + u I (2.34) If all the regressors in X1 are orthogonal to all the regressors in 0 X2 , so that X2 X1 = 0, then the OLS estimate β̂ 1 obtained by the above equation is the same as it obtained from y = X1 β 1 + u1 (2.35) It is the same for the β̂ 2 . I In addition, we also have 0 0 P1 PX = PX P1 = P1 , where P1 ≡ PX1 = X1 (X1 X1 )−1 X1 I Thus, P1 y = P1 pX y = P1 (X1 β̂ 1 + X2 β̂ 2 ) = P1 X1 β̂ 1 = X1 β̂ 1 The Frish-Waugh-Lovell Theorem I If we transform the regressor matrix in (2.34) by adding X1 A to X2 , where A is a k1 × k2 matrix, and leaving X1 as it is, we have the regression y = X1 α1 + (X2 + X1 A)α2 + u (2.38) I The α̂ from Eq. (2.38) is the same as β̂ 2 from Eq. (2.34). I Define M1 ≡ I − P1 , so we can run the following regression: y = X1 α1 + M1 X2 α2 + u 0 0 = X1 X 1 + (X2 − X1 (X1 X1 )−1 X1 X2 )α2 + u I I α̂2 will be unchanged, if we omit X1 . This regression is a special case of Eq. (2.38). The Frish-Waugh-Lovell Theorem I The following two equations y = X1 α1 + M1 X2 β 2 + u (2.39) y = M1 X2 β 2 + v (2.40) have the same estimate of β̂ 2 I But they have different residuals: M1 y = M1 X2 β 2 + ν This regression is called the FWL regression. The Frish-Waugh-Lovell Theorem The Frish-Waugh-Lovell Theorem I The OLS estimates of β 2 from regressions (2.34) and (2.41) are numerically identical. I The residuals from regressions (2.34) and (2.41) are numerically identical. I Proof: I Remark: the FWL Theorem allows us to obtain explicit 0 0 expressions based on (X2 M1 X2 )−1 X2 M1 y for subsets of the parameter estimates of a linear regression. I Application and assignment: Using FWL theorem to handle the data with seasonal variation and time trend.