Gram-Schmidt and QR Decomposition Example Suppose that 0 1 B 2 X =B @ 3 4 3 4 As on the slides, let 1 1 2 2 1 1 2 C C 2 A 1 X l = the matrix made of the …rst l columns of X and consider replacing X with Z having orthogonal columns for which C (Z l ) = C (X l ) for all l. 4 3 Take as the …rst column of Z 0 1 1 B 2 C C z 1 = x1 = B @ 3 A 4 Then take 1 1 B 1 C hx2 ; z 1 i C z1 = B @ 2 A hz 1 ; z 1 i 2 0 z 2 = x2 Note that 0 1 B B 1 B C B 17 B 2 C B B = 30 @ 3 A B B B 4 @ 0 13 30 4 30 9 30 8 30 1 C C C C C C C C A 8 + 27 32 =0 30 and that x1 and x2 are both linear combinations of z 1 ; z 2 , i.e. C (Z 2 ) = C (X 2 ). hz 1 ; z 2 i = 13 Then take z 3 = x3 hx3 ; z 1 i hx3 ; z 2 i z1 + z2 hz 1 ; z 1 i hz 2 ; z 2 i 0 1 1 B 2 C C =B @ 2 A 1 0 B 0 B B B 15 B B B B 30 @ B B @ 0 13 30 B 1 B 1 B C (15=30) B 2 C B + 3 A (330=900) B B B 4 @ It is easy to see that hz 1 ; z 3 i = 0 and hz 2 ; z 3 i = 0 and with 0 13 2 1 30 22 B B 4 26 B 2 30 22 B Z=B B 9 2 B 3 30 22 B @ 8 14 4 30 22 C (Z 3 ) = C (Z) = C (X) = C (X 3 ). 1 4 30 9 30 8 30 1 C C C C C C C C A 11 0 CC B CC B CC B CC B CC = B CC B CC B CC B AA @ 2 22 26 22 2 22 14 22 1 C C C C C C C C A Notice that for kj 8 < = : and 0 1 B B B 2 B X=B B B 3 B @ 4 Further, with 1=2 D = diag hz 1 ; z 1 i if j = k if j > k otherwise 0 0 1 B B B 0 ) = kj B @ 0 =( one has 1 hz k ;xj i hz k ;z k i 13 30 2 22 8 30 14 22 1=2 ; hz 2 ; z 2 i 1 15 11 0 1 0 C 1 C CB CB CB 0 CB C@ C A 0 26 22 9 30 15 30 1 2 22 4 30 17 30 ; hz 3 ; z 3 i 1=2 1 C C C C A 17 30 15 30 1 15 11 0 1 1 C C C=Z C A p = diag 30; one has X = ZD for Q = ZD 1 D p p 11=30; 20=11 = QR 1 and R = D . The matrix 0 1 13 2 1 30 22 B C0 1 p B C 4 26 C 30 B 2 30 22 C B B CB 0 Q=B B C@ 2 9 B 3 C 30 22 C B 0 @ A 14 8 4 30 22 0 q 0 30 11 0 q 11 20 0 1 C C= A 1 p z1; 30 r 30 z2; 11 r 11 z3 20 ! is a version of Z with columns of norm 1 (that thus form an orthonormal basis for C (X)). For any vector of responses/targets Y , q j since Thus Yb = 3 X the columns of Q, and Yb the projection of Y onto C (X), Y ; q j q j = QQ0 Y j=1 0 1 hY ; q 1 i Q0 Y = @ hY ; q 2 i A hY ; q 3 i Yb = QQ0 Y = QRR 1 Q0 Y = XR 1 Q0 Y which means that the ordinary least squares coe¢ cient vector is b OLS = R 1 Q0 Y OLS b OLS = x0 b and the OLS predictor of y for an arbitrary input x is fb(x) = y = x0 R 2 1 Q0 Y .