MATH 311 Topics in Applied Mathematics Lecture 18: Least squares problems. Norms and inner products. Overdetermined system of linear equations: x + 2y = 3 x + 2y = 3 3x + 2y = 5 ⇐⇒ −4y = −4 x + y = 2.09 −y = −0.91 No solution: inconsistent system Assume that a solution (x0 , y0 ) does exist but the system is not quite accurate, namely, there may be some errors in the right-hand sides. Problem. Find a good approximation of (x0 , y0 ). One approach is the least squares fit. Namely, we look for a pair (x, y ) that minimizes the sum (x + 2y − 3)2 + (3x + 2y − 5)2 + (x + y − 2.09)2 . Least squares solution System of linear equations: a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 ⇐⇒ Ax = b · · · · · · · · · am1 x1 + am2 x2 + · · · + amn xn = bm For any x ∈ Rn define a residual r (x) = b − Ax. The least squares solution x to the system is the one that minimizes kr (x)k (or, equivalently, kr (x)k2 ). 2 kr (x)k = m X i=1 (ai1 x1 + ai2 x2 + · · · + ain xn − bi )2 Let A be an m×n matrix and let b ∈ Rm . Theorem A vector x̂ is a least squares solution of the system Ax = b if and only if it is a solution of the associated normal system AT Ax = AT b. Proof: Ax is an arbitrary vector in R(A), the column space of A. Hence the length of r (x) = b − Ax is minimal if Ax is the orthogonal projection of b onto R(A). That is, if r (x) is orthogonal to R(A). We know that R(A)⊥ = N(AT ), the nullspace of the transpose matrix. Thus x̂ is a least squares solution if and only if AT r (x̂) = 0 ⇐⇒ AT (b − Ax̂) = 0 ⇐⇒ AT Ax̂ = AT b. Problem. Find the least squares solution to x + 2y = 3 3x + 2y = 5 x + y = 2.09 1 2 3 3 2 x = 5 y 1 1 2.09 1 2 3 x 1 3 1 1 3 1 = 5 3 2 y 2 2 1 2 2 1 2.09 1 1 20.09 x =1 11 9 x = ⇐⇒ 18.09 y = 1.01 9 9 y Problem. Find the constant function that is the least square fit to the following data 0 1 2 3 x f (x) 1 0 1 2 1 1 c = 1 1 0 c=0 =⇒ f (x) = c =⇒ 1 (c) = 1 c = 1 c =2 1 2 1 1 1 0 (1, 1, 1, 1) 1 (c) = (1, 1, 1, 1) 1 2 1 c = 14 (1 + 0 + 1 + 2) = 1 (mean arithmetic value) Problem. Find the linear polynomial that is the least square fit to the following data 0 1 2 3 x f (x) 1 0 1 2 1 1 0 c = 1 1 0 1 1 c1 c1 + c2 = 0 =⇒ f (x) = c1 + c2 x =⇒ 1 2 c2 = 1 c + 2c = 1 1 2 c + 3c = 2 2 1 3 1 2 1 1 0 0 c1 1 1 1 1 1 1 1 1 1 1 = c2 0 1 2 3 1 1 2 0 1 2 3 2 1 3 c1 = 0.4 c1 4 4 6 ⇐⇒ = c2 = 0.4 8 c2 6 14 Norm The notion of norm generalizes the notion of length of a vector in Rn . Definition. Let V be a vector space. A function α : V → R is called a norm on V if it has the following properties: (i) α(x) ≥ 0, α(x) = 0 only for x = 0 (positivity) (ii) α(r x) = |r | α(x) for all r ∈ R (homogeneity) (iii) α(x + y) ≤ α(x) + α(y) (triangle inequality) Notation. The norm of a vector x ∈ V is usually denoted kxk. Different norms on V are distinguished by subscripts, e.g., kxk1 and kxk2 . Examples. V = Rn , x = (x1 , x2 , . . . , xn ) ∈ Rn . • kxk∞ = max(|x1 |, |x2 |, . . . , |xn |). Positivity and homogeneity are obvious. The triangle inequality: |xi + yi | ≤ |xi | + |yi | ≤ maxj |xj | + maxj |yj | =⇒ maxj |xj + yj | ≤ maxj |xj | + maxj |yj | • kxk1 = |x1 | + |x2 | + · · · + |xn |. Positivity and homogeneity are obvious. The triangle inequality: |xi + yi | ≤ |xi | + |yi | P P P =⇒ j |yj | j |xj | + j |xj + yj | ≤ Examples. V = Rn , x = (x1 , x2 , . . . , xn ) ∈ Rn . 1/p • kxkp = |x1 |p + |x2 |p + · · · + |xn |p , p > 0. Theorem kxkp is a norm on Rn for any p ≥ 1. Remark. kxk2 = Euclidean length of x. Definition. A normed vector space is a vector space endowed with a norm. The norm defines a distance function on the normed vector space: dist(x, y) = kx − yk. Then we say that a sequence x1 , x2 , . . . converges to a vector x if dist(x, xn ) → 0 as n → ∞. Unit circle: kxk = 1 kxk = (x12 + x22 )1/2 1/2 kxk = 21 x12 + x22 kxk = |x1 | + |x2 | kxk = max(|x1 |, |x2 |) black green blue red Examples. V = C [a, b], f : [a, b] → R. • kf k∞ = max |f (x)|. a≤x≤b • kf k1 = Z b |f (x)| dx. a • kf kp = Z a b p |f (x)| dx 1/p , p > 0. Theorem kf kp is a norm on C [a, b] for any p ≥ 1. Inner product The notion of inner product generalizes the notion of dot product of vectors in Rn . Definition. Let V be a vector space. A function β : V × V → R, usually denoted β(x, y) = hx, yi, is called an inner product on V if it is positive, symmetric, and bilinear. That is, if (i) hx, xi ≥ 0, hx, xi = 0 only for x = 0 (positivity) (ii) hx, yi = hy, xi (symmetry) (iii) hr x, yi = r hx, yi (homogeneity) (iv) hx + y, zi = hx, zi + hy, zi (distributive law) An inner product space is a vector space endowed with an inner product. Examples. V = Rn . • hx, yi = x · y = x1 y1 + x2 y2 + · · · + xn yn . • hx, yi = d1 x1 y1 + d2 x2 y2 + · · · + dn xn yn , where d1 , d2 , . . . , dn > 0. • hx, yi = (Dx) · (Dy), where D is an invertible n×n matrix. Remarks. (a) Invertibility of D is necessary to show that hx, xi = 0 =⇒ x = 0. (b) The second example is a particular case of the 1/2 1/2 1/2 third one when D = diag(d1 , d2 , . . . , dn ). Problem. Find an inner product on R2 such that he1 , e1 i = 2, he2 , e2 i = 3, and he1 , e2 i = −1, where e1 = (1, 0), e2 = (0, 1). Let x = (x1 , x2 ), y = (y1 , y2 ) ∈ R2 . Then x = x1 e1 + x2 e2 , y = y1 e1 + y2 e2 . Using bilinearity, we obtain hx, yi = hx1 e1 + x2 e2 , y1 e1 + y2 e2 i = x1 he1 , y1 e1 + y2 e2 i + x2 he2 , y1 e1 + y2 e2 i = x1 y1 he1 , e1 i + x1 y2 he1 , e2 i + x2 y1 he2 , e1 i + x2 y2 he2 , e2 i = 2x1 y1 − x1 y2 − x2 y1 + 3x2 y2 . Examples. V = C [a, b]. Z b f (x)g (x) dx. • hf , g i = a • hf , g i = Z b f (x)g (x)w (x) dx, a where w is bounded, piecewise continuous, and w > 0 everywhere on [a, b]. w is called the weight function. Theorem Suppose hx, yi is an inner product on a vector space V . Then hx, yi2 ≤ hx, xihy, yi for all x, y ∈ V . Proof: For any t ∈ R let vt = x + ty. Then hvt , vt i = hx, xi + 2thx, yi + t 2 hy, yi. The right-hand side is a quadratic polynomial in t (provided that y 6= 0). Since hvt , vt i ≥ 0 for all t, the discriminant D is nonpositive. But D = 4hx, yi2 − 4hx, xihy, yi. Cauchy-Schwarz Inequality: p p |hx, yi| ≤ hx, xi hy, yi. Cauchy-Schwarz Inequality: p p |hx, yi| ≤ hx, xi hy, yi. Corollary 1 |x · y| ≤ |x| |y| for all x, y ∈ Rn . Equivalently, for all xi , yi ∈ R, (x1 y1 + · · · + xn yn )2 ≤ (x12 + · · · + xn2 )(y12 + · · · + yn2 ). Corollary 2 For any f , g ∈ C [a, b], Z b 2 Z b Z 2 f (x)g (x) dx ≤ |f (x)| dx · a a a b |g (x)|2 dx. Norms induced by inner products Theorem Suppose hx, yi is an pinner product on a vector space V . Then kxk = hx, xi is a norm. Proof: Positivity is obvious. Homogeneity: p p p kr xk = hr x, r xi = r 2 hx, xi = |r | hx, xi. Triangle inequality (follows from Cauchy-Schwarz’s): kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi ≤ hx, xi + |hx, yi| + |hy, xi| + hy, yi ≤ kxk2 + 2kxk kyk + kyk2 = (kxk + kyk)2 . Examples. • The length of a vector in Rn , p |x| = x12 + x22 + · · · + xn2 , is the norm induced by the dot product x · y = x1 y1 + x2 y2 + · · · + xn yn . • The norm kf k2 = Z a b |f (x)|2 dx 1/2 on the vector space C [a, b] is induced by the inner product Z b f (x)g (x) dx. hf , g i = a Angle Since |hx, yi| ≤ kxk kyk, we can define the angle between nonzero vectors in any vector space with an inner product (and induced norm): hx, yi ∠(x, y) = arccos . kxk kyk Then hx, yi = kxk kyk cos ∠(x, y). In particular, vectors x and y are orthogonal (denoted x ⊥ y) if hx, yi = 0. x+y x y Pythagorean Theorem: x ⊥ y =⇒ kx + yk2 = kxk2 + kyk2 Proof: kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = hx, xi + hy, yi = kxk2 + kyk2 . y x−y x+y x x y Parallelogram Identity: kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 Proof: kx+yk2 = hx+y, x+yi = hx, xi + hx, yi + hy, xi + hy, yi. Similarly, kx−yk2 = hx, xi − hx, yi − hy, xi + hy, yi. Then kx+yk2 + kx−yk2 = 2hx, xi + 2hy, yi = 2kxk2 + 2kyk2 .