MATH 304 Linear Algebra Lecture 19: Least squares problems (continued).

advertisement
MATH 304
Linear Algebra
Lecture 19:
Least squares problems (continued).
Norms and inner products.
Orthogonal projection
Theorem 1 Let V be a subspace of Rn . Then
any vector x ∈ Rn is uniquely represented as
x = p + o, where p ∈ V and o ∈ V ⊥ .
In the above expansion, p is called the orthogonal
projection of the vector x onto the subspace V .
Theorem 2 kx − vk > kx − pk for any v 6= p in V .
Thus kok = kx − pk = min kx − vk is the
v∈V
distance from the vector x to the subspace V .
V⊥
x
o
p
V
Least squares solution
System of linear equations:

a11 x1 + a12 x2 + · · · + a1n xn = b1



a21 x1 + a22 x2 + · · · + a2n xn = b2
⇐⇒ Ax = b
·
·
·
·
·
·
·
·
·



am1 x1 + am2 x2 + · · · + amn xn = bm
For any x ∈ Rn define a residual r (x) = b − Ax.
The least squares solution x to the system is the
one that minimizes kr (x)k (or, equivalently, kr (x)k2 ).
2
kr (x)k =
m
X
i=1
(ai1 x1 + ai2 x2 + · · · + ain xn − bi )2
Let A be an m×n matrix and let b ∈ Rm .
Theorem A vector x̂ is a least squares solution of
the system Ax = b if and only if it is a solution of
the associated normal system AT Ax = AT b.
Proof: Ax is an arbitrary vector in R(A), the column space of
A. Hence the length of r (x) = b − Ax is minimal if Ax is the
orthogonal projection of b onto R(A). That is, if r (x) is
orthogonal to R(A).
We know that {row space}⊥ = {nullspace} for any matrix.
In particular, R(A)⊥ = N(AT ), the nullspace of the transpose
matrix of A. Thus x̂ is a least squares solution if and only if
AT r (x̂) = 0 ⇐⇒ AT (b − Ax̂) = 0 ⇐⇒ AT Ax̂ = AT b.
Corollary The normal system AT Ax = AT b is
always consistent.
Problem. Find the constant function that is the
least square fit to the following data
0 1 2 3
x
f (x) 1 0 1 2

 
 
1
1
c
=
1



1
0
c=0

 
=⇒ 
f (x) = c =⇒
 1  (c) =  1 
c
=
1


c =2
1
2
 
 
1
1
1
0

 
(1, 1, 1, 1) 
 1  (c) = (1, 1, 1, 1)  1 
2
1
c = 14 (1 + 0 + 1 + 2) = 1
(mean arithmetic value)
Problem. Find the linear polynomial that is the
least square fit to the following data
0 1 2 3
x
f (x) 1 0 1 2

 


1
1
0
c
=
1

1


0
 1 1  c1
c1 + c2 = 0
 

=⇒ 
f (x) = c1 + c2 x =⇒
 1 2  c2 =  1 
c
+
2c
=
1

1
2

 c + 3c = 2
2
1 3
1
2
 


1
1 0 0
 c1
1
1
1
1
1
1
1 1 1 1 
 


=


c2
0 1 2 3 1
1 2
0 1 2 3
2
1 3
c1 = 0.4
c1
4
4 6
⇐⇒
=
c2 = 0.4
8
c2
6 14
Problem. Find the quadratic polynomial that is the least
square fit to the following data
x
0 1 2 3
f (x) 1 0 1 2
f (x) = c1 + c2 x + c3 x 2



 
 
1
0
0
1
c
=
1

1


 1 1 1  c1
0
c1 + c2 + c3 = 0
   
=⇒ 
=⇒
 1 2 4  c2 =  1 
c
+
2c
+
4c
=
1

1
2
3

c3
 c + 3c + 9c = 2
1 3 9
2
1
2
3
 


 1
 1 0 0   

1
1
1
1
c
1 1 1 1 
1
 

 0 1 2 3   1 1 1   c2  =  0 1 2 3   0 
1
1 2 4
0 1 4 9
c3
0 1 4 9
2
1 3 9


   
4
c1
4 6 14
 c1 = 0.9
 6 14 36  c2  =  8  ⇐⇒
c2 = −1.1
 c = 0.5
22
c3
14 36 98
3
Norm
The notion of norm generalizes the notion of length
of a vector in Rn .
Definition. Let V be a vector space. A function
α : V → R is called a norm on V if it has the
following properties:
(i) α(x) ≥ 0, α(x) = 0 only for x = 0 (positivity)
(ii) α(r x) = |r | α(x) for all r ∈ R (homogeneity)
(iii) α(x + y) ≤ α(x) + α(y) (triangle inequality)
Notation. The norm of a vector x ∈ V is usually
denoted kxk. Different norms on V are
distinguished by subscripts, e.g., kxk1 and kxk2 .
Examples. V = Rn , x = (x1 , x2 , . . . , xn ) ∈ Rn .
• kxk∞ = max(|x1 |, |x2 |, . . . , |xn |).
Positivity and homogeneity are obvious.
The triangle inequality:
|xi + yi | ≤ |xi | + |yi | ≤ maxj |xj | + maxj |yj |
=⇒ maxj |xj + yj | ≤ maxj |xj | + maxj |yj |
• kxk1 = |x1 | + |x2 | + · · · + |xn |.
Positivity and homogeneity are obvious.
The triangle inequality: |xi + yi | ≤ |xi | + |yi |
P
P
P
=⇒
j |yj |
j |xj | +
j |xj + yj | ≤
Examples. V = Rn , x = (x1 , x2 , . . . , xn ) ∈ Rn .
1/p
• kxkp = |x1 |p + |x2 |p + · · · + |xn |p
, p > 0.
Remark. kxk2 = Euclidean length of x.
Theorem kxkp is a norm on Rn for any p ≥ 1.
Positivity and homogeneity are still obvious (and
hold for any p > 0). The triangle inequality for
p ≥ 1 is known as the Minkowski inequality:
1/p
|x1 + y1 |p + |x2 + y2 |p + · · · + |xn + yn |p
≤
1/p
1/p
.
+ |y1 |p + · · · + |yn |p
≤ |x1 |p + · · · + |xn |p
Normed vector space
Definition. A normed vector space is a vector
space endowed with a norm.
The norm defines a distance function on the normed
vector space: dist(x, y) = kx − yk.
Then we say that a sequence x1 , x2 , . . . converges
to a vector x if dist(x, xn ) → 0 as n → ∞.
Also, we say that a vector x is a good
approximation of a vector x0 if dist(x, x0 ) is small.
Unit circle: kxk = 1
kxk = (x12 + x22 )1/2
1/2
kxk = 21 x12 + x22
kxk = |x1 | + |x2 |
kxk = max(|x1 |, |x2 |)
black
green
blue
red
Examples. V = C [a, b], f : [a, b] → R.
• kf k∞ = max |f (x)|.
a≤x≤b
• kf k1 =
Z
b
|f (x)| dx.
a
• kf kp =
Z
a
b
p
|f (x)| dx
1/p
, p > 0.
Theorem kf kp is a norm on C [a, b] for any p ≥ 1.
Inner product
The notion of inner product generalizes the notion
of dot product of vectors in Rn .
Definition. Let V be a vector space. A function
β : V × V → R, usually denoted β(x, y) = hx, yi,
is called an inner product on V if it is positive,
symmetric, and bilinear. That is, if
(i) hx, xi ≥ 0, hx, xi = 0 only for x = 0 (positivity)
(ii) hx, yi = hy, xi
(symmetry)
(iii) hr x, yi = r hx, yi
(homogeneity)
(iv) hx + y, zi = hx, zi + hy, zi (distributive law)
An inner product space is a vector space endowed
with an inner product.
Examples. V = Rn .
• hx, yi = x · y = x1 y1 + x2 y2 + · · · + xn yn .
• hx, yi = d1 x1 y1 + d2 x2 y2 + · · · + dn xn yn ,
where d1 , d2 , . . . , dn > 0.
• hx, yi = (Dx) · (Dy),
where D is an invertible n×n matrix.
Remarks. (a) Invertibility of D is necessary to show
that hx, xi = 0 =⇒ x = 0.
(b) The second example is a particular case of the
1/2
1/2
1/2
third one when D = diag(d1 , d2 , . . . , dn ).
Counterexamples. V = R2 .
• hx, yi = x1 y1 − x2 y2 .
Let v = (1, 2), then hv, vi = 12 − 22 = −3.
hx, yi is symmetric and bilinear, but not positive.
• hx, yi = 2x1 y1 + x1 x2 + 2x2 y2 + y1 y2 .
v = (1, 1), w = (1, 0) =⇒ hv, wi = 3, h2v, wi = 8.
hx, yi is positive and symmetric, but not bilinear.
• hx, yi = x1 y1 + x1 y2 − x2 y1 + x2 y2 .
v = (1, 1), w = (1, 0) =⇒ hv, wi = 0, hw, vi = 2.
hx, yi is positive and bilinear, but not symmetric.
Download