MATH 311 Topics in Applied Mathematics Lecture 18: Least squares problems.

advertisement
MATH 311
Topics in Applied Mathematics
Lecture 18:
Least squares problems.
Norms and inner products.
Overdetermined system of linear equations:


 x + 2y = 3
 x + 2y = 3
3x + 2y = 5 ⇐⇒
−4y = −4


x + y = 2.09
−y = −0.91
No solution: inconsistent system
Assume that a solution (x0 , y0 ) does exist but the
system is not quite accurate, namely, there may be
some errors in the right-hand sides.
Problem. Find a good approximation of (x0 , y0 ).
One approach is the least squares fit. Namely, we
look for a pair (x, y ) that minimizes the sum
(x + 2y − 3)2 + (3x + 2y − 5)2 + (x + y − 2.09)2 .
Least squares solution
System of linear equations:

a11 x1 + a12 x2 + · · · + a1n xn = b1



a21 x1 + a22 x2 + · · · + a2n xn = b2
⇐⇒ Ax = b
·
·
·
·
·
·
·
·
·



am1 x1 + am2 x2 + · · · + amn xn = bm
For any x ∈ Rn define a residual r (x) = b − Ax.
The least squares solution x to the system is the
one that minimizes kr (x)k (or, equivalently, kr (x)k2 ).
2
kr (x)k =
m
X
i=1
(ai1 x1 + ai2 x2 + · · · + ain xn − bi )2
Let A be an m×n matrix and let b ∈ Rm .
Theorem A vector x̂ is a least squares solution of
the system Ax = b if and only if it is a solution of
the associated normal system AT Ax = AT b.
Proof: Ax is an arbitrary vector in R(A), the column space of
A. Hence the length of r (x) = b − Ax is minimal if Ax is the
orthogonal projection of b onto R(A). That is, if r (x) is
orthogonal to R(A).
We know that R(A)⊥ = N(AT ), the nullspace of the
transpose matrix. Thus x̂ is a least squares solution if and
only if
AT r (x̂) = 0 ⇐⇒ AT (b − Ax̂) = 0 ⇐⇒ AT Ax̂ = AT b.
Problem. Find the least squares solution to

 x + 2y = 3
3x + 2y = 5

x + y = 2.09




1 2 3
3 2 x =  5 
y
1 1
2.09




1 2 3
x
1 3 1 
1 3 1 
=
5 
3 2
y
2 2 1
2 2 1
2.09
1 1
20.09
x =1
11 9
x
=
⇐⇒
18.09
y = 1.01
9 9
y
Problem. Find the constant function that is the
least square fit to the following data
0 1 2 3
x
f (x) 1 0 1 2

 
 
1
1
c
=
1



1
0
c=0

 
=⇒ 
f (x) = c =⇒
 1  (c) =  1 
c
=
1


c =2
1
2
 
 
1
1
1
0

 
(1, 1, 1, 1) 
 1  (c) = (1, 1, 1, 1)  1 
2
1
c = 14 (1 + 0 + 1 + 2) = 1
(mean arithmetic value)
Problem. Find the linear polynomial that is the
least square fit to the following data
0 1 2 3
x
f (x) 1 0 1 2

 


1
1
0
c
=
1

1


0
 1 1  c1
c1 + c2 = 0
 

=⇒ 
f (x) = c1 + c2 x =⇒
 1 2  c2 =  1 
c
+
2c
=
1

1
2

 c + 3c = 2
2
1 3
1
2
 


1
1 0 0
 c1
1
1
1
1
1
1
1 1 1 1 
 


=


c2
0 1 2 3 1
1 2
0 1 2 3
2
1 3
c1 = 0.4
c1
4
4 6
⇐⇒
=
c2 = 0.4
8
c2
6 14
Norm
The notion of norm generalizes the notion of length
of a vector in Rn .
Definition. Let V be a vector space. A function
α : V → R is called a norm on V if it has the
following properties:
(i) α(x) ≥ 0, α(x) = 0 only for x = 0 (positivity)
(ii) α(r x) = |r | α(x) for all r ∈ R (homogeneity)
(iii) α(x + y) ≤ α(x) + α(y) (triangle inequality)
Notation. The norm of a vector x ∈ V is usually
denoted kxk. Different norms on V are
distinguished by subscripts, e.g., kxk1 and kxk2 .
Examples. V = Rn , x = (x1 , x2 , . . . , xn ) ∈ Rn .
• kxk∞ = max(|x1 |, |x2 |, . . . , |xn |).
Positivity and homogeneity are obvious.
The triangle inequality:
|xi + yi | ≤ |xi | + |yi | ≤ maxj |xj | + maxj |yj |
=⇒ maxj |xj + yj | ≤ maxj |xj | + maxj |yj |
• kxk1 = |x1 | + |x2 | + · · · + |xn |.
Positivity and homogeneity are obvious.
The triangle inequality: |xi + yi | ≤ |xi | + |yi |
P
P
P
=⇒
j |yj |
j |xj | +
j |xj + yj | ≤
Examples. V = Rn , x = (x1 , x2 , . . . , xn ) ∈ Rn .
1/p
• kxkp = |x1 |p + |x2 |p + · · · + |xn |p
, p > 0.
Theorem kxkp is a norm on Rn for any p ≥ 1.
Remark. kxk2 = Euclidean length of x.
Definition. A normed vector space is a vector
space endowed with a norm.
The norm defines a distance function on the normed
vector space: dist(x, y) = kx − yk.
Then we say that a sequence x1 , x2 , . . . converges
to a vector x if dist(x, xn ) → 0 as n → ∞.
Unit circle: kxk = 1
kxk = (x12 + x22 )1/2
1/2
kxk = 21 x12 + x22
kxk = |x1 | + |x2 |
kxk = max(|x1 |, |x2 |)
black
green
blue
red
Examples. V = C [a, b], f : [a, b] → R.
• kf k∞ = max |f (x)|.
a≤x≤b
• kf k1 =
Z
b
|f (x)| dx.
a
• kf kp =
Z
a
b
p
|f (x)| dx
1/p
, p > 0.
Theorem kf kp is a norm on C [a, b] for any p ≥ 1.
Inner product
The notion of inner product generalizes the notion
of dot product of vectors in Rn .
Definition. Let V be a vector space. A function
β : V × V → R, usually denoted β(x, y) = hx, yi, is
called an inner product on V if it is positive,
symmetric, and bilinear. That is, if
(i) hx, xi ≥ 0, hx, xi = 0 only for x = 0 (positivity)
(ii) hx, yi = hy, xi
(symmetry)
(iii) hr x, yi = r hx, yi
(homogeneity)
(iv) hx + y, zi = hx, zi + hy, zi (distributive law)
An inner product space is a vector space endowed
with an inner product.
Examples. V = Rn .
• hx, yi = x · y = x1 y1 + x2 y2 + · · · + xn yn .
• hx, yi = d1 x1 y1 + d2 x2 y2 + · · · + dn xn yn ,
where d1 , d2 , . . . , dn > 0.
• hx, yi = (Dx) · (Dy),
where D is an invertible n×n matrix.
Remarks. (a) Invertibility of D is necessary to show
that hx, xi = 0 =⇒ x = 0.
(b) The second example is a particular case of the
1/2
1/2
1/2
third one when D = diag(d1 , d2 , . . . , dn ).
Problem. Find an inner product on R2 such that
he1 , e1 i = 2, he2 , e2 i = 3, and he1 , e2 i = −1,
where e1 = (1, 0), e2 = (0, 1).
Let x = (x1 , x2 ), y = (y1 , y2 ) ∈ R2 .
Then x = x1 e1 + x2 e2 , y = y1 e1 + y2 e2 .
Using bilinearity, we obtain
hx, yi = hx1 e1 + x2 e2 , y1 e1 + y2 e2 i
= x1 he1 , y1 e1 + y2 e2 i + x2 he2 , y1 e1 + y2 e2 i
= x1 y1 he1 , e1 i + x1 y2 he1 , e2 i + x2 y1 he2 , e1 i + x2 y2 he2 , e2 i
= 2x1 y1 − x1 y2 − x2 y1 + 3x2 y2 .
Examples. V = C [a, b].
Z b
f (x)g (x) dx.
• hf , g i =
a
• hf , g i =
Z
b
f (x)g (x)w (x) dx,
a
where w is bounded, piecewise continuous, and
w > 0 everywhere on [a, b].
w is called the weight function.
Theorem Suppose hx, yi is an inner product on a
vector space V . Then
hx, yi2 ≤ hx, xihy, yi for all x, y ∈ V .
Proof: For any t ∈ R let vt = x + ty. Then
hvt , vt i = hx, xi + 2thx, yi + t 2 hy, yi.
The right-hand side is a quadratic polynomial in t
(provided that y 6= 0). Since hvt , vt i ≥ 0 for all t,
the discriminant D is nonpositive. But
D = 4hx, yi2 − 4hx, xihy, yi.
Cauchy-Schwarz Inequality:
p
p
|hx, yi| ≤ hx, xi hy, yi.
Cauchy-Schwarz Inequality:
p
p
|hx, yi| ≤ hx, xi hy, yi.
Corollary 1 |x · y| ≤ |x| |y| for all x, y ∈ Rn .
Equivalently, for all xi , yi ∈ R,
(x1 y1 + · · · + xn yn )2 ≤ (x12 + · · · + xn2 )(y12 + · · · + yn2 ).
Corollary 2 For any f , g ∈ C [a, b],
Z b
2 Z b
Z
2
f (x)g (x) dx ≤
|f (x)| dx ·
a
a
a
b
|g (x)|2 dx.
Norms induced by inner products
Theorem Suppose hx, yi is an
pinner product on a
vector space V . Then kxk = hx, xi is a norm.
Proof: Positivity is obvious. Homogeneity:
p
p
p
kr xk = hr x, r xi = r 2 hx, xi = |r | hx, xi.
Triangle inequality (follows from Cauchy-Schwarz’s):
kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
≤ hx, xi + |hx, yi| + |hy, xi| + hy, yi
≤ kxk2 + 2kxk kyk + kyk2 = (kxk + kyk)2 .
Examples. • The length of a vector in Rn ,
p
|x| = x12 + x22 + · · · + xn2 ,
is the norm induced by the dot product
x · y = x1 y1 + x2 y2 + · · · + xn yn .
• The norm kf k2 =
Z
a
b
|f (x)|2 dx
1/2
on the
vector space C [a, b] is induced by the inner product
Z b
f (x)g (x) dx.
hf , g i =
a
Angle
Since |hx, yi| ≤ kxk kyk, we can define the angle
between nonzero vectors in any vector space with
an inner product (and induced norm):
hx, yi
∠(x, y) = arccos
.
kxk kyk
Then hx, yi = kxk kyk cos ∠(x, y).
In particular, vectors x and y are orthogonal
(denoted x ⊥ y) if hx, yi = 0.
x+y
x
y
Pythagorean Theorem:
x ⊥ y =⇒ kx + yk2 = kxk2 + kyk2
Proof:
kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= hx, xi + hy, yi = kxk2 + kyk2 .
y
x−y
x+y
x
x
y
Parallelogram Identity:
kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2
Proof: kx+yk2 = hx+y, x+yi = hx, xi + hx, yi + hy, xi + hy, yi.
Similarly, kx−yk2 = hx, xi − hx, yi − hy, xi + hy, yi.
Then kx+yk2 + kx−yk2 = 2hx, xi + 2hy, yi = 2kxk2 + 2kyk2 .
Download