Least Squares Fit

advertisement
Jim Lambers
MAT 419/519
Summer Session 2011-12
Lecture 13 Notes
These notes correspond to Section 4.1 in the text.
Least Squares Fit
One of the most fundamental problems in science and engineering is data fitting–constructing a
function that, in some sense, conforms to given data points. One type of data-fitting technique
is interpolation. Interpolation techniques, of any kind, construct functions that agree exactly with
the data. That is, given points (x1 , y1 ), (x2 , y2 ), . . ., (xm , ym ), interpolation yields a function f (x)
such that f (xi ) = yi for i = 1, 2, . . . , m.
However, fitting the data exactly may not be the best approach to describing the data with
a function. High-degree polynomial interpolation can yield oscillatory functions that behave very
differently than a smooth function from which the data is obtained. Also, it may be pointless to
try to fit data exactly, for if it is obtained by previous measurements or other computations, it may
be erroneous. Therefore, we consider another notion of what constitutes a “best fit” of given data
by a function.
One alternative approach to data fitting is to solve the minimax problem, which is the problem
of finding a function f (x) of a given form for which
max |f (xi ) − yi |
1≤i≤n
is minimized. However, this is a very difficult problem to solve.
Another approach is to minimize the total absolute deviation of f (x) from the data. That is,
we seek a function f (x) of a given form for which
m
X
|f (xi ) − yi |
i=1
is minimized. However, we cannot apply standard minimization techniques to this function, because, like the absolute value function that it employs, it is not differentiable.
This defect is overcome by considering the problem of finding f (x) of a given form for which
m
X
[f (xi ) − yi ]2
i=1
is minimized. This is known as the least squares problem. We will first show how this problem is
solved for the case where f (x) is a linear function of the form f (x) = a1 x + a0 , and then generalize
this solution to other types of functions.
1
When f (x) is linear, the least squares problem is the problem of finding constants a0 and a1
such that the function
m
X
E(a0 , a1 ) =
(a1 xi + a0 − yi )2
i=1
is minimized. In order to minimize this function of a0 and a1 , we must compute its partial derivatives
with respect to a0 and a1 . This yields
m
m
X
∂E
=
2(a1 xi + a0 − yi ),
∂a0
X
∂E
=
2(a1 xi + a0 − yi )xi .
∂a1
i=1
i=1
At a minimum, both of these partial derivatives must be equal to zero. This yields the system of
linear equations
!
m
m
X
X
xi a1 =
yi ,
ma0 +
m
X
!
xi
a0 +
i=1
i=1
m
X
!
x2i
i=1
a1 =
i=1
m
X
xi yi .
i=1
These equations are called the normal equations.
Using the formula for the inverse of a 2 × 2 matrix,
−1
1
d −b
a b
,
=
a
c d
ad − bc −c
we obtain the solutions
Pm
Pm
P
( i=1 yi ) − ( m
i=1 xi ) ( i=1 xi yi )
,
P
Pm
2
2
m m
i=1 xi − ( i=1 xi )
Pm
Pm
P
m m
i=1 xi yi − ( i=1 xi ) ( i=1 yi )
.
Pm
P
2
2
m m
i=1 xi − ( i=1 xi )
Pm
a0 =
a1 =
2
i=1 xi
Example We wish to find the linear function y = a1 x + a0 that best approximates the data shown
in Table 1, in the least-squares sense. Using the summations
m
X
xi = 56.2933,
i=1
m
X
m
X
x2i = 380.5426,
i=1
i=1
yi = 73.8373,
m
X
xi yi = 485.9487,
i=1
we obtain
a0 =
a1 =
380.5426 · 73.8373 − 56.2933 · 485.9487
742.5703
=
= 1.1667,
2
10 · 380.5426 − 56.2933
636.4906
10 · 485.9487 − 56.2933 · 73.8373
702.9438
=
= 1.1044.
2
10 · 380.5426 − 56.2933
636.4906
2
i
1
2
3
4
5
6
7
8
9
10
xi
2.0774
2.3049
3.0125
4.7092
5.5016
5.8704
6.2248
8.4431
8.7594
9.3900
yi
3.3123
3.8982
4.6500
6.5576
7.5173
7.0415
7.7497
11.0451
9.8179
12.2477
Table 1: Data points (xi , yi ), for i = 1, 2, . . . , 10, to be fit by a linear function
We conclude that the linear function that best fits this data in the least-squares sense is
y = 1.1044x + 1.1667.
The data, and this function, are shown in Figure 1. 2
It is interesting to note that if we define the m × 2 matrix A, the 2-vector a, and the m-vector
y by




1 x1
y1
 1 x2 
 y2 
a0




A= .
, a=
, y =  . ,

.
.
.
.
a
 .
 . 
1
. 
1 xm
ym
then a is the solution to the system of equations
AT Aa = AT y.
These equations are the normal equations defined earlier, written in matrix-vector form. They arise
from the problem of finding the vector a such that
kAa − yk
is minimized, where, for any vector u, kuk is the magnitude, or length, of u.
This magnitude is equivalent to the square root of the expression we originally intended to
minimize,
m
X
(a1 xi + a0 − yi )2 ,
i=1
3
Figure 1: Data points (xi , yi ) (circles) and least-squares line (solid line)
but we will see that the normal equations also characterize the solution a, an n-vector, to the more
general linear least squares problem of minimizing kAa − yk for any matrix A that is m × n, where
m ≥ n, and whose columns are linearly independent.
We now consider the problem of finding a polynomial of degree n that gives the best leastsquares fit. As before, let (x1 , y1 ), (x2 , y2 ), . . ., (xm , ym ) be given data points that need to be
approximated by a polynomial of degree n. We assume that n < m − 1, for otherwise, we can use
polynomial interpolation to fit the points exactly.
Let the least-squares polynomial have the form
pn (x) =
n
X
aj xj .
j=0
Our goal is to minimize the sum of squares of the deviations in pn (x) from each y-value,

2
m
m
n
X
X
X

E(a) =
[pn (xi ) − yi ]2 =
aj xji − yi  ,
i=1
i=1
4
j=0
where a is a column vector of the unknown coefficients of pn (x),


a0
 a1 


a =  . .
 .. 
an
Differentiating this function with respect to each ak yields


m
n
X
X
∂E
=
2
aj xji − yi  xki , k = 0, 1, . . . , n.
∂ak
i=1
j=0
Setting each of these partial derivatives equal to zero yields the system of equations
!
n
m
m
X
X
X
j+k
xi
aj =
xki yi , k = 0, 1, . . . , n.
j=0
i=1
i=1
These are the normal equations. They are a generalization of the normal equations previously
defined for the linear case, where n = 1. Solving this system yields the coefficients {aj }nj=0 of the
least-squares polynomial pn (x).
As in the linear case, the normal equations can be written in matrix-vector form
AT Aa = AT y,
where




A=


1
1
1
..
.
x0
x1
x2
..
.
x20
x21
x22
..
.
···
···
···
..
.
xn0
xn1
xn2
..
.
1 xm x2m · · · xnm




,





a=

a0
a1
..
.
an



,




y=

y1
y2
..
.



.

yn
The normal equations equations can be used to compute the coefficients of any linear combination
of functions {φj (x)}nj=0 that best fits data in the least-squares sense, provided that these functions
are linearly independent. In this general case, the entries of the matrix A are given by aij = φi (xj ),
for i = 1, 2, . . . , m and j = 0, 1, . . . , n.
Example We wish to find the quadratic function y = a2 x2 + a1 x + a0 that best approximates the
data shown in Table 2, in the least-squares sense. By defining




1 x1 x21
y1


a0
 1 x2 x2 
 y2 
2 





a
A= .
,
a
=
,
y
=
 ..  ,

..
..
1
 ..


. 
.
.
a2
2
1 x10 x10
y10
5
i
1
2
3
4
5
6
7
8
9
10
xi
2.0774
2.3049
3.0125
4.7092
5.5016
5.8704
6.2248
8.4431
8.7594
9.3900
yi
2.7212
3.7798
4.8774
6.6596
10.5966
9.8786
10.5232
23.3574
24.0510
27.4827
Table 2: Data points (xi , yi ), for i = 1, 2, . . . , 10, to be fit by a quadratic function
and solving the normal equations
AT Aa = AT y,
we obtain the coefficients
c0 = 4.7681,
c1 = −1.5193,
c2 = 0.4251,
and conclude that the quadratic function that best fits this data in the least-squares sense is
y = 0.4251x2 − 1.5193x + 4.7681.
The data, and this function, are shown in Figure 2. 2
Least-squares fitting can also be used to fit data with functions that are not linear combinations
of functions such as polynomials. Suppose we believe that given data points can best be matched
to an exponential function of the form y = beax , where the constants a and b are unknown. Taking
the natural logarithm of both sides of this equation yields
ln y = ln b + ax.
If we define z = ln y and c = ln b, then the problem of fitting the original data points {(xi , yi )}m
i=1
with an exponential function is transformed into the problem of fitting the data points {(xi , zi )}m
i=1
with a linear function of the form c + ax, for unknown constants a and c.
Similarly, suppose the given data is believed to approximately conform to a function of the form
y = bxa , where the constants a and b are unknown. Taking the natural logarithm of both sides of
this equation yields
ln y = ln b + a ln x.
6
Figure 2: Data points (xi , yi ) (circles) and quadratic least-squares fit (solid curve)
If we define z = ln y, c = ln b and w = ln x, then the problem of fitting the original data points
{(xi , yi )}m
i=1 with a constant times a power of x is transformed into the problem of fitting the data
points {(wi , zi )}m
i=1 with a linear function of the form c + aw, for unknown constants a and c.
Example We wish to find the exponential function y = beax that best approximates the data
shown in Table 3, in the least-squares sense. By defining




1 x1
z1
 1 x2 
 z2 
c




A =  . . , c =
, z =  . ,
a
 .. .. 
 .. 
1 x5
z5
where c = ln b and zi = ln yi for i = 1, 2, . . . , 5, and solving the normal equations
AT Ac = AT z,
we obtain the coefficients
a = 0.4040,
b = ec = e−0.2652 = 0.7670,
7
i
1
2
3
4
5
xi
2.0774
2.3049
3.0125
4.7092
5.5016
yi
1.4509
2.8462
2.1536
4.7438
7.7260
Table 3: Data points (xi , yi ), for i = 1, 2, . . . , 5, to be fit by an exponential function
and conclude that the exponential function that best fits this data in the least-squares sense is
y = 0.7670e0.4040x .
The data, and this function, are shown in Figure 3. 2
It can be seen from the preceding discussion and examples that the normal equations can be
used to solve any problem that requires finding the vector x ∈ Rn that minimizes
kb − Axk,
where b ∈ Rm , m ≥ n, and A is an m × n matrix with linearly independent columns, regardless of
the interpretation of these columns.
To see this, we define the function
ϕ(x) = kb − Axk2 ,
x ∈ Rn .
Then, it can be shown through differentiation that
∇ϕ(x) = 2(AT Ax − AT b),
Hϕ (x) = AT A.
If x 6= 0, then Ax 6= 0 because A has linearly independent columns. It follows that
x · AT Ax = (Ax) · Ax = kAxk2 > 0,
so Hϕ (x) is positive definite on Rn . This leads to the following theorem.
Theorem Let A be an m × n matrix with linearly independent columns, and let b ∈ Rm . Then
the vector x∗ defined by
x∗ = (AT A)−1 AT b,
that solves the normal equations AT Ax = AT b, is the strict global minimizer of
kb − Axk,
8
x ∈ Rn .
Figure 3: Data points (xi , yi ) (circles) and exponential least-squares fit (solid curve)
The matrix
A+ = (AT A)−1 AT
is called the pseudo-inverse, or generalized inverse, of A. When A is a square, invertible matrix,
then A+ = A−1 . Otherwise, A+ is the matrix that, as closely as possible, serves as an inverse of
A. It should be noted that the condition that A has linearly independent columns is essential, so
that AT A is invertible.
Exercises
1. Chapter 4, Exercise 1
2. Chapter 4, Exercise 4
3. Chapter 4, Exercise 7
4. Chapter 4, Exercise 10
9
Download