Polynomial Curve of Best Fit

advertisement
Polynomial Curve of Best Fit
Assume that we have collected a set of n data points S = {(x1, y1), (x2, y2), …, (xn, yn)} from an
experiment or from measuring some physical situation. We can plot the data (often called a scatter
plot) but usually the points will not fall on a single straight line. Previously we used linear interpolation
to connect all the points. We saw that the result can be “spiky” and the pieces can have different
slopes. An alternative procedure is to try to determine one line which comes closest to all the points,
but need not go through (interpolate) any of the points. The line in this case is called the line of best
fit to the data. (Note that this idea is very different from interpolation which requires we go through all
of the points.)
Here we discuss a way to determine the line of best fit graphically and algebraically within MATLAB.
We describe how the equation of the line of best fit is computed, but omit the details. In addition we
show how to determine a quadratic of best fit when the data seems to have a parabolic shape.
We used the following example previously when we discussed linear interpolation. First we use
MATAB to approximate and determine the line of best fit and then discuss and compare quadratic fits.
(There are cubic fits, quartic fits, quantic fits, and in general polynomial fits of degree n.)
Example: The following is a set temperature measurements taken from the cylinder head of a new
engine that is being tested for possible use in a race car. (Source: D. Etter, Engineering Problem
Solving in MATLAB)
Time, t
0 1 2 3 4 5
Temperature, ºF 0 20 60 68 77 110
We use the following code to generate
the scatter plot as shown in the next
figure.
>> t=0:5;
>> temp=[0 20 60 68 77 110];
>> plot(t,temp,'*k')
Imagine using a ruler to simulate a line
that comes closest to all the points but
need not interpolate any point.
You could probably make a good
guess but you may not determine the
line of best fit.
We need agree on what “best fit” means. The usual meaning of “best fit” requires that the line
of best fit y = mx + b be constructed so that the square root of the sum of the squares of
vertical distances to the data points to the line is minimized. Yes, this is a max-min computation.
As such it requires the use of (partial) derivatives and the solution of a set of equations. We will use
MATLAB to perform these computations.
For now let’s see how we can approximate the line of best using an interactive MATLAB routine
called lsqgame. (The line of best fit is often referred to as least squares line.)
In this interactive 'game' you get two guesses for the least squares line by using the mouse to select
two points that are then connected to generate your approximate least squares line. Here is an
example of the graphs generated in lsqgame for the time-temperature data.
The routine permits a second try as shown next.
The line of best fit can be
computed within lsqgame
as shown in the next
figure. Note that both blue
and magenta lines were rather good approximations.
MATLAB provides an easy method for computing the equation and graphing the line of best fit once
you have plotted the scatterplot. Here is our scatter plot for the example and the code that generated
it. Read the directions on the graph.
>> t=0:5;
>> temp=[0 20 60 68 77 110];
>> plot(t,temp,'*k') %Making the scatter plot
See the next graph.
If you want to see an equation of the line
of best fit click on the arrow at the bottom
of the Basic Fitting drop down menu. See
below.
If the scatter plot seems to indicate that a curve would produce a
better model to the data we have a variety of options. The one we
discuss here is the quadratic of best fit or a quadratic least
squares fit. There is another “game” we can use to get a feel for
creating the quadratic; of course the MATLAB file is quadgame and
is played the same way lsqgame is played. Again MATLAB has an
easy way to compute the corresponding equation of the parabola of best fit and graph it.
Example: Use the following code to
create a scatter plot of the data in
the following table. The figure
shows both a quadratic and a linear
least squares fit to the data. We
used the basic fitting drop down
menu obtained by clicking on tools.
In addition, the equations for each
least squares fit are shown.
(Source: Hanselmandand Littlefield,
Mastering MATLAB, 2012.)
>> x=0:.1:1;
>> y=[-0.447 1.978 3.28 6.16 7.08 7.34 7.66 9.56 9.48 9.30 11.2];
>> plot(x,y,'*k')
>> title('Quadratic and Linear Least Squares')
Inspect the Basic Fitting menu; particularly, the boxes that are
checked. This indicates how the graph was obtained.
As we indicated previously, “best fit” requires that the
polynomial (linear, quadratic, cubic, etc.) of best fit is
constructed so that the square root of the sum of the squares of
vertical distances from the curve of best fit to the data points is
minimized. The quantity “square root of the sum of the
squares of vertical distances” is called the norm of the
residual. If you look at the menu from Basic Fitting clicking the
arrow at the bottom will give the equation of the curve of best fit
and display the residual. For our example we get that the norm
of the residual of the quadratic fit is smaller than the norm of
linear fit.
If we think of the norm of the residual as an “error measure”, then the quadratic is a “better” fit to the
data.
Will this always be the case? Explain. (Hint: think of the minimization process for the quadratic fit
as search over all polynomials of degree 2 or less.)
Could you find a best fit of a polynomial of degree n so that the norm of the residual is zero?
Explain.
Warning: Least squares line fit or quadratics are not guaranteed to produce good
approximations for all data sets.
Download