Determining a best-fit straight line in Mathematica.

advertisement
Determining a best-fit straight line in
Mathematica.
Mathematica offers several methods for fitting equations to data, and when fitting to a line is required, there
are three convenient options. Please feel free to use this notebook as a template for your own work.
All three options require the data to be presented as a list of lists, 88x1 , y1 <, 8x2 , y2 < ... 8xn , yn <<. You could
enter the data like this, but I find it easier to enter as two separate lists, one for x and another for y and then
use the Transpose function to do the necessary rearrangement. Let's input some data. I've chosen some
values for the molar heat capacity (in J mol-1 K-1 ) of carbon monoxide as a function of the kelvin temperature. (From an example in Chapter 22 of D.A. McQuarrie, Mathematical Methods for Physical Chemistry)
xData = 8600., 650., 700., 750., 800., 850., 900., 950., 1000.<
yData = 830.93, 31.54, 31.32, 32.18, 32.25, 32.27, 33.41, 33.21, 33.97<
8600., 650., 700., 750., 800., 850., 900., 950., 1000.<
830.93, 31.54, 31.32, 32.18, 32.25, 32.27, 33.41, 33.21, 33.97<
Now do the transpose.
fitData = Transpose@8xData, yData<D
600.
650.
700.
750.
800.
850.
900.
950.
1000.
30.93
31.54
31.32
32.18
32.25
32.27
33.41
33.21
33.97
Fit
Fit finds a least squares fit of the data as a linear combination of a specified list of functions. For a
straight line, y = a + bx, the parameters are a and b, and the list of functions is {1,x}. You have to tell Fit
the data, the list of functions, and the variable being fit. You get back an equation for the best fit line. You
do not get estimates of the errors in the fit parameters, and if you want residuals, you have to calculate
them yourself. Plotting your best fit line and the data points is not too bad.
2
FittingStraightLines.nb
usingFit = Fit@fitData, 81, x<, xD
0.00714667 x + 26.6249
Show@ListPlot@fitData, PlotStyle ® RedD, Plot@usingFit, 8x, 500., 1000.<DD
34.0
33.5
33.0
32.5
32.0
31.5
700
800
900
1000
Here's how to calculate the y values predicted by the model and the residuals.
yCalcFit = Table@usingFit •. x ® xData@@iDD, 8i, 1, Length@xDataD<D
830.9129, 31.2702, 31.6276, 31.9849, 32.3422, 32.6996, 33.0569, 33.4142, 33.7716<
residualsFit = yData - yCalcFit
80.0171111, 0.269778, -0.307556, 0.195111, -0.0922222, -0.429556, 0.353111, -0.204222, 0.198444<
TableForm@Transpose@8xData, yData, yCalcFit, residualsFit<D,
TableHeadings ® 8None, 8"x", "y Obs.", "y Calc.", "Obs. - Calc."<<D
x
y Obs.
y Calc.
Obs. - Calc.
600.
650.
700.
750.
800.
850.
900.
950.
1000.
30.93
31.54
31.32
32.18
32.25
32.27
33.41
33.21
33.97
30.9129
31.2702
31.6276
31.9849
32.3422
32.6996
33.0569
33.4142
33.7716
0.0171111
0.269778
-0.307556
0.195111
-0.0922222
-0.429556
0.353111
-0.204222
0.198444
FindFit
FittingStraightLines.nb
3
FindFit
FindFit finds a least squares fit of the data to a specified equation that contains adjustable parameters,
which are themselves specified as a list. For a straight line, y = a + bx, you give the equation as a+b*x
(you don't include the "y="), and the parameters are specified as {a, b}. As with Fit, you have to tell
FindFit the data, and the variable being fit, too. You get back a rule specifying the best fit parameters.
Once again, you do not get estimates of the errors in the fit parameters, and if you want residuals, you have
to calculate them yourself, but this is a bit simpler than it is with Fit. Plotting your best fit line and the data
points is nearly the same, you need to repeat your straight line equation and specify the rule.
usingFindFit = FindFit@fitData, a + b * x, 8a, b<, xD
8a ® 26.6249, b ® 0.00714667<
Show@ListPlot@fitData, PlotStyle ® RedD,
Plot@a + b * x •. usingFindFit, 8x, 500., 1000.<DD
34.0
33.5
33.0
32.5
32.0
31.5
700
800
900
1000
Here's how to calculate the y values predicted by the model and the residuals using FindFit.
yCalcFindFit = Ha + b * xDataL •. usingFindFit
830.9129, 31.2702, 31.6276, 31.9849, 32.3422, 32.6996, 33.0569, 33.4142, 33.7716<
residualsFindFit = yData - yCalcFindFit
80.0171111, 0.269778, -0.307556, 0.195111, -0.0922222, -0.429556, 0.353111, -0.204222, 0.198444<
4
FittingStraightLines.nb
TableForm@Transpose@8xData, yData, yCalcFindFit, residualsFindFit<D,
TableHeadings ® 8None, 8"x", "y Obs.", "y Calc.", "Obs. - Calc."<<D
x
y Obs.
y Calc.
Obs. - Calc.
600.
650.
700.
750.
800.
850.
900.
950.
1000.
30.93
31.54
31.32
32.18
32.25
32.27
33.41
33.21
33.97
30.9129
31.2702
31.6276
31.9849
32.3422
32.6996
33.0569
33.4142
33.7716
0.0171111
0.269778
-0.307556
0.195111
-0.0922222
-0.429556
0.353111
-0.204222
0.198444
Linear Model Fit
This is perhaps the most powerful of the three methods, and it can provide the most information. Of course,
it is also the most complicated. LinearModelFit was a new feature starting in Mathematica 7, but the
same functionality was available in Mathematica 6 using the "Linear Regression" package. You just tell
LinearModelFit that you want to fit your data to the function x, and that x is the variable. You get back a
"FittedModel." You can get a "normal" form of the equation using Normal. At this point, you could just
procede as above using Fit, since the normal form is just what Fit gave you. There are, however, more
options available to you. For starters, the result from LinearModelFit is a function of the variable you
specified. This makes plotting a snap.
usingLinearModelFit = LinearModelFit@fitData, x, xD
FittedModelB 0.00714667 x + 26.6249 F
normalResult = Normal@usingLinearModelFitD
0.00714667 x + 26.6249
FittingStraightLines.nb
5
Show@ListPlot@fitData, PlotStyle ® RedD,
Plot@usingLinearModelFit@xD, 8x, 500., 1000.<DD
34.0
33.5
33.0
32.5
32.0
31.5
700
800
900
1000
You can also get the calculated values and the residuals directly.
yCalcLinearModelFit = usingLinearModelFit@"PredictedResponse"D
830.9129, 31.2702, 31.6276, 31.9849, 32.3422, 32.6996, 33.0569, 33.4142, 33.7716<
residualsLinearModelFit = usingLinearModelFit@"FitResiduals"D
80.0171111, 0.269778, -0.307556, 0.195111, -0.0922222, -0.429556, 0.353111, -0.204222, 0.198444<
TableForm@
Transpose@8xData, yData, yCalcLinearModelFit, residualsLinearModelFit<D,
TableHeadings ® 8None, 8"x", "y Obs.", "y Calc.", "Obs. - Calc."<<D
x
y Obs.
y Calc.
Obs. - Calc.
600.
650.
700.
750.
800.
850.
900.
950.
1000.
30.93
31.54
31.32
32.18
32.25
32.27
33.41
33.21
33.97
30.9129
31.2702
31.6276
31.9849
32.3422
32.6996
33.0569
33.4142
33.7716
0.0171111
0.269778
-0.307556
0.195111
-0.0922222
-0.429556
0.353111
-0.204222
0.198444
The biggest advantage of LinearModelFit is that you can get estimates of the errors in the fitted parameters and things like the covariance matrix and the correlation matrix.
6
FittingStraightLines.nb
usingLinearModelFit@"ParameterTable"D
Estimate
1
x
Standard Error t-Statistic P-Value
26.6249
0.614874
0.00714667 0.000758777
43.3014
9.41867
9.14598 ´ 10-10
0.0000317067
usingLinearModelFit@"ParameterErrors"D
80.614874, 0.000758777<
usingLinearModelFit@"CovarianceMatrix"D
0.37807
-0.000460593
-0.000460593 5.75742 ´ 10-7
usingLinearModelFit@"CorrelationMatrix"D
1.
-0.987228
-0.987228
1.
Summary
Mathematica can easily be used to fit data to a straight line. Of the three methods presented, FindFit is
probably the quickest, and the one I turn to first if I'm just playing around with data. For serious work that is
to be reported, LinearModelFit is the only one of the three that provides me with the information necessary for a complete specification of the results, including error estimates, and what I would use for that
purpose.
Download