Determining a best-fit straight line in Mathematica. Mathematica offers several methods for fitting equations to data, and when fitting to a line is required, there are three convenient options. Please feel free to use this notebook as a template for your own work. All three options require the data to be presented as a list of lists, 88x1 , y1 <, 8x2 , y2 < ... 8xn , yn <<. You could enter the data like this, but I find it easier to enter as two separate lists, one for x and another for y and then use the Transpose function to do the necessary rearrangement. Let's input some data. I've chosen some values for the molar heat capacity (in J mol-1 K-1 ) of carbon monoxide as a function of the kelvin temperature. (From an example in Chapter 22 of D.A. McQuarrie, Mathematical Methods for Physical Chemistry) xData = 8600., 650., 700., 750., 800., 850., 900., 950., 1000.< yData = 830.93, 31.54, 31.32, 32.18, 32.25, 32.27, 33.41, 33.21, 33.97< 8600., 650., 700., 750., 800., 850., 900., 950., 1000.< 830.93, 31.54, 31.32, 32.18, 32.25, 32.27, 33.41, 33.21, 33.97< Now do the transpose. fitData = Transpose@8xData, yData<D 600. 650. 700. 750. 800. 850. 900. 950. 1000. 30.93 31.54 31.32 32.18 32.25 32.27 33.41 33.21 33.97 Fit Fit finds a least squares fit of the data as a linear combination of a specified list of functions. For a straight line, y = a + bx, the parameters are a and b, and the list of functions is {1,x}. You have to tell Fit the data, the list of functions, and the variable being fit. You get back an equation for the best fit line. You do not get estimates of the errors in the fit parameters, and if you want residuals, you have to calculate them yourself. Plotting your best fit line and the data points is not too bad. 2 FittingStraightLines.nb usingFit = Fit@fitData, 81, x<, xD 0.00714667 x + 26.6249 Show@ListPlot@fitData, PlotStyle ® RedD, Plot@usingFit, 8x, 500., 1000.<DD 34.0 33.5 33.0 32.5 32.0 31.5 700 800 900 1000 Here's how to calculate the y values predicted by the model and the residuals. yCalcFit = Table@usingFit •. x ® xData@@iDD, 8i, 1, Length@xDataD<D 830.9129, 31.2702, 31.6276, 31.9849, 32.3422, 32.6996, 33.0569, 33.4142, 33.7716< residualsFit = yData - yCalcFit 80.0171111, 0.269778, -0.307556, 0.195111, -0.0922222, -0.429556, 0.353111, -0.204222, 0.198444< TableForm@Transpose@8xData, yData, yCalcFit, residualsFit<D, TableHeadings ® 8None, 8"x", "y Obs.", "y Calc.", "Obs. - Calc."<<D x y Obs. y Calc. Obs. - Calc. 600. 650. 700. 750. 800. 850. 900. 950. 1000. 30.93 31.54 31.32 32.18 32.25 32.27 33.41 33.21 33.97 30.9129 31.2702 31.6276 31.9849 32.3422 32.6996 33.0569 33.4142 33.7716 0.0171111 0.269778 -0.307556 0.195111 -0.0922222 -0.429556 0.353111 -0.204222 0.198444 FindFit FittingStraightLines.nb 3 FindFit FindFit finds a least squares fit of the data to a specified equation that contains adjustable parameters, which are themselves specified as a list. For a straight line, y = a + bx, you give the equation as a+b*x (you don't include the "y="), and the parameters are specified as {a, b}. As with Fit, you have to tell FindFit the data, and the variable being fit, too. You get back a rule specifying the best fit parameters. Once again, you do not get estimates of the errors in the fit parameters, and if you want residuals, you have to calculate them yourself, but this is a bit simpler than it is with Fit. Plotting your best fit line and the data points is nearly the same, you need to repeat your straight line equation and specify the rule. usingFindFit = FindFit@fitData, a + b * x, 8a, b<, xD 8a ® 26.6249, b ® 0.00714667< Show@ListPlot@fitData, PlotStyle ® RedD, Plot@a + b * x •. usingFindFit, 8x, 500., 1000.<DD 34.0 33.5 33.0 32.5 32.0 31.5 700 800 900 1000 Here's how to calculate the y values predicted by the model and the residuals using FindFit. yCalcFindFit = Ha + b * xDataL •. usingFindFit 830.9129, 31.2702, 31.6276, 31.9849, 32.3422, 32.6996, 33.0569, 33.4142, 33.7716< residualsFindFit = yData - yCalcFindFit 80.0171111, 0.269778, -0.307556, 0.195111, -0.0922222, -0.429556, 0.353111, -0.204222, 0.198444< 4 FittingStraightLines.nb TableForm@Transpose@8xData, yData, yCalcFindFit, residualsFindFit<D, TableHeadings ® 8None, 8"x", "y Obs.", "y Calc.", "Obs. - Calc."<<D x y Obs. y Calc. Obs. - Calc. 600. 650. 700. 750. 800. 850. 900. 950. 1000. 30.93 31.54 31.32 32.18 32.25 32.27 33.41 33.21 33.97 30.9129 31.2702 31.6276 31.9849 32.3422 32.6996 33.0569 33.4142 33.7716 0.0171111 0.269778 -0.307556 0.195111 -0.0922222 -0.429556 0.353111 -0.204222 0.198444 Linear Model Fit This is perhaps the most powerful of the three methods, and it can provide the most information. Of course, it is also the most complicated. LinearModelFit was a new feature starting in Mathematica 7, but the same functionality was available in Mathematica 6 using the "Linear Regression" package. You just tell LinearModelFit that you want to fit your data to the function x, and that x is the variable. You get back a "FittedModel." You can get a "normal" form of the equation using Normal. At this point, you could just procede as above using Fit, since the normal form is just what Fit gave you. There are, however, more options available to you. For starters, the result from LinearModelFit is a function of the variable you specified. This makes plotting a snap. usingLinearModelFit = LinearModelFit@fitData, x, xD FittedModelB 0.00714667 x + 26.6249 F normalResult = Normal@usingLinearModelFitD 0.00714667 x + 26.6249 FittingStraightLines.nb 5 Show@ListPlot@fitData, PlotStyle ® RedD, Plot@usingLinearModelFit@xD, 8x, 500., 1000.<DD 34.0 33.5 33.0 32.5 32.0 31.5 700 800 900 1000 You can also get the calculated values and the residuals directly. yCalcLinearModelFit = usingLinearModelFit@"PredictedResponse"D 830.9129, 31.2702, 31.6276, 31.9849, 32.3422, 32.6996, 33.0569, 33.4142, 33.7716< residualsLinearModelFit = usingLinearModelFit@"FitResiduals"D 80.0171111, 0.269778, -0.307556, 0.195111, -0.0922222, -0.429556, 0.353111, -0.204222, 0.198444< TableForm@ Transpose@8xData, yData, yCalcLinearModelFit, residualsLinearModelFit<D, TableHeadings ® 8None, 8"x", "y Obs.", "y Calc.", "Obs. - Calc."<<D x y Obs. y Calc. Obs. - Calc. 600. 650. 700. 750. 800. 850. 900. 950. 1000. 30.93 31.54 31.32 32.18 32.25 32.27 33.41 33.21 33.97 30.9129 31.2702 31.6276 31.9849 32.3422 32.6996 33.0569 33.4142 33.7716 0.0171111 0.269778 -0.307556 0.195111 -0.0922222 -0.429556 0.353111 -0.204222 0.198444 The biggest advantage of LinearModelFit is that you can get estimates of the errors in the fitted parameters and things like the covariance matrix and the correlation matrix. 6 FittingStraightLines.nb usingLinearModelFit@"ParameterTable"D Estimate 1 x Standard Error t-Statistic P-Value 26.6249 0.614874 0.00714667 0.000758777 43.3014 9.41867 9.14598 ´ 10-10 0.0000317067 usingLinearModelFit@"ParameterErrors"D 80.614874, 0.000758777< usingLinearModelFit@"CovarianceMatrix"D 0.37807 -0.000460593 -0.000460593 5.75742 ´ 10-7 usingLinearModelFit@"CorrelationMatrix"D 1. -0.987228 -0.987228 1. Summary Mathematica can easily be used to fit data to a straight line. Of the three methods presented, FindFit is probably the quickest, and the one I turn to first if I'm just playing around with data. For serious work that is to be reported, LinearModelFit is the only one of the three that provides me with the information necessary for a complete specification of the results, including error estimates, and what I would use for that purpose.