Curve Fitting Using Excel
CD 2 – Curve Fitting using Linear Regression
Purpose: To introduce curve fitting to experimental data.
Graph given data and decide if a linear fit is appropriate.
Find the "Best Line" through given data using the least squares method.
Transform data to obtain a straight line.
Provide fits to experimental data in the measurements lab.
Assignment: See page 16 of handout for the homework assignment.
Introduction to Linear Equations and Linearization
Engineering often results in reducing physical behaviors to predictable relationships. This can be
done using a theoretical derivation, which results in an equation, or by analyzing experimental
data. In the latter case, it is more convenient if a line rather than a series of points can be used to
demonstrate the behavior. This requires that an equation be determined that best represents the
data so a curve can be drawn. For empirical relations, this results in an equation that can be used
for engineering analyses. Other times, the equation determined from experimental data can be
compared to that as given by theory to see how they compare and determine the validity of the
theoretical approach (or to spotlight erroneous data). This is known as curve fitting.
The simplest type of curve fit is that of linear data, that is points that follow a linear relation such
as y=mx+b. A relation such as y=x2 or y=1/x would be known as nonlinear. However, most nonlinear relations can be reduced or plotted in a linear fashion so that a linear curve fit can be
In an experimental process, there are errors inherent in any measurement technique that results in
relations that are close to linear, but don't fall exactly on a line. If graphing shows that the points
are close to linear then we can assume that the errors are to blame and then proceed to make a
linear "best fit."
The easiest way to determine a best-fit line is to plot the data and graphically plot a straight line
using a straight edge. This approach can lead to considerable variation of where the line is
drawn, however, especially if the data is not exactly linear. Do you connect the end points?
Should the line pass through the middle of the data? How do you account for outliers? A more
consistent and mathematically grounded approach is required for engineering purposes.
Method of Least Squares
There are several mathematical ways of determining the best fit line that lend themselves to easy
use with a spreadsheet program. The one we will use is called the Method of Least Squares. It
begins with a definition of how close a line is to being a perfect fit. This definition says that we
should figure the vertical distance from each point to the line we choose. We should then square
all these distances and add them together. The smaller this sum of squared errors is, the better
our line. We then need a mathematical way of finding the line for which the sum of squared
errors is less than for every other line. That line is then our "best fit" line.
To put this in mathematical symbols we begin by representing our N data points by their
(x1, y1), (x2, y2), ... , (xN, yN)
and we describe our line by the formula for any line,
y = mx+ b
where m is the slope and b is the y-intercept. We can then graph the points and the line together
as shown below.
The vertical arrows show the errors for two points. The error for any point i at (xi, yi) would be
yi-(mxi+ b) and the sum of these squared errors would be
What we want to do is find m and b for the line so that the error is the smallest value possible.
First we determine the following four summations,
We will call these sums sum of x, sum of y, sum of x squared, and sum of xy. We can then
apply the least squares theorem to find m and b.
Method of Least Squares Theorem
The line y = mx + b, through the n points (x1, y1), ... (xn ,yn), which minimizes the total square
error has the following slope m and intercept b given by the equations below.
Notice that you must compute the slope m first since the equation for the intercept b uses this
If we have our data points in columns in Excel it is easy to compute the four summations, and
easy to find m and b for a curve fit line.
Checking the Theorem
We can check the theorem by using the points (0,-1), (1, 1), and (2, 3). If you substitute them
into the equation y =2x -1 you will see that they all fit, so they are all exactly on the line with
slope 2 and intercept -1. If we apply the least squares theorem we should come up with m = 2
and b = -1.
Then we apply the least squares theorem to find m and b:
So we see that we have m = 2 and b = -1 as we expected.
Ohm's Law
Recall our example of Ohm's law from
CD 1. Here we compared our experimental
data with Ohm's law theory. To obtain a
better comparison, we will now fit a line to
the same data.
Since we know that Ohm's law is linear
(V=IR, voltage V is linear with current I
with R a constant resistance), we should be
able to apply the method of least squares
directly to the data without any alteration.
Here are the steps that we need to complete –
1. Determine which variable is our x and which is our y.
2. Calculate the number of points, n.
3. Tabulate the values of x2 and xy.
4. Calculate the sum of x, y, x2 and xy.
5. Calculate the slope, m.
6. Calculate the y-intercept, b.
7. Use the equation y=mx+b to plot a line.
We will see that this is relatively simple. Our result should look something similar to our original
plot of experimental data and Ohm's law, below.
Ohm's Law Data
The data you need is located in the spreadsheet file on-line.
Download it to begin graphing the data.
You will need the first worksheet title Ohm's Law. The
spreadsheet should look like table shown to the right.
Optionally, you can re-enter this data by hand, such as you
might if you had recorded the data during an experiment.
Notice that we have reserved columns C and D for later use.
These will be our columns for X and XY.
First, enter the titles of the columns in cells C6 and D6. We
will call them 'x2' and 'xy'.
Next, enter the formula in cell C7 to calculate the square of x. Click on cell C7 and enter A7^2
(or alternatively, =A7*A7). After hitting enter, the value 48.5809 should appear. If it did not,
make sure you have entered the formula correctly.
Now use the Fill Down command under the Edit menu.
To do this, highlight the cells C7 through C15 and select
the Fill » Down command in the Edit submenu. This
will translate the formula down through cells C15. Thus,
in cell C14, you should have the formula =A14^2. Click
on this cell to check that the procedure was executed
Once you have done this, repeat the procedure for cells
D7 through D15. In this case, your formula will be
=A7*B7. Your values should appear similar to the ones
to the right.
Formatting the Cells
Depending upon individual settings in Excel, the significant figures in the cells may be either too
many or too few. This is certainly the case in the columns C and D on the previous page. To
change the formatting, highlight the cells you wish to format and select the Format Cells
command under the Format » Cells... menu (this may be in a different location in different
versions of Excel-just keep looking for it).The following window should appear.
Select Number from the file tabs and Number in the
category menu. This will allow you to select the
number of decimal places in the cells.
For our spreadsheet, we selected 3 decimal places for
cells C7 through C15 and 4 decimal places for cells D7
through D15. This allows us to see the numerical
values without too much clutter.
Note that formatting the cell doesn't actually change the
value of the cell or how it appears on a plot. It only
changes how Excel displays the value to the user.
Finding the Sums
Finding sum of x and sum of y is easy enough to compute but we also need to compute sum x
squared and sum xy as well as finding m, b and N. This can be done below each column, but
we will add them to the right so we can keep them well labeled (you can place them wherever
you like, but placing them in the same location well help you keep track of the correct values). In
each case, we will need to sum the columns of our four quantities using the sum () command.
Thus, the formula entries will be =sum (A7:A15), =sum (B7:B15), =sum (C7:C15), and =
sum(D7:D15), respectively, as shown below.
As mentioned, we also have to count the number of points, N. This is easy enough to do by hand
with the few number of points we have here, but we want to automate it so it will work for a
large number points. Here we use the count () command. Since all of the columns have the same
number of values, we could use this on any column. For column A, the command would simply
be =count (A7:A15). It should return a value of 9. (Note that sometimes we will discard end
points in linear regression analyses. In that case, we do not include those points in our count
since we did not include them in our sums.)
Now we need to compute the slope m and y-intercept b. If the sums and equations are in the
cells shown, then the equations will be = (G5*G10-G7*G8)/ (G5*G9-G7^2) in cell G13 for the
slope and = (G8-G13*G7)/G5 in cell G15 for the intercept. Make sure these formulae are
correct. Any error will not return the correct values for m and b.
Finishing Up
Our spreadsheet should now look like the one below. If your values do not match with the ones
shown, go back and check your formula entries very carefully.
In order to graph our line and points together it is convenient to add a column after B. We add
this column to determine the points that will lie on our best fit line. The experimental points are
labeled y so we'll label these curve fit points y'. To add the column for y' simply select any cell
in the column C and then pull down the Insert menu and select Columns. You can add this
anywhere, but it is helpful to do it next to y as it will make graphing easier later on. (Note that
Excel automatically changed the formulae in column G when they were moved to column H so
you don’t have to.)
Now we will fill it in with points that follow the line y' = mx + b with our new values of m and
b from the least squares formulae. In the new column which is now C enter =$H$13*A7+$H$15
in cell C7 and then Fill » Down through C15. Pay special attention to the extra "$" symbols in
the cell addresses. These dollar signs keep these cell addresses fixed (absolute rather than
relative values) so that when you copy the cells the values of m and b don't get moved to new
cell locations. Check the various cells in column C to observe this behavior.
Your spreadsheet should now look like this
Plotting the Results
Now we need to plot both the data and the curve fit. As a rule of presentation, experimental data
are always plotted using points without lines while curve fits and theoretical curves are plotted
using lines without points. This allows one to quickly determine what type of data is presented
on a graph.
To graph both the measured points and the best fit line, highlight the three columns (x, y, and y'),
and Insert » Chart and make the graph as you have before. Make sure you select an X-Y
(Scatter) plot for your chart.
Once it is finished, right click on one of the data points from the best-fit line (series two, which
is y') and select Format » Data Series (or double click on one of the points). A dialog box will
then appear.
Select a custom line style with a single straight line and no marker for the curve fit. Repeat this
procedure for the experimental data, except with the reverse options. (Alternatively, you can
select these options while making the graph the first time.)
The Curve Fit Plot
Your final plot should now look something like the following.
You now have your original data points displayed as points and your best fit curve displayed as
what it is: a line.
Compare it with our original volt plot where Ohm's law is plotted directly and notice how the
curve slightly varies. The differences are due to the errors introduced in the measurement
Automobile Drag
The linear regression analysis only works for
linear relationships. As previously mentioned,
however, it is often possible to reduce a nonlinear relation to a linear one.
For example, the aerodynamic drag on a car is a
function of the square of the velocity.
D = f (V2)
The force required to overcome aerodynamic drag thus quadruples if the speed of the car
doubles. This can be seen in the relationship shown in the spreadsheet called Automobile Drag.
Instead of plotting drag as a function of velocity, we can plot it as a function of velocity squared.
While drag varies non-linearly with velocity, it varies linearly with velocity squared.
The velocity here is in m/s (meters per second) while the drag force is in N (newtons). For quick
reference, 1 m/s ~ 2 mph while 4.5 N ~ 1 pound (force). (These are approximate.)
Changing the Relation
To change the non-linear relation into a linear one, we have to add a new x column which we
will call x'. Technically, we are plotting drag as a function of the square of velocity. So insert a
new column between columns A and B called velocity squared. We enter the formula =A6^2
and fill down to get the following:
Note that when we plot drag versus velocity squared, we have a nice linear relation, where our
curve fit can be determined by
y' = mx'+ b =m(x2) +b
Now we have to complete the linear regression analysis using our new variable.
Changing the Relation
We have two options in preparing our sums. The first is to re-enter the formulae by hand. This is
time consuming, but accurate. The second is to copy and paste our formulae cells from the
Ohm's Law spreadsheet into the Automobile Drag spreadsheet. You have to make sure that
you increase the cell ranges and change the necessary cell values, however. Thus, when copying
and pasting make sure you increase all cell ranges to the last data row and change the sum
relations to use the velocity squared column B instead of the velocity column A.
The sheet above shows that we have increased some of our columns, but not all of them yet. We
may also need to reformat our columns if our numbers are too large to display, In that case, we
would end up with ######### symbols as shown in the right most columns.
Begin changing the columns one by one (using the Fill » Down command, of course), or reentering the formulae by hand if you choose to do so.
The Final Values
When completed, your spreadsheet should look like the following. Since the relationship looks
very linear, your y' values should match very closely with your y values. If they do not, you have
done something wrong.
Note that the y-intercept b is zero. This agrees very well with the data previously presented.
Once our values are completed, we can plot the data and return it to the original relationship.
(As an aside, the power requirements for a car can be determined by calculating the aerodynamic
drag (neglecting rolling resistance) knowing that
required power = force x velocity = drag x velocity
Thus, power is a function of velocity cubed. Determining the drag will then allow you to
calculate the power required of a motor to move at a given speed.
In SI units, this power is given in Watts (1 W = 1 N·m/s). This can be converted to a horsepower
rating used on US cars using the conversion 1 hp = 745 W.
Changing the Relation
The final plots should look like the one shown here.
Note that the curve fit and data fall along the same
line. If we were to plot the drag data and curve fit
versus velocity instead of velocity squared, we
would have something similar to the plot below.
Now we have an exact match to our original nonlinear curve!
The final curve fit with the data is shown here.
And that is all there is to linear regression and curve fitting!
As a last note, some versions of Excel do come with a linear regression package. To use this
properly, however, you have to know what you are doing, which is why you have gone through
these exercises. When performing curve fits in the future, you can resort to using a pre-packaged
routine or doing it yourself now that you know how easy it is.
Complete the following problems:
1. Complete your plots and spreadsheets for the Ohm's Law and Automobile Drag
tutorials. (For the latter, you only need to turn in the linear drag versus velocity squared
2. In an industrial process, the depth of the fluid in a tank of cooling liquid is measured over
time. The following depth measurements are recorded:
Time [hours]
Depth [feet]
Plot this data and perform a best fit using the method of least squares. Using this fit,
predict how long it will take before the cooling fluid supply is completely depleted.
3. Your spreadsheet should also contain a worksheet with the winning times for the
Olympic 200 m dash. Plot the data and determine the best fit using the method of least
squares. Predict the winning time for 2004 using your best fit. How close does this come
to the actual winning time? Note: In some years (1916, 1940 and 1944), the Olympics
were skipped due to world political problems (i.e., war). Thus the years are not equally
spaced so that you must use an XY-scatter graph instead of a Line Graph. Since the years
are listed, however, you must delete these rows for your fit to be valid.
For each problem, turn in the required plots and the spreadsheets used to create them. The
assignment is due at the start of class one week from the day it is assigned.