USING THE EXCEL CHART WIZARD TO CREATE CURVE FITS

advertisement
USING THE EXCEL CHART WIZARD TO CREATE CURVE FITS (DATA ANALYSIS).
Note to physics students: Even if this tutorial is not given as an assignment, you are responsible for
knowing the material contained here on how to use EXCEL. The examples contained here
demonstrate how this feature works in EXCEL, and they demonstrate the general scientific technique of
using curve fits to extract information from experimental data.
Several types of curve fits are demonstrated in the examples. The simplest and most common type, a
straight line fit, is not included as a specific example, but its features are discussed. The author has done
this to emphasize that the technique is more general, especially for students who have done only straight
line fits before.
You should read and complete the "EXCEL SPREADSHEET TUTORIAL" before attempting this
exercise.
TABLE OF CONTENTS:
I. INTRODUCTION: - Motivation and Our First Example.
II. OUR FIRST FIT: - Position vs. Time; A 2nd Order Polynomial Fit.
A. Getting Started
B. Adding the Best Fit Curve
II. OUR SECOND FIT: - A Simple Power Law Fit.
A. Basic Form
B. Kepler's Third Law Example
C. A Word of Caution
III. THIRD EXAMPLE: - An Exponential Function.
A. Basic Form
B. Radioactive Decay Example
C. A Word of Caution
IV. OTHER TYPES OF FITS.
V. SUMMARY OF DATA RESTRICTIONS.
VI. BAD FITS.
A. The Wrong Function
B. Bad Data
VII. CONCLUDING REMARKS.
I. INTRODUCTION: - Motivation and Our First Example.
In the "EXCEL SPREADSHEET TUTORIAL" we created a plot from numbers that were
calculated from a set of formulas. We plotted the position vs. time and velocity vs. time of a falling
body using the kinematics equations for constant acceleration (position = y, velocity = v):
1
y = y 0 + v 0y t + a y t 2 ;
v y = v 0y + a y t .
2
The position function y(t) is "quadratic" in terms of the independent variable time. In other
words the position equation is a simple polynomial with a constant term y0, a term proportional to t (v0y
t), and a term proportional to t2 (1/2ay t2). A quadratic function is also referred to as a "2nd degree
polynomial" or "polynomial of order 2". The "degree" or "order" of a polynomial is the highest power
that the independent variable is raised to (for example, a 3rd order polynomial would also include a t3
term). The velocity v(t) is "linear" in the variable t, or a "1st order polynomial".
These equations predict what the position and velocity of a falling body would be, based on the
physics involved and the assumption that the acceleration of the falling object is constant. This gives us
a prediction that the position is a quadratic function of time and that the velocity is a linear function of
time. The values that we assumed we knew (ay = -9.8 m/s2, v0y =0, and y0 = 0) are referred to as
parameters. We assumed we knew the parameters, plugged them into the formulas, calculated v and y
for various values of t, and plotted the result. This result is a mathematical model, or theoretical result
based on the assumption of constant acceleration
Instead, in the laboratory you would measure the physical quantities involved and test to see
how well they fit these models. If measurements fit the models well enough, you should be able to
determine experimentally what the parameters are from your measurements. In our example here, we
would drop some object and measure its position and time at various intervals as it fell. The
measurements would not necessarily exactly match our predicted quadratic equation for y vs. t, but they
should be very close to it as long as our assumption of constant acceleration is valid, and if we take
careful measurements.
Even if the measurements deviate slightly from the model, it does not mean our assumption is
wrong. There is always a certain amount of uncertainty (sometimes referred to as "error") in taking a
measurement that comes from the limited precision of any measurement device. This could account for
some of the deviation. Also, we may have done the measurements incorrectly and this could cause
some deviation. We don't intend to discuss measurement uncertainty ("error") in detail in this tutorial,
but you should remember that when the deviation comes from you taking the measurements incorrectly
it is unacceptable and should be corrected. This type of mistake is NOT what we mean by "error" when
you discuss sources of "error" in your experiment.
Let's examine some hypothetical position vs. time measurements. We want to test and see if
they closely fit a function of the form:
y = A + Bt + Ct 2 .
If so, then we are confident that we are measuring something that has constant acceleration. We can
relax the assumption that we know what the acceleration is, and that we know what the initial position
and velocity are. Instead, we can calculate numerical values for A, B, and C directly from our
measurements that make a quadratic function y(t) that passes as close as possible to all of the data
points. There is a rigorous mathematical way to do this, and the curve that we get is called a "best fit
curve". You may have done this before in a statistics course or another science course fitting data to a
straight line, but the method is more general and other types of curves can be fit also. We will not
concern ourselves with the math involved, since EXCEL will do it for us.
In the second half of this tutorial we will show you how to use EXCEL to get the best fit curve,
but first let us see how it looks in general. Suppose the data table in Figure 1 below contains our
measurements of position vs. time of a falling object. Next to the table is shown what the graph of these
measurements with the best fit curve from EXCEL looks like. The data points are the small square
boxes on the graph. The equation displayed on the graph is a quadratic function that EXCEL calculated
as a best fit to the data (x is Time, y is Position), and the curve you see on the graph is a plot of this
function. Note that I have followed the proper conventions for titling and labeling the graph, and for
labeling the data table; as explained in the SPREADSHEET TUTORIAL.
Positon vs. Time Data W ith Best Fit Curve
0.00
2.00
4.00
Time (s)
6.00
8.00
10.00
0
-50
Position (m)
-100
Time(s) Position(m)
0
0
-150
1
-4.5
-200
2
-20
3
-43.5
-250
4
-81
-300
5
-120
-350
6
-180
2
y = -4.9441x + 0.086x - 0.0245
7
-240
-400
2
R = 0.9987
8
-325
-450
9
-385
-500
10
-500
Figure 1 - Graph of Position vs. Time Data of Falling Body With Best Fit Curve From Excel.
There are several things to be aware of about this result:
1. The data points do not fall exactly on the curve, but are very close to it. I will not prove it,
but the ones that seem to fall right on the curve do not match it exactly (if they do it is
coincidence). Even though our first data point is y = 0 at t = 0, the best fit curve does NOT
equal 0 at t = 0. The number called "R2" on the graph describes how close the curve fits the
data. It will always be a number between 0 and 1. In our case, R2 = 0.9987. The closer R2 is to
1, the closer the data is to the best fit curve in general. In fact the curve we got is the "best fit"
because it is a unique function of the type we chose (2nd order polynomial) that makes R2 as
close to being equal to 1 as possible. No other possible values of the parameters A, B, and C
can do better, based on the value of R2. This number is often referred to as the "coefficient of
regression". As before, we will not discuss the mathematics used to calculate it here.
2. It is important to know that the result you get is an EXPERIMENTAL result, NOT
theoretical. We chose to fit the data to a 2nd degree polynomial based on theory, but the results
you get for the parameters A, B, and C are calculated directly from the data. Think of this as a
fancy way to get "average" values for A, B, and C from the measurements. Never refer to the
results of a best fit as "theory". We only used the model as a starting point to know what
general type of curve to fit our data to. Also, even though our data fits a 2nd degree polynomial
very well, we would not say that it "proves" that the acceleration is constant. We would only
say that our experimental result "verifies" our assumption of constant acceleration to a certain
level of precision.
Any result you get from analyzing a best fit curve in a physics lab is an experimental value, even
if your lab manual does not explicitly call it that. However, we need to compare the values we got for
A, B, and C to the theory to see what they mean physically. In our case we compare:
1
y = y 0 + v 0y t + a y t 2
2
to:
y = A + Bt + Ct 2 .
Here are the results (values for A, B, and C are read directly from the curve in Figure 1, and I have
included the proper units):
1) Experimental value for y0 = A = -0.0245 m.
2) Experimental value for v0y = B = 0.086 m/s.
3) Experimental value for ay = 2C = 2(-4.9441 m/s2) = -9.8882 m/s2.
In the next section of the tutorial we will see how to do this example in EXCEL.
II. OUR FIRST FIT: - Position vs. Time; A 2nd Order Polynomial Fit.
A. Getting Started: In general, the following first few steps will be done for ANY curve fit
(except with different data, labels, and type of curve each time). Do the following:
1). Start the program EXCEL and start with a blank spreadsheet.
I have chosen to give mine the description "Problem: Create a 2nd Order polynomial fit
to our Position vs. Time measurements".
3). Create a data table of your experimental values that you are going to fit. Remember
which column the "x-axis" values go in for graphing purposes.
Use the table from Figure 1. DO NOT calculate these numbers from a formula. The
numbers in the table are like the numbers you would get from measurements.
4). Use the "Chart Wizard" to create an "XY (scatter)" graph of your data.
You should only see data points. You should not see any lines connecting your data
points on the graph if you chose the graph type "XY (scatter)". Figure 2 on the next
page depicts what my spreadsheet looks like now.
Figure 2 - EXCEL Data and Graph Before Adding the Best Fit Curve.
Yours should look something like this. If not, repeat these steps. Reread the "EXCEL
SPREADSHEET TUTORIAL" if needed. Now we will add the best fit curve.
B. Adding the Best Fit Curve: We can now fit a "2nd Order Polynomial" to this data.
Click on the chart if it is not already selected.
Choose Add Trendline... from the Chart menu.
Click on the Type tab.
You will see some small windows with pictures of various types of functions and their name
below them. Click on the window labeled Polynomial.
Type 2 In the textbox titled Order.
Now click on the Options tab.
The only options we are concerned with are 3 at the bottom:
1. Set intercept should be OFF (if checkmark is shown, click box to remove).
2. Display equation on chart should be ON (checkmark is visible).
3. Display R-squared value on chart should be ON (checkmark is visible).
Click on OK or press Enter to finish.
The best fit equation should now appear on the chart. We can reformat it so it is more readable.
Double click on the equation, and a window titled "Format Data Labels" appears, with 4 tabs at
the top: "Patterns", "Fonts", "Number" and "Alignment".
Under the "Number" tab there is a list of formats for displaying the numbers in the equation. I
have chosen the one called "Number" with 4 decimal places shown. Under the "Font" tab you
can adjust the font type and size, etc. I have chosen a larger font of about 14pt. Click Okay to
close the menu when you are done.
When you first clicked on the equation, a box appeared around it (single click to make it
reappear if it is gone). By clicking and holding the mouse near the upper left corner of the box,
you can drag the equation to a clear place on your graph. My result looks like this now:
Figure 3 - Data and graph after adding best fit curve.
II. OUR SECOND FIT: - A Simple Power Law Fit.
A. Basic Form: The basic form of a power law is an equation of the form y = CxB. Another
way to say this is that "y is proportional to x raised to the B power". There are 2 parameters,
the coefficient C and the exponent B.
For example, the period T (time for one complete oscillation) of a simple pendulum is a function
of the length L of the pendulum given by:
L
T = 2π
g
Here g is the magnitude of acceleration due to gravity. Remember that a square root is the same
as the exponent 1/2. Also, remember that the square root of g in the denominator is the same as
raising g to the exponent -1/2. This is a power law since it can be written in the form:
T = (2πg -1/2 )L1/2
If we measured the periods T of several pendulums of different lengths L and plotted T vs. L,
we could then do a power law fit of our data. If the relation above is verified we expect to get
an exponent B approximately equal to 1/2 from the fit. We would also get an experimental
value for g from the coefficient C since:
C = 2πg
-1/2
;
therefore (solve for g)
2π
g=
C
2
.
The exponent on the independent variable of a power law can also be negative, as in y = C/x;
which can be rewritten as y = Cx-1.
B. Kepler's Third Law Example: Here is a data table containing experimental values of the
orbital period T (time that it takes a planet to orbit the sun) and orbital radius R (average
distance from the sun to the planet) for the 9 planets in our solar system:
Planet
Distance from Sun (m)
Orbital period (s)
10
Mercury
5.79x10
7.60x106
Venus
1.08x1011
1.94x107
Earth
1.496x1011
3.156x107
11
Mars
2.28x10
5.94x107
Jupiter
7.78x1011
3.74x108
Saturn
1.43x1012
9.35x108
12
Uranus
2.87x10
2.64x109
Neptune
4.50x1012
5.22x109
12
Pluto
5.91x10
7.82x109
Figure 4 - Orbital radius and period data for the 9 planets.
Kepler's 3rd Law states that the square of T is proportional to the cube of R. This is written as:
T=
2π
R 3/2
GM S
MS is the mass of the Sun in kilograms. G is the universal gravitational force constant with the
known value G = 6.67x10-11 Nm2/kg2. This is a power law for T vs. R.
Now we will fit the data to a simple power law. Do the basic steps in EXCEL of making an
"XY (scatter)" plot of T vs. R data. To enter numbers in scientific notation in EXCEL you use
"e" for the power of 10 as you would in any computer. For example, the orbital distance in
meters for the Earth is 1.496x1011. This would be entered into EXCEL as 1.496e11 (notice
there are no spaces).
Once you have the scatter plot, then:
Click on the chart if it is not already selected.
Choose Add Trendline... from the Chart menu.
Click on the Type tab.
Click on the small window labeled Power.
Now click on the Options tab.
The only options we are concerned with are 3 at the bottom:
1. Set intercept should be OFF (if checkmark is shown, click box to remove).
2. Display equation on chart should be ON (checkmark is visible).
3. Display R-squared value on chart should be ON (checkmark is visible).
Click on OK or press Enter to finish.
We should change the format of the numbers displayed in the equation to "Scientific". If you
left it as "Number" the coefficient C on the result would display "0.0000" to 4 decimal places.
Its value is 5.51x10-10 to 3 significant figures in scientific notation. In "Number" format you
would need 12 decimal places to see this since it is 0.000000000551. The result is shown
below, with numbers on the equation showing 5 decimal places in scientific notation.
Figure 5 - Power law fit of orbital period vs. orbital distance for the 9 planets.
The exponent on the result is 1.49962 to 5 decimal places, or 1.50 when rounded to 3 significant
figures. We expected 3/2 which equals 1.5, and the data used was given to 3 sig. figs. R2 is
given as 9.99999x10-1 which is the same as 0.99999, a value close to 1. Now let's use C to
calculate the mass of the Sun. From examining Kepler's 3rd Law we see that:
C=
2π
GM S
Solving for MS and plugging in C from the fit:
4π 2
4 ⋅ (3.14159) 2
kg = 1.949x10 30 kg
M S (experiment al) =
=
2
-10 2
-11
CG
(5.51129 x10 ) ⋅ (6.67 x10 )
C. A Word of Caution: EXCEL will NOT allow you to do a simple power law fit if you have
either x=0 or y=0 in any of your data points. Also, it does not allow you to do a power law fit if
any of the x or y values are negative. For example, in the 2nd order polynomial fit, if we
simplified our model and only kept the t2 term, it looks sort of like a power law: y = (1/2ay)t2.
However; the y values are negative. We can only do the power law fit if we redefine our y
values as positive numbers (like choosing our coordinates with down as the positive direction).
We would also need to throw away the data point (t=0, y=0). Once we do these things we
could do a power law fit. C (now a positive number) should give an experimental value for
1/2ay and the exponent B should be approximately equal to 2. Try this on your own and do a
power law fit to the free fall data. You will get a slightly different experimental value for the
acceleration, but that is expected because we changed the model slightly.
III. THIRD EXAMPLE: - An Exponential Function.
A. Basic Form: An exponential function is a function of the form:
y = Ce Ax
Here "e" is the base of the natural logarithm (e=2.7182818.. to 7 decimal places). There are 2
parameters, C and A. The coefficient C is equal to the value of y at x=0, since e0=1:
y(at x = 0) = Ce A⋅( 0) = Ce 0 = C
The parameter A is the coefficient of x in the exponent on e. It can be either positive or
negative. When A is positive the function increases with increasing x, and is called an
"exponential growth function". When A is negative the function decreases with increasing x,
and is called an "exponential decay". The larger A is, the more rapidly the function grows or
decays. This type of function often appears in physics, biology, chemistry, economics and other
areas; so it is extremely important.
B. Radioactive Decay Example: The decay of the nuclei of many radioactive atoms is an
exponential decay in the number of atoms remaining of the original radioactive substance. For
example, the nucleus of a Barium 137 atom can exist in an "excited" state that is radioactive and
decays rapidly to a more stable state by giving off a high energy photon:
137 *
137
56 B → 56 B + γ
The star * on the B on the left hand side of the arrow represents the excited Barium. The arrow
means "goes to". The final products are represented on the right hand side by the B for Barium
and the Greek letter gamma for the photon. The number N(t) of excited Barium atoms
remaining in a sample as a function of time t is an exponential decay:
N(t) = N 0 e -λt
Suppose we measure the number of photons per second emitted, as shown in this data:
Time (s)
60
120
180
240
300
360
420
480
540
600
Photon rate
(number per sec)
1.62x1013
1.14x1013
8.58x1012
7.03x1012
4.71x1012
4.01x1012
3.05x1012
2.20x1012
1.91x1012
1.30x1012
Figure 6 - Photon emission rate vs. time data for excited Barium 137 decay.
The photon rate should equal the magnitude of the rate that N decreases since one photon is
emitted for each Barium that decays. The rate of decrease in N is just its time derivative:
dN
Rate of photon emission =
= (λN 0 )e- λt
dt
We dropped the minus sign on the (-λ) that was pulled out in front in the derivative, because we
took the absolute value. This is still an exponential decay, so we can fit measurements of the
photon rate vs. time to this function and get experimental values of λ and N0
To fit this data, create the "XY (scatter)" plot in EXCEL of "Photon Rate vs. Time", then:
Click on the chart if it is not already selected.
Choose Add Trendline... from the Chart menu.
Click on the Type tab.
Click on the small window labeled Exponential.
Now click on the Options tab.
The only options we are concerned with are 3 at the bottom:
1. Set intercept should be OFF (if checkmark is shown, click box to remove).
2. Display equation on chart should be ON (checkmark is visible).
3. Display R-squared value on chart should be ON (checkmark is visible).
Click on OK or press Enter to finish.
The steps are the same as before, except for the Type of fit chosen, and here is the result:
Figure 7 - Exponential fit of photon rate vs. time.
The fit equation gives λ = 4.5132x10-3, and the C equal to 2.0012x10+13. Since C = λN0:
N 0 (experiment al) =
C
=
2.0012x1013
atoms = 4.434x1015 atoms
λ 4.5132x10
This determines experimentally how many excited Barium atoms were in the sample at t=0. If
you know your chemistry you can figure out that this is about 1 microgram of Barium (use the
atomic mass of Barium, 137.34 grams, and the fact that the atomic mass is the mass of 1 mole,
or 6.02x1023 atoms).
-3
The value of λ describes how fast the Barium decays. It can be used to determine the "half-life"
of the excited Barium, which we will call τ1/2. The half-life is how long it takes for N to
decrease to half of the initial value N0. The half-life is given by:
ln(2)
0.693147
τ 1/2 =
=
sec = 154 sec (or 2.6 minutes).
λ
4.5132x10 − 3
Even though our plot is not a plot of N vs. t (it is a plot of dN/dt vs. t), it has the same half-life
as N, which can be seen by examining the graph. Note that the coefficient C=2.0012x1013 gives
the value of the graph at t=0. The best fit curve has a y value of half that at t=154 s.
C. A Word of Caution: As with the power law, EXCEL does not allow you to do an
exponential fit if you have any negative y values in your data. but you can have negative values
for x. Also, you cannot have y=0 in any data point, but you can have x=0.
IV. OTHER TYPES OF FITS.
EXCEL has 3 more types of fits shown in the choices, Linear, Logarithmic and Moving
Average. I will only mention them briefly here (without specific examples).
1) A linear function is an equation of a straight line: y = Cx + B. The parameter C is the slope
of the line, and B is the value of y at x=0 (the "y-intercept"). A linear function is the same thing
as a 1st order polynomial, but EXCEL has a separate choice for it on the menu. Data fits to a
straight line are common; however, you should not fit your data to a straight line if the theory
you are using has some other form.
2) A logarithmic function has the form: y = C ln(x) + B. The parameter B is the value of y at
x=1 (NOT at x=0), since ln(1)=0. EXCEL does not allow x=0 or negative values of x data for
this fit. Negative y values or y=0 are okay. This function won't show up as often as linear,
polynomial, or exponential in your physics courses, but it may show up in other places.
3) A moving average simply takes the average of every few data points. Sometimes this is used
to smooth out data, but it won't be used in your physics labs. It does not give you any specific
model to compare your data to.
V. SUMMARY OF DATA RESTRICTIONS.
The following table summarizes the restrictions for the types of data allowed for each fit:
Type of Function Allowed By EXCEL
DATA CONTAINS:
Linear
Polynomial Power Law
Exponential
Logarithm
Moving
Average
x=0
YES
YES
NO
YES
NO
YES
y=0
YES
YES
NO
NO
YES
YES
Negative
x values
YES
YES
NO
YES
NO
YES
Negative
y values
YES
YES
NO
NO
YES
YES
VI. BAD FITS.
As long as you do not violate any of the data restrictions for a certain type of fit, you can fit a
data set to any function you want to. HOWEVER, THIS DOES NOT MEAN YOU SHOULD. You
should base your choice on the physics or other model that applies to your data. You should learn to
recognize when your data does not match the model you are applying. Also, you should recognize
when your data does not fit well because the measurements are bad. We discuss these 2 cases here.
A. Fitting the Wrong Function: As an example, in the figure below I have fit some data to
an exponential function. This data should fit a straight line, and is NOT exponential. EXCEL
still calculates the "best fit" of an exponential function to the data I gave it. In fact, the R2 value
(0.9) even seems okay. This result is completely unacceptable. It would be ridiculous to claim
this velocity is increasing exponentially just because I can do a curve fit and get a result.
Figure 8 - Fitting the wrong function; an exponential fit of linear data.
B. Bad Data: Below I have fit some data to a straight line, y = Cx + B.
Figure 9 - Linear fit of garbage data.
Notice that the data is very scattered and does not fit the straight line well (or any of the other
functions we have discussed, for that matter). If the linear model is good it could indicate that
the measurements were taken or evaluated incorrectly. [ Note: It could also indicate that the
model is wrong, but DO NOT ever try to make this claim in your 200 level physics lab. The lab
experiments are well designed to give reliable measurements based on well known laws of
physics. ]
In the next graph, only the last 2 data points are garbage.
Figure 10 - Straight line fit with 2 bad data points.
If you can fix this type of problem by retaking the measurements, you should. Otherwise, you
can sometimes throw away the bad data points. You should only do this if you still have
enough measurements left to get reliable results.
On both of these last 2 examples you should also notice that the values of R2 are not even close
to 1. In the first case it is the worst, 0.0158. In the second case it is 0.4726. This is close to
1/2, which is still very bad because (remember) R2 will always be a number between 0 and 1.
VII. CONCLUDING REMARKS.
We have not dealt with the actual mathematics used to calculate the fit parameters from the
data. You may learn some of these details in the future in other classes or even in your job, if you want
to do more detailed types of data analysis or modeling of some physical system. Some knowledge of
the mathematics is needed in order to do detailed error analysis of the result. For example, a "standard
deviation" can be calculated for each one of the parameters in a curve fit, which is a numerical estimate
of the uncertainty ("error") in that parameter's result.
We have examined how to fit experimental results to various functions using EXCEL. Although
the EXCEL feature is fairly easy to use, curve fitting is a more general technique used to analyze data in
science and engineering. Therefore, these exercises should be beneficial to you even beyond your
physics labs. By working through the examples on your own, the reader should gain valuable
knowledge and experience in these scientific methods.
------------------------------------------------------------------------------------------------------------------------This is an original document written August 2002 by:
T. Horton, Graduate Student
Department of Physics
North Carolina State University
Campus Box 8202
Raleigh, NC 27695
-------------------------------------------------------------------------------------------------------------------------
Download