USING THE EXCEL CHART WIZARD TO CREATE CURVE FITS (DATA ANALYSIS). Note to physics students: Even if this tutorial is not given as an assignment, you are responsible for knowing the material contained here on how to use EXCEL. The examples contained here demonstrate how this feature works in EXCEL, and they demonstrate the general scientific technique of using curve fits to extract information from experimental data. Several types of curve fits are demonstrated in the examples. The simplest and most common type, a straight line fit, is not included as a specific example, but its features are discussed. The author has done this to emphasize that the technique is more general, especially for students who have done only straight line fits before. You should read and complete the "EXCEL SPREADSHEET TUTORIAL" before attempting this exercise. TABLE OF CONTENTS: I. INTRODUCTION: - Motivation and Our First Example. II. OUR FIRST FIT: - Position vs. Time; A 2nd Order Polynomial Fit. A. Getting Started B. Adding the Best Fit Curve II. OUR SECOND FIT: - A Simple Power Law Fit. A. Basic Form B. Kepler's Third Law Example C. A Word of Caution III. THIRD EXAMPLE: - An Exponential Function. A. Basic Form B. Radioactive Decay Example C. A Word of Caution IV. OTHER TYPES OF FITS. V. SUMMARY OF DATA RESTRICTIONS. VI. BAD FITS. A. The Wrong Function B. Bad Data VII. CONCLUDING REMARKS. I. INTRODUCTION: - Motivation and Our First Example. In the "EXCEL SPREADSHEET TUTORIAL" we created a plot from numbers that were calculated from a set of formulas. We plotted the position vs. time and velocity vs. time of a falling body using the kinematics equations for constant acceleration (position = y, velocity = v): 1 y = y 0 + v 0y t + a y t 2 ; v y = v 0y + a y t . 2 The position function y(t) is "quadratic" in terms of the independent variable time. In other words the position equation is a simple polynomial with a constant term y0, a term proportional to t (v0y t), and a term proportional to t2 (1/2ay t2). A quadratic function is also referred to as a "2nd degree polynomial" or "polynomial of order 2". The "degree" or "order" of a polynomial is the highest power that the independent variable is raised to (for example, a 3rd order polynomial would also include a t3 term). The velocity v(t) is "linear" in the variable t, or a "1st order polynomial". These equations predict what the position and velocity of a falling body would be, based on the physics involved and the assumption that the acceleration of the falling object is constant. This gives us a prediction that the position is a quadratic function of time and that the velocity is a linear function of time. The values that we assumed we knew (ay = -9.8 m/s2, v0y =0, and y0 = 0) are referred to as parameters. We assumed we knew the parameters, plugged them into the formulas, calculated v and y for various values of t, and plotted the result. This result is a mathematical model, or theoretical result based on the assumption of constant acceleration Instead, in the laboratory you would measure the physical quantities involved and test to see how well they fit these models. If measurements fit the models well enough, you should be able to determine experimentally what the parameters are from your measurements. In our example here, we would drop some object and measure its position and time at various intervals as it fell. The measurements would not necessarily exactly match our predicted quadratic equation for y vs. t, but they should be very close to it as long as our assumption of constant acceleration is valid, and if we take careful measurements. Even if the measurements deviate slightly from the model, it does not mean our assumption is wrong. There is always a certain amount of uncertainty (sometimes referred to as "error") in taking a measurement that comes from the limited precision of any measurement device. This could account for some of the deviation. Also, we may have done the measurements incorrectly and this could cause some deviation. We don't intend to discuss measurement uncertainty ("error") in detail in this tutorial, but you should remember that when the deviation comes from you taking the measurements incorrectly it is unacceptable and should be corrected. This type of mistake is NOT what we mean by "error" when you discuss sources of "error" in your experiment. Let's examine some hypothetical position vs. time measurements. We want to test and see if they closely fit a function of the form: y = A + Bt + Ct 2 . If so, then we are confident that we are measuring something that has constant acceleration. We can relax the assumption that we know what the acceleration is, and that we know what the initial position and velocity are. Instead, we can calculate numerical values for A, B, and C directly from our measurements that make a quadratic function y(t) that passes as close as possible to all of the data points. There is a rigorous mathematical way to do this, and the curve that we get is called a "best fit curve". You may have done this before in a statistics course or another science course fitting data to a straight line, but the method is more general and other types of curves can be fit also. We will not concern ourselves with the math involved, since EXCEL will do it for us. In the second half of this tutorial we will show you how to use EXCEL to get the best fit curve, but first let us see how it looks in general. Suppose the data table in Figure 1 below contains our measurements of position vs. time of a falling object. Next to the table is shown what the graph of these measurements with the best fit curve from EXCEL looks like. The data points are the small square boxes on the graph. The equation displayed on the graph is a quadratic function that EXCEL calculated as a best fit to the data (x is Time, y is Position), and the curve you see on the graph is a plot of this function. Note that I have followed the proper conventions for titling and labeling the graph, and for labeling the data table; as explained in the SPREADSHEET TUTORIAL. Positon vs. Time Data W ith Best Fit Curve 0.00 2.00 4.00 Time (s) 6.00 8.00 10.00 0 -50 Position (m) -100 Time(s) Position(m) 0 0 -150 1 -4.5 -200 2 -20 3 -43.5 -250 4 -81 -300 5 -120 -350 6 -180 2 y = -4.9441x + 0.086x - 0.0245 7 -240 -400 2 R = 0.9987 8 -325 -450 9 -385 -500 10 -500 Figure 1 - Graph of Position vs. Time Data of Falling Body With Best Fit Curve From Excel. There are several things to be aware of about this result: 1. The data points do not fall exactly on the curve, but are very close to it. I will not prove it, but the ones that seem to fall right on the curve do not match it exactly (if they do it is coincidence). Even though our first data point is y = 0 at t = 0, the best fit curve does NOT equal 0 at t = 0. The number called "R2" on the graph describes how close the curve fits the data. It will always be a number between 0 and 1. In our case, R2 = 0.9987. The closer R2 is to 1, the closer the data is to the best fit curve in general. In fact the curve we got is the "best fit" because it is a unique function of the type we chose (2nd order polynomial) that makes R2 as close to being equal to 1 as possible. No other possible values of the parameters A, B, and C can do better, based on the value of R2. This number is often referred to as the "coefficient of regression". As before, we will not discuss the mathematics used to calculate it here. 2. It is important to know that the result you get is an EXPERIMENTAL result, NOT theoretical. We chose to fit the data to a 2nd degree polynomial based on theory, but the results you get for the parameters A, B, and C are calculated directly from the data. Think of this as a fancy way to get "average" values for A, B, and C from the measurements. Never refer to the results of a best fit as "theory". We only used the model as a starting point to know what general type of curve to fit our data to. Also, even though our data fits a 2nd degree polynomial very well, we would not say that it "proves" that the acceleration is constant. We would only say that our experimental result "verifies" our assumption of constant acceleration to a certain level of precision. Any result you get from analyzing a best fit curve in a physics lab is an experimental value, even if your lab manual does not explicitly call it that. However, we need to compare the values we got for A, B, and C to the theory to see what they mean physically. In our case we compare: 1 y = y 0 + v 0y t + a y t 2 2 to: y = A + Bt + Ct 2 . Here are the results (values for A, B, and C are read directly from the curve in Figure 1, and I have included the proper units): 1) Experimental value for y0 = A = -0.0245 m. 2) Experimental value for v0y = B = 0.086 m/s. 3) Experimental value for ay = 2C = 2(-4.9441 m/s2) = -9.8882 m/s2. In the next section of the tutorial we will see how to do this example in EXCEL. II. OUR FIRST FIT: - Position vs. Time; A 2nd Order Polynomial Fit. A. Getting Started: In general, the following first few steps will be done for ANY curve fit (except with different data, labels, and type of curve each time). Do the following: 1). Start the program EXCEL and start with a blank spreadsheet. I have chosen to give mine the description "Problem: Create a 2nd Order polynomial fit to our Position vs. Time measurements". 3). Create a data table of your experimental values that you are going to fit. Remember which column the "x-axis" values go in for graphing purposes. Use the table from Figure 1. DO NOT calculate these numbers from a formula. The numbers in the table are like the numbers you would get from measurements. 4). Use the "Chart Wizard" to create an "XY (scatter)" graph of your data. You should only see data points. You should not see any lines connecting your data points on the graph if you chose the graph type "XY (scatter)". Figure 2 on the next page depicts what my spreadsheet looks like now. Figure 2 - EXCEL Data and Graph Before Adding the Best Fit Curve. Yours should look something like this. If not, repeat these steps. Reread the "EXCEL SPREADSHEET TUTORIAL" if needed. Now we will add the best fit curve. B. Adding the Best Fit Curve: We can now fit a "2nd Order Polynomial" to this data. Click on the chart if it is not already selected. Choose Add Trendline... from the Chart menu. Click on the Type tab. You will see some small windows with pictures of various types of functions and their name below them. Click on the window labeled Polynomial. Type 2 In the textbox titled Order. Now click on the Options tab. The only options we are concerned with are 3 at the bottom: 1. Set intercept should be OFF (if checkmark is shown, click box to remove). 2. Display equation on chart should be ON (checkmark is visible). 3. Display R-squared value on chart should be ON (checkmark is visible). Click on OK or press Enter to finish. The best fit equation should now appear on the chart. We can reformat it so it is more readable. Double click on the equation, and a window titled "Format Data Labels" appears, with 4 tabs at the top: "Patterns", "Fonts", "Number" and "Alignment". Under the "Number" tab there is a list of formats for displaying the numbers in the equation. I have chosen the one called "Number" with 4 decimal places shown. Under the "Font" tab you can adjust the font type and size, etc. I have chosen a larger font of about 14pt. Click Okay to close the menu when you are done. When you first clicked on the equation, a box appeared around it (single click to make it reappear if it is gone). By clicking and holding the mouse near the upper left corner of the box, you can drag the equation to a clear place on your graph. My result looks like this now: Figure 3 - Data and graph after adding best fit curve. II. OUR SECOND FIT: - A Simple Power Law Fit. A. Basic Form: The basic form of a power law is an equation of the form y = CxB. Another way to say this is that "y is proportional to x raised to the B power". There are 2 parameters, the coefficient C and the exponent B. For example, the period T (time for one complete oscillation) of a simple pendulum is a function of the length L of the pendulum given by: L T = 2π g Here g is the magnitude of acceleration due to gravity. Remember that a square root is the same as the exponent 1/2. Also, remember that the square root of g in the denominator is the same as raising g to the exponent -1/2. This is a power law since it can be written in the form: T = (2πg -1/2 )L1/2 If we measured the periods T of several pendulums of different lengths L and plotted T vs. L, we could then do a power law fit of our data. If the relation above is verified we expect to get an exponent B approximately equal to 1/2 from the fit. We would also get an experimental value for g from the coefficient C since: C = 2πg -1/2 ; therefore (solve for g) 2π g= C 2 . The exponent on the independent variable of a power law can also be negative, as in y = C/x; which can be rewritten as y = Cx-1. B. Kepler's Third Law Example: Here is a data table containing experimental values of the orbital period T (time that it takes a planet to orbit the sun) and orbital radius R (average distance from the sun to the planet) for the 9 planets in our solar system: Planet Distance from Sun (m) Orbital period (s) 10 Mercury 5.79x10 7.60x106 Venus 1.08x1011 1.94x107 Earth 1.496x1011 3.156x107 11 Mars 2.28x10 5.94x107 Jupiter 7.78x1011 3.74x108 Saturn 1.43x1012 9.35x108 12 Uranus 2.87x10 2.64x109 Neptune 4.50x1012 5.22x109 12 Pluto 5.91x10 7.82x109 Figure 4 - Orbital radius and period data for the 9 planets. Kepler's 3rd Law states that the square of T is proportional to the cube of R. This is written as: T= 2π R 3/2 GM S MS is the mass of the Sun in kilograms. G is the universal gravitational force constant with the known value G = 6.67x10-11 Nm2/kg2. This is a power law for T vs. R. Now we will fit the data to a simple power law. Do the basic steps in EXCEL of making an "XY (scatter)" plot of T vs. R data. To enter numbers in scientific notation in EXCEL you use "e" for the power of 10 as you would in any computer. For example, the orbital distance in meters for the Earth is 1.496x1011. This would be entered into EXCEL as 1.496e11 (notice there are no spaces). Once you have the scatter plot, then: Click on the chart if it is not already selected. Choose Add Trendline... from the Chart menu. Click on the Type tab. Click on the small window labeled Power. Now click on the Options tab. The only options we are concerned with are 3 at the bottom: 1. Set intercept should be OFF (if checkmark is shown, click box to remove). 2. Display equation on chart should be ON (checkmark is visible). 3. Display R-squared value on chart should be ON (checkmark is visible). Click on OK or press Enter to finish. We should change the format of the numbers displayed in the equation to "Scientific". If you left it as "Number" the coefficient C on the result would display "0.0000" to 4 decimal places. Its value is 5.51x10-10 to 3 significant figures in scientific notation. In "Number" format you would need 12 decimal places to see this since it is 0.000000000551. The result is shown below, with numbers on the equation showing 5 decimal places in scientific notation. Figure 5 - Power law fit of orbital period vs. orbital distance for the 9 planets. The exponent on the result is 1.49962 to 5 decimal places, or 1.50 when rounded to 3 significant figures. We expected 3/2 which equals 1.5, and the data used was given to 3 sig. figs. R2 is given as 9.99999x10-1 which is the same as 0.99999, a value close to 1. Now let's use C to calculate the mass of the Sun. From examining Kepler's 3rd Law we see that: C= 2π GM S Solving for MS and plugging in C from the fit: 4π 2 4 ⋅ (3.14159) 2 kg = 1.949x10 30 kg M S (experiment al) = = 2 -10 2 -11 CG (5.51129 x10 ) ⋅ (6.67 x10 ) C. A Word of Caution: EXCEL will NOT allow you to do a simple power law fit if you have either x=0 or y=0 in any of your data points. Also, it does not allow you to do a power law fit if any of the x or y values are negative. For example, in the 2nd order polynomial fit, if we simplified our model and only kept the t2 term, it looks sort of like a power law: y = (1/2ay)t2. However; the y values are negative. We can only do the power law fit if we redefine our y values as positive numbers (like choosing our coordinates with down as the positive direction). We would also need to throw away the data point (t=0, y=0). Once we do these things we could do a power law fit. C (now a positive number) should give an experimental value for 1/2ay and the exponent B should be approximately equal to 2. Try this on your own and do a power law fit to the free fall data. You will get a slightly different experimental value for the acceleration, but that is expected because we changed the model slightly. III. THIRD EXAMPLE: - An Exponential Function. A. Basic Form: An exponential function is a function of the form: y = Ce Ax Here "e" is the base of the natural logarithm (e=2.7182818.. to 7 decimal places). There are 2 parameters, C and A. The coefficient C is equal to the value of y at x=0, since e0=1: y(at x = 0) = Ce A⋅( 0) = Ce 0 = C The parameter A is the coefficient of x in the exponent on e. It can be either positive or negative. When A is positive the function increases with increasing x, and is called an "exponential growth function". When A is negative the function decreases with increasing x, and is called an "exponential decay". The larger A is, the more rapidly the function grows or decays. This type of function often appears in physics, biology, chemistry, economics and other areas; so it is extremely important. B. Radioactive Decay Example: The decay of the nuclei of many radioactive atoms is an exponential decay in the number of atoms remaining of the original radioactive substance. For example, the nucleus of a Barium 137 atom can exist in an "excited" state that is radioactive and decays rapidly to a more stable state by giving off a high energy photon: 137 * 137 56 B → 56 B + γ The star * on the B on the left hand side of the arrow represents the excited Barium. The arrow means "goes to". The final products are represented on the right hand side by the B for Barium and the Greek letter gamma for the photon. The number N(t) of excited Barium atoms remaining in a sample as a function of time t is an exponential decay: N(t) = N 0 e -λt Suppose we measure the number of photons per second emitted, as shown in this data: Time (s) 60 120 180 240 300 360 420 480 540 600 Photon rate (number per sec) 1.62x1013 1.14x1013 8.58x1012 7.03x1012 4.71x1012 4.01x1012 3.05x1012 2.20x1012 1.91x1012 1.30x1012 Figure 6 - Photon emission rate vs. time data for excited Barium 137 decay. The photon rate should equal the magnitude of the rate that N decreases since one photon is emitted for each Barium that decays. The rate of decrease in N is just its time derivative: dN Rate of photon emission = = (λN 0 )e- λt dt We dropped the minus sign on the (-λ) that was pulled out in front in the derivative, because we took the absolute value. This is still an exponential decay, so we can fit measurements of the photon rate vs. time to this function and get experimental values of λ and N0 To fit this data, create the "XY (scatter)" plot in EXCEL of "Photon Rate vs. Time", then: Click on the chart if it is not already selected. Choose Add Trendline... from the Chart menu. Click on the Type tab. Click on the small window labeled Exponential. Now click on the Options tab. The only options we are concerned with are 3 at the bottom: 1. Set intercept should be OFF (if checkmark is shown, click box to remove). 2. Display equation on chart should be ON (checkmark is visible). 3. Display R-squared value on chart should be ON (checkmark is visible). Click on OK or press Enter to finish. The steps are the same as before, except for the Type of fit chosen, and here is the result: Figure 7 - Exponential fit of photon rate vs. time. The fit equation gives λ = 4.5132x10-3, and the C equal to 2.0012x10+13. Since C = λN0: N 0 (experiment al) = C = 2.0012x1013 atoms = 4.434x1015 atoms λ 4.5132x10 This determines experimentally how many excited Barium atoms were in the sample at t=0. If you know your chemistry you can figure out that this is about 1 microgram of Barium (use the atomic mass of Barium, 137.34 grams, and the fact that the atomic mass is the mass of 1 mole, or 6.02x1023 atoms). -3 The value of λ describes how fast the Barium decays. It can be used to determine the "half-life" of the excited Barium, which we will call τ1/2. The half-life is how long it takes for N to decrease to half of the initial value N0. The half-life is given by: ln(2) 0.693147 τ 1/2 = = sec = 154 sec (or 2.6 minutes). λ 4.5132x10 − 3 Even though our plot is not a plot of N vs. t (it is a plot of dN/dt vs. t), it has the same half-life as N, which can be seen by examining the graph. Note that the coefficient C=2.0012x1013 gives the value of the graph at t=0. The best fit curve has a y value of half that at t=154 s. C. A Word of Caution: As with the power law, EXCEL does not allow you to do an exponential fit if you have any negative y values in your data. but you can have negative values for x. Also, you cannot have y=0 in any data point, but you can have x=0. IV. OTHER TYPES OF FITS. EXCEL has 3 more types of fits shown in the choices, Linear, Logarithmic and Moving Average. I will only mention them briefly here (without specific examples). 1) A linear function is an equation of a straight line: y = Cx + B. The parameter C is the slope of the line, and B is the value of y at x=0 (the "y-intercept"). A linear function is the same thing as a 1st order polynomial, but EXCEL has a separate choice for it on the menu. Data fits to a straight line are common; however, you should not fit your data to a straight line if the theory you are using has some other form. 2) A logarithmic function has the form: y = C ln(x) + B. The parameter B is the value of y at x=1 (NOT at x=0), since ln(1)=0. EXCEL does not allow x=0 or negative values of x data for this fit. Negative y values or y=0 are okay. This function won't show up as often as linear, polynomial, or exponential in your physics courses, but it may show up in other places. 3) A moving average simply takes the average of every few data points. Sometimes this is used to smooth out data, but it won't be used in your physics labs. It does not give you any specific model to compare your data to. V. SUMMARY OF DATA RESTRICTIONS. The following table summarizes the restrictions for the types of data allowed for each fit: Type of Function Allowed By EXCEL DATA CONTAINS: Linear Polynomial Power Law Exponential Logarithm Moving Average x=0 YES YES NO YES NO YES y=0 YES YES NO NO YES YES Negative x values YES YES NO YES NO YES Negative y values YES YES NO NO YES YES VI. BAD FITS. As long as you do not violate any of the data restrictions for a certain type of fit, you can fit a data set to any function you want to. HOWEVER, THIS DOES NOT MEAN YOU SHOULD. You should base your choice on the physics or other model that applies to your data. You should learn to recognize when your data does not match the model you are applying. Also, you should recognize when your data does not fit well because the measurements are bad. We discuss these 2 cases here. A. Fitting the Wrong Function: As an example, in the figure below I have fit some data to an exponential function. This data should fit a straight line, and is NOT exponential. EXCEL still calculates the "best fit" of an exponential function to the data I gave it. In fact, the R2 value (0.9) even seems okay. This result is completely unacceptable. It would be ridiculous to claim this velocity is increasing exponentially just because I can do a curve fit and get a result. Figure 8 - Fitting the wrong function; an exponential fit of linear data. B. Bad Data: Below I have fit some data to a straight line, y = Cx + B. Figure 9 - Linear fit of garbage data. Notice that the data is very scattered and does not fit the straight line well (or any of the other functions we have discussed, for that matter). If the linear model is good it could indicate that the measurements were taken or evaluated incorrectly. [ Note: It could also indicate that the model is wrong, but DO NOT ever try to make this claim in your 200 level physics lab. The lab experiments are well designed to give reliable measurements based on well known laws of physics. ] In the next graph, only the last 2 data points are garbage. Figure 10 - Straight line fit with 2 bad data points. If you can fix this type of problem by retaking the measurements, you should. Otherwise, you can sometimes throw away the bad data points. You should only do this if you still have enough measurements left to get reliable results. On both of these last 2 examples you should also notice that the values of R2 are not even close to 1. In the first case it is the worst, 0.0158. In the second case it is 0.4726. This is close to 1/2, which is still very bad because (remember) R2 will always be a number between 0 and 1. VII. CONCLUDING REMARKS. We have not dealt with the actual mathematics used to calculate the fit parameters from the data. You may learn some of these details in the future in other classes or even in your job, if you want to do more detailed types of data analysis or modeling of some physical system. Some knowledge of the mathematics is needed in order to do detailed error analysis of the result. For example, a "standard deviation" can be calculated for each one of the parameters in a curve fit, which is a numerical estimate of the uncertainty ("error") in that parameter's result. We have examined how to fit experimental results to various functions using EXCEL. Although the EXCEL feature is fairly easy to use, curve fitting is a more general technique used to analyze data in science and engineering. Therefore, these exercises should be beneficial to you even beyond your physics labs. By working through the examples on your own, the reader should gain valuable knowledge and experience in these scientific methods. ------------------------------------------------------------------------------------------------------------------------This is an original document written August 2002 by: T. Horton, Graduate Student Department of Physics North Carolina State University Campus Box 8202 Raleigh, NC 27695 -------------------------------------------------------------------------------------------------------------------------