ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Computer Demonstration CD 2 – Curve Fitting using Linear Regression Purpose: To introduce curve fitting to experimental data. Goals: 1. 2. 3. 4. Graph given data and decide if a linear fit is appropriate. Find the "Best Line" through given data using the least squares method. Transform data to obtain a straight line. Provide fits to experimental data in the measurements lab. Assignment: See page 16 of handout for the homework assignment. Introduction to Linear Equations and Linearization Engineering often results in reducing physical behaviors to predictable relationships. This can be done using a theoretical derivation, which results in an equation, or by analyzing experimental data. In the latter case, it is more convenient if a line rather than a series of points can be used to demonstrate the behavior. This requires that an equation be determined that best represents the data so a curve can be drawn. For empirical relations, this results in an equation that can be used for engineering analyses. Other times, the equation determined from experimental data can be compared to that as given by theory to see how they compare and determine the validity of the theoretical approach (or to spotlight erroneous data). This is known as curve fitting. The simplest type of curve fit is that of linear data, that is points that follow a linear relation such as y=mx+b. A relation such as y=x2 or y=1/x would be known as nonlinear. However, most nonlinear relations can be reduced or plotted in a linear fashion so that a linear curve fit can be performed. In an experimental process, there are errors inherent in any measurement technique that results in relations that are close to linear, but don't fall exactly on a line. If graphing shows that the points are close to linear then we can assume that the errors are to blame and then proceed to make a linear "best fit." The easiest way to determine a best-fit line is to plot the data and graphically plot a straight line using a straight edge. This approach can lead to considerable variation of where the line is drawn, however, especially if the data is not exactly linear. Do you connect the end points? Should the line pass through the middle of the data? How do you account for outliers? A more consistent and mathematically grounded approach is required for engineering purposes. 1 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Method of Least Squares There are several mathematical ways of determining the best fit line that lend themselves to easy use with a spreadsheet program. The one we will use is called the Method of Least Squares. It begins with a definition of how close a line is to being a perfect fit. This definition says that we should figure the vertical distance from each point to the line we choose. We should then square all these distances and add them together. The smaller this sum of squared errors is, the better our line. We then need a mathematical way of finding the line for which the sum of squared errors is less than for every other line. That line is then our "best fit" line. To put this in mathematical symbols we begin by representing our N data points by their coordinates, (x1, y1), (x2, y2), ... , (xN, yN) and we describe our line by the formula for any line, y = mx+ b where m is the slope and b is the y-intercept. We can then graph the points and the line together as shown below. The vertical arrows show the errors for two points. The error for any point i at (xi, yi) would be yi-(mxi+ b) and the sum of these squared errors would be 2 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel What we want to do is find m and b for the line so that the error is the smallest value possible. First we determine the following four summations, We will call these sums sum of x, sum of y, sum of x squared, and sum of xy. We can then apply the least squares theorem to find m and b. Method of Least Squares Theorem The line y = mx + b, through the n points (x1, y1), ... (xn ,yn), which minimizes the total square error has the following slope m and intercept b given by the equations below. Notice that you must compute the slope m first since the equation for the intercept b uses this result. If we have our data points in columns in Excel it is easy to compute the four summations, and easy to find m and b for a curve fit line. Checking the Theorem We can check the theorem by using the points (0,-1), (1, 1), and (2, 3). If you substitute them into the equation y =2x -1 you will see that they all fit, so they are all exactly on the line with slope 2 and intercept -1. If we apply the least squares theorem we should come up with m = 2 and b = -1. 3 ME 101: Computer Demonstration – CD 2 First we calculate our four sums: Then we apply the least squares theorem to find m and b: So we see that we have m = 2 and b = -1 as we expected. Ohm's Law Recall our example of Ohm's law from CD 1. Here we compared our experimental data with Ohm's law theory. To obtain a better comparison, we will now fit a line to the same data. Since we know that Ohm's law is linear (V=IR, voltage V is linear with current I with R a constant resistance), we should be able to apply the method of least squares directly to the data without any alteration. 4 Curve Fitting Using Excel ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Here are the steps that we need to complete – 1. Determine which variable is our x and which is our y. 2. Calculate the number of points, n. 3. Tabulate the values of x2 and xy. 4. Calculate the sum of x, y, x2 and xy. 5. Calculate the slope, m. 6. Calculate the y-intercept, b. 7. Use the equation y=mx+b to plot a line. We will see that this is relatively simple. Our result should look something similar to our original plot of experimental data and Ohm's law, below. Ohm's Law Data The data you need is located in the spreadsheet file on-line. Download it to begin graphing the data. You will need the first worksheet title Ohm's Law. The spreadsheet should look like table shown to the right. Optionally, you can re-enter this data by hand, such as you might if you had recorded the data during an experiment. Notice that we have reserved columns C and D for later use. 2 These will be our columns for X and XY. First, enter the titles of the columns in cells C6 and D6. We will call them 'x2' and 'xy'. 5 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Next, enter the formula in cell C7 to calculate the square of x. Click on cell C7 and enter A7^2 (or alternatively, =A7*A7). After hitting enter, the value 48.5809 should appear. If it did not, make sure you have entered the formula correctly. Now use the Fill Down command under the Edit menu. To do this, highlight the cells C7 through C15 and select the Fill » Down command in the Edit submenu. This will translate the formula down through cells C15. Thus, in cell C14, you should have the formula =A14^2. Click on this cell to check that the procedure was executed properly. Once you have done this, repeat the procedure for cells D7 through D15. In this case, your formula will be =A7*B7. Your values should appear similar to the ones to the right. Formatting the Cells Depending upon individual settings in Excel, the significant figures in the cells may be either too many or too few. This is certainly the case in the columns C and D on the previous page. To change the formatting, highlight the cells you wish to format and select the Format Cells command under the Format » Cells... menu (this may be in a different location in different versions of Excel-just keep looking for it).The following window should appear. 6 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Select Number from the file tabs and Number in the category menu. This will allow you to select the number of decimal places in the cells. For our spreadsheet, we selected 3 decimal places for cells C7 through C15 and 4 decimal places for cells D7 through D15. This allows us to see the numerical values without too much clutter. Note that formatting the cell doesn't actually change the value of the cell or how it appears on a plot. It only changes how Excel displays the value to the user. Finding the Sums Finding sum of x and sum of y is easy enough to compute but we also need to compute sum x squared and sum xy as well as finding m, b and N. This can be done below each column, but we will add them to the right so we can keep them well labeled (you can place them wherever you like, but placing them in the same location well help you keep track of the correct values). In each case, we will need to sum the columns of our four quantities using the sum () command. Thus, the formula entries will be =sum (A7:A15), =sum (B7:B15), =sum (C7:C15), and = sum(D7:D15), respectively, as shown below. As mentioned, we also have to count the number of points, N. This is easy enough to do by hand with the few number of points we have here, but we want to automate it so it will work for a large number points. Here we use the count () command. Since all of the columns have the same number of values, we could use this on any column. For column A, the command would simply be =count (A7:A15). It should return a value of 9. (Note that sometimes we will discard end 7 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel points in linear regression analyses. In that case, we do not include those points in our count since we did not include them in our sums.) Now we need to compute the slope m and y-intercept b. If the sums and equations are in the cells shown, then the equations will be = (G5*G10-G7*G8)/ (G5*G9-G7^2) in cell G13 for the slope and = (G8-G13*G7)/G5 in cell G15 for the intercept. Make sure these formulae are correct. Any error will not return the correct values for m and b. Finishing Up Our spreadsheet should now look like the one below. If your values do not match with the ones shown, go back and check your formula entries very carefully. In order to graph our line and points together it is convenient to add a column after B. We add this column to determine the points that will lie on our best fit line. The experimental points are labeled y so we'll label these curve fit points y'. To add the column for y' simply select any cell in the column C and then pull down the Insert menu and select Columns. You can add this 8 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel anywhere, but it is helpful to do it next to y as it will make graphing easier later on. (Note that Excel automatically changed the formulae in column G when they were moved to column H so you don’t have to.) Now we will fill it in with points that follow the line y' = mx + b with our new values of m and b from the least squares formulae. In the new column which is now C enter =$H$13*A7+$H$15 in cell C7 and then Fill » Down through C15. Pay special attention to the extra "$" symbols in the cell addresses. These dollar signs keep these cell addresses fixed (absolute rather than relative values) so that when you copy the cells the values of m and b don't get moved to new cell locations. Check the various cells in column C to observe this behavior. Your spreadsheet should now look like this Plotting the Results Now we need to plot both the data and the curve fit. As a rule of presentation, experimental data are always plotted using points without lines while curve fits and theoretical curves are plotted using lines without points. This allows one to quickly determine what type of data is presented on a graph. To graph both the measured points and the best fit line, highlight the three columns (x, y, and y'), and Insert » Chart and make the graph as you have before. Make sure you select an X-Y (Scatter) plot for your chart. Once it is finished, right click on one of the data points from the best-fit line (series two, which is y') and select Format » Data Series (or double click on one of the points). A dialog box will then appear. 9 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Select a custom line style with a single straight line and no marker for the curve fit. Repeat this procedure for the experimental data, except with the reverse options. (Alternatively, you can select these options while making the graph the first time.) The Curve Fit Plot Your final plot should now look something like the following. 10 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel You now have your original data points displayed as points and your best fit curve displayed as what it is: a line. Compare it with our original volt plot where Ohm's law is plotted directly and notice how the curve slightly varies. The differences are due to the errors introduced in the measurement process. Automobile Drag The linear regression analysis only works for linear relationships. As previously mentioned, however, it is often possible to reduce a nonlinear relation to a linear one. For example, the aerodynamic drag on a car is a function of the square of the velocity. D = f (V2) The force required to overcome aerodynamic drag thus quadruples if the speed of the car doubles. This can be seen in the relationship shown in the spreadsheet called Automobile Drag. Instead of plotting drag as a function of velocity, we can plot it as a function of velocity squared. While drag varies non-linearly with velocity, it varies linearly with velocity squared. 11 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel The velocity here is in m/s (meters per second) while the drag force is in N (newtons). For quick reference, 1 m/s ~ 2 mph while 4.5 N ~ 1 pound (force). (These are approximate.) Changing the Relation To change the non-linear relation into a linear one, we have to add a new x column which we will call x'. Technically, we are plotting drag as a function of the square of velocity. So insert a new column between columns A and B called velocity squared. We enter the formula =A6^2 and fill down to get the following: 12 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Note that when we plot drag versus velocity squared, we have a nice linear relation, where our curve fit can be determined by y' = mx'+ b =m(x2) +b Now we have to complete the linear regression analysis using our new variable. Changing the Relation We have two options in preparing our sums. The first is to re-enter the formulae by hand. This is time consuming, but accurate. The second is to copy and paste our formulae cells from the Ohm's Law spreadsheet into the Automobile Drag spreadsheet. You have to make sure that you increase the cell ranges and change the necessary cell values, however. Thus, when copying and pasting make sure you increase all cell ranges to the last data row and change the sum relations to use the velocity squared column B instead of the velocity column A. The sheet above shows that we have increased some of our columns, but not all of them yet. We may also need to reformat our columns if our numbers are too large to display, In that case, we would end up with ######### symbols as shown in the right most columns. Begin changing the columns one by one (using the Fill » Down command, of course), or reentering the formulae by hand if you choose to do so. The Final Values When completed, your spreadsheet should look like the following. Since the relationship looks very linear, your y' values should match very closely with your y values. If they do not, you have done something wrong. 13 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Note that the y-intercept b is zero. This agrees very well with the data previously presented. Once our values are completed, we can plot the data and return it to the original relationship. (As an aside, the power requirements for a car can be determined by calculating the aerodynamic drag (neglecting rolling resistance) knowing that required power = force x velocity = drag x velocity Thus, power is a function of velocity cubed. Determining the drag will then allow you to calculate the power required of a motor to move at a given speed. In SI units, this power is given in Watts (1 W = 1 N·m/s). This can be converted to a horsepower rating used on US cars using the conversion 1 hp = 745 W. Changing the Relation The final plots should look like the one shown here. Note that the curve fit and data fall along the same line. If we were to plot the drag data and curve fit versus velocity instead of velocity squared, we would have something similar to the plot below. Now we have an exact match to our original nonlinear curve! 14 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel The final curve fit with the data is shown here. And that is all there is to linear regression and curve fitting! As a last note, some versions of Excel do come with a linear regression package. To use this properly, however, you have to know what you are doing, which is why you have gone through these exercises. When performing curve fits in the future, you can resort to using a pre-packaged routine or doing it yourself now that you know how easy it is. 15 ME 101: Computer Demonstration – CD 2 Curve Fitting Using Excel Assignment: Complete the following problems: 1. Complete your plots and spreadsheets for the Ohm's Law and Automobile Drag tutorials. (For the latter, you only need to turn in the linear drag versus velocity squared plot.) 2. In an industrial process, the depth of the fluid in a tank of cooling liquid is measured over time. The following depth measurements are recorded: Time [hours] 0 1 2 3 Depth [feet] 12.8 10.0 9.2 8.4 Plot this data and perform a best fit using the method of least squares. Using this fit, predict how long it will take before the cooling fluid supply is completely depleted. 3. Your spreadsheet should also contain a worksheet with the winning times for the Olympic 200 m dash. Plot the data and determine the best fit using the method of least squares. Predict the winning time for 2004 using your best fit. How close does this come to the actual winning time? Note: In some years (1916, 1940 and 1944), the Olympics were skipped due to world political problems (i.e., war). Thus the years are not equally spaced so that you must use an XY-scatter graph instead of a Line Graph. Since the years are listed, however, you must delete these rows for your fit to be valid. For each problem, turn in the required plots and the spreadsheets used to create them. The assignment is due at the start of class one week from the day it is assigned. 16