Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 LabNotes Rev. Sp07 Chapter 5 Workshop on Fitting of Linear Data (Contributed by E.L. Allen, SJSU) 5.0 Learning Objectives After successfully com pleting this laboratory workshop, including the assigned reading, the lab repot sheets, the lab quizzes, and any required reports, the student will be able to: 1. Distinguish between dependent and independent variables. 2. Properly display x-y data in a scatter plot w ith appropriate scaling and axes, both manually and using an Excel spreadsheet with its graphing functions. 3. Recognize from a scatter plot whether data is linear, has some other functional dependence, or has no functional dependence between x and y. 4. Perform a linear regress ion analysis of experimental data, both manually and using an Excel spreadsheet or the linear regression commands in MATLAB (e.g., the Basic Fitting Interface) 5. Correctly display a fitted line and its equations and demonstrate understanding of relationship between fitting parameters and experimental data. 6. Demonstrate understanding of the distinction between raw data and experimental results. 5.1 References 1. C. Chatfield, Statistics for Technology, Chapman and Hall, London, 1983. 2. W.J. Palm, "Intro to MATLAB7 for Engineers", McGraw-Hill, 2005, pp 312-315 5.2 . Linear Regression Linear regression is a technique for finding the linear relationship between the x and y values of experimental data. It assum es that there is a linear relationship between x and y. This m ay not always be th e case; there m ay be a linear relatio nship between x and a function of y; there m ay be a non-linear function, or there may be no relationship at all between the data. We only apply linear regression to data when we have reason to think that there is a linear r elationship. It is necessary to first identify which is the independent variable and which is the dependent variable. Here is a m ethod to follow. Many of you have linear regression progra ms on your engineering ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf 5-1 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 calculators; it is also embedded in Excel and LabNotes Rev. Sp07 MATLAB software . However, to understand the m eaning and reliability of f itted data, it is best to perform a manual linear regression at least once in your life. In the exercise that follows, you may use your calculator for adding columns of data only but no t for fitting. W hen you have finished the linear reg ression exercise, you may use your calculator’s program to check your result; then you can plot the data in Excel or MATLAB and compare the results obtained with Excel’s or MATLAB's algorithm. More on this technique, and other statistical methods, can be found in the references. Before beginning any calculations for a linear regression analysis, consider the following points: 1. Identify x and y, that is the independent and dependent variables in the experiment. 2. Make a scatter diagram, i.e. plot the data on an x-y coordinate system and see if it is roughly linear. 3. If it isn’t, think about wh ether you should plot som e f(y) such as log y or y 2; th is generally requires some notion of what the experiment is about and what is the expected relationship in the data. If you need to transform the x or y data, do so, and create a new set of data (c.f. ENGR25) 4. You now have n pairs of data, which we shall refer to as (xi, yi). The x variable is referred to as the control or independent variable; yi is referred to as the response or dependent variable. 5. Assume y is subject to scatter (i.e. random errors) and that x is not. 6. Assume a linear relationship exists, as y = a0 + a1 x (equivalent to: y = mx + b) . 7. Our objective is to find estim ates for a 0 and a 1 such that the line gives a “good fit”. The method of least squares is one way to do this. In general, the values of a 0 and a 1 are interpreted to have some fundamental physical meaning. This is one way of determ ining the value of physical constants or m aterials prop erties. The values so o btained are only as reliable as the measurement errors. 8. The assum ption of a linear f it in the data pre cludes the possibility that there is a m ore complex relationsh ip be tween them , or th at the relation ship is on ly linear ove r a lim ited range of the independent variable, or that th ere are other factors wh ich inf luence the data which we have not explicitly considered. A ll of these c oncepts m ust be considered in interpreting the data. Now that your data is available for a linear re gression you are ready to begin. The following steps provide a guide to performing a linear regression analysis: Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf 5-2 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 LabNotes Rev. Sp07 1. Consider that at xi, the predicted value of y is: yp = a0 + a1xi 2. The difference between the observed y-value (yi) and the predicted value (yp) is ei = yi − (a0 + a1 x i ) where ei is the deviation or residual. There is a value of ei at each data pair. 3. Choose values of a 0 and a1 that minimize the s um of the squares of the ei’s at each pair of points. The sum of the squares we will call S. n n i=1 i=1 2 S = ∑ ei2 = ∑ [yi − (a0 + a1 x i )] S = f (a0 , a1 ) 4. To minimize S, we need to find the partial derivatives zero, and solve the resulting simultaneous equa ∂S ∂S ∂a0 and ∂a1 , set th em equal to tions f or the leas t s quares es timates of aˆ0 and aˆ1 . In the pa ges that foll ow, each sum is assu med to be from i=1 to i=n, where n is the total number of data pairs. ∂S = ∑ 2[yi − (a0 + a1 x i )](−1) ∂a0 ∂S = 2[y − (a0 + a1 xi )](− xi ) ∂a1 ∑ i 5. Next set the two partial derivatives equal to zero, and solve for the values of aˆ0 and aˆ1 : ∂S = 0 = ∑ 2[yi − aˆ0 − aˆ1 xi ](−1) ∂a0 0 = − ∑ 2yi + ∑ 2 aˆ 0 + ∑ 2 aˆ1 x i ∑y i = naˆ0 + aˆ1 ∑ xi and similarly, for the second partial: ∂S =0= ∂a1 ∑ 2[y i − aˆ0 − aˆ1 xi ](− xi ) 0 = − ∑ 2x i yi + ∑ 2aˆ0 xi + ∑ 2 aˆ1 x 2i ∑x y i i = aˆ 0 ∑ x i + aˆ1 ∑ x 2i 6. We now have two normal equations in aˆ0 and aˆ1 which need to be solved simultaneously. naˆ 0 + aˆ1 ∑ x i = ∑ yi aˆ 0 ∑ x i + aˆ1 ∑ x 2i = ∑ x i yi ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf 5-3 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 LabNotes Rev. Sp07 To solve the normal equations, set up a matrix of the coefficients of aˆ0 and aˆ1 . Let X = aˆ0 and Y = aˆ1 , and a-f are the coefficients. ax + by = c dx + ey = f solving for x and y, we get: c − by f − ey and x = a d c − by f − ey = a d and by rearranging, eventually we get : x= y= aˆ1 = dc − af , or db − ae ∑ x ∑ y − n∑ x y (∑ x ) − n∑ x i 2 i i i 2 i i where aˆ1 is the slope of the least squares fit. 7. Doing a similar calculation for the intercept, we find: aˆ0 = ∑y i − aˆ1 ∑ xi n 8. To find the linear fit, then, we simply calculate the terms in aˆ0 and aˆ1 from our data set. Then plot the lin e and conf irm that th ere is reasonable fit to th e data. This explanation does not include further statistical analysis such as ho w good the linear fit is or what the confidence intervals are. These should be investigated further in other courses. ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf 5-4 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 LabNotes Rev. Sp07 LAB REPORT SHEET 5.1 Determining a Linear Relationship Key Member (Encourage all m embers of the team to participate, ensure that everyone understands the m aterial, and organize the ta sks and divide them between the tea m members): Graphics Analyst (Generate the needed plots and figures.): _______________ Other Group Members: Consider the following sets of experimental data. In each case, determine: • Whether there appears to be any functional relationship at all among the data • If there is a relationship, determine which is the independent variable and which is the dependent variable. • Determine whether there is a linear re lationship between th e data, and why you might expect one. • If the relationship is not linear, determine whether there is a way to transfor m the data so that it is linear. 1. You have data from a meteorologist on te mperature for each month of the year, measured in one location using th e sam e therm ometer, on three d ifferent d ays each month. Data is in degrees Fahrenheit. Jan 23 Feb Mar Apr May June July Aug Sep Oct Nov Dec , 34,13 12, 45, 32 14, 32, 33 38, 50, 51 50, 62, 65 70, 75, 79 80, 82, 95 88, 83, 75 84, 75, 90 78, 65, 50 50, 45, 32 32, 30, 28 ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf Report Sheet 5.1 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 LabNotes Rev. Sp07 2. You have data on electrical resistance of an aluminum wire, at various temperatures: Temp (C) 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Resistance(ohms) 1.12E-07 1.16E-07 1.21E-07 1.25E-07 1.29E-07 1.34E-07 1.38E-07 1.43E-07 1.47E-07 1.51E-07 1.56E-07 1.60E-07 1.64E-07 1.69E-07 3. You have data on the im pact energy of a bu llet, m easured at various take-off velocities. Answer the questions above and determine the mass of the bullet. Velocity (m/s) Energy (J) 50 1.25E+02 55 1.51E+02 60 1.80E+02 65 2.11E+02 70 2.45E+02 75 2.81E+02 80 3.20E+02 85 3.61E+02 90 4.05E+02 95 4.51E+02 100 5.00E+02 ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf Lab Report Sheet 5.1 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 LabNotes Rev. Sp07 LAB REPORT SHEET 5.2 Performing a Linear Regression Key Member (Encourage all m embers of the team to participate, ensure that everyon e understands the material, and organize the tasks and divide them between the team members): _ _ Calculations Expert (Work with your calculator to help your team solve the problems.) Graphics Analyst (Generate the needed plots and figures.): _______________ Other Group Members: Exercise 1 The following data was record ed in an experim ent which m easured the vari ation of th e specific heat of a chemical with temperature. In this temperature regime, it was expected that the specific heat (Cp) should depend linearly on the absolute tem perature, T. Two measurem ents were made at each temperature. Temperature (OC) Specific Heat (J/mol oC) 50 1.6 1.64 60 1.63 1.65 70 1.67 1.68 80 1.7 1.72 90 1.71 1.72 100 1.71 1.74 1. Note that the two m easurements m ade at the sam e t emperature can be considered independent measurements, so you have 12 pieces of data. 2. Plot the data on a scatter diagram and determine whether a linear relationship exists. 3. Fit a s traight lin e to the data by eye; f ind th e slope and inte rcept o f this line; w rite an equation for this line. 4. Estimate the specific heat of this chemical when the temperature is 75 oC. Exercise 2 1. Perform a linear regression analysis on the data of Exercise 1. Using the regression values for the slope and the intercept, add the fitted line to your plot from Exercise 1, using a different color pencil from the previous line fit to the data by eye. 2. Estimate the specific heat of this chemical when the temperature is 75 oC. 3. What is the percentage difference between the estimated values for 75 oC found in Exercise 1 and Exercise 2? Which value is more accurate? Why? ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf Report Sheet 5.2 Chabot Engineering - Courtesy of Prof. E. Allen • SJSU CoE MATE-153 ENGR45 Chapter 5: Linear Regression Ch5_LinearFit_RevS07_ENGR45.pdf LabNotes Rev. Sp07 Lab Report Sheet 5.2