Least-Squares Curve Fits Curve fitting plays a large part in experimental analysis, either in the interpretation of the data or in its empirical representation. There are a large number of ways to fit curves of various sorts to experimental data. Here we wish to concentrate upon the most widely-used technique: least squares curve fits. By far the most common of the least-squares fitting techniques is the linear least-squares fit, or the linear regression, as it is sometimes called. To introduce this technique, suppose that we vary one experimental quantity x and measure the resulting variation in another quantity y. The quantity x could be, for example, the value of a mass which we attach to a spring, and y could be the resulting period of oscillation of the spring-mass system. Or x could be the amount of current we supply to an electrical resistance coil within an insulated box, and y the resulting temperature within the box. In any case, suppose that we have some reason to believe that there exists (at least approximately) a linear relationship between x and y, of the form y = ax + b, where a and b are constants to be determined. Let us y suppose moreover that, for n values of x, x1 , x2 , , xn , we have measured n corresponding values of y, y1 , y2 , , yn . Then one very simple, and very frequently used, means of determining a and b is just plot the values (xi, yi) on a sheet of graph paper and, by eye, draw a straight line which seems to fit the data well. The quantities a and b can then be extracted as the slope and the x-intercept of this straight line. The figure at x right shows this procedure. Although this method can often produce quite reasonable fits, it is possible to do considerably better without too much extra effort. Let yi axi b . Note that the quantity yi is not the same as the observed, or measured, value yi, but rather the value we would measure if the linear relation we have postulated were completely valid (which it is not; rather it is only approximately valid). What we want to do is to determine the two constants a and b by requiring the yi values to be as close as possible, in some sense, to the observed values yi. A very common method of doing this is to seek to minimize the sum of the squares of the differences between yi and yi. In other words, we want to minimize the quantity D yi yi n 2 i 1 Now we have that yi axi b , so substitution into the above equation gives D ax i b y i n 2 i 1 We now want to pick values of a and b so as to minimize D. Hence we will require that D D 0 a b Thus n D 2 xi axi b yi 0 a i 1 n n 2 n xi a xi b xi yi i 1 i 1 i 1 (1) n D 2 axi b yi 0 b i 1 n n x a nb yi i i 1 i 1 ( 2) Equations (1) and (2) above give us two linear equations for a and b whose solution gives us the best linear fit through the data, at least in a least squares sense. It should be noted that most modern scientific calculators include, as an intrinsic function, the ability to calculate linear least squares fit. This procedure can be readily extended to higher-order polynomials. For example, suppose that we want to fit a quadratic of the form y ax 2 bx c to n pairs of data (x1, y1), (x2, y2), ......., (xn, yn). We again form the quantity D as before and minimize it with respect to a, b, and c. This yields the following three linear equations for a, b, and c: n n 4 n n x i a x i3 b x i2 c x i2 y i i 1 i 1 i 1 i 1 n n 3 n 2 n xi a xi b xi c xi yi i 1 i 1 i 1 i 1 n n 2 n x i a x i b nc y i i 1 i 1 i 1 In general, an n-th order polynomial fit will yield n+1 linear equations for the constant in the polynomial. We can use basically the same procedure with certain other types of curve fits. For example, suppose that we wish to fit a power-law relation of the form y cx a , where a and c are constants to be determined, to the experimental data. Taking logs of both sides of this equation, we have ln y a ln x ln c If we now let b ln c , we have ln y a ln x b Notice now that this is exactly the same as our previous linear fit, except now the equation involves ln y instead of y and ln x instead of x. Hence the linear least-squares fitting procedure goes through exactly as before, if we only substitute log quantities instead of the actual quantities themselves. A similar trick works for curve fits of the form y ce ax Again taking logs of both sides, we have ln y ax ln c Setting b ln c , we have ln y ax b, so that once again we can use a linear, least-squares fit between ln y and b. It is not always obvious which type of curve will best fit the data. One useful procedure is simply to plot the data graphically on various types of graph paper: linear, semi-log, and loglog. Obviously, if the data fall fairly close to a straight line on linear paper, then a linear fit of the form y ax b is a good choice. On the other hand, if the data fall close to a straight line on semi-log paper, then an exponential fit of the form y ce ax is indicated. Finally, if the data are nearly linear on log-log paper, then a power-law fit y ce ax may be used. Having chosen a particular functional form for the curve fit, we may now ask how good the resulting fit really is. A quantitative answer to this question is given by the correlation coefficient. Let ym 1 n yi n i 1 be the mean of the experimentally observed y values, and likewise let n y y i 1 i ym 2 n 1 be the (sample) standard deviation. Suppose again that we are fitting a linear relation of the form y ax b to the experimental data, and let yi axi b . Then we define n y,x y i 1 i yi 2 n2 and note that, for a perfect fit, x,y = 0. We now define the correlation coefficient r of the fit as r 1 2x , y 2y Note that, for a perfect fit (x,y = 0), r = 1. In general the closer r is to 1, the better the fit. Most modern scientific calculators, in carrying out linear, power-law, or exponential curve fits, will also provide the value of the correlation coefficient. The Experiment We want to derive an experimental relation between the period of oscillation of a spring-mass system (the length of time that the system takes to make one complete oscillation) and the mass of the system. We will be using the simple system sketched at right, using a spring of fixed spring constant k, but varying the mass m carried by the spring. k m Using the masses available in class, load the system by placing one or more masses on the spring platform and set the system into motion. Using a stopwatch, measure the period of oscillation T by measuring the time required to complete a fixed number of oscillations (the more, the better), and then dividing this time by the number of period of oscillations. It is recommended that this measurement be done simultaneously by as many team members as possible, with the recorded value being the average of the measurements. Do this for five different mass values and record the values in the table on the lab sheet (note that we are not really recording mass, but rather weight in pounds; however, the two differ only by a constant factor g). We will investigate three possible data fits: linear (T = aW + b), power-law (T = aWb), and exponential (T = aebW). To decide which of these is the most appropriate, we will carry out fits for all three possible functional forms using a scientific calculator and compare the correlation coefficients. The fit yielding a correlation coefficient closest to one is the best fit. Report on the data sheet the functional form yielding the best fit and the associated values of a and b. Then plot the best fit and the data points on the linear graph paper on the back of the data sheet. Because we are using linear paper, if you decide that a power-law fit is best, you should plot lnT vs. lnW. On the other hand, if you decide that an exponential fit is best, you should plot lnT vs. W.