CURVE FITTING ENGR 351 Numerical Methods for Engineers Southern Illinois University Carbondale College of Engineering Dr. L.R. Chevalier Copyright © 2000 by Lizette R. Chevalier Permission is granted to students at Southern Illinois University at Carbondale to make one copy of this material for use in the class ENGR 351, Numerical Methods for Engineers. No other permission is granted. All other rights are reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the copyright owner. Specific Growth Rate, m Applications Need to determine parameters for saturation-growth rate model to characterize microbial kinetics Food Available, S m m max S Ks S Applications T ( o C) 0 10 20 30 0 Epilimnion 5 z (m ) 10 Thermocline 15 20 25 30 Hypolimnion Applications Interpolation of data What is kinematic viscosity at 7.5º C? v, 10-2 cm 2/s 2 1.5 1 0.5 0 0 10 20 T(oC) 30 T (oC) 0 4 8 12 16 20 24 v, 10-2 (cm2/s) 1.7923 1.5615 1.3874 1.2396 1.1168 1.0105 0.9186 f(x) x We want to find the best “fit” of a curve through the data. Mathematical Background • The prerequisite mathematical background for interpolation is found in the material on the Taylor series expansion and finite divided differences • Simple statistics average standard deviation normal distribution Normal Distribution A histogram used to depict the distributions of an exam grade. x x 2 95% x 68% Material to be Covered in Curve Fitting • Linear Regression Polynomial Regression Multiple Regression General linear least squares Nonlinear regression • Interpolation Newton’s Polynomial Lagrange polynomial Coefficients of polynomials Specific Study Objectives • Understand the fundamental difference between regression and interpolation and realize why confusing the two could lead to serious problems • Understand the derivation of linear least squares regression and be able to assess the reliability of the fit using graphical and quantitative assessments. Specific Study Objectives • Know how to linearize data by transformation • Understand situations where polynomial, multiple and nonlinear regression are appropriate • Understand the general matrix formulation of linear least squares • Understand that there is one and only one polynomial of degree n or less that passes exactly through n+1 points Specific Study Objectives • Realize that more accurate results are obtained if data used for interpolation is centered around and close to the unknown point • Recognize the liabilities and risks associated with extrapolation • Understand why spline functions have utility for data with local areas of abrupt change Least Squares Regression • Simplest is fitting a straight line to a set of paired observations (x1,y1), (x2, y2).....(xn, yn) • The resulting mathematical expression is y = ao + a1x + e • We will consider the error introduced at each data point to develop a strategy for determining the “best fit” equations n n i 1 i 1 Sr e2i yi a o a1xi 2 f(x) yi a o a1xi x n n i 1 i 1 Sr e2i yi a o a1xi 2 To determine the values for ao and a1, differentiate with respect to each coefficient Sr 2 yi ao a1xi ao Sr 2 yi ao a1xi xi a1 Note: we have simplified the summation symbols. What mathematics technique will minimize Sr? Sr 2 yi ao a1xi ao Sr 2 yi ao a1xi xi a1 Setting the derivative equal to zero will minimizing Sr. If this is done, the equations can be expressed as: 0 yi ao a1xi 0 yi xi ao xi a1xi2 0 yi ao a1xi 0 yi xi ao xi a1xi2 Note: a o nao We have two simultaneous equations, with two unknowns, ao and a1. What are these equations? (hint: only place terms with ao and a1 on the LHS of the equations) What are the final equations for ao and a1? nao xi a1 yi xa x a x y i a1 2 i 1 o i i n xi yi xi y i n x xi ao y a1x 2 i 2 These first two equations are called the normal equations Error Recall: n n Sr e i yi a o a1xi 2 i 1 2 f(x) i 1 x The most common measure of the “spread” of a sample is the standard deviation about the mean: St yi y 2 St sy n1 Introduce a term to measure the standard error of the estimate: sy x Sr n2 Coefficient of determination r2: St Sr r St 2 r is the correlation coefficient St Sr r St 2 The following signifies that the line explains 100 percent of the variability of the data: Sr = 0 r = r2 = 1 If r = r2 = 0, then Sr = St and the fit is invalid. Consider the following four sets of data Data 1 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 Data 2 10 9.14 8 8.14 13 8.74 9 8.77 11 9.26 14 8.10 6 6.13 4 3.10 12 9.13 7 7.26 5 4.74 Data 3 10 7.46 8 6.77 13 12.74 9 7.11 11 7.81 14 8.84 6 6.08 4 5.39 12 8.15 7 6.42 5 5.73 Data 4 8 6.58 8 5.76 8 7.71 8 8.84 8 8.47 8 7.04 8 5.25 19 12.50 8 5.56 8 7.91 8 6.89 12 12 10 10 8 8 y 14 y 14 6 6 4 4 y = 0.5001x + 3.0001 R2 = 0.6665 2 y = 0.5x + 3.0009 2 0 R2 = 0.6662 0 0 5 10 15 0 5 x 10 15 x 14 14 12 12 10 10 8 y y 8 6 6 4 4 y = 0.4997x + 3.0025 2 2 y = 0.4999x + 3.0017 R2 = 0.6667 2 R = 0.6663 0 0 0 5 10 x 15 0 5 10 x 15 20 Linearization of non-linear relationships Some data is simply illsuited for linear least squares regression.... or so it appears. f(x) x P EXPONENTIAL EQUATIONS Linearize P Po e rt t ln P intercept = ln P0 slope = r why? t P P0e rt ln P ln P0e rt ln P0 lne rt ln P0 rt lnP intercept = ln Po Can you see the similarity with the equation for a line: y = b + mx where b is the y-intercept and m is the slope? slope = r t P P0e rt ln P ln P0e rt ln P0 lne rt ln P0 rt ln P0 intercept = ln P0 After taking the natural log of the y-data, perform linear regression. From this regression: The value of b will give us ln (P0). Hence, P0 = eb The value of m will give us r directly. slope = r t Q POWER EQUATIONS Q cH a (Flow over a weir) log Q H log H Here we linearize the equation by taking the log of H and Q data. What is the resulting intercept and slope? Q cH a log Q logcH a log c log H a log c a log H log Q slope = a log H intercept = log c Q cH a log Q logcH a So how do we get c and a from performing regression on the log H vs log Q data? From : y = mx + b log c log H a log c a log H log Q b = log c c = 10b slope = a log H intercept = log c m=a SATURATION-GROWTH RATE EQUATION m m m max S 1/m slope = Ks/mmax intercept = 1/mmax 1/ S S Ks S Here, m is the growth rate of a microbial population, mmax is the maximum growth rate, S is the substrate or food concentration, Ks is the substrate concentration at a value of m = mmax/2 General Comments of Linear Regression • You should be cognizant of the fact that there are theoretical aspects of regression that are of practical importance but are beyond the scope of this book • Statistical assumptions are inherent in the linear least squares procedure General Comments of Linear Regression • x has a fixed value; it is not random and is measured without error • The y values are independent random variable and all have the same variance • The y values for a given x must be normally distributed General Comments of Linear Regression y-direction • The regression of y versus x is not the same as x versus y • The error of y versus x is not the same as x versus y f(x) x-direction x Polynomial Regression • One of the reasons you were presented with the theory behind linear regression was to allow you the insight behind similar procedures for higher order polynomials • y = a0 + a 1x • mth - degree polynomial y = a0 + a1x + a2x2 +....amxm + e Based on the sum of the squares of the residuals Sr yi ao a1xi a x ...... a x 2 2 i m 2 m i 1. Take the derivative of the above equation with respect to each of the unknown coefficients: i.e. the partial with respect to a2 Sr 2 2 xi yi ao a1xi a2 xi2 ..... amxim a2 2. These equations are set to zero to minimize Sr., i.e. minimize the error. 3. Set all unknowns values on the LHS of the equation. Again, using the partial of Sr. wrt a2 ao xi a1 xi3 a2 xi4 ..... am xi 2 m 2 xi yi 2 4. This set of normal equations result in m+1 simultaneous equations which can be solved using matrix methods to determine a0, a1, a2......am Multiple Linear Regression • A useful extension of linear regression is the case where y is a linear function of two or more variables y = ao + a1x1 + a2x2 • We follow the same procedure y = ao + a1x1 + a2x2 + e Multiple Linear Regression For two variables, we would solve a 3 x 3 matrix in the following form: n x1i x2 i x x x x 1i 2 1i 1i 2 i x x x x a 0 yi 1i 2 i a1 x1i yi 2 x y 2 i a 2 2 i i 2i [A] and {c}are clearly based on data given for x1, x2 and y to solve for the unknowns in {x}. Interpolation • General formula for an n-th order polynomial y = a0 + a1x + a2x2 +....amxm • For m+1 data points, there is one, and only one polynomial of order m or less that passes through all points • Example: y = a0 + a1x – fits between 2 points – 1st order Interpolation • We will explore two mathematical methods well suited for computer implementation • Newton’s Divided Difference Interpolating Polynomials • Lagrange Interpolating Polynomial Newton’s Divided Difference Interpolating Polynomials • • • • Linear Interpolation Quadratic Interpolation General Form Errors Linear Interpolation 3 Temperature, C 0 Density, kg/m 5 1000.0 10 999.7 15 999.1 20 998.2 999.9 How would you approach estimating the density at 17 C? Temperature, C Density, kg/m 0 999.9 5 1000.0 10 999.7 15 999.1 20 998.2 3 ??? 999.1 > r > 998.2 998.2 999.1 r T 15 20 998.2 999.1 r T 15 998 .2 999 .1 r 20 15 T 20 Assume a straight line between the known data. Then calculate the slope. 998.2 999.1 r 15 r 999.1 r 17 15 T T 17 20 Assuming this linear relationship is constant, the slope is the same between the unknown point and a known point. 998.2 999.1 r 15 T 17 20 998.2 999.1 r 999.1 r 20 15 17 15 T Therefore, the slope of one interval will equal the slope of the other interval. Solve for r 998.2 999 .1 998.2 r 20 15 20 17 f x1 f x0 f1 x f xo x x0 x1 x0 Note: The notation f1(x) designates that this is a first order interpolating polynomial true solution f(x) 1 2 smaller intervals provide a better estimate x true solution f(x) 1 2 Alternative approach would be to include a third point and estimate f(x) from a 2nd order polynomial. x true solution f(x) Alternative approach would be to include a third point and estimate f(x) from a 2nd order polynomial. x Quadratic Interpolation f 2 x b0 b1 x x0 b2 x x0 x x1 Prove that this a 2nd order polynomial of the form: f x a0 a1x a2 x 2 First, multiply the terms f 2 x b0 b1 x x0 b2 x x0 x x1 f 2 x b0 b1x b1x0 b2 x 2 b2 x0 x1 b2 xx0 b2 xx1 Collect terms and recognize that: a0 b0 b1 x0 b2 x0 x1 a1 b1 b2 x0 b2 x1 a2 b2 f x a0 a1x a2 x 2 x2, f(x2) x, f(x) Procedure for Quadratic Interpolation f(x) x1, f(x1) x0, f(x0) b0 f x0 x f x1 f x0 b1 x1 x0 f x2 f x1 f x1 f x0 x2 x1 x1 x0 b2 x2 x0 Procedure for Quadratic Interpolation b0 f x0 f x1 f x0 b1 x1 x0 f x2 f x1 f x1 f x0 x2 x1 x1 x0 b2 x2 x0 Procedure for Quadratic Interpolation b0 f x0 f x1 f x0 b1 x1 x0 f x2 f x1 f x1 f x0 x2 x1 x1 x0 b2 x2 x0 f 2 x b0 b1 x x0 b2 x x0 x x1 Example Include 10 degrees in your calculation of the density at 17 degrees. 1000 Density 999.5 999 998.5 998 0 5 10 Tem p 15 20 Temperature, C Density, kg/m 0 999.9 5 1000.0 10 999.7 15 999.1 20 998.2 3 General Form of Newton’s Interpolating Polynomials for the nth-order polynomial f n x b0 b1 x x0 ....bn x x0 x x1 x xn 1 To establish a methodical approach to a solution define the first finite divided difference as: f xi , x j f xi f x j xi x j f xi , x j f xi f x j xi x j if we let i=1 and j=0 then this is b1 f x1 f x0 b1 x1 x0 Similarly, we can define the second finite divided difference, which expresses both b2 and the difference of the first two divided difference Similarly, we can define the second finite divided difference, which expresses both b2 and the difference of the first two divided difference f x2 f x1 f x1 f x0 x2 x1 x1 x0 b2 x2 x0 f xi , x j , xk f xi , x j f x j , xk xi xk Following the same scheme, the third divided difference is the difference of two second finite divided difference. This leads to a scheme that can easily lead to the use of spreadsheets i xi f(xi) first second third 0 x0 f(x0) f[x1,x0] f[x2,x1,x0] f[x3,x2,x1,x0] 1 x1 f(x1) f[x2,x1] f[x3,x2,x0] 2 x2 f(x2) f[x2,x3] 3 x3 f(x3) f n x b0 b1 x x0 ....bn x x0 x x1 x xn 1 These difference can be used to evaluate the b-coefficients. The result is the following interpolation polynomial called the Newton’s Divided Difference Interpolating Polynomial f n x f x0 f x1 , x0 x x0 .... f xn , xn 1 , , x0 x x0 x x1 x xn 1 To determine the error we need an extra point. The error would follow a relationship analogous to the error in the Taylor Series. Lagrange Interpolating Polynomial n f n x Li x f xi i0 n Li x j0 ji x xj xi x j where P designates the “product of” The linear version of this expression is at n=1 Linear version: n=1 n f n x Li x f xi i 0 n Li x j 0 j i x xj xi x j x x1 x x0 f1 f x0 f x1 x0 x1 x1 x0 Your text shows you how to do n=2 (second order). What would third order be? n f n x Li x f xi i 0 n Li x j 0 j i x xj xi x j x x1 x x2 x x3 f3 f x0 x0 x1 x0 x2 x0 x3 ....... n f n x Li x f xi i 0 n Li x j 0 j i x xj xi x j x x1 x x2 x x3 f3 f x0 x0 x1 x0 x2 x0 x3 x x0 x x2 x x3 f x1 x1 x0 x1 x2 x1 x3 ....... Note: x1 is not being subtracted from the constant term x or xi = x1 in the numerator or the denominator j= 1 n f n x Li x f xi i 0 n Li x j 0 j i x xj xi x j x x1 x x2 x x3 f3 f x0 x0 x1 x0 x2 x0 x3 x x0 x x2 x x3 f x1 x1 x0 x1 x2 x1 x3 x x0 x x1 x x3 f x2 x2 x0 x2 x1 x2 x3 ...... Note: x2 is not being subtracted from the constant term x or xi = x2 in the numerator or the denominator j= 2 n f n x Li x f xi i 0 n Li x j 0 j i x xj xi x j x x1 x x2 x x3 f3 f x0 x0 x1 x0 x2 x0 x3 x x0 x x2 x x3 f x1 x1 x0 x1 x2 x1 x3 x x0 x x1 x x3 f x2 x2 x0 x2 x1 x2 x3 x x0 x x1 x x2 f x3 x3 x0 x3 x1 x3 x2 Note: x3 is not being subtracted from the constant term x or xi = x3 in the numerator or the denominator j= 3 Example Determine the density at 17 degrees. 1000 Density 999.5 999 998.5 998 0 5 10 Tem p 15 20 Temperature, C Density, kg/m 0 999.9 5 1000.0 10 999.7 15 999.1 20 998.2 3 f 2 17 119.964 839.244 279.496 998.776 f 2 17 998.776 f 1 17 998.74 Using Newton’s Interpolating Polynomial In fact, you can derive Lagrange directly from Newton’s Interpolating Polynomial Coefficients of an Interpolating Polynomial f n x b0 b1 x x0 ....bn x x0 x x1 x xn 1 y = a0 + a1x + a2x2 +....amxm HOW CAN WE BE MORE STRAIGHT FORWARD IN GETTING VALUES? f x0 a 0 a1x0 a 2 x02 f x1 a 0 a1x1 a 2 x12 f x2 a 0 a1x2 a 2 x32 This is a 2nd order polynomial. We need three data points. Plug the value of xi and f(xi) directly into equations. This gives three simultaneous equations to solve for a0 , a1 , and a2 Example Determine the density at 17 degrees. 1000 Density 999.5 999 998.5 998 0 5 10 Tem p 15 20 Temperature, C Density, kg/m 0 999.9 5 1000.0 10 999.7 15 999.1 20 998.2 3 Spline Interpolation • Our previous approach was to derive an nth order polynomial for n+1 data points. • An alternative approach is to apply lowerorder polynomials to subset of data points • Such connecting polynomials are called spline functions • Adaptation of drafting techniques Spline interpolation is an adaptation of the drafting technique of using a spline to draw smooth curves through a series of points Linear Splines f x f x0 m0 x x0 f x f x1 m1 x x1 x0 x x1 x1 x x2 f x f xn 1 mn 1 x xn 1 where f xi 1 f xi mi xi 1 xi xn 1 x xn Quadratic Spline a1 x 2 b1 x c1 a2 x 2 b2 x c2 a4 x 2 b4 x c4 a3 x 2 b3 x c3 Example A well pumping at 250 gallons per minute has observation wells located at 15, 42, 128, 317 and 433 ft away along a straight line from the well. After three hours of pumping, the following drawdowns in the five wells were observed: 14.6, 10.7, 4.8 1.7 and 0.3 ft respectively. Derive equations of each quadratic spline. 16 14 Drawdown 12 10 8 6 4 2 0 0 100 200 300 Distance from we ll 400 500 Splines • To ensure that the mth derivatives are continuous at the “knots”, a spline of at least m+1 order must be used • 3rd order polynomials or cubic splines that ensure continuous first and second derivatives are most frequently used in practice • Although third and higher derivatives may be discontinuous when using cubic splines, they usually cannot be detected visually and consequently are ignored. Splines • The derivation of cubic splines is somewhat involved • First illustrate the concepts of spline interpolation using second order polynomials. • These “quadratic splines” have continuous first derivatives at the “knots” • Note: This does not ensure equal second derivatives at the “knots” Quadratic Spline 1.The function must be equal at the interior knots. This condition can be represented as: ai 1xi21 bi 1xi 1 ci 1 f xi 1 note: we are referencing the same x and f(x) ai xi21 bi xi 1 ci f xi 1 ai 1xi2 1 bi 1xi 1 ci 1 f xi 1 ai xi2 1 bi xi 1 ci f xi 1 This occurs between i = 2, n Using the interior knots (n-1) this will provide 2n -2 equations. 2. The first and last functions must pass through the end points. This will add two more equations. a1x02 b1x0 c1 f x0 a n xn2 bn xn cn f xn We now have 2n - 2 +2 = 2n equations. How many do we need? 3. The first derivative at the interior knots must be equal. This provides another n-1 equations for 2n + n-1 =3n -1. We need 3n 2ai 1xi 1 bi 1 2ai xi 1 b 4. Unless we have some additional information regarding the functions or their derivatives, we must make an arbitrary choice in order to successfully compute the constants. 5. Assume the second derivative is zero at the first point. The visual interpretation of this condition is that the first two points will be connected by a straight line. a1 = 0 Cubic Splines • Third order polynomial • Need n+1 = 3+1 = 4 intervals • Consequently there are 4n unknown constants to evaluate • What are these equations? f i x ai x 3 bi x 2 ci x di Cubic Splines • The function values must be equal at the interior knots (2n -2) • The first and last functions must pass through the end points (2) • The first derivatives at the interior knots must be equal (n-1) • The second derivatives at the interior knots must be equal (n-1) • The second derivative at the end knots are zero (2) Cubic Splines • The function values must be equal at the interior knots (2n -2) • The first and last functions must pass through the end points (2) • The first derivatives at the interior knots must be equal (n-1) • The second derivatives at the interior knots must be equal (n-1) • The second derivative at the end knots are zero (2) SPECIAL NOTE On the surface it may appear that a third order approximation using splines would be inferior to higher order polynomials. Consider a situation where a spline may perform better: A generally smooth function undergoes an abrupt change in a region of interest. The abrupt change induces oscillations in interpolating polynomials. In contrast, the cubic spline provides a much more acceptable approximation Previous Exam Question Given the following data, develop the simultaneous equations for a quadratic spline. Express your final answers in matrix form. 7.00 f(x) 0.50 4.60 1.50 3.00 6.00 5.00 4.00 f(x) x 1 4 6 7 3.00 2.00 1.00 0.00 0 1 2 3 4 x 5 6 7 8 (4, 4.6) 7.00 Interior knots: 16a1 + 4b1 + c1 = 4.6 16a2 + 4b2 + c2 = 4.6 6.00 5.00 f(x) 4.00 3.00 2.00 1.00 0.00 0 1 2 3 4 5 6 7 8 36a2 + 6b2 + c2 = 1.5 36a3 + 6b3 + c3 = 1.5 x x 1 4 6 7 f(x) 0.50 4.60 1.50 3.00 End conditions a1 + b1 + c1 = 0.5 49a3 + 7b3 + c3 = 3.0 First derivative cot. at interior knots 8a1 + b1 = 8a2 + b2 12a2 + b2 = 12a3 + b3 Extra equation a1 =0 Interior knots: 16a1 + 4b1 + c1 = 4.6 36a2 + 6b2 + c2 = 1.5 4 0 0 0 1 0 1 0 1 End conditions a1 + b1 + c1 = 0.5 49a3 + 7b3 + c3 = 3.0 First derivative cont. at interior knots 8a1 + b1 = 8a2 + b2 12a2 + b2 = 12a3 + b3 Extra equation a1 =0 16a2 + 4b2 + c2 = 4.6 36a3 + 6b3 + c3 = 1.5 0 0 0 0 16 4 1 0 36 0 0 6 0 1 0 1 0 0 0 0 0 0 0 0 8 1 0 0 12 1 0 0 b1 4.6 0 0 0 c1 4.6 0 0 0 a2 15 . 36 6 1 b2 15 . 0 0 0 c2 0.5 49 7 1 a3 3.0 0 0 0 b3 0 12 1 0 c3 0 0 0