REGRESSION OF NON-LINEAR FUNCTIONS: Not all analytical methods yield straight-line plots. Regression analysis can and should still be performed on curvilinear functions as well as straight-line functions. There are basically two ways to do this. 1. Mathematically convert the data such that it plots as a straight line and perform linear regression as we have learned. For example plot [x2 vs. y], [x vs. y] [log x vs. y], [log x vs. log y], etc. instead of simply plotting [x vs. y]. Often a curved plot can be rendered linear after one of these operations. 2. Excel (and most scientific calculators) can perform non-linear regression directly on non-linear data. This route is usually easier. LINEAR REGRESSION OF NON-LINEAR DATA (after converting the data): We will consider one example here. A very common example is a multipoint calibration with specific ion electrodes. These instruments can accurately measure concentrations of analytes over wide concentration ranges. For example, a single calibration curve can range from 1ppm analyte to 10,000ppm (a factor of 105)! 1 for anions & log10 conc. millivolts for cations millivolts It is seen that log10 conc. According to the Nernst equation, electrodes for univalent ions, e.g., F-, Na+, etc. display (at 25ºC) 59.2 mV change for a decade (10-fold) change in analyte concentration. Divalent ion electrodes (e.g., Ca+2, SO4-2) display half this response (29.6 mV per decade). Traditionally a linear graph can be obtained by plotting concentration on the logarithmic axis of semi-log graph paper and mV on the linear axis. The plot is linear over most of the range cited above. We can linearize the data for use in linear regression by using the log10 conc. rather than just conc. Then linear regression can be used to solve for concentrations of unknown samples whose mV readings are measured. In industrial laboratories both graph paper plots and regression calculators and computer programs are used. Graphs are used at regular intervals and kept to demonstrate proof of calibration. For everyday analysis, regression analysis is commonly used. Exercise: Using an NH3 gas-sensing electrode, the following calibration data was obtained: NH3 as ppm N log10 N reading (mV) 0.1 + 95 1.0 + 38 10.0 - 20 100 - 78 Complete the middle column and enter this data on an Excel spreadsheet. Note that the independent variable (x) is log10 (ppm N) and the dependent variable (y) is mV. Use the Trend function to determine the conc. (ppm N) of samples. Note you must enter the log conc. (log x) not conc. (x) values or results will be incorrect. The Trend function returns predicted log conc. values. To get sample concentrations, you must take the antilog of these values, i.e., 10log x. Be sure to include at least 2 measured sets of (log x, y) data and treat it as if it were sample data. The estimated conc. values determined from these must be the same or very close to the original data. If they are not very close, check your work for an error. CH 503 NONLINEAR REGRESSION PAGE 1 NH3 Calib. Data meas. x log x ppm N log ppm N 0.1 -1 1 0 10 1 100 2 1000 3 LINEST RESULTS parameters std. dev's R2, SE(y) F, df ss(reg), ss(resid) m -57.60 0.115 0.999988 248832 33178 meas. y mV 95 38 -20 -78 -135 b 37.6 0.20 0.37 3 0.40 Estimates via 'Trend' sple y sple log x sple mV sple log N sple ppm N 90 -0.91 0.12 38 -0.007 0.98 0 0.65 4.50 -78 2.01 102 -135 3.00 992 Correct conc. values after taking anitlog of sple log x values via 'Trend' sple x calc. conc -366 -154 0 317 549 Incorrect conc. values when Trend function is applied to non-linear functions. Now fill in the following table from your spreadsheet. Assume these are sample data. mV log10 (ppm N) ppm N + 80 0 - 25 - 70 Apply LINEST to obtain regression results. Again note that LINEST must be applied to linear data, i.e., (log x, y), not (x, y) data. Compare it to the spreadsheet above. Obtain regression data via Data Analysis. Data analysis is a linear regression tool. Before applying this tool, calculate the log10 of conc. values and select these values as x-values in the Data Analysis dialog box. Eliminate the Analysis of Variance data. Compare the results to those obtained by LINEST. CH 503 NONLINEAR REGRESSION PAGE 2 PLOT THE GRAPH OF LOG Conc. vs. mV: Select the log conc. and mV data (log x and y data) for NH3 calibration and use the Chart wizard to plot a graph. Select XY(Scatter) without any connecting lines. Use the add Trendline tool to obtain a Linear type and include the R2 and ‘display equation on chart’ options. Set the Trendline name to custom and label it Linear Reg. Line. NH3 Calib. Curve mV Data 150 100 50 0 -50 -100 -150 Linear Reg. Line y = -57.60000x + 37.60000 R2 = 0.99999 -2 -1 0 1 2 3 4 log conc. NH3 (ppm N) To move the X-axis scale from the middle of the graph to lie below the graph, click to select the Y-axis. Note: After clicking once you must see the selection handles of the Y-axis appear and the pop-up label appear saying ‘Value (Y) axis’. Then choose Format, Selected axis. On the Format Axis dialog box, under the Scale tab, change ‘Value (X) axis crosses at:’ to the same value listed as ‘Minimum’, i.e., -150, then OK. To move the Y-axis scale from the middle of the graph to the left of the graph, click to select the X-axis. Note: After clicking once you must see the selection handles of the X-axis appear and the pop-up label appear saying ‘Value (X) axis’. Then choose Format, Selected axis. On the Format Axis dialog box, under the Scale tab, change ‘Value (Y) axis crosses at:’ to the same value listed as ‘Minimum’, i.e., -2, then OK. CH 503 NONLINEAR REGRESSION PAGE 3 PLOT A SEMI-LOG CURVE, ADD A NON-LINEAR REGRESSION TRENDLINE: You have just worked through the process of converting conc. values to log conc. for regression analysis and ultimately you converted the estimated log conc. of samples back to conc. This was necessary because we chose to use linear regression. Excel also performs non-linear regression. The various types available in Excel are shown in the Add Trendline dialog box, i.e., Linear, Logarithmic, Polynomial, Power, Exponential, and Moving Average. LOGARITHMIC REGRESSION: From the Nernst equation, we know that ion selective electrodes will exhibit a semi-log mathematical relationship between concentration and mV. See the graph below. Select the conc. and mV data (x and y data) for the NH3 calibration and use the Chart wizard to plot a graph. Select XY(Scatter) without any connecting lines. Use the add Trendline feature selecting Logarithmic type and the including the R2 and ‘display equation on chart’ options. Set the Trendline name to custom and label it Log Reg. Line. Note that Excel gives the equation of the line as y = cLn(x) + b where b is the y-intercept and c is the coefficient of Ln(x). The equation can be converted from ln to log. Recall that (2.3026log x = ln x). NH3 Calib. Data Log Reg. Line 150 100 y = -25.01536Ln(x) + 37.60000 R2 = 0.99999 mV 50 0 -50 -100 -150 0 500 1000 1500 conc. (ppm N) Caution: Log (or Ln) can only be applied to numbers greater than 0. It is impossible to take the log of zero or negative numbers. Recall that the log of a number is the exponent of base 10 that gives that number. Similarly, the ln of a number is the exponent of base e that gives that number. For example 103 = 1000. The exponent (the log) of 10 that gives 1000 is 3, i.e., log10 1000 = 3 Similarly, e2 = 7.389. The exponent (the ln) of e that gives 7.389 is 2, i.e., lne 7.389 = 2. Recall that e is a constant of value = 2.7183 (approx.). Any base (10, e, etc.) raised to any exponent (+ or -) cannot yield 0 or a negative number. When your x-axis data contains a zero or negative value, the logarithmic add Trendline option is not available. Zero or negative values can only be y-values for logarithmic regression in Excel. In the case of ISE analysis, plot mV values on the y-axis and do not enter a concentration value of zero. CH 503 NONLINEAR REGRESSION PAGE 4 LOGARITHMIC REGRESSION AND A SEMI-LOG PLOT: We are accustomed to plotting semi-log relationships on semi-log paper to produce straight-line graphs. This is easily done in Excel. 1. 2. After creating the graph shown above, drag a copy of the graph and reformat the X-axis as follows. Click to select the X-axis. Note: After clicking once you must see the selection handles of the X-axis appear and the pop-up label appear saying ‘Value (X) axis’. Choose Format, Selected Axis. On the Format Axis dialog box, under the Scale tab, select Logarithmic scale, OK. NH3 Calib. Data Log Reg. Line 150 y = -25.01536Ln(x) + 37.60000 R2 = 0.99999 100 mV 50 0 -50 -100 -150 0.1 1 10 100 1000 conc. (ppm N) Note that the regression equation is the same in the last two plots. By changing the axis from linear to logarithmic, we have only changed the appearance of the graph. Exercise: Plot the following calibration data for Na+ conc. analysed by ISE on an XY(Scatter) plot. Plot two graphs, one with a linear X-axis and one with a log scale for the X-axis. You can drag and drop (CTRL+ drag) to copy a chart. Na+ Calibration Data meas. x ppm Na+ log ppm Na+ 0.1 1 10 100 1000 meas. y mV 20 77 134 191 248 Perform non-linear regression by adding a logarithmic Trendline and display R2 and the regression equation. Your chart should look like the one below. Format all its components to look the same as this example. CH 503 NONLINEAR REGRESSION PAGE 5 LOGARITHMIC REGRESSION PLOTS FOR Na+ Na+ Calib. Curve Data Log Reg. Line mV 300 200 y = 24.75479Ln(x) + 77.00000 R2 = 1.00000 100 0 0 500 1000 1500 conc. (ppm Na+) Logarithmic Regression with a linear X-axis scale. Na+ Calib. Curve mV Data Log Reg. Line y = 24.75479Ln(x) + 77.00000 2 R = 1.00000 300 250 200 150 100 50 0 0.1 1 10 100 1000 + conc. (ppm Na ) Logarithmic Regression with a logarithmic X-axis scale. CH 503 NONLINEAR REGRESSION PAGE 6 Obtain regression analysis via Data Analysis for the Na+ ion calibration data. Data analysis is a linear regression tool only. Before applying this tool, calculate the log10 of conc. values and select these values as x-values in the Data Analysis dialog box. Na+ Calibration Data meas. x ppm Na+ log ppm Na+ 0.1 -1 1 0 10 1 100 2 1000 3 meas. y mV 20 77 134 191 248 SUMMARY OUTPUT Regression Statistics Multiple R 1.00000 R Square 1.00000 Adjusted R Square 1.00000 Standard Error 0.00000 Observations 5.00000 Coefficients Standard Error 77 0 Intercept log ppm Na+ 57 t Stat 65535 0 65535 0 0 0 0 Standard Residuals 65535 65535 65535 65535 0 65535 RESIDUAL OUTPUT Observation 1 2 3 4 Predicted mV 20 77 134 191 5 248 Residuals The analysis of variance (ANOVA) cells have been deleted. Note the R2 is exactly 1.0 and the residuals are all zero, indicating that the data is a perfect fit. (The data is not real; it was simply calculated). The equation of the line is y = 57x + 77, but recall that x is really log 10conc., so the true eqn. is: mV = 57log10conc. + 77 The residuals plot (not shown here) shows all data points have zero deviation from the predicted trend line. CH 503 NONLINEAR REGRESSION PAGE 7 PREDICTING CONC. VALUES WITH THE GROWTH FUNCTION: The ‘GROWTH’ function can be used to predict sample conc. (x) values from logarithmic data similar to the Trend function with linear data. It is actually designed only for exponential data but also works for log data provided you enter the data exactly as shown below (opposite to the prompts in the dialog box). In the NH3 spreadsheet, use the GROWTH function to estimate ppm N concentrations (x-estimates). We will do this with the sample data used to predict sample concentration via TREND function but now we will do it directly on (x,y) values, i.e., (conc., mV) values rather than (log x, y) values, i.e., (log conc., mV). 1. Select an array of cells where the estimated conc. values will appear. Click in the formula bar, then click Insert, Function, Statistical, Growth, OK. 2. In the Growth dialog box, enter the cell addresses in the following order: known_x’s, known_y’s, measured (sple)_y’s, const = 1, CTRL+SHIFT + ENTER. 3. This should give the same sample conc. values as obtained using the TREND function. Caution: The GROWTH function will give incorrect predictions of mV (y-axis) values. If an estimated y-value (mV) is desired, it can be readily calculated from the regression equation. Exercise: Use the growth function to calculate Na+ ion sample concentration values using the ‘sample data’ given previously. Your results should be the same as shown below. Calibration Data Sample Data ppm Na+ sple mV sple ppm Na+ sple mV 0.1 20 0.122 25 1 77 1.00 77 10 134 19.1 150 100 191 144 200 1000 248 1000 248 The GROWTH function only works for log (using reversed syntax, i.e., x,y,y,1x) and for exponential functions (using syntax as prompted, i.e., y,x,x y). Thus there will be occasions where you must use your basic algebra and rearrange an equation to solve for an unknown. PREDICTING VALUES BY REARRANGING AND SOLVING EQUATIONS: Consider the NH3 calibration equation, y = -25.02LN(x) + 37.60 It is a simple matter to solve this equation for values of y given values of x, since y is the isolated term. For example, calculate y when x = 1. Do this in your head, confirm it on your calculator, and finally do it on an Excel spreadsheet (using a cell reference in place of x). (Ans. = 37.60) In order to solve this equation for values of x (given values of y), we must rearrange the equation to isolate x. The rearranged equation is: LN(x) = (y-37.6)/-25.02 which further rearranges to x = e(y-37.6)/-25.02. The rearranged equation can be copied into a cell and Excel will calculate values of x with the x-values you give it. For example, calculate x when y = 37.60. Do this in your head, confirm it on your calculator and finally do it on an Excel spreadsheet (using a cell reference in place of y. (Ans. = 1) Exercise: Enter the appropriate equation for conc. Na+ on your spreadsheet and calculate sample conc. Na+ from the sample mV readings shown above. CH 503 NONLINEAR REGRESSION PAGE 8 PREDICTING VALUES USING THE ‘GOAL SEEK’ TOOL: As an alternative to rearranging an equation to solve for a non-isolated term, e.g., solving for x (given y) in the equation: y = -25.02LN(x) + 37.60, one can use ‘Goal Seek’ from the Tools menu. 1. In an Excel spreadsheet, select a cell, e.g., B1. Click in the formula bar and type in an ‘=’ sign followed by the formula, e.g., =-25.02*LN(A1)+37.60. Note that a cell reference (A1) is used in place of x. 2. In the referenced cell (A1), enter any value, e.g., 1. Note that the value in cell B1becomes 37.60, i.e., the solution of the formula (the value of y) using 1 as the x-value for A1 in the formula. 3. In the Tools menu, click Goal Seek. In the Goal Seek dialog box, the cell address of the current position of the mouse is displayed in the box labeled ‘Set cell’. Replace this address with the absolute cell reference of the formula, i.e., $B$1, in this case. 4. Click the mouse in the box labeled ‘To value’. Then enter a desired y-value, e.g., -20 mV, then click in the box labeled ‘By changing cell’ and enter the absolute cell reference for the current xvalue ($A$1) as by clicking in cell A1; then click OK. 5. The x-value that appears in cell A1 is your answer. This is the x-value that when entered in the equation will yield the target y-value. Cell B1shows the target y-value you entered. Note that you have solved the equation for an x-value without rearranging the equation to isolate x. Goal Seek dialog box entries: CH 503 NONLINEAR REGRESSION Results of Goal Seek function: PAGE 9 Goal Seek keeps changing the x-value (cell A1) until the equation yields the target y-value that you enter. Excel performs repeated calculations (‘iterations’) until the change in the result is less than a certain default value of the function. You must adjust the sensitivity to a very small (fine) setting to ensure accuracy of the calculation. This is adjusted in the Calculation tab of the Options dialog box found in the Tools menu (shown below). As a matter of course, set the Maximum change setting to 1E-15 rather than the default value of 0.001. Change this setting from 0.001 to 1E-15 Exercise: 1. Use the Goal Seek tool to calculate the Na+ ion sample concentrations using the previously given sample data. 2. Use the Rule of Crafts and Goal Seek to calculate the normal bp (@760 mmHg) of water given that a student measured a bp of 98.9°C at 730 mmHg. CH 503 NONLINEAR REGRESSION PAGE 10 CHOOSING THE BEST TYPE OF REGRESSION FOR UNKNOWN RELATIONSHIPS: It is important to examine a scatterplot as an aid in selecting the appropriate non-linear form. The figure below shows four ‘single-bulge’ non-linear patterns that might be observed as a scatterplot. Each pattern has a label indicating possible applicable regression types. Possible Best-Fit Regression Types for Non-Linear Data Power, Log, 2nd order polynomial Equations of Functions: Linear: Power, Log, 2nd order polynomial y = cLn(x) + b Exponential: y = cebx Power: y = cxb 2nd Order Polynomial: y = ax2 + bx + c 3 Order Polynomial: Power, Log, Exp, 2nd order polynomial y = mx + b Logarithmic: rd Power, Log, Exp, 2nd order polynomial 3 2 y = ax + bx + cx + d (a quadratic equation) (a cubic equation) Excel can solve data for polynomial fits up to 6 th order, however it is unlikely that you will ever need anything beyond a 3rd order polynomial. Note that a 2nd order polynomial, i.e., a quadratic such as ax2 + bx + c = 0, may work for all single-bulge patterns. If the scatterplot shows two bulges (an S shape), a cubic function (polynomial of order 3) may be appropriate, e.g., ax3 + bx2 + cx + d = 0. To determine the best fit: 1. Enter the data into a spreadsheet (x data in a column to the left of y data). 2. Plot an XY(Scatter) diagram without joining lines. 3. Select a data point on the chart, choose Chart, Add Trendline. From the Add Trendline dialog box choose a type and under the options tab, choose display R2 on chart and choose display equation on chart. 4. Repeat this process for all possible types and compare the R 2 values and observe how well the regression equation fits the data points. Note that it is not necessary to recreate the graph each time. The chart can be copied using a control-drag mouse action and the current Trendline can be changed by clicking on it and repeating the command sequence: Chart, Add Trendline, etc. CH 503 NONLINEAR REGRESSION PAGE 11 Exercise: Using the following data set, create a series of charts of the all types available in the Add Trendline dialog box including both 2nd and 3rd order polynomials. Identify the best-fit regression type. Unknown Relationship x y 1 4.95 3 25.0 5 36.5 8 164 10 445 11 734 CH 503 NONLINEAR REGRESSION PAGE 12 CALIBRATION CURVES FOR SPECTROPHOTOMETRY: Linear calibration curves can be handled easily using a linear Trendline. However calibration curves derived from atomic absorption spectroscopy, flame emission spectroscopy and turbidimetry often show some curvature (or are linear over very short ranges). Exercise: Graph the following x and y data obtained for a flame photometer calibration curve for Na. Add Trendlines to determine the best-fit equation. Try all available types. Compare your results with the graphs shown below. Use the Goal Seek tool to check each point’s fit to the regression equation. Na by Flame Photometer x x2 x3 ppm Na ppm^2 ppm^3 0 0 0 5 25 125 10 100 1000 15 225 3375 20 400 8000 25 625 15625 Check Cubic Fit/Goal Seek ppm Rdg y Rdg 0 0.062 0.115 0.16 0.2 0.233 0 0.062 0.115 0.16 0.2 0.233 Note that Log, Exponential and Power Trendlines are not available because the data includes the point (0,0). This is a valid data point. It was obtained by zeroing the spectrophotometer with a blank solution. The blank contained all the same reagents as the samples but no analyte was added. Explain why each of these regression types is not available. Flame Photometer Calib. Flame Photometer Calib. 2 y = -1.4071E-04x + 1.2798E-02x + 6.0714E-04 y = 0.009280x + 0.012333 2 R = 9.9993E-01 0.3 0.25 0.25 0.2 Absorbance Absorbance R2 = 0.987821 0.2 0.15 0.1 Series1 0.05 Linear (Series1) 0.15 0.1 0.05 Series1 Poly. (Series1) 0 0 0 10 20 conc. Na (ppm) CH 503 NONLINEAR REGRESSION 30 0 10 20 30 conc. Na (ppm) PAGE 13 Flame Photometer Calib. 3 Note that although a quadratic gives an acceptable fit, a cubic equation is better and should be used. Recall that ‘four nines’ is the minimum R2 value for accurate analysis. 2 y = 1.40740741E-06x - 1.93492063E-04x + 1.32798942E-02x + 7.93650793E-05 2 R = 9.99986475E-01 Absorbance 0.25 0.2 0.15 0.1 0.05 Series1 Poly. (Series1) 0 0 5 10 15 20 25 30 conc. Na (ppm) Now, as a further check, use LINEST to calculate the coefficients of the regression equation and compare them with those determined by the Trendline tool. As shown above, you will need to create columns of x2 and x3 values adjacent to the x-values for the LINEST calculation. For cubic equations, in the LINEST dialog box enter all cells containing x, x 2, and x3 data. Your result should look like the one shown below. coefficients std. dev. R2, 1.41E-06 5.05E-07 0.999986 49289.67 0.038141 #N/A #N/A LINEST SOLUTION From LINEST/GOAL -1.935E-04 0.01328 7.94E-05 #N/A Rdg. ppm 1.922E-05 0.000193 0.000498 #N/A 0 -0.0060 0.0005079 #N/A #N/A #N/A 0.062 5.01 2 #N/A #N/A #N/A 0.115 9.97 5.159E-07 #N/A #N/A #N/A 0.16 14.85 #N/A #N/A #N/A #N/A 0.2 19.87 #N/A #N/A #N/A #N/A 0.233 24.63 Now use Goal Seek to again calculate ppm values from the emission readings. Your results should look like those shown above. Note that the calculated concentrations are not identical, but they are very close. Data Analysis gives the same regression statistics and three sets of line fit and residual plots. CH 503 NONLINEAR REGRESSION PAGE 14 Exercise: The following data was obtained for a turbidimetric sulfate calibration. Plot the data on an XY(Scatter) plot without connecting lines. Add Trendlines to find the best fit. Format these charts as shown below. Use the regression equation and Goal Seek to check the fit of all data points. Compare your results with those shown below. mg SO4-2 Abs. Cubic Fit Quartic Fit Quadratic Fit 0 0 sple. conc. sple Abs. sple. conc sple Abs. sple conc. sple. Abs. 1 0.0700 1.019 0.07 0.989 0.07 1.06 0.07 2 0.125 2.0 0.125 1.96 0.125 2.12 0.125 4 0.207 4.120 0.207 4.10 0.207 4.10 0.207 8 0.273 7.448 0.273 7.87 0.273 6.72 0.273 10 0.293 9.835 0.293 10.02 0.293 8.86 0.291 14 0.335 14.48 0.335 14.05 0.335 9.41 0.291 16 0.362 15.84 0.362 16.00 0.362 9.50 0.291 Turbidimetric Sulfate Calib. Turbidimetric Sulfate Calib. y = -0.0015x2 + 0.0432x + 0.0269 R2 = 0.9766 y = 0.0205x + 0.0674 R2 = 0.8996 0.4 0.4 0.3 0.2 Series1 0.1 Linear (Series1) Absorbance Absorbance 0.5 0 0.3 0.2 Series1 0.1 0 0 5 10 15 20 0 conc. (mg SO4-2/100 mL) 3 Absorbance Poly. (Series1) 0 0 5 10 15 -2 conc. (mg SO4 /100 mL) 20 0.4 0.2 Series1 15 y = -0.00000912x4 + 0.00048522x3 0.00902524x2 + 0.08022994x - 0.00098487 R2 = 0.99987095 0.3 0.1 10 Turbidimetric Sulfate Calib. 2 y = 0.00020250x - 0.00633726x + 0.07225357x + 0.00273111 R2 = 0.99895087 0.4 5 conc. (mg SO4-2/100 mL) Turbidimetric Sulfate Calib. Absorbance Poly. (Series1) 20 0.3 0.2 Series1 0.1 Poly. (Series1) 0 0 5 10 conc. (mg SO4-2/100 15 20 mL) It is apparent that the quartic equation gives the best fit as it has the highest R2 and returns predicted concentration values closest to the original data. Caution: By using high orders of polynomials, even poor (scattered) data can be made to ‘fit’ a regression line. There is no substitute for good experimental data. Check your results by running replicates when possible. CH 503 NONLINEAR REGRESSION PAGE 15