Notes AGR 206 687292012 Revised: 2/5/2016 Chapter 14. Nonlinear Regression 14:1 What is nonlinear regression? 14:1.1 Model is non-linear in the parameters. Nonlinear models can be understood by comparison with linear models. In statistics, "linear," in relation to a model, means that the model is linear in the parameters. A model is linear in the parameters if it can be written as an equation that only involves sums of parameters multiplied by coefficients, where the coefficients are functions of the predictor variables that do not involve any parameters or unknown quantities. Therefore, all polynomial models, including interactions among X variables, and other models that appear not to be linear but include only sums of parameters multiplied by know functions of the explanatory variables, are actually linear. 14:1.1.1 Examples of linear models. Typical linear models are those used in most statistical methods, such as ANOVA, simple linear regression, ANCOVA, multiple linear regression, etc. Polynomial with a single X: Y 0 1 X1 2 X12 3 X13 Multiple linear regression (hyperplane): Y 0 1 X1 2 X2 3 X 3 Linear function of a known function of X: Y 0 1 e5 X Response surface: Y 0 1 X1 11X12 2 X2 22 X 22 12 X1 X2 14:1.1.2 Examples of nonlinear models. In nonlinear models, parameters are designated with the symbol . Nonlinear models cannot be expressed as linear functions of the parameters. The following are nonlinear models. Some of them are discussed in detail below. Y 0 1X 2 X Y X 0 1 X Y 0 e 1 X Some of these nonlinear models could be linearized if the errors were not additive. When a nonlinear model can be linearized by using transformations and changes of variables, it is called "intrinsically linear." Therefore, nonlinear models can be subdivided into intrinsically linear and intrinsically nonlinear models. 687292012 1 Notes AGR 206 14:1.1.3 Revised: 2/5/2016 687292012 Intrinsically linear models Intrinsically linear models can be linearized by transformations of the X, Y, or both variables. Typically the transformations are the log or inverse. In these cases, the model is intrinsically linear only of the errors are multiplicative; otherwise, the transformations will distort the estimation of the errors, and may tend to overestimate them. Parameters estimates will likely be biased in those situations. More generally, intrinsically linear models have errors that become additive when transformed. Examples of intrinsically linear models (note that the errors must be multiplicative): Exponential growth or decay: Y 0e 1 X ln Y ln 0 1 X ln Michaelis - Menten: Y 14:1.1.4 Vmax [S] 1 1 Y 1 Vmax km Vmax [S]1 1 k m [S] Intrinsically nonlinear models Intrinsically nonlinear models cannot be linearized. All nonlinear models shown above cannot be linearized if the errors are additive in the untransformed model, which is a common assumption. In the HW07 you are asked to explore what happens when a model whose errors are additive is linearized to estimate the parameters. prey consumption rate (prey/time) 4 Ymax 3 2 Xc1 1 Xc2 0 0 100 200 300 prey density (prey/area) Figure 14-1. Type I functional response as a segmented, nonlinear model. Segmented models are very useful nonlinear models. These models are characterized by being composed of segments of different functions for different ranges of the X variables. Typically, the values of X at which the model switches from one functional form to another are parameters to be adjusted. For example, a type I functional response describing the rate of predation or prey consumption by a predator consists of a "ramp" function where rate of predation increases linearly for prey densities between 0 and X c, and then remains constant at a maximum (Figure 1). An example illustrating how to fit segmented models is considered below. 14:1.2 Non-linear regression minimizes SSE numerically. When the models are nonlinear in the parameters, the estimation of parameter values, probability levels, and confidence intervals are radically different from the procedures that can be applied to linear models. Even 687292012 2 Notes AGR 206 687292012 Revised: 2/5/2016 for very simple nonlinear models, it is not possible to obtain an algebraic or analytical solution for the minimization of SSE. The normal equations become nonlinear and typically are intractable. Thus, the parameters are estimated by numerically obtaining those values that minimize the SSE. Numerical estimation simply means that the SSE are calculated for many (very many) combinations of parameter values, and the combination that yields the least sums of squares is selected as the solution. In practical terms, numerical minimization of functions with multiple parameters is extremely laborious and requires very sophisticated software. Even with the best software, the relationship between the SSE and the parameter values may be so wild that it is impossible to find an optimum. One should also consider that given that the models are nonlinear, a single global minimum SSE might not exist. There may be multiple combinations of parameters that have the same, or almost the same, SSE. 14:1.3 Properties of parameters and predictions cannot be derived with equations. In addition to the problem caused by nonlinearity on the estimation of parameters, there is a problem with the distribution of the estimated parameters. The usual statistics (like t, z and F) that are invoked to make probabilistic statements about parameters in linear models may not apply to nonlinear ones. Therefore, use of techniques such as resampling can be advantageous in nonlinear fitting. 14:2 Why and when should one use nonlinear regression? Nonlinear models should be used when the true relationship between the response and explanatory variables is not linear. Even when such a model can be linearized, the transformations applied to linearize the model may disrupt properties of the errors, and will likely yield biased estimates of the parameters. Moreover, as illustrated in the homework assigned for nonlinear regression, some of the properties of the original model are not preserved in the linearized version. The linearized version of the type II functional response yields a negative value for the parameter Th (handling time). A negative value for handling time makes no biological sense, and would never be obtained with the original nonlinear model. The following are advantages of nonlinear models, and indicate when nonlinear fitting should be used: 1. Function relating Y to X's is known on the basis of a mechanistic understanding of the process. For example, the logistic growth model is based on the fact that for some populations, individuals compete for resources and reduce each other's growth rate linearly as density increases. 2. Parameters of the model have a direct biological meaning. The model may offer the only way of empirically determine the value of the parameter. In the previous example, the parameters of the logistic equations are the intrinsic relative growth rate (r) and carrying capacity (K). 3. Nonlinear models may characterize responses better with fewer parameters than linear ones, even when no a priori functional form is available. 14:3 Model and assumptions. The general version of nonlinear models is: Yi f (Xi ; ) i Where the errors are assumed to have normal, independent distributions with mean 0 and constant variance 2. The function part of the equation represents the expected value of Y for a given value of the vector of independent variables X. 687292012 3 Notes AGR 206 687292012 Revised: 2/5/2016 14:4 Useful nonlinear models. 14:4.1 Exponential growth/decay. This model describes a process that has a self accelerating/decelerating feedback loop. The main assumption is that the rate of change in Y is a constant proportion of the level of Y. sgn[ 0 ] sgn[ 1 ] growth sgn[ 0 ] sgn[ 1 ] decay t time or explanatory variable Y 0 e 1t 0 initial size when t 0 1 per capita rate of change Note that the model has an additive error term, which is different from the multiplicative term that was previously shown with intrinsically linear exponential model. In dynamic simulation modeling, this equation represents the following compartment: Y Figure 14-2. Compartment model for exponential decay. Figure 14-3. Shape of exponential decay and growth functions. 687292012 4 Notes AGR 206 14:4.2 Revised: 2/5/2016 687292012 Two-term exponential model. This model represents the concentration or amount of material in a second compartment that receives input from a first compartment and outputs to a sink. The first compartment receives a single instantaneous dose, and passes the material to the second compartment in exponential decay fashion (i.e., a constant proportion of the amount remaining in compartment 1 is passed to compartment 2 per unit time). Yb 1 1 2 e 2 t e 1t This model can be used to describe the amount of a substance in the blood as a function of time after a single dose of substance has been taken into a different compartment such as the gut (meal) or muscle (intramuscular injection). The model is represented with the two compartments, where each compartment follows an exponential behavior, but compartment 2 depends on the input from compartment1. The parameters 1 and 2 represent the exponential transfer from gut to blood and out from blood into storage or excretion. Yg ut Y blood Figure 14-4. Compartments and flows that result in a two-term exponential model. Figure 14-5. Examples of two-term exponential models. 687292012 5 Notes AGR 206 14:4.3 687292012 Revised: 2/5/2016 Mitscherlich's growth or yield response. This is a model commonly used to describe yield responses to fertilizer input. The main assumption or theory that generates the model is that the rate of increase of expected yield (E{Y}) per unit fertilizer (X) is proportional to the difference between expected yield and maximum yield ( ). Different symbols are used for the parameters of this model to follow more typical nomenclature. This model has a Y-intercept equal to [1-e-] which is the yield without any fertilizer, and it has an Xintercept equal to -, reflecting the fact that the soil is supplying an amount of nutrient equivalent to . The parameter b controls the rate at which the asymptote is approached. Y 1 e X where Y yield X input (e.g., fertilization rate) maximum yield when X is not limiting increase in expected yield per unit X per unit of " opportunity" for yield to increase nutrient value of soil in units or equivalents of X Figure 14-6. Example of Mitscherlich response function with=0. Figure 6 also shows an inverse polynomial model that is intrinsically nonlinear. The Michaelis-Menten model that is discussed next is a special case of an inverse polynomial model. 14:4.4 Michaelis-Menten enzyme kinetics or Holling's disk equation. This model represents the enzymatic conversion of a substrate into product Y, where the reaction depends only on the concentration of substrate [S] and the affinity between substrate and enzyme. The equation also depicts the rate of predation as a function of prey density, where the predator is limited by non-overlapping searching and handling times (see HW07). 687292012 6 Notes AGR 206 Y Revised: 2/5/2016 687292012 Vmax [S] for enzyme reactions km [S] where Y rate of production per unit time Vmax maximum rate possible determined by the concentration of enzyme km substrate concentration that produces half of [S] substrate concentration Y = 1 T X h 1 X aTh Vmax aX 1 aTh X where Y rate of prey consumption by predator (prey per unit time) X prey density (prey per unit area) a rate of successful search (area per unit time) Th handling time 14:5 Obtaining parameter estimates in JMP and SAS NLIN. 14:5.1 Initial estimates of the parameters. Because the parameters are estimated numerically, it is necessary to provide JMP and SAS with some initial estimates for the parameter values. These estimates can be critical in achieving a good fit for the model. It is a good practice to obtain good guesses for parameter values that have biological validity, and then request a grid of initial points around those guesses. It is also good to fit the model several times, each time starting with a different grid, to make sure that parameter estimates are always the same, indicating that results have not converged to a local but to a global minimum. Initial parameter values can be obtained by: 1. Using prior knowledge of the system, or published values for similar situations. 2. Plugging in a few representative observations from the data into the model and solving for the parameters in a deterministic fashion. 3. Plotting the data and projecting parameter values that have direct meaning on the Y or X axes. The following Figure provides an example for the Mitscherlich model. 687292012 7 Notes AGR 206 687292012 Revised: 2/5/2016 Figure 14-7. Initial parameter estimates for the Mitscherlich equation. is the maximum yield when X is not limiting. In the previous Figure one can visually extrapolate to the right and get an initial guess that =1400 kg. The second parameter is or the fertilizer equivalent of the nutrient provided by the soil. Based on the Figure, In the Mitscherlich model there are 3 parameters. The first parameter, the model should cross the Y axis at about 400 kg, and with 60 units of fertilizer the yield is around 1000 kg. Based on the model we can thus write: 400 1400 1 e 400 e 1 1400 400 ln 1 0.34 1400 1000 1000 e 60 60 ln 1 1.25 1400 1400 subtracting the last two expressions, term to term we obtain 60 0.34 1.25 0.91 0.015 0.015 0.34 22 1000 1400 1 e 60 1 This yields a set of initial guesses for the values of the parameters to be entered in the program or equations. In JMP, initial parameter estimates are entered when defining the parameters. These values can be changed at any time, because JMP treats the predicted values as a regular equation for a new column. In SAS, the initial estimates are specified in the “PARAMETERS” or “PARMS” statement. If desired, a grid of initial parameters can be defined in the PARAMETERS statement by specifying extremes for each parameter and grid size. 14:5.2 Fitting a segmented model with SAS. Consider the functional response, the function that relates prey consumed per predator per unit time as a function of prey density. Most functional response models assume that predators cannot search for and 687292012 8 Notes AGR 206 Revised: 2/5/2016 687292012 handle/consume prey at the same time. While this is a sound assumption for many cases, the "predator" view does not apply to large herbivores that can take several bites and walk looking for more forage as they chew ("handle") what they have in the mouth. Therefore, handling and searching can overlap in time, and the typical functional responses may fail. In this case, the forage ingestion process requires a model that incorporates the overlap between chewing and searching. In simplistic terms, one can represent the ingestion process as limited by the process that takes the longest, either chewing a bite or finding a bite. This can be expressed in an equation as: Rate of bite consumption: BR 1 max(Tc ,Ts ) Time to handle or chew each bite: Time to search for and find each bite: Ts Tc Ts 1 V X where X is bite density and V is forager velocity. Therefore, for a range of low densities, the rate of bites will be determined by encounter rate, and thus, bite density. At a density Xc, the time that it takes the forager to find a bite is the same it takes it to chew it, and for higher densities, the bite rate is constant and limited by the time necessary to chew the forage. Note that at X=Xc, 1/Ts=1/Tc. when X = Xc BR 1 / Ts 1 / Tc V Xc Tc1 Xc (V Tc )2 thus, the model is expressed as: V X if X (V Tc )2 BR 1 2 if X (V Tc ) Tc The segmented model requires a few extra lines of code in SAS and a slightly more complex formula in JMP. proc nlin data=cow; parms v=0.3 to 1.5 by 0.3 tc=0.03 to 0.3 by 0.03; xc=(1/(v*tc))**2 /*estimate the point where shape changes*/; if x>xc then do; model y=1/tc; end; else do; model y=v*sqrt(x); end; run; The following output is generated: Non-Linear Least Squares Grid Search Dependent Variable Y V TC Sum of Squares 20.000000 0.005000 5481.231793 25.000000 0.005000 4321.759189 30.000000 0.005000 12207.286586 35.000000 0.005000 29137.813982 40.000000 0.005000 51847.061259 687292012 9 Notes AGR 206 Revised: 2/5/2016 687292012 20.000000 25.000000 30.000000 35.000000 40.000000 20.000000 25.000000 30.000000 35.000000 40.000000 20.000000 25.000000 30.000000 35.000000 40.000000 0.010000 0.010000 0.010000 0.010000 0.010000 0.015000 0.015000 0.015000 0.015000 0.015000 0.020000 0.020000 0.020000 0.020000 0.020000 5448.091733 1352.507693 271.826876 1527.546387 4465.910243 10790.925320 7730.202209 7257.618327 7714.325021 8531.900948 19757.516225 18302.531789 18104.539596 18351.852161 18632.762188 Non-Linear Least Squares Iterative Phase Dependent Variable Y Method: Gauss-Newton Iter V TC Sum of Squares 0 30.000000 0.010000 271.826876 1 29.708664 0.010165 251.506284 2 29.708664 0.010167 251.502020 3 29.708664 0.010167 251.502020 NOTE: Convergence criterion met. Non-Linear Least Squares Summary Statistics Source Regression Residual Uncorrected Total DF Sum of Squares 2 100317.62019 18 251.50202 20 100569.12221 (Corrected Total) 19 Parameter V TC Estimate Dependent Variable Y Mean Square 50158.81009 13.97233 19360.83045 Asymptotic Std. Error 29.70866365 0.54009073751 0.01016737 0.00015775252 Asymptotic 95 % Confidence Interval Lower Upper 28.573982614 30.843344687 0.009835949 0.010498796 The model then looks as shown in Figure 8, where Xc=10.95 bites/m2. 687292012 10 Notes AGR 206 687292012 Revised: 2/5/2016 Figure 14-8. Segmented model. 14:5.3 Fitting a nonlinear model with JMP. I use the same example as above to show how one does it in JMP. The general idea for nonlinear fitting in JMP is to create a new column that contains a formula for the results of the model equation. In constructing this formula, one creates, names, and gives initial values to parameters. Then, the new column is used as the X variable for the nonlinear fit platform. From that point on, the model is controlled from the nonlinear fit window. 14:5.3.1 Creating the “X” column for nonlinear fitting. In this case the model is simple in the sense that each piece of the overall equation is a simple equation, but complex from the point of view that it is a segmented model because the “formula” to calculate the predicted value depends on the level of the X variable (bite density). Segmented models can be easily specified in JMP through the use of IF statements and Local variables. A local variable is a variable that exists and is calculated only within a column of JMP. By the way, although one may look upon JMP as the “weak” cousin of SAS, JMP is quite powerful and fully programmable. If one learns scripting, there will be virtually no test that one cannot perform in JMP. Before getting into the specification of the formula, consider that the model has an unknown value that seems to be a parameter: the level of bite density at which the model turns from one equation to the other. As the reader may have gathered from the SAS solution, it turn out that this is not a third parameter, but just a local variable, because once v and tc are set, the value of bite density at which the equations change is also set. It is that value of bite density (xc) at which the result of the first segment equals that of the second. This is the reason that xc is specified as a Local Variable as opposed to a third parameter. If you want to experiment, specify xc as a parameter and explore the results. Note that it is possible to represent this model without using the local variable> I use the local variable to illustrate its use and because it make the formula more understandable. Open xmpl_SegmentNlin.jmp. Create a new column called XforBR. Select the column and then from the red triangle next to the left of “Columns” select Formula. 687292012 11 Notes AGR 206 687292012 Revised: 2/5/2016 On the menu above the names of table columns (circled in red) select “Local Variables” to define a new one. Click on “New Local…” and type “xc” as the name if the variable. Leave the value blank. The name xc appears in the list of local variables. Click on it once to enter in the formula and then select “Assignment” on the Functions and select the “=” sign. To proceed with the formula for xc we need to create a couple of parameters. On the menu circled above select “Parameters” and click on “New Parameter…” Name the new parameters v and tc, and give them the initial values obtained earlier by visual and analytical inspection of the scatter plot. 687292012 12 Notes AGR 206 687292012 Revised: 2/5/2016 The local variable is ended with a semicolon, after which the formula for the model itself can be entered. Use the equation editor to get a formula that looks like the one in the figure. Once the equation for XforBR is completed, select the Nonlinear Fit platform and enter the observed bite rate as the Y variable and XforBR as the X variable. Then, click OK. The nonlinear control panel opens up. Click on Go and JMP will perform the numerical minimization of the loss function, in this case the SSE, which is the default. 687292012 13 Notes AGR 206 687292012 Revised: 2/5/2016 This area of the control panel allows you to control the numerical minimization of the loss function. Unless a model fails to converge and other solutions do not work, the default values are not changed. This button places the current values of the estimated parameters in the formula for XforBR. The CI button calculates the CI for the specified Alpha, and places the CI extremes in the Solution table. The Goal SSE is used to estimate nonlinear bivariate confidence regions. The graph shows the fitted line and the scatter of observations. The slider bar for each parameter allows one to explore the changes in the curve by manually changing the parameter estimates. 687292012 14 Notes AGR 206 687292012 Revised: 2/5/2016 The correlation of estimates gives an idea of how much one parameter can “compensate for the variation in the other. In this particular model, the parameters are completely independent because they apply to different parts of the curve. Often, parameter estimates have positive or negative correlation. Because of the correlation and because of the nonlinearity of the model, the joint confidence intervals for the parameters can have shapes that are irregular. The Nonlinear platform allows one to numerically determine a joint CI for the estimated parameters as follows. Click on the Confidence Limits button and record the Goal SSE for CL. This is the SSE that corresponds to an isoline of constant probability equal to 1-Alpha. Then, select SSE grid in the pop-down menu obtained by clicking on the red triangle to the left of “Nonlinear Fit” at the top. The SSE grid command creates a table where the ranges and number of points for each parameter can be specified. This generates a grid of values of both parameters and calculates the SSE for each point in the grid. Point where the SSE is smaller than the Goal SSE for CL are within and define the joint CI for both parameters. By clicking on the “GO” button the grid is created and the values for the SSE are calculated. Everything is put into a new untitled data table that can be used to create a contour or a 3-D graph of the confidence interval. Use the “Graph” and “Contour Plot” menu to obtain a contour plot of SSE vs. v and tc, as shown below. 687292012 15 Notes AGR 206 687292012 Revised: 2/5/2016 In the contour plot window, use the pop-down menu to specify contour values. Make sure that you include a contour that is equal to the Goal SSE. In this case it was 313.75. The final contour plot can be modified by removing or changing colors, etc. For this, right-click on the colors and other graph elements. In this example, it is clear that the parameter estimated have no correlation because the CI is perfectly circular. This model probably has well-behaved normal errors, an assumption that can be verified by the usual means (Distribution… Fit Distribution… Normal…Goodness of Fit). 687292012 16 Notes AGR 206 687292012 687292012 Revised: 2/5/2016 17