Chapter 14. Nonlinear Regression.

advertisement
Notes AGR 206
687292012
Revised: 2/5/2016
Chapter 14. Nonlinear Regression
14:1 What is nonlinear regression?
14:1.1
Model is non-linear in the parameters.
Nonlinear models can be understood by comparison with linear models. In statistics, "linear," in relation to a
model, means that the model is linear in the parameters. A model is linear in the parameters if it can be written
as an equation that only involves sums of parameters multiplied by coefficients, where the coefficients are
functions of the predictor variables that do not involve any parameters or unknown quantities. Therefore, all
polynomial models, including interactions among X variables, and other models that appear not to be linear but
include only sums of parameters multiplied by know functions of the explanatory variables, are actually linear.
14:1.1.1
Examples of linear models.
Typical linear models are those used in most statistical methods, such as ANOVA, simple linear regression,
ANCOVA, multiple linear regression, etc.
Polynomial with a single X:
Y   0  1 X1   2 X12   3 X13  
Multiple linear regression (hyperplane):
Y   0  1 X1   2 X2   3 X 3  
Linear function of a known function of X:
Y   0  1 e5 X  
Response surface:
Y   0  1 X1   11X12   2 X2   22 X 22   12 X1 X2  
14:1.1.2
Examples of nonlinear models.
In nonlinear models, parameters are designated with the symbol .
Nonlinear models cannot be expressed as linear functions of the parameters. The following are nonlinear
models. Some of them are discussed in detail below.
Y   0   1X   2 X  
Y
X

 0  1 X
Y   0 e 1 X  
Some of these nonlinear models could be linearized if the errors were not additive. When a nonlinear model
can be linearized by using transformations and changes of variables, it is called "intrinsically linear."
Therefore, nonlinear models can be subdivided into intrinsically linear and intrinsically nonlinear models.
687292012
1
Notes AGR 206
14:1.1.3
Revised: 2/5/2016
687292012
Intrinsically linear models
Intrinsically linear models can be linearized by transformations of the X, Y, or both variables. Typically the
transformations are the log or inverse. In these cases, the model is intrinsically linear only of the errors are
multiplicative; otherwise, the transformations will distort the estimation of the errors, and may tend to
overestimate them. Parameters estimates will likely be biased in those situations. More generally, intrinsically
linear models have errors that become additive when transformed.
Examples of intrinsically linear models (note that the errors must be multiplicative):
Exponential growth or decay:
Y  0e
1 X
  ln Y  ln  0  1 X  ln 
Michaelis - Menten:
Y
14:1.1.4
Vmax [S]
1
1
  Y 1  Vmax
 km Vmax
[S]1   1
k m  [S]
Intrinsically nonlinear models
Intrinsically nonlinear models cannot be linearized. All nonlinear models shown above cannot be linearized if
the errors are additive in the untransformed model, which is a common assumption. In the HW07 you are asked
to explore what happens when a model whose errors are additive is linearized to estimate the parameters.
prey consumption rate (prey/time)
4
Ymax
3
2
Xc1
1
Xc2
0
0
100
200
300
prey density (prey/area)
Figure 14-1. Type I functional response as a segmented, nonlinear model.
Segmented models are very useful nonlinear models. These models are characterized by being composed
of segments of different functions for different ranges of the X variables. Typically, the values of X at which the
model switches from one functional form to another are parameters to be adjusted. For example, a type I
functional response describing the rate of predation or prey consumption by a predator consists of a "ramp"
function where rate of predation increases linearly for prey densities between 0 and X c, and then remains
constant at a maximum (Figure 1). An example illustrating how to fit segmented models is considered below.
14:1.2
Non-linear regression minimizes SSE numerically.
When the models are nonlinear in the parameters, the estimation of parameter values, probability levels,
and confidence intervals are radically different from the procedures that can be applied to linear models. Even
687292012
2
Notes AGR 206
687292012
Revised: 2/5/2016
for very simple nonlinear models, it is not possible to obtain an algebraic or analytical solution for the
minimization of SSE. The normal equations become nonlinear and typically are intractable. Thus, the
parameters are estimated by numerically obtaining those values that minimize the SSE. Numerical estimation
simply means that the SSE are calculated for many (very many) combinations of parameter values, and the
combination that yields the least sums of squares is selected as the solution. In practical terms, numerical
minimization of functions with multiple parameters is extremely laborious and requires very sophisticated
software. Even with the best software, the relationship between the SSE and the parameter values may be so
wild that it is impossible to find an optimum. One should also consider that given that the models are nonlinear,
a single global minimum SSE might not exist. There may be multiple combinations of parameters that have the
same, or almost the same, SSE.
14:1.3 Properties of parameters and predictions cannot be derived with
equations.
In addition to the problem caused by nonlinearity on the estimation of parameters, there is a problem with
the distribution of the estimated parameters. The usual statistics (like t, z and F) that are invoked to make
probabilistic statements about parameters in linear models may not apply to nonlinear ones. Therefore, use of
techniques such as resampling can be advantageous in nonlinear fitting.
14:2 Why and when should one use nonlinear regression?
Nonlinear models should be used when the true relationship between the response and explanatory
variables is not linear. Even when such a model can be linearized, the transformations applied to linearize the
model may disrupt properties of the errors, and will likely yield biased estimates of the parameters. Moreover, as
illustrated in the homework assigned for nonlinear regression, some of the properties of the original model are
not preserved in the linearized version. The linearized version of the type II functional response yields a
negative value for the parameter Th (handling time). A negative value for handling time makes no biological
sense, and would never be obtained with the original nonlinear model.
The following are advantages of nonlinear models, and indicate when nonlinear fitting should be used:
1. Function relating Y to X's is known on the basis of a mechanistic understanding of
the process. For example, the logistic growth model is based on the fact that for
some populations, individuals compete for resources and reduce each other's
growth rate linearly as density increases.
2. Parameters of the model have a direct biological meaning. The model may offer
the only way of empirically determine the value of the parameter. In the previous
example, the parameters of the logistic equations are the intrinsic relative growth
rate (r) and carrying capacity (K).
3. Nonlinear models may characterize responses better with fewer parameters than
linear ones, even when no a priori functional form is available.
14:3 Model and assumptions.
The general version of nonlinear models is:
Yi  f (Xi ;  )   i
Where the errors  are assumed to have normal, independent distributions with mean 0 and constant
variance 2. The function part of the equation represents the expected value of Y for a given value of the vector
of independent variables X.
687292012
3
Notes AGR 206
687292012
Revised: 2/5/2016
14:4 Useful nonlinear models.
14:4.1
Exponential growth/decay.
This model describes a process that has a self accelerating/decelerating feedback loop. The main
assumption is that the rate of change in Y is a constant proportion of the level of Y.
sgn[ 0 ]  sgn[ 1 ]  growth
 
sgn[ 0 ]  sgn[ 1 ]  decay
t  time or explanatory variable
Y  0 e 1t  
0  initial size when t  0
1  per capita rate of change
Note that the model has an additive error term, which is different from the multiplicative term that was
previously shown with intrinsically linear exponential model.
In dynamic simulation modeling, this equation represents the following compartment:
Y
Figure 14-2. Compartment model for exponential decay.
Figure 14-3. Shape of exponential decay and growth functions.
687292012
4
Notes AGR 206
14:4.2
Revised: 2/5/2016
687292012
Two-term exponential model.
This model represents the concentration or amount of material in a second compartment that receives input
from a first compartment and outputs to a sink. The first compartment receives a single instantaneous dose, and
passes the material to the second compartment in exponential decay fashion (i.e., a constant proportion of the
amount remaining in compartment 1 is passed to compartment 2 per unit time).
Yb 
1
1  2
e
 2 t

 e  1t  
This model can be used to describe the amount of a substance in the blood as a function of time after a
single dose of substance has been taken into a different compartment such as the gut (meal) or muscle
(intramuscular injection). The model is represented with the two compartments, where each compartment
follows an exponential behavior, but compartment 2 depends on the input from compartment1. The parameters
1 and 2 represent the exponential transfer from gut to blood and out from blood into storage or excretion.
Yg ut
Y blood
Figure 14-4. Compartments and flows that result in a two-term exponential model.
Figure 14-5. Examples of two-term exponential models.
687292012
5
Notes AGR 206
14:4.3
687292012
Revised: 2/5/2016
Mitscherlich's growth or yield response.
This is a model commonly used to describe yield responses to fertilizer input. The main assumption or
theory that generates the model is that the rate of increase of expected yield (E{Y}) per unit fertilizer (X) is
proportional to the difference between expected yield and maximum yield ( ). Different symbols are used for
the parameters of this model to follow more typical nomenclature.
This model has a Y-intercept equal to [1-e-] which is the yield without any fertilizer, and it has an Xintercept equal to -, reflecting the fact that the soil is supplying an amount of nutrient equivalent to . The
parameter b controls the rate at which the asymptote

 is approached.

Y   1  e  X     
where
Y  yield
X  input (e.g., fertilization rate)
  maximum yield when X is not limiting
  increase in expected yield per unit X per
unit of " opportunity" for yield to increase
  nutrient value of soil in units or equivalents of X
Figure 14-6. Example of Mitscherlich response function with=0.
Figure 6 also shows an inverse polynomial model that is intrinsically nonlinear. The Michaelis-Menten model
that is discussed next is a special case of an inverse polynomial model.
14:4.4
Michaelis-Menten enzyme kinetics or Holling's disk equation.
This model represents the enzymatic conversion of a substrate into product Y, where the reaction depends
only on the concentration of substrate [S] and the affinity between substrate and enzyme. The equation also
depicts the rate of predation as a function of prey density, where the predator is limited by non-overlapping
searching and handling times (see HW07).
687292012
6
Notes AGR 206
Y 
Revised: 2/5/2016
687292012
Vmax [S]
  for enzyme reactions
km  [S]
where
Y  rate of production per unit time
Vmax  maximum rate possible determined
by the concentration of enzyme
km  substrate concentration that produces half of
[S]  substrate concentration
Y =
1 T X
h
 1 
   X
aTh 
 
Vmax
aX

1  aTh X
where
Y  rate of prey consumption by predator (prey per unit time)
X  prey density (prey per unit area)
a  rate of successful search (area per unit time)
Th  handling time
14:5 Obtaining parameter estimates in JMP and SAS NLIN.
14:5.1
Initial estimates of the parameters.
Because the parameters are estimated numerically, it is necessary to provide JMP and SAS with some initial
estimates for the parameter values. These estimates can be critical in achieving a good fit for the model. It is a
good practice to obtain good guesses for parameter values that have biological validity, and then request a grid
of initial points around those guesses. It is also good to fit the model several times, each time starting with a
different grid, to make sure that parameter estimates are always the same, indicating that results have not
converged to a local but to a global minimum.
Initial parameter values can be obtained by:
1. Using prior knowledge of the system, or published values for similar situations.
2. Plugging in a few representative observations from the data into the model and
solving for the parameters in a deterministic fashion.
3. Plotting the data and projecting parameter values that have direct meaning on
the Y or X axes.
The following Figure provides an example for the Mitscherlich model.
687292012
7
Notes AGR 206
687292012
Revised: 2/5/2016
Figure 14-7. Initial parameter estimates for the Mitscherlich equation.
 is the maximum yield when X is not
limiting. In the previous Figure one can visually extrapolate to the right and get an initial guess that =1400 kg.
The second parameter is  or the fertilizer equivalent of the nutrient provided by the soil. Based on the Figure,
In the Mitscherlich model there are 3 parameters. The first parameter,
the model should cross the Y axis at about 400 kg, and with 60 units of fertilizer the yield is around 1000 kg.
Based on the model we can thus write:

400  1400 1 e
 
400
e
 1 1400
 
 400 
     ln 1
 0.34
 1400
1000
 1000
 e   60      60   ln 1
 1.25
 1400
1400
subtracting the last two expressions, term to term we obtain
60  0.34  1.25  0.91    0.015
0.015   0.34    22


1000  1400 1 e   60  1
This yields a set of initial guesses for the values of the parameters to be entered in the program or
equations. In JMP, initial parameter estimates are entered when defining the parameters. These values can be
changed at any time, because JMP treats the predicted values as a regular equation for a new column. In SAS,
the initial estimates are specified in the “PARAMETERS” or “PARMS” statement. If desired, a grid of initial
parameters can be defined in the PARAMETERS statement by specifying extremes for each parameter and grid
size.
14:5.2
Fitting a segmented model with SAS.
Consider the functional response, the function that relates prey consumed per predator per unit time as a
function of prey density. Most functional response models assume that predators cannot search for and
687292012
8
Notes AGR 206
Revised: 2/5/2016
687292012
handle/consume prey at the same time. While this is a sound assumption for many cases, the "predator" view
does not apply to large herbivores that can take several bites and walk looking for more forage as they chew
("handle") what they have in the mouth. Therefore, handling and searching can overlap in time, and the typical
functional responses may fail. In this case, the forage ingestion process requires a model that incorporates the
overlap between chewing and searching. In simplistic terms, one can represent the ingestion process as limited
by the process that takes the longest, either chewing a bite or finding a bite. This can be expressed in an
equation as:
Rate of bite consumption:
BR 
1
max(Tc ,Ts )
Time to handle or chew each bite:
Time to search for and find each bite:
Ts 
Tc
Ts
1
V X
where X is bite density and V is forager velocity.
Therefore, for a range of low densities, the rate of bites will be determined by encounter rate, and thus, bite
density. At a density Xc, the time that it takes the forager to find a bite is the same it takes it to chew it, and for
higher densities, the bite rate is constant and limited by the time necessary to chew the forage. Note that at
X=Xc, 1/Ts=1/Tc.
when X = Xc
BR  1 / Ts  1 / Tc  V Xc  Tc1  Xc  (V Tc )2
thus, the model is expressed as:
V X if X  (V Tc )2

BR  1
2
if X  (V Tc )

Tc
The segmented model requires a few extra lines of code in SAS and a slightly more complex formula in
JMP.
proc nlin data=cow;
parms v=0.3 to 1.5 by 0.3 tc=0.03 to 0.3 by 0.03;
xc=(1/(v*tc))**2
/*estimate the point where shape changes*/;
if x>xc then do;
model y=1/tc;
end;
else do;
model y=v*sqrt(x);
end;
run;
The following output is generated:
Non-Linear Least Squares Grid Search
Dependent Variable Y
V
TC
Sum of Squares
20.000000
0.005000
5481.231793
25.000000
0.005000
4321.759189
30.000000
0.005000
12207.286586
35.000000
0.005000
29137.813982
40.000000
0.005000
51847.061259
687292012
9
Notes AGR 206
Revised: 2/5/2016
687292012
20.000000
25.000000
30.000000
35.000000
40.000000
20.000000
25.000000
30.000000
35.000000
40.000000
20.000000
25.000000
30.000000
35.000000
40.000000
0.010000
0.010000
0.010000
0.010000
0.010000
0.015000
0.015000
0.015000
0.015000
0.015000
0.020000
0.020000
0.020000
0.020000
0.020000
5448.091733
1352.507693
271.826876
1527.546387
4465.910243
10790.925320
7730.202209
7257.618327
7714.325021
8531.900948
19757.516225
18302.531789
18104.539596
18351.852161
18632.762188
Non-Linear Least Squares Iterative Phase
Dependent Variable Y
Method: Gauss-Newton
Iter
V
TC
Sum of Squares
0
30.000000
0.010000
271.826876
1
29.708664
0.010165
251.506284
2
29.708664
0.010167
251.502020
3
29.708664
0.010167
251.502020
NOTE: Convergence criterion met.
Non-Linear Least Squares Summary Statistics
Source
Regression
Residual
Uncorrected Total
DF Sum of Squares
2
100317.62019
18
251.50202
20
100569.12221
(Corrected Total)
19
Parameter
V
TC
Estimate
Dependent Variable Y
Mean Square
50158.81009
13.97233
19360.83045
Asymptotic
Std. Error
29.70866365 0.54009073751
0.01016737 0.00015775252
Asymptotic 95 %
Confidence Interval
Lower
Upper
28.573982614 30.843344687
0.009835949
0.010498796
The model then looks as shown in Figure 8, where Xc=10.95 bites/m2.
687292012
10
Notes AGR 206
687292012
Revised: 2/5/2016
Figure 14-8. Segmented model.
14:5.3
Fitting a nonlinear model with JMP.
I use the same example as above to show how one does it in JMP. The general idea for nonlinear fitting in
JMP is to create a new column that contains a formula for the results of the model equation. In constructing this
formula, one creates, names, and gives initial values to parameters. Then, the new column is used as the X
variable for the nonlinear fit platform. From that point on, the model is controlled from the nonlinear fit window.
14:5.3.1
Creating the “X” column for nonlinear fitting.
In this case the model is simple in the sense that each piece of the overall equation is a simple equation, but
complex from the point of view that it is a segmented model because the “formula” to calculate the predicted
value depends on the level of the X variable (bite density). Segmented models can be easily specified in JMP
through the use of IF statements and Local variables. A local variable is a variable that exists and is calculated
only within a column of JMP. By the way, although one may look upon JMP as the “weak” cousin of SAS, JMP
is quite powerful and fully programmable. If one learns scripting, there will be virtually no test that one cannot
perform in JMP.
Before getting into the specification of the formula, consider that the model has an unknown value that
seems to be a parameter: the level of bite density at which the model turns from one equation to the other. As
the reader may have gathered from the SAS solution, it turn out that this is not a third parameter, but just a local
variable, because once v and tc are set, the value of bite density at which the equations change is also set. It is
that value of bite density (xc) at which the result of the first segment equals that of the second. This is the
reason that xc is specified as a Local Variable as opposed to a third parameter. If you want to experiment,
specify xc as a parameter and explore the results. Note that it is possible to represent this model without using
the local variable> I use the local variable to illustrate its use and because it make the formula more
understandable.
Open xmpl_SegmentNlin.jmp. Create a new column called XforBR. Select the column and then from the red
triangle next to the left of “Columns” select Formula.
687292012
11
Notes AGR 206
687292012
Revised: 2/5/2016
On the menu above the
names of table columns
(circled in red) select
“Local Variables” to
define a new one. Click
on “New Local…” and
type “xc” as the name if
the variable. Leave the
value blank.
The name xc appears in the list
of local variables. Click on it
once to enter in the formula and
then select “Assignment” on the
Functions and select the “=”
sign. To proceed with the
formula for xc we need to create
a couple of parameters. On the
menu circled above select
“Parameters” and click on “New
Parameter…”
Name the new parameters v
and tc, and give them the
initial values obtained earlier
by visual and analytical
inspection of the scatter plot.
687292012
12
Notes AGR 206
687292012
Revised: 2/5/2016
The local variable is
ended with a semicolon,
after which the formula
for the model itself can
be entered. Use the
equation editor to get a
formula that looks like
the one in the figure.
Once the equation for XforBR is
completed, select the Nonlinear
Fit platform and enter the
observed bite rate as the Y
variable and XforBR as the X
variable. Then, click OK. The
nonlinear control panel opens
up. Click on Go and JMP will
perform the numerical
minimization of the loss
function, in this case the SSE,
which is the default.
687292012
13
Notes AGR 206
687292012
Revised: 2/5/2016
This area of the control panel
allows you to control the
numerical minimization of the
loss function. Unless a model
fails to converge and other
solutions do not work, the
default values are not
changed.
This button places the current
values of the estimated
parameters in the formula for
XforBR.
The CI button
calculates the CI for the
specified Alpha, and
places the CI extremes
in the Solution table.
The Goal SSE is used
to estimate nonlinear
bivariate confidence
regions.
The graph shows the fitted line
and the scatter of observations.
The slider bar for each
parameter allows one to
explore the changes in the
curve by manually changing the
parameter estimates.
687292012
14
Notes AGR 206
687292012
Revised: 2/5/2016
The correlation of estimates gives an idea of how much one parameter can “compensate for the variation in
the other. In this particular model, the parameters are completely independent because they apply to different
parts of the curve. Often, parameter estimates have positive or negative correlation. Because of the correlation
and because of the nonlinearity of the model, the joint confidence intervals for the parameters can have shapes
that are irregular. The Nonlinear platform allows one to numerically determine a joint CI for the estimated
parameters as follows. Click on the Confidence Limits button and record the Goal SSE for CL. This is the SSE
that corresponds to an isoline of constant probability equal to 1-Alpha. Then, select SSE grid in the pop-down
menu obtained by clicking on the red triangle to the left of “Nonlinear Fit” at the top.
The SSE grid command creates a table where the
ranges and number of points for each parameter can
be specified. This generates a grid of values of both
parameters and calculates the SSE for each point in
the grid. Point where the SSE is smaller than the
Goal SSE for CL are within and define the joint CI for
both parameters.
By clicking on the “GO” button the grid is created and the values for the SSE are calculated. Everything is
put into a new untitled data table that can be used to create a contour or a 3-D graph of the confidence interval.
Use the “Graph” and “Contour Plot” menu to obtain a contour plot of SSE vs. v and tc, as shown below.
687292012
15
Notes AGR 206
687292012
Revised: 2/5/2016
In the contour plot window, use the pop-down menu to specify contour values. Make sure that you include a
contour that is equal to the Goal SSE. In this case it was 313.75.
The final contour plot can be modified by removing or changing colors, etc. For this, right-click on the colors
and other graph elements. In this example, it is clear that the parameter estimated have no correlation because
the CI is perfectly circular. This model probably has well-behaved normal errors, an assumption that can be
verified by the usual means (Distribution… Fit Distribution… Normal…Goodness of Fit).
687292012
16
Notes AGR 206
687292012
687292012
Revised: 2/5/2016
17
Download