Vegetation Science - UCL Department of Geography

advertisement
Non-linear Canopy Reflectance Model Inversion
MSc. Remote Sensing
P. Lewis & M. Disney
Remote Sensing Unit,
Dept. Geography
University College London,
26 Bedford way,
London WC1H 0AP
+44-71-387-7050 x 5557
e-mail: plewis@geog.ucl.ac.uk
January 14th 2006
1. INTRODUCTION
So far, we have been considering descriptions of a relationship between canopy
reflectance (or a VI) and some parameterization of the surface (e.g. LAI, chlorophyll
concentration). Such a relationship is generally termed a forward model.
If we wish to attempt to obtain information on the surface parameterization from
observed RS data, we need to invert such a model. Model inversion is therefore of key
importance to the remote sensing problem. In this lecture, we will discuss the issues
involved in attempting an inversion, and have an introduction to the range of methods
available for the task. Many of the issues and options raised are generic in nature and
applicable to a wide range of scientific problems.
1.2 ISSUES IN NON-LINEAR INVERSION
1.2.1. Error Surfaces
Earlier in the course, we considered linear model inversion as part of our study of lumped
parameter modelling. We will see that by phrasing the problem as a minimization of
some error function (e.g. RMSE) we can solve for the inverse of a linear model by matrix
inversion. In this case, the RMSE ‘error surface’ (in N dimensions for N model
parameters) is a quadratic function. It has only a single (unconstrained) minimum. We
can apply (linear) constraints through simple methods such as Lagrange multipliers.
P1
RMSE
P0
Figure 1. Quadratic error surface of RMSE of a linear model
P1
RMSE
P0
Figure 2. Constrained quadratic error surface of RMSE of a linear model
1.2.2 Parameter transformations and bounding
More generally, especially when considering inverting physically-based canopy
reflectance models, we do not have an analytical solution of this type, often because there
is no analytical solution to the (non-linear) simultaneous equations obtained from the
partial differentiation. Also, more generally, the error surface is more complex than a
quadratic function – this is easy to visualize if one considers non-linear parameters which
are known to threshold (i.e. the remote sensing signal, thence RMSE, is not sensitive to
changes in the parameter beyond a certain point) such as LAI. For high LAI, the error
surface becomes ‘flat’.
One way top get around this problem is by approximately linearising the parameter space.
The following table, taken from Combal et al1 (2002), shows a set of transformations for
a range of optical canopy reflectance model variables. By transforming the physical
variables to these terms, and setting robust limits to the terms, we can attempt a more
reliable inversion, obtain a ‘simpler’ (closer to quadratic) error surface, and (partially)
avoid saturation problems.
Table 1. Biophysical parameter transformations
Figure 3 shows a typical error surface (the model parameters are transformed LAI and
soil brightness) for a non-linear model – the error surface for non-transformed LAI would
show saturation at high values – in the transformed space, the global; minimum is more
clearly accentuated.
In fact, provided there is sufficient information in the remote sensing signal (i.e.
sensitivity to the model parameters), we find that in many cases for optical remote
sensing, the error surface is relatively ‘well-behaved’. We will demonstrate the
importance of this point later. Note that the same is often not the case for microwave
signals – this is partially a sensitivity to a large number of model parameters, and
partially a lack of information in the data received from satellite data.
With certain numerical (error) minimization routines, there may also be problems
associated with finding a global minimum, rather than a local one. Situations such as
inverting RS parameters can often be helped by assigning bounds to the domain in which
we search for a solution. When using the Powell algorithm, this is achieved by adding a
gradually-increasing penalty to the error function if we step outside defined bounds. This
does sometimes cause some problems with the inversion getting trapped on a boundary,
especially if the penalty increase is too sharp on the boundary.
1
B. Combal, F. Baret, M. Weiss, A. Trubuil, D. Macé, A. Pragnère, R. Myneni, Y. Knyazikhin and L.
Wang, (2002), Retrieval of canopy biophysical variables from bidirectional reflectance: Using prior
information to solve the ill-posed inverse problem, Remote Sensing of Environment, 84, Pages 1-15.
Figure 3. Error surface from non-linear model
1.2.3 Weighting of the Error Function
Another issue to be considered at this point is how to phrase the error function. In the
Method of Least Squares, one is supposed to weight each residual by the uncertainty
associated with the observations. In this way, the RMSE is similar to a Z-score, in being a
normalized function relating to the average association of the observations with the
model values.
In the previous lecture, we weighted all samples equally. This is equivalent to assuming
that the error (noise) associated with each measurement is equal. This means that, for
example, if we were trying to invert chlorophyll concentration from visible and near
infrared hyperspectral data, we would be weighting the residuals in the near infrared
(where there is no sensitivity to the required parameter) as strongly as those at visible
wavelengths.
If the aim of the inversion were a robust estimation of a single parameter, we could chose
instead to weight the residuals by the sensitivity of the signal to that parameter (this
applies to angular as well as spectral sampling). We might chose a partial derivative as
the weighting term – the higher the rate of change of reflectance with respect to the
parameter of interest, the higher the weighting, so, instead of using:
iN
RMSE 
 
i 1
measured
i    modelled i 2
(1)
iN
1
i 1
we could use:
 


 P   measured i    modelled i 

i 1 
2
iN
  

 
i 1  P 
iN
RMSE 
2
(2)
Such an approach is suggested by Privette (1994)2. There are clearly issues with the
implementation of such an approach – principally: (i) the derivative will vary as a
function of the model parameters for a non-linear model (so some ‘generally appropriate’
mean value must be chosen); and (ii) weighting for multiple parameters is somewhat
arbitrary (so one has to decide on a compromise weighting scheme, or invert multiple
times for each model parameter). As a partial compromise, one most typically sees a
‘relative RMSE’ used in remote sensing applications:
iN
RMSE 
 
i 1
measured
i    modelled i 2
iN
 
i 1
i 
(3)
2
measured
rather than the ‘absolute’ RMSE shown in equation 1.
1.2.4 Using Additional Information
Whatever method we use for inversion, and however we phrase the error function, the
potential for problems exists if any model parameters are coupled (i.e. a change in one
parameter can be compensated for by change in another parameter without affecting the
solution at all). One way to get around this is to use additional information in the
inversion problem. The most typical example of this, for vegetation science applications,
is to use some model of the expected temporal dynamics of a vegetation canopy in
inversion. Moulin et al. (1998)3 review methods for linking crop growth models to the
2
Privette, J. L. (1994) An efficient strategy for the inversion of bidirectional reflectance models with
satellite remote sensing data, Ph.D. thesis, University of Colorado.
3
S. MOULIN, A. BONDEAU and R. DELEÂCOLLE , (1998), “Combining agricultural crop model s and
satellite observations: from field to regional scales”, International Journal of Remote Sensing, vol. 19, no.
6, 1021 -1036
remote sensing inversion problem. Methods generally rely on using a priori information
on the expected crop type (e.g. obtained from a land cover classification) and ‘growing’
the crop model to provide an expectation of canopy state variables. The inversion
problem is then much more limited in scope, and one can apply expectations e.g. of
coupling between clumping, LAI, and leaf angle distributions for a given crop type. Crop
or other vegetation growth models, however, tend to require a significant number of
‘internal’ parameters describing how a particular variety will grow. Most models are
based on considerations of ‘integrated thermal time’ (sum of degree days since planting),
so knowledge of the local daily mean temperature is a primary requirement for running
such models. In addition, they may have limits applied to the light environment (so that
the models can be used at different latitudes). Some models consider water and nutrient
limiting factors, although there is much disagreement between models as to the impacts
of these. A key use then, of a canopy growth model, is to provide expectation of growth
under non-limiting conditions – any large disagreement between model and measure may
then be used to identify the impact of some additional limit (water, nutrients etc.).
LAI 1
LAI 4
LAI 0
Figure 4 Crop Mosaic
1.2.6 Scaling
The models we build of canopy reflectance or scattering will typically be defined
assuming operation over a single cover type – indeed, many models assume that the
canopy is of horizontal infinite extent. In reality, this may not be the case, particularly for
moderate or coarse resolution sensors. The advantage of optical moderate resolution
sensors for monitoring crop dynamics is the high temporal coverage that can be achieved,
even taking into account data lost due to cloud cover (which may be typically as high as
75% loss in Northern Europe). The main disadvantage is that pixel resolutions of 100s of
metres to several km will tend to observe more than one field at a time, particularly given
North European field sizes of typically a few Ha. High spatial resolution optical data are
available relatively rarely for cloudy environments, and cannot be guaranteed to provide
suitable temporal monitoring for crops. Current microwave sensors tend to have a much
lower information content, but higher spatial and temporal resolution.
If we wish to invert canopy parameters from moderate resolution sensors then, we must
be aware of (or ideally take account of) scaling effects. To gain an insight into scaling
effects, consider a mosaic of crop types viewed within a single moderate resolution pixel
(figure 4). The area comprises 20% of LAI 0, 40% LAI 4, 40% LAI 1. The ‘real’ total
value of LAI is thus: 0.2x0+0.4x4+0.4x1=2.0. We assume a model of reflectance as:
   (1  exp(  LAI / 2))   s exp(  LAI / 2)
(4)
We now consider two typical cases for leaf and soil properties:
  0.2;  s  0.1
  0.9;  s  0.3
visible:
NIR
canopy reflectance
0.9
0.8
0.7
reflectance
0.6
0.5
visible
NIR
0.4
0.3
0.2
0.1
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
9
11
7
5
3
1
0
LAI
For the visible reflectance, the canopy reflectance over the pixel is 0.15 and 0.60 for the
NIR. If we assume the model in equation 4 to hold, this equates to an LAI of 1.4. The
‘inversion’ underestimates the ‘correct’ (scaled) answer of 2.0. In addition, different
results may occur for different wavebands (they do not here as we have only a single
scattering model). It is worth noting that some parameters will scale more linearly than
others – examples of those that do are albedo and canopy cover. Effectively any
parameter we might model with a linear model scales approximately linearly (other than
through multiple scattering effects with the atmosphere and potentially between cover
types). Any parameter we see in a non-linear sense in a model (e.g. exp(-LAI)) is unlikely
to scale linearly.
Figure 5. Contour plot of relative difference in LAI derived from unadjusted LAI
retrieval algorithm as a function of spatial resolution and pixel heterogeneity
(dominant land cover purity): (a) Grasses and Cereal Crops, (b) Shrubs, (c) Broadleaf
Crops, (d) Savannas, (e) Broadleaf Forests, and (f) Needle Forests.
Tian et al. (2002)4 examined a theoretical basis for studying and understanding scaling of
LAI. They consider a ‘pure’ 1 km resolution dataset composed of 6 biomes (plus bare
4
Yuhong Tian, Yujie Wang, Yu Zhang, Yuri Knyazikhin, Jan Bogaert and Ranga B. Myneni, (2002),
Radiative transfer based scaling of LAI retrievals from reflectance data of different resolutions,
Remote Sensing of Environment, 84, Pages 143-159
ground) and characterize the mixing of these biomes at different spatial resolutions
through a ‘purity index’ (mean of the cover type percentages). They then group pixels
into three classes: ‘pure’ (group 1) (purity index > 90%), ‘mixed’ (group 2) (50% to
90%), and ‘heterogeneous’ (group 3) (<50%). A measure of heterogeneity is considered
as the purity index of the dominant land cover type. They then define an ‘error’ in LAI
through a relative difference between the ‘real’ and ‘inferred’ LAI values. This is shown
in figure 5, for different dominant land cover types. We can see that, particularly for
needle-leaf forests, when the ‘purity index’ is low, we can incur error of up to 40% in
LAI due simply to scaling. A dependency on spatial resolution is shown, but the
dominant effect can be attributed to ‘pixel purity’.
1.2.6 Options for Numerical Inversion
In the general case, we may have a complex, non-quadratic error surface in many
dimensions. We could potentially go through all combinations of parameters to find
which set gives a minimum in the error value, but may take far too long. However, if the
error surface is known to be well-behaved, we may chose to sample the complete
parameter space, using a so-called Look-up Table (LUT) inversion. This has been seen
to be an attractive option for the inversion of optical models in recent years.
Often, we tend to use iterative numerical techniques to attempt to obtain the minimum
of our error function. There are a variety of methods we can use, e.g. Quasi-Newton
methods (an algorithm found in NAG which is used by a number of authors), or the
Powell algorithm (an algorithm found in Numerical recipes in C, also used by many
authors). Although such techniques are generally quite complex in nature, they are a
valuable tool for the relatively rapid inversion of non-analytical functions. Basic aspects
of these are discussed in greater detail below. Our personal experience is that the Powell
algorithm is relatively robust, and tends to perform better than the NAG routine, but this
may be case specific. Some improvements may be possible if we can find analytical
solution to 1st and 2nd order partial derivatives, but most authors seem not to bother
with this; in any case we often find that other difficulties also arise, especially in highly
non-linear systems of equations.
Other options for numerical inversion, which are reviewed below, include: Knowledgebased Systems (KBS); Artificial Neural Networks (ANNs); and Genetic Algorithms
(GAs). Selecting the right inversion algorithm for a particular problem is of critical
importance in parameter estimation – one must weigh up the advantages and
disadvantages of each method for a particular task.
1.3 How can we assess the inversion performance?
Generally, in order to assess the performance of a numerical inversion algorithm, we look
at the 'goodness-of-fit' measure associated with the inversion )e.g. root-mean-squared
error (RMSE)). However, we should understand that this is a global average of the error
function, and doesn't tell us anything about how well the model fits the data at any
particular sample location. This may well be critical if the model is particularly sensitive
to errors at certain sample locations (e.g. parameters describing the hot-spot will tend
only to be relevant in the region of the hot-spot). Obtaining an understanding of how
critical particular angular samples are to the inversion is obviously very important if we
wish to derive accurate estimates of the surface parameterization from RS data.
A relevant general comment at this point is that the more parameters we define as
unknowns in our model, the more difficult it will be to derive accurate estimates of each
parameter. This is because there will always be some degree of coupling between model
parameters, such that errors in the estimation of any particular parameter can often be
compensated for by errors introduced in other model parameters. Thus, there will always
be a trade-off between the accuracy with which we can hope to model canopy BRDF with
a limited number of parameters, and the maximum number of parameters we can expect
to derive from our RS observations.
There are several other ways in which we can attempt to assess the performance of a
model and the quality of parameters we invert from it. One way is to look at the
implications of using a particular angular sampling set in our inversion, for instance,
looking at how the model parameters vary as we have a greater or lesser number of
samples at different solar zenith angles, as well as particular viewing angles. For any
given sensor, at a given time of year, we will obtain a limited number of angular viewing
samples. For many sensors, the range of sun angles of these observations will also be
limited. We should therefore attempt to understand how well the different sampling
capabilities of various sensors affect the parameters we obtain. It is apparent from many
inversion studies, for example, that inverting the directional reflectance data for different
sun angles can lead to significant variations in the derived model parameters.
The 'quality' of any inversion depends of course on the purpose of the inversion. If it is
to derive biometric parameters from RS data, then we need to consider how accurately
such parameters are derived. If however, the purpose is to use BRDF models to obtain
estimates of the surface albedo, then we might indeed use a different set of criteria in
assessing model performance. Goel (1988) discusses why it is important to understand
the sensitivity of a model. This has many implications for model inversion, as, for
instance, if our model was particularly sensitive to errors in knowledge of the viewing or
sun angles, then any errors introduced in assigning these 'fixed' parameters would lead to
significant errors in model inversion. We must also consider how sensitive a model is to
any (residual) 'noise' in the system, i.e. any noise due to quantization effects or errors in
atmospheric correction, any errors introduced by not taking the sensor MTF into account,
etc.
1.4 How can we test the quality of the derived
parameters?
There are several ways we might consider to assess how well the model is performing in
inverting derived parameters.
One method is to use field radiometric data, where we can attempt to measure
biophysical parameters of the canopy directly (e.g. LAI, leaf angle distribution) and, at
the same time, obtain measurements of canopy reflectance. We can then invert the
canopy reflectance measurements against our model and compare the resulting
parameters with those we measured. There are several problems with such techniques,
such as the accuracy with which we can hope to measure the average leaf angle
distribution of a complex canopy, or determine the 'average leaf single-scattering albedo',
but such methods can at least give an indication of how well a model performs under
realistic conditions.
One can also try such methods of comparison with airborne or spaceborne remote
sensing instruments, although the problems then tend to be compounded by increased
uncertainties in atmospheric correction, pointing accuracy, etc., as well as being limited
by the sampling capabilities of the sensor, as well as the smaller scale over which one has
to obtain ground-based biometric measurements.
One can also test the performance of any particular model against other models, though
one is obviously limited by the assumptions made in the original model formulation. It is
perhaps more useful in this regard to make use of numerical solutions to the radiative
transfer equation, and to use complex 3-D models of the vegetation canopy, because of
the flexibility offered by such methods. When using numerical solutions to the radiative
transfer equation, one can also attempt to test how well the particular assumptions made
in any analytical approximation perform at a number of different levels, e.g. by
simulating a canopy with constant LAI, known leaf angle distribution, and known singlescattering albedo (i.e. assume a Lambertian reflectance model). This can then provide us
with information on how well the structural and radiometric assumptions made in the
analytical formulation perform.
We also have the potential with numerical solutions to derive information on the various
components of the radiation exitant from the canopy, e.g. that due to single-scattering
events and that due to multiple-scattering events. In this way, we can directly compare
such data with the analytical solutions to single- and multiple-scattering.
The main problem with comparing one model against another is that we are testing it
only for the range of assumptions made in the more complex of the two. Thus, we
need to make use of recent developments in modelling the complex geometry of plant
canopies to be sure that our model performs well under realistic conditions.
In practice, we might often be using simple models which e.g. assume that the canopy is
homogeneous, even if we know that this is unlikely to be strictly so. In this case, we need
to know the utility of any parameters we might derive by such methods.
1.5 Conclusions
You should have gained some insight from this section into the basic principles of
inversion techniques, and be able to appreciate the difference between analytical and
numerical solutions. You should also have gained some knowledge of the issues and
problems associated with inversion, which should allow you to assess the way in which
model performance tends to be tested in the literature, as well as enabling you to perform
useful tests of model performance and accuracy. You will no doubt have noted that there
are many such problems, and that we do not necessarily have answers to all of them as
yet.
You should back up what you have learnt in this lecture and the following practical
exercise with further reading. A useful starting point is Goel (1988), although you will
tend to find some discussion on model inversion in many of the papers you will read on
model development.
2. Numerical Iterative non-linear inversion techniques
2.1 Introduction
In most cases of non-linear inversion, we need to look for numerical solutions to the
problem of finding the minima of our defined error functions. Computationally, these
need to be as efficient and robust as possible i.e. using the optimum amount of
information and iterations. There is no 'best' solution - be aware that different techniques
can be used in an attempt to find the optimal solution.
In inverting canopy reflectance models against reflectance data, we are usually interested
in constrained optimisation - a special case of the general optimisation problem where
some variables may be fixed, or lie between fixed limits. Figure 1 shows some of the
difficulties this may pose in the 1-D case (from Numerical Recipes in C, Press et al., p
395).
Figure 6 Local and Global Minima
In figure 6 - points A, C and E are local, but not global maxima. Points B and F are local,
but not global minima. The global maximum occurs at G, which is on the boundary of the
interval so that the derivative of the function need not vanish there. At E, derivatives
higher than the first vanish, which can cause difficulties for some algorithms. The points
X, Y and Z are said to bracket the point F since Y is less than both X and Z i.e. Y is a
"first guess" at a minima in this region. It is important to note that in the general case of
N-dimensional optimisation (particularly if not using derivatives), the solution is reduced
to a series of N (iteratively improving) minimisations in 1-D. As a result, nearly all the
algorithms we will see can be reduced to the problem of finding the extrema of a 1-D
function.
How do we choose which methods we use? We need to decide between methods that
need only to evaluate the function, against those which calculate (partial) derivatives.
Derivative methods may often be more powerful in terms of finding global minima
(assuming functions are well-behaved and differentiable). Derivative methods such as the
conjugate gradient algorithms of Fletcher-Reeves and Polak-Ribiere, and the well-known
quasi-Newton algorithms of Davidon-Fletcher-Powell (DFP) and Broyden-FletcherGoldfarb-Shanno (BFGS) are widely used in practice due to their robustness. Derivative
methods however require many more function evaluations than is desirable for rapid
inversion and also require a priori knowledge of the behaviour of the function. Because
of this, we will concentrate on methods not requiring calculation of derivatives. Another
computational issue is that of storage: we need to decide between methods requiring
storage of order N (N is the number of dimensions of the minimisation problem), and
order N2.
2.2 Golden Section Search
This method is one of the simplest for determining extrema, and describes the successive
bracketing of a minimum of a function f to provide an improving 'guess' at the 'real' value
of the minimum.
Figure 7 Golden Section Search
Figure 7 (see Press et al., p 398) - successive bracketing of a minimum. The minimum is
originally bracketed by points 1,3, and 2. The function is evaluated at 4, which replaces
2; than at 5, which replaces 1; then at 6, which replaces 4. The rule at each stage is to
keep a center point that is lower than the two outside points. After the steps shown, the
minimum is bracketed by points 5,3, and 6.
A minimum is bracketed when there is a triplet of points a < b < c (or c < b < a) such that
f(b) < f(a) and f(c). In Figure 2, points 1,3 and 2 equate to points a, b and c i.e. f(3) < f(1)
and f(2). We can successively improve this bracketing by performing the following steps:





Evaluate f at some new point x, either between 1 and 3, or between 3 and 4.
Choosing a point between 3 and 4 say, we now evaluate f(4).
f(3) < f(4), so the new bracketing triplet of points is 1,3 and 4.
Choose another point at 5, say, and evaluate f(5).
f(3) < f(5), so the new bracketing triplet is 5, 3 and 4.
In each case, the middle point of the new triplet is the best minimum achieved so far.
This process is repeated until the distance between the two outer points becomes
sufficiently small (i.e. approaches some set limit) - where sufficiently small means
approximately the square root of machine floating point precision.
An important question you should be asking is "How do we choose the new point x each
time?".
Figure 8 Golden Ratio
If we say that b is some fraction w of the way between a and c then w = b-a/c-a, and,
consequently, 1-w = c-b/c-a. If the new point, x, is some distance z beyond b, then z = xb/c-a. The next bracketing segment will therefore either be of length w+z relative to the
current one, or of length 1-w. To minimise the worst case, then we can choose z to make
these equal i.e. z = 1 - 2w. This gives us the symmetric point to b in the interval a to c i.e.
|b - a| = |x - c|. This implies that x lies in the larger of the two segments ( z > 0 only if w <
1/2).
How does the value of w get chosen? It must come presumably from the previous stage
of applying the same strategy, so if z is chosen to be optimal, then so was w. This scale
similarity implies that z should be the same fraction of the way from b to c (if that is the
bigger segment) as was b from a to c i.e. w = z/(1-w). We can use these two equations in
w and z to form the quadratic equation w^2 - 3w +1 = 0 yielding w ~ 0.38197, and 1-w
~0.61803. Therefore the optimal bracketing interval (a,b,c) has its middle point b a
fractional distance 0.38197 from one end, and 0.61803 from the other. These are the socalled golden mean fractions.
In each step of our golden section search therefore, we select a point a fraction 0.38197
into the larger of the two intervals (from the central point of the triplet). An additional
feature of this method is that if our initial triplet has segments not in the golden ratios, the
procedure outlined above will quickly converge to the proper self-replicating ratios.
2.3 Parabolic Interpolation and Brent's Method in One
Dimension
The golden section search is a method designed to handle the least favourable of function
minimisations: no assumptions are made about the shape of the function in the region of
the minimum, and the minimum is bracketed with greater and greater precision with each
iteration. However, if the function is smooth near the minimum, which is likely for most
'well-behaved' functions, then a parabola fitted through any three points near to the
minimum should take us very close to the actual minimum in one step. Figure 9
illustrates this case.
Figure 9 Convergence by inverse parabolic interpolation (see Press et al., p 403)
This method is known as inverse parabolic interpolation, as we want to find a point x at
which our function y=f(x) is a minimum (not just a value of y). The formula for the
abscissa x that is the minimum of a parabola through three points f(a), f(b) and f(c) is:
x = b - 1/2 * (b-a)^2*(f(b) - f(c)) - (b-c)^2*(f(b)-f(a))
------------------------------------------------(b-a)*(f(b)-f(c)) - (b-c)*(f(b)-f(a))
This equation fails only if the three points are collinear. However, it is just as likely to
jump to a maximum as a minimum, and somehow this tendency has to be overcome.
A technique likely to be both stable and efficient is one which will use a slow-but-sure
method such as the golden section search in unfavourable areas, but can also switch to
parabolic interpolation when it comes sufficiently close to a minimum. In this way, we
can be sure we are approaching a minimum rather than a maximum (or vice versa) before
switching to the more efficient parabolic method. The main difficulties of this approach
are in finding a quick, robust scheme for deciding which of the two techniques is likely to
be favourable, in addition to being careful not to get into problems when we are very near
to a solution and the function is being evaluated close to the machine precision mentioned
earlier.
Brent's method is such a scheme. At each stage, it keeps track of six function points, a, b,
u, v, w, and x. The minimum is bracketed between a and b; x is the point with the least
function value found so far (or the most recent in the case of a tie); w is the point with the
second least function value; v is the previous value of w; u is the point at which the
function was evaluated most recently.
Parabolic interpolation is tried through the points x, v, and w. To be accepted as a
successful move, the parabolic step must fall between a and b and imply a movement
from the best current value, x, that is less than half the movement of the step before last.
This second critetion insures that the parabolic steps are actually converging to
something. In the worst possible case, where the parabolic steps are acceptable but
useless, the method will approximately alternate between parabolic steps and golden
sections, converging in due course by virtue of the latter. Press et. al. explain the reason
for choosing the step before last (as opposed to the previous step) as purely heuristic i.e.
through experience, it seems better not to 'punish' the algorithm for taking one bad step if
it can make it up on the next one.
2.4 Multidimensional minimisation methods
The minimisation problem becomes far more complex in the multidimensional case,
which (not surprisingly!) is of most interest to us. Non-linear canopy reflectance models
are likely to be functions of many parameters (the Kuusk model, for example, has 15
separate parameters). The next section describes some methods that we can use in these
cases. An excellent starting point to look for material regarding optimisation problems,
with material on a huge range of optimisation techniques and associated literature are
http://solon.cma.univie.ac.at/%7Eneum/glopt.html
and
http://www.csc.fi/math_topics/Movies/Opt.html. They include some excellent
visualisations and animations of optimisation problems (GA).
2.4.1 Downhill Simplex Method (aka amoeba or Nelder-Mead)
This is the only multidimensional minimisation method we will come across here which
does not make use of a 1-D minimisation algorithm like the ones described above. The
downhill simplex method (due to Nelder and Mead), is a method requiring only function
evaluations, not derivatives. It is a relatively simple method, and although not particularly
efficient (in comparison to Powell, which we will come across later), it can often be the
most robust.
A simplex is the geometrical figure consisting, in N dimensions, of N+1 vertices and all
their interconnecting line segments, polygonal faces etc. In 2-D this is a triangle, in 3-D a
tetrahedron. If any point of the simplex is taken as the origin, then the other N points
define vector directions that span the N-dimensional vector space. There is no analogous
bracketing method (as we saw in 1-D) in multidimensions. All we can do is give the
algorithm a starting guess i.e. an N-vector of independent variables to try as the first
point. The algorithm is then supposed to make its own way "downhill" (in function
space) until it finds a minimum. Figure 10 illustrates this idea.
Figure 10 - possible outcomes for a step in the downhill simplex method
(Press et al., p 409).
If we consider the starting point, in N dimensions we must define N+1 points, which
define the initial simplex. If we define one of these points as Po then the other N points
are simply:
Pi = Po + kei
where the ei's are N unit vectors, and where k is a constant which is our guess of the
problem's characteristic length scale. We proceed by taking a series of steps away from
the starting point, and seeing if these steps take us to a more favourable position (closer
to a minimum). The steps available are (a) a reflection away from the high point (b) a
reflection and expansion away from the high point (c) a contraction along one dimension
from the high point or (d) a contraction along all dimensions towards the low point.
When it can, the algorithm expands the simplex in one or other of the directions, to take
larger steps. When it reaches a "valley floor" (think of a long, steep-sided valley sloping
away from us), it contracts itself in the transverse (cross-valley) direction in order to ooze
down the valley. If these steps are not available, the simplex can contract around its
lowest (best) point.
How do we know when we have reached a minimum? This can be a difficult question in
any multidimensional minimisation algorithm. The sensible thing to do is to see whether
one of two criteria are satisfied, namely (i) identify one "step" of the algorithm - when the
vector distance moved in one step is less than some tolerance, then stop, or (ii) require
that the decrease in the function value in one step be fractionally smaller than some
defined tolerance.
A final warning is that both these termination criteria can be fooled by a single,
anomalous, step which doesn't go anywhere. To avoid this, it is a good idea to restart the
any multidimensional minimisation algorithm from the point where it claims to have
found a minimum. If it "really" has, this will not be a costly move, and if it hasn't, well
......
2.4.2 Direction Set (Powell's) Method
Another technique for multidimensional minimisation is Powell's conjugate set method.
This is another method not requiring the caclulation of gradients, however this time it is a
method which utilises the 1-D line minimisation techniques we saw earlier. If we start at
a point P in N-dimensional space, and proceed from there in some vector direction n,
then any function of N variables f(P) can be minimised along the line n by using the 1-D
methods. Different methods differ only in the way they choose the next direction, n, to
try.
Let us look at the simplest direction set method. If we take the unit vectors e1, e2,....eN as
a set of directions, we should be able to cycle through the directions, using our 1-D line
minimisation methods to minimise along each direction in turn, before going on to the
next. This is a good method in some cases, but figure 11 shows where this method
struggles.
Figure 11 (see Press et al. p. 414)
In the case of a long, narrow valley, not aligned with our chosen axes (X and Y in this
case), this method has to follow our base axes, and so must take many small steps to
crawl towards the minimum. In this case, a more sensible choice of directions is needed.
All direction set methods consist of recipes for updating the set of directions as the
method progresses, attempting to come up with a set which either (i) includes some very
good directions that will take us fast along narrow valleys (e.g. at 45o to X and Y in
figure 6), or else (ii) includes some number of "non-interfering" directions with the
special property that minimisation along one is not "spoiled" by subsequent minimisation
along another, so that we do not have to contiually cycle through all directions.
Powell's method does just this. These "non-interfering" directions are more formally
defined as conjugate directions, and Powell's method produces N mutually conjugate
directions along which to minimise. First, we initialise a set of directions ui = ei where
i=1,....,N. Next we repeat the following steps:





Save the starting position as Po
For i=1,....,N, move Pi-1 to the minimum along direction ui and call this point Pi
For i=1,...,N-1, set ui = ui+1
Set uN = PN - Po
Move PN to the minimum along direction uN and call this point Po
This method is illustrated in figure 12.
Figure 12 - Powell's conjugate direction set method.
The first steps move to a new point, P, and line minimisation continues from P along the
new set of directions x' and y which are x' is the resultant direction after our first step. In
this case, it is clear that ideally we should be minimising along the axes x ideal and yideal,
but even after one step, our new set of directions is far closer to these than originally,
which is promising.
It can be demonstrated that N iterations of the basic procedure i.e. N(N+1) line
minimisations will exactly minimise a quadratic form. There is, however, one major
drawback to Powell as stated - the procedure of throwing away, at each stage, u1 in
favour of PN - Po tends to produce sets of directions that "fold up on each other" i.e. they
all start to point in the same way! If this happens, then the algorithm converges only to a
minimum of the function over some subsapce of the full N-dimensional case, giving the
wrong answer. To avoid this, we can try a number of things:
 Reinitalise the set of directions ui to the basis vectors ei after every N or N+1 iterations.
This is OK if it is important to retain the quadratic convergence property, which may be
the case if our function is close to quadratic.
 Brent states that the direction set can just as easily be reset to the columns of any
orthogonal matrix. Rather than throw away the information on conjugate directions
already built up, he suggests reconfiguring the direction set to the calculated principal
directions of this matrix.
 Powell himself suggested that we might sacrifice the quadratic convergence property in
favour of a more heuristic scheme which attempts to find a few "good" directions along
narrow valleys instead of N necessarily cojugate directions.
In practice, any of these adjustments to the basic Powell method may be used, with each
having slight differences in performance under different situations.
2.4.3 Simulated Annealing
We will now briefly look at a few completely different and novel minimisation
algorithms that move away from the idea of N-dimensional minimisation as a
combination of a few relatively simple techniques such as line minimisation in 1-D and
derviative methods. The first of these is simulated annealing. The minimisation
algorithms we have looked at up to now all roughly follow the same principles - have a
first guess at a solution, minimise in some linear direction, and keep going downhill (in
function space) until we reach a minimum. These methods are all very well, but their
common flaw is that they immediately head for the fastest downhill route, and will often
find a local minima, as opposed to a global one - if we want to avoid this, we generally
have to restart the whole thing from another location and see if it converges to the same
solution. In a situation where the function has only a few distinct minima this may be
sufficient, but in the case where the function has many small, poorly defined minima, this
may be of litle use. Figure 13 illustrates this problem.
Figure 13 - schematic representation of a function with several local minima, A, B, C,
and D, and a global minima at E.
Simulated annealing is so named because it has a direct thermodynamic analogy with the
annealing, or slow cooling, of metals, or the crystallisation of liquids. At high
temperatures, the molecules of a liquid move freely with respect to each other. If the
liquid is cooled slowly, thermal mobility is lost, and the atoms are often able to line
themselves up and form a pure crystal that is completely ordered over a distance of up to
billions of times the size of an individual atom in all directions. This crystalline state is
the state of minimum energy of the system. For slowly cooled systems, Nature is able to
find this state every time. If a liquid metal is quenched, or cooled quickly, it ends up in
some polycrystalline or amorphous state having much higher energy (this situation is
partly analogous to the other minimisation processes we have considered). How is Nature
able to solve such an incredibly complex optimisation problem, seemingly with no effort,
when we are struggling to solve problems requiring minimsation of only a few
parameters?
The basic process is one of slow cooling, allowing time for the atoms to redistribute
themselves as they lose mobility (energy). The annealing process is based on the very
important Boltzmann probability distribution
Prob(E) ~ exp(-E/kT)
where k is a constant relating temperature to energy. This equation simply states that a
system in thermal equilibrium at temperature T has its energy probabilistically distributed
among all different energy states E. That is, all energy states are possible, but some are
far more likely than others. Even at low T there is a small probability that the system may
be in a high energy state. There is, therefore, a corresponding chance for the system to get
out of a local energy minimum in favour of finding a better, more global one. This means
our system can go uphill occasionally, as well as downhill, although as the temperature
falls, it is less and less likely that it will go uphill.
The algorithm for simulated annealing is implemented in the following way. Offered a
succession of options, a simulated thermodynamic system is assumed to change its
configuration from energy E1 to E2 with probablity P = exp[-(E2 - E1)/kT]. If E2 < E1, P >
1. In this case, P is arbitrarily assigned the value 1 i.e. the system always took such an
option. However, if E2 > E1 i.e. the option is to go to a higher energy state, P is postitive,
but less than 1 i.e. there is a small but finite probability that the system may take this
option. Hence our system will always take a downhill step that is offered to it, but will
sometimes take an uphill one if offered that.
In practice, we need to provide the following elements if we wish to use this algorithm:
 A description of possible system configurations i.e. available states that the system
"could" be in.
 A generator of random changes in the configurations - these changes are the "options"
presented to the system.
 An objective function E (analog of energy) whose minimisation is the goal of the
procedure.
 A control parameter T (analog of temperature) and an annealing schedule which tells
how it is lowered from high to low values e.g. after how many random changes in
configuration is each downward step in T taken, and how large is that step.
Figure 14 illustrates this procedure.
Figure 14 - a 1-D function with 6 separate energy maxima and minima.
In the annealing process, going from E1 to E2, E3 to E4 and E5 to E6 are all downhill
steps, and so have a probability of 1. Going from E2 to E3, and from E4 to E5 are =uphill
steps, and consequently have a small but finite probability of occurring. This small
probability is what controls the possibilty of going from the local minima at E2 and E4,
to the global minima at E6. Other minimisation algorithms that we have looked at do not
possess this ability.
The simulated annealing idea is applicable to the optimisation of continuous Ndimensional functions, f(x), where x is an N-dimsensional vector. The four elements
required by the above procedure are now as follows: the value of f is the objective
function. The system state is the point x. The control parameter T is, as before something
like a temperature, with an annealing schedule by which it is gradually reduced. There
must also be a generator of random changes in the configuration i.e. a procedure for
taking a random step from x to x + dx, which are the new "options" available to the
system. In practice, this may be the most difficult of the steps to implement, and a
number of solutions have been proposed for this (see Press et al., p 451).
All applications of simulated annealing suffer from the difficulty of defining the rate of
cooling. How slow is "sufficiently slow"? The success of the approach may be controlled
by this factor. There are a number of possibilities worth trying:
 Reduce T to (1-e)T after every m moves, where e/m is determined by experiment.
 Budget a total of K moves, and reduce T after every m moves to a value T = To(1k/K)a, where k is the cumulative number of moves thus far, and a is a constant, say 1,2 or
4. The optimal value of a depends on the statistical distribution of relative minima of
various depths. Larger values of a spend more iterations at lower temperature.
 After every m moves, set T to b times f1-fb where b is an experimentally determined
constant of order 1, f1 is the current lowest function value available, and fb is the "bestever" function value encountered.
Simulated annealing has been applied in practice to solve a classic minimsation problem
known as the "Travelling Salesman" problem (see Press et al., p 445-450). The aim is to
find the shortest possible route for a travelling salesman visiting N arbitrarily located
cities in turn before returning to the starting point (visiting each city only once). This kind
of problem is known as an NP-complete problem, and computation time for an exact
solution rises proportionally to ekN where k is some constant. As N increases, the time
taken to try all solutions rapidly becomes prohibitive. Simulated annealing solves this
with the number of calculations only scaling as a small power of N. This has made
simulated annealing particularly useful in specialised problems such as that of locating
components during chip design in order to minimise heat generation and interference.
3 Other Methods
There are a variety of other intriguing methods of multidimensional function optimisation
that have arisen from i) the need to find sophisticated methods to solve a huge range of
optimisation problems in a whole range of mathematics, physics, engineering and
modelling applications, many of which do not surrender easily to the "standard" methods
mentioned above ii) increases in computing power that have pushed development of more
sophisticated algorithms iii) the observation of the way Nature has created very elegant
ways of solving seemingly impossible optimisation tasks.
3.1 (Artificial) Neural networks (ANN)
These methods are based on the observation that bioligical neural networks (i.e. brains)
are exceptionally good at solving incredibly complex non-linear optimisation problems.
They can do this by "training", system responses to behave according to previously
experienced situations, and adapting these responses to new situations.
There is no universally accepted definition of an ANN, but a general definition is that an
ANN is a network of many simple processors ("units"), each possibly having a small
amount of local memory. The units are connected by communication channels
("connections") which usually carry numeric (as opposed to symbolic) data, encoded by
any of various means. The units operate only on their local data and on the inputs they
receive via the connections. The restriction to local operations is often relaxed during
training.
This figure schematically illustrates the structure of a simple feed-forward neural net with
only one hidden layer, and one output (the figure comes from
http://www.cbs.dtu.dk/services/SignalP/methods_nn.html).
The neurons have several inputs but only one output. The output from a neuron is
calculated as a function of a weighted sum of the inputs. A threshold can be set so that
the neuron can have a non-linear output response - if the signal is less than the threshold,
the output is zero, greater than the threshold and the ouput is 1, or the weighted sum of
the inputs. In an ANN, neurons are connected (with the output of one neuron being the
input of another neuron). When the ANN is presented with an input (e.g. a set of
reflectance data) it will calculate an output that can be interpreted as a classification of
the input (e.g. the model parameter values that result in this reflectance). It is possible to
``teach'' a neural network how to make a classification by repeatedly presenting it with a
set of known inputs (the training set - e.g. directional reflectance data sets) and
simultaneously modifying the weights associated with each input in such a way that the
difference between the desired output (e.g. measured reflectance) and the actual output
(e.g. modelled reflectance) is minimised according to some measure such as RMSE. The
ability of the ANN to change the output to reflect the input is important, this ability gives
the neuron the capability to learn. The more (distinct) training data we can throw at the
ANN, the more refined the behaviour of the ANN should become i.e. better able to
distinguish between finer variations in the input data. This is achieved via back
propagation training.
A general Neural Network algorithm is as follows:






Set the inital weights of each node
Repeat
Show input example and desired output
Work out actual output
Update weights
Until no more examples
ANNs have a number of advantage of solving the types of non-linear optimistion
problems we have been considering: they can be used where we can't formulate an
algorithmic solution, where we can get lots of examples of the behaviour we require or
where we need to pick out the structure from existing data. A major drawback is that
when we change the nature of the training data e.g. different angular or spectral sampling,
then the ANN must be retrained. It suggests that ANN inversion of multispectral data
might be a possibility for fixed geometry, multiple waveband sensors (e.g. TM) but is not
so useful in the general case (arbitrary viewing and illumination geometry). There are a
number of examples of the application of ANNs to the problem of inverting biophysical
parameters from optical and microwave data of vegetation canopies e.g. Chuah, 1993;
Pierce et al., 1994; Gopal and Woodcock, 1996; Jin and Liu, 1997; Abuelgasim et al.,
1998. In addition, ANNs have proved very promising for the classification of EO data,
where supervised methods (requiring training data) are already the norm e.g. Foody,
1997a and 1997b.
An excellent introduction to ANNs is at Stirling University with many external links. For
more information on neural nets see the page mentioned above, or the Neural Net FAQ .
3.2 Genetic (or evolutionary) algorithms (GAs)
It has long been noted that Nature manages to solve THE optimisation problem, i.e.
maximisation of species survival and propagation, through genetic mutation and the
subsequent favouring of a few of these mutations in the long term. If we can phrase our
optimisation problems in terms of some non-linear "fitness-for-survival" function, genetic
mutation and propagation models can be applied to these problems in order to find
optimal solutions. Effectively a GA is a method by which some description of the
function state (known as a "string" - equivalent to a gene pattern) is changed in some way
according to a probabilistic model (typically crossover, mutation or inversion). In the
case of a BRDF model, this would be the N-D vector representing the current state of the
model parameters. This mutation is then propagated to the subsequent generation in some
predefined way, such as "mating" - the recombination of two parent strings to create an
offspring string. Finally, the new generation "mutation" is tested to see if it better fulfils
our survival criteria (or has a lower RMSE in terms of model fit). In this way we can see
that the GA progresses (as in Nature) through mutation and selection via sexual
reproduction, characterised by recombining two parent strings into an offspring.
A
general
Genetic
Algorithm
is
as
follows
(see
http://osiris.sunderland.ac.uk/cbowww/AI/TEXTS/ML2/sect4.htm) for an illustrated
example of this):






Populate a set of chromosomes
Repeat
Determine fitness of chromosomes
Choose best chromosomes
Evolve chosen chromosomes using crossover, mutation or inversion
Until a chromosome is found of a suitable fitness
GAs differ from normal optimisation methods in four ways:

They work with a coding of the parameter set, not the parameters themselves



They search from a population of points, not a single one
They use 'payoff' information (some objective function) - not derivatives or other
auxilliary knowledge.
GAs use probabilistic transition rules, not deterministic ones (c.f. simulated
annealing).
The keys to GA are the inital coding of the description of the function state (into strings),
and the way in which these strings are changed at each step according to some rule (e.g.
crossover or mutation, mentioned above)
The most basic GA is as follows:






STEP 0: Define a genetic representation of the problem
STEP 1: Create an initial population P(0) = x10, ...., xN0. Set t=0.
STEP 2: Compute the average fitness f(t) = SUMiNf(xi)/N. Assign each individual
the normalised fitness value f(xi)/f(t).
STEP 3: Assign each xi a probability p(xi, t) proportional to its normalised fitness.
Using this distribution, select N vectors from P(t). This gives the selected set of
parents.
STEP 4: Pair all parents at random, forming N/2 pairs. Apply crossover with a
certain probability to each pair and other genetic operators such as mutation,
froming a new population P(t+1).
STEP 5: Set t = t + 1, return to STEP 2.
Some advantage of GAs are their flexibility and power - they may be able to solve
problems characterised by very many small, ill-defined minima, the sort of problem that
is typically almost impossible to solve by standard methods. Another major advantage is
that they are intuitively understandable. The difficulty of using GAs practically is that
they combine two different search strategies: random search by mutation and a biased
search by recombination of the strings contained in a population, and defining the balance
between them is by no means straightforward. More relevant to problems in CR model
inversion is the fact that it may take a huge number of generations (iterations) for such a
system to converge to a solution.
I have donwloaded several useful references to GAs, of which there are copies in the
RSU. Also, see Goldberg (1988), MATHS NA20 GOL in the DMS Watson library.
3.3 Knowledge-based systems (KBS)
KBS comprise a variety of methods seeking to solve complex computational problems
using sources of information external to the problem itself. In CR model inversion, work
done on KBS attempts to use an understanding of the inversion problem to drive the
optimisation process. An example in vegetation modelling is the work of Kimes and
Harrison (1990) and Kimes et al. (1991) who developed VEG, a system for inferring
physical and biological surface properties from reflectance data sets. VEG integrates the
traditional spectral (directional) data with diverse knowledge bases derived from
literature, field-measured directional reflectance data sets and information from human
experts into an efficient system for inferring biophysical parameters from reflectance
data. Significant devlopmental work was done on KBS in the late 80s - early 90s but the
major problem of such systems, namely how the knowledge is defined and input into the
system, has never been overcome satisfactorily. This approach has been largely taken
over by ANNs, where the decisions as to what response to a particular input is used is
concealed in the 'black box' hidden layer(s) of neurons. The drawback of this method of
course is that no knowledge of how the problem is being solved within the black box is
available - something that is made explicit in the KBS approach. A more promising
approach for CR model inversion than either ANN (limited by training data) or KBS
(how do we define knowledge data?) is that of look-up tables (LUT).
3.3 LUT Inversion
We noted LUT inversion earlier in these notes as a potentially useful method, particularly
for optical remote sensing with a high information content. The principles of the method
are simple: define a set of points in parameter space and find some subset of these points
which give the best match (minimum RMSE) to the observations. One major advantage
of LUTs (and ANNs) is that the canopy reflectance modeling does not need to be
performed ‘on the fly’ in the inversion: model simulations are stored in the LUT entries.
This means that one is not limited to a simple analytical model for forward modeling –
any (complex) numerical code can be used – the forward modeling task is divorced from
the inversion procedure.
The key to applying the technique is in defining a small enough original set to properly
sample the parameter space, but at the same time keep the number of RMSE calculations
to a minimum. Knowledge of external parameters e.g. VIs, cover type,
viewing/illumination geometry etc. are then used to limit the search space in as many
ways as possible.
The use of external information (knowledge) to limit the search space implies that LUT
methods are closely related to KBS methods. LUT methods have a simpler requirement
for the use of external data however, and can be applied quite simply in practice e.g.
MODIS LAI/FPAR algorithm (see MOD15 LAI/FPAR ATBD.
A good example of a LUT inversion is provided by Combal et al. (2002), in which they
compare LUT inversion with ANNs and Quasi-Newton methods. They found that LUT
and Quasi-Newton methods were more sensitive to model inaccuracies, but that ANN
inversion produced less accurate results overall.
4. Summary
A number of minimisation (more generally, optimisation) techniques that can be used to
derive biophysical parameter information via the inversion of CR models against
reflectance data have been introduced. A brief summary of these, including some of the
pros and cons in each case is given below.






The most commonly used non-linear inversion methods (ignoring derivative
information) are Powell (conjugate direction set) and Nelder and Mead (amoeba).
Relatively fast and robust, but tend to converge to local minima -> need to run
several times from different starting positions.
Simulated annealing - probabilistic partition of energy states of a system - allows
finite probability of system going from low -> higher energy state so big
advantage is ability to get out of local minima. Drawback is maybe slower, and
how to define annealing schedule.
ANNs - network trained using "known" reflectance data, to derive model
parameters from measured data. Very fast once trained. Limitation is need for
training data sets to be of same nature (i.e. viewing, illum. geometry, spectral
bands) as measured data. Has been applied to EO data. Also good for
classification.
GAs - very novel approach to problem of N-D optimisation - mutation and
reproduction, survival-of-the-fittest. Advantage is long, gradual change may allow
for very subtle solutions but likely to be very slow - few examples of applications
in EO (eg. Clark and Canas, 1993), but many in engineering.
KBS - use additional information to refine process of finding solution e.g. Kimes
et al. VEG model. Advantages, maximise use of data, potentially more robust
inversion. Disadvantage of how additional information is defined and used.
LUT - subset of KBS (?) - Precalculate many CR scenarios e.g. many
viewing/illum geom., spectral bands, biome types etc. prior to run-time. At runtime, follow a search path (tree) using additional data (e.g. biome type), to
drastically reduce search space before searching for best-fit solution. Fast once
LUT is computed and can just recompute LUT if model is altered. Disadvantage
is defining search path and additional data to be used.
5. Additional References




Abuelgasim, A. A., Gopal, S. and Strahler, A. H. (1998) Forward and inverse
modelling of canopy directional reflectance using a neural network,
INTERNATIONAL JOURNAL OF REMOTE SENSING, 1998, Vol.19, No.3,
pp.453-471.
Brent, R. P., 1973, Algorithms for Minimisation without Derivatives (Englewood
Cliffs, NJ: Prentice-Hall).
Bunday, Brian D. (1984) Basic Optimisation Methods (Edward Arnold, London).
Chuah, H. T. (1993) An artificial neural-network for inversion of vegetation
parameters from radar backscatter coefficients, JOURNAL OF












ELECTROMAGNETIC WAVES AND APPLICATIONS, 1993, Vol.7, No.8,
pp.1075-1092.
Clark, C. and Canas, T. (1993) A genetic algorithm approach approach to spectral
identification, Proc. RSS '93, Chester.
Foody, G. M. (1997a) Fully fuzzy supervised classification of land cover from
remotely sensed imagery with an artificial neural network, NEURAL
COMPUTING & APPLICATIONS, 1997, Vol.5, No.4, pp.238-247.
Foody, G. M. (1997b) An evaluation of some factors affecting the accuracy of
classification by an artificial neural network, INTERNATIONAL JOURNAL OF
REMOTE SENSING, 1997, Vol.18, No.4, pp.799-810.
Gopal, S. and Woodcock, C. (1996) Remote sensing of forest change using
artificial neural networks, IEEE TRANSACTIONS ON GEOSCIENCE AND
REMOTE SENSING, 1996, Vol.34, No.2, pp.398-404.
Goldberg, D. E. (1988) Genetic algorithms in search, optimization and machine
learning, Addison-Wesley, 1988. - 0-201-15767-5.
Jin, Y. Q. and Liu C. (1997) Biomass retrieval from high-dimensional
active/passive remote sensing data by using artificial neural networks,
INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, Vol.18, No.4,
pp.971-979.
Kimes, D. S., Harrison, P. R. (1990) Inferring hemispherical directional
reflectance using a knowledge based system, Proc. IGARSS '90, Washington, p.
1759-1764.
Kimes, D. S., Harrison, P. R. and Ratcliffe, P. A. (1991) A knowledge-based
expert system for inferring vegetation characteristics, INTERNATIONAL
JOURNAL OF REMOTE SENSING, 1991, Vol.12, No.10,pp.1987-2020.
Kimes, D. C. and Nelson, R. F. (1998) Attributes of neural networks for
extracting continuous vegetation, INTERNATIONAL JOURNAL OF REMOTE
SENSING, 1998, Vol.19, No.14, pp.2639-2663.
Pierce, L. E., Sarabandi, K. and Ulaby, F. T. (1994) Application of an artificial
neural-network in canopy scattering inversion, INTERNATIONAL JOURNAL
OF REMOTE SENSING, 1994, Vol.15, No.16, pp.3263-3270.
Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P., (1994)
Numerical Recipes in C (CUP, 2nd ed., pp. 994).
Privette, J. L. (1994) An efficient strategy for the inversion of bidirectional
reflectance models with satellite remote sensing data, Ph.D. thesis, University of
Colorado (postscript version - view using pageview, 3.12MB).
Download