Vegetation Science - UCL Department of Geography

Non-linear Canopy Reflectance Model Inversion MSc. Remote Sensing P. Lewis & M. Disney Remote Sensing Unit, Dept. Geography University College London, 26 Bedford way, London WC1H 0AP +44-71-387-7050 x 5557 e-mail: plewis@geog.ucl.ac.uk January 14th 2006 1. INTRODUCTION So far, we have been considering descriptions of a relationship between canopy reflectance (or a VI) and some parameterization of the surface (e.g. LAI, chlorophyll concentration). Such a relationship is generally termed a forward model. If we wish to attempt to obtain information on the surface parameterization from observed RS data, we need to invert such a model. Model inversion is therefore of key importance to the remote sensing problem. In this lecture, we will discuss the issues involved in attempting an inversion, and have an introduction to the range of methods available for the task. Many of the issues and options raised are generic in nature and applicable to a wide range of scientific problems. 1.2 ISSUES IN NON-LINEAR INVERSION 1.2.1. Error Surfaces Earlier in the course, we considered linear model inversion as part of our study of lumped parameter modelling. We will see that by phrasing the problem as a minimization of some error function (e.g. RMSE) we can solve for the inverse of a linear model by matrix inversion. In this case, the RMSE ‘error surface’ (in N dimensions for N model parameters) is a quadratic function. It has only a single (unconstrained) minimum. We can apply (linear) constraints through simple methods such as Lagrange multipliers. P1 RMSE P0 Figure 1. Quadratic error surface of RMSE of a linear model P1 RMSE P0 Figure 2. Constrained quadratic error surface of RMSE of a linear model 1.2.2 Parameter transformations and bounding More generally, especially when considering inverting physically-based canopy reflectance models, we do not have an analytical solution of this type, often because there is no analytical solution to the (non-linear) simultaneous equations obtained from the partial differentiation. Also, more generally, the error surface is more complex than a quadratic function – this is easy to visualize if one considers non-linear parameters which are known to threshold (i.e. the remote sensing signal, thence RMSE, is not sensitive to changes in the parameter beyond a certain point) such as LAI. For high LAI, the error surface becomes ‘flat’. One way top get around this problem is by approximately linearising the parameter space. The following table, taken from Combal et al1 (2002), shows a set of transformations for a range of optical canopy reflectance model variables. By transforming the physical variables to these terms, and setting robust limits to the terms, we can attempt a more reliable inversion, obtain a ‘simpler’ (closer to quadratic) error surface, and (partially) avoid saturation problems. Table 1. Biophysical parameter transformations Figure 3 shows a typical error surface (the model parameters are transformed LAI and soil brightness) for a non-linear model – the error surface for non-transformed LAI would show saturation at high values – in the transformed space, the global; minimum is more clearly accentuated. In fact, provided there is sufficient information in the remote sensing signal (i.e. sensitivity to the model parameters), we find that in many cases for optical remote sensing, the error surface is relatively ‘well-behaved’. We will demonstrate the importance of this point later. Note that the same is often not the case for microwave signals – this is partially a sensitivity to a large number of model parameters, and partially a lack of information in the data received from satellite data. With certain numerical (error) minimization routines, there may also be problems associated with finding a global minimum, rather than a local one. Situations such as inverting RS parameters can often be helped by assigning bounds to the domain in which we search for a solution. When using the Powell algorithm, this is achieved by adding a gradually-increasing penalty to the error function if we step outside defined bounds. This does sometimes cause some problems with the inversion getting trapped on a boundary, especially if the penalty increase is too sharp on the boundary. 1 B. Combal, F. Baret, M. Weiss, A. Trubuil, D. Macé, A. Pragnère, R. Myneni, Y. Knyazikhin and L. Wang, (2002), Retrieval of canopy biophysical variables from bidirectional reflectance: Using prior information to solve the ill-posed inverse problem, Remote Sensing of Environment, 84, Pages 1-15. Figure 3. Error surface from non-linear model 1.2.3 Weighting of the Error Function Another issue to be considered at this point is how to phrase the error function. In the Method of Least Squares, one is supposed to weight each residual by the uncertainty associated with the observations. In this way, the RMSE is similar to a Z-score, in being a normalized function relating to the average association of the observations with the model values. In the previous lecture, we weighted all samples equally. This is equivalent to assuming that the error (noise) associated with each measurement is equal. This means that, for example, if we were trying to invert chlorophyll concentration from visible and near infrared hyperspectral data, we would be weighting the residuals in the near infrared (where there is no sensitivity to the required parameter) as strongly as those at visible wavelengths. If the aim of the inversion were a robust estimation of a single parameter, we could chose instead to weight the residuals by the sensitivity of the signal to that parameter (this applies to angular as well as spectral sampling). We might chose a partial derivative as the weighting term – the higher the rate of change of reflectance with respect to the parameter of interest, the higher the weighting, so, instead of using: iN RMSE    i 1 measured i    modelled i 2 (1) iN 1 i 1 we could use:      P   measured i    modelled i   i 1  2 iN       i 1  P  iN RMSE  2 (2) Such an approach is suggested by Privette (1994)2. There are clearly issues with the implementation of such an approach – principally: (i) the derivative will vary as a function of the model parameters for a non-linear model (so some ‘generally appropriate’ mean value must be chosen); and (ii) weighting for multiple parameters is somewhat arbitrary (so one has to decide on a compromise weighting scheme, or invert multiple times for each model parameter). As a partial compromise, one most typically sees a ‘relative RMSE’ used in remote sensing applications: iN RMSE    i 1 measured i    modelled i 2 iN   i 1 i  (3) 2 measured rather than the ‘absolute’ RMSE shown in equation 1. 1.2.4 Using Additional Information Whatever method we use for inversion, and however we phrase the error function, the potential for problems exists if any model parameters are coupled (i.e. a change in one parameter can be compensated for by change in another parameter without affecting the solution at all). One way to get around this is to use additional information in the inversion problem. The most typical example of this, for vegetation science applications, is to use some model of the expected temporal dynamics of a vegetation canopy in inversion. Moulin et al. (1998)3 review methods for linking crop growth models to the 2 Privette, J. L. (1994) An efficient strategy for the inversion of bidirectional reflectance models with satellite remote sensing data, Ph.D. thesis, University of Colorado. 3 S. MOULIN, A. BONDEAU and R. DELEÂCOLLE , (1998), “Combining agricultural crop model s and satellite observations: from field to regional scales”, International Journal of Remote Sensing, vol. 19, no. 6, 1021 -1036 remote sensing inversion problem. Methods generally rely on using a priori information on the expected crop type (e.g. obtained from a land cover classification) and ‘growing’ the crop model to provide an expectation of canopy state variables. The inversion problem is then much more limited in scope, and one can apply expectations e.g. of coupling between clumping, LAI, and leaf angle distributions for a given crop type. Crop or other vegetation growth models, however, tend to require a significant number of ‘internal’ parameters describing how a particular variety will grow. Most models are based on considerations of ‘integrated thermal time’ (sum of degree days since planting), so knowledge of the local daily mean temperature is a primary requirement for running such models. In addition, they may have limits applied to the light environment (so that the models can be used at different latitudes). Some models consider water and nutrient limiting factors, although there is much disagreement between models as to the impacts of these. A key use then, of a canopy growth model, is to provide expectation of growth under non-limiting conditions – any large disagreement between model and measure may then be used to identify the impact of some additional limit (water, nutrients etc.). LAI 1 LAI 4 LAI 0 Figure 4 Crop Mosaic 1.2.6 Scaling The models we build of canopy reflectance or scattering will typically be defined assuming operation over a single cover type – indeed, many models assume that the canopy is of horizontal infinite extent. In reality, this may not be the case, particularly for moderate or coarse resolution sensors. The advantage of optical moderate resolution sensors for monitoring crop dynamics is the high temporal coverage that can be achieved, even taking into account data lost due to cloud cover (which may be typically as high as 75% loss in Northern Europe). The main disadvantage is that pixel resolutions of 100s of metres to several km will tend to observe more than one field at a time, particularly given North European field sizes of typically a few Ha. High spatial resolution optical data are available relatively rarely for cloudy environments, and cannot be guaranteed to provide suitable temporal monitoring for crops. Current microwave sensors tend to have a much lower information content, but higher spatial and temporal resolution. If we wish to invert canopy parameters from moderate resolution sensors then, we must be aware of (or ideally take account of) scaling effects. To gain an insight into scaling effects, consider a mosaic of crop types viewed within a single moderate resolution pixel (figure 4). The area comprises 20% of LAI 0, 40% LAI 4, 40% LAI 1. The ‘real’ total value of LAI is thus: 0.2x0+0.4x4+0.4x1=2.0. We assume a model of reflectance as:    (1  exp(  LAI / 2))   s exp(  LAI / 2) (4) We now consider two typical cases for leaf and soil properties:   0.2;  s  0.1   0.9;  s  0.3 visible: NIR canopy reflectance 0.9 0.8 0.7 reflectance 0.6 0.5 visible NIR 0.4 0.3 0.2 0.1 51 49 47 45 43 41 39 37 35 33 31 29 27 25 23 21 19 17 15 13 9 11 7 5 3 1 0 LAI For the visible reflectance, the canopy reflectance over the pixel is 0.15 and 0.60 for the NIR. If we assume the model in equation 4 to hold, this equates to an LAI of 1.4. The ‘inversion’ underestimates the ‘correct’ (scaled) answer of 2.0. In addition, different results may occur for different wavebands (they do not here as we have only a single scattering model). It is worth noting that some parameters will scale more linearly than others – examples of those that do are albedo and canopy cover. Effectively any parameter we might model with a linear model scales approximately linearly (other than through multiple scattering effects with the atmosphere and potentially between cover types). Any parameter we see in a non-linear sense in a model (e.g. exp(-LAI)) is unlikely to scale linearly. Figure 5. Contour plot of relative difference in LAI derived from unadjusted LAI retrieval algorithm as a function of spatial resolution and pixel heterogeneity (dominant land cover purity): (a) Grasses and Cereal Crops, (b) Shrubs, (c) Broadleaf Crops, (d) Savannas, (e) Broadleaf Forests, and (f) Needle Forests. Tian et al. (2002)4 examined a theoretical basis for studying and understanding scaling of LAI. They consider a ‘pure’ 1 km resolution dataset composed of 6 biomes (plus bare 4 Yuhong Tian, Yujie Wang, Yu Zhang, Yuri Knyazikhin, Jan Bogaert and Ranga B. Myneni, (2002), Radiative transfer based scaling of LAI retrievals from reflectance data of different resolutions, Remote Sensing of Environment, 84, Pages 143-159 ground) and characterize the mixing of these biomes at different spatial resolutions through a ‘purity index’ (mean of the cover type percentages). They then group pixels into three classes: ‘pure’ (group 1) (purity index > 90%), ‘mixed’ (group 2) (50% to 90%), and ‘heterogeneous’ (group 3) (<50%). A measure of heterogeneity is considered as the purity index of the dominant land cover type. They then define an ‘error’ in LAI through a relative difference between the ‘real’ and ‘inferred’ LAI values. This is shown in figure 5, for different dominant land cover types. We can see that, particularly for needle-leaf forests, when the ‘purity index’ is low, we can incur error of up to 40% in LAI due simply to scaling. A dependency on spatial resolution is shown, but the dominant effect can be attributed to ‘pixel purity’. 1.2.6 Options for Numerical Inversion In the general case, we may have a complex, non-quadratic error surface in many dimensions. We could potentially go through all combinations of parameters to find which set gives a minimum in the error value, but may take far too long. However, if the error surface is known to be well-behaved, we may chose to sample the complete parameter space, using a so-called Look-up Table (LUT) inversion. This has been seen to be an attractive option for the inversion of optical models in recent years. Often, we tend to use iterative numerical techniques to attempt to obtain the minimum of our error function. There are a variety of methods we can use, e.g. Quasi-Newton methods (an algorithm found in NAG which is used by a number of authors), or the Powell algorithm (an algorithm found in Numerical recipes in C, also used by many authors). Although such techniques are generally quite complex in nature, they are a valuable tool for the relatively rapid inversion of non-analytical functions. Basic aspects of these are discussed in greater detail below. Our personal experience is that the Powell algorithm is relatively robust, and tends to perform better than the NAG routine, but this may be case specific. Some improvements may be possible if we can find analytical solution to 1st and 2nd order partial derivatives, but most authors seem not to bother with this; in any case we often find that other difficulties also arise, especially in highly non-linear systems of equations. Other options for numerical inversion, which are reviewed below, include: Knowledgebased Systems (KBS); Artificial Neural Networks (ANNs); and Genetic Algorithms (GAs). Selecting the right inversion algorithm for a particular problem is of critical importance in parameter estimation – one must weigh up the advantages and disadvantages of each method for a particular task. 1.3 How can we assess the inversion performance? Generally, in order to assess the performance of a numerical inversion algorithm, we look at the 'goodness-of-fit' measure associated with the inversion )e.g. root-mean-squared error (RMSE)). However, we should understand that this is a global average of the error function, and doesn't tell us anything about how well the model fits the data at any particular sample location. This may well be critical if the model is particularly sensitive to errors at certain sample locations (e.g. parameters describing the hot-spot will tend only to be relevant in the region of the hot-spot). Obtaining an understanding of how critical particular angular samples are to the inversion is obviously very important if we wish to derive accurate estimates of the surface parameterization from RS data. A relevant general comment at this point is that the more parameters we define as unknowns in our model, the more difficult it will be to derive accurate estimates of each parameter. This is because there will always be some degree of coupling between model parameters, such that errors in the estimation of any particular parameter can often be compensated for by errors introduced in other model parameters. Thus, there will always be a trade-off between the accuracy with which we can hope to model canopy BRDF with a limited number of parameters, and the maximum number of parameters we can expect to derive from our RS observations. There are several other ways in which we can attempt to assess the performance of a model and the quality of parameters we invert from it. One way is to look at the implications of using a particular angular sampling set in our inversion, for instance, looking at how the model parameters vary as we have a greater or lesser number of samples at different solar zenith angles, as well as particular viewing angles. For any given sensor, at a given time of year, we will obtain a limited number of angular viewing samples. For many sensors, the range of sun angles of these observations will also be limited. We should therefore attempt to understand how well the different sampling capabilities of various sensors affect the parameters we obtain. It is apparent from many inversion studies, for example, that inverting the directional reflectance data for different sun angles can lead to significant variations in the derived model parameters. The 'quality' of any inversion depends of course on the purpose of the inversion. If it is to derive biometric parameters from RS data, then we need to consider how accurately such parameters are derived. If however, the purpose is to use BRDF models to obtain estimates of the surface albedo, then we might indeed use a different set of criteria in assessing model performance. Goel (1988) discusses why it is important to understand the sensitivity of a model. This has many implications for model inversion, as, for instance, if our model was particularly sensitive to errors in knowledge of the viewing or sun angles, then any errors introduced in assigning these 'fixed' parameters would lead to significant errors in model inversion. We must also consider how sensitive a model is to any (residual) 'noise' in the system, i.e. any noise due to quantization effects or errors in atmospheric correction, any errors introduced by not taking the sensor MTF into account, etc. 1.4 How can we test the quality of the derived parameters? There are several ways we might consider to assess how well the model is performing in inverting derived parameters. One method is to use field radiometric data, where we can attempt to measure biophysical parameters of the canopy directly (e.g. LAI, leaf angle distribution) and, at the same time, obtain measurements of canopy reflectance. We can then invert the canopy reflectance measurements against our model and compare the resulting parameters with those we measured. There are several problems with such techniques, such as the accuracy with which we can hope to measure the average leaf angle distribution of a complex canopy, or determine the 'average leaf single-scattering albedo', but such methods can at least give an indication of how well a model performs under realistic conditions. One can also try such methods of comparison with airborne or spaceborne remote sensing instruments, although the problems then tend to be compounded by increased uncertainties in atmospheric correction, pointing accuracy, etc., as well as being limited by the sampling capabilities of the sensor, as well as the smaller scale over which one has to obtain ground-based biometric measurements. One can also test the performance of any particular model against other models, though one is obviously limited by the assumptions made in the original model formulation. It is perhaps more useful in this regard to make use of numerical solutions to the radiative transfer equation, and to use complex 3-D models of the vegetation canopy, because of the flexibility offered by such methods. When using numerical solutions to the radiative transfer equation, one can also attempt to test how well the particular assumptions made in any analytical approximation perform at a number of different levels, e.g. by simulating a canopy with constant LAI, known leaf angle distribution, and known singlescattering albedo (i.e. assume a Lambertian reflectance model). This can then provide us with information on how well the structural and radiometric assumptions made in the analytical formulation perform. We also have the potential with numerical solutions to derive information on the various components of the radiation exitant from the canopy, e.g. that due to single-scattering events and that due to multiple-scattering events. In this way, we can directly compare such data with the analytical solutions to single- and multiple-scattering. The main problem with comparing one model against another is that we are testing it only for the range of assumptions made in the more complex of the two. Thus, we need to make use of recent developments in modelling the complex geometry of plant canopies to be sure that our model performs well under realistic conditions. In practice, we might often be using simple models which e.g. assume that the canopy is homogeneous, even if we know that this is unlikely to be strictly so. In this case, we need to know the utility of any parameters we might derive by such methods. 1.5 Conclusions You should have gained some insight from this section into the basic principles of inversion techniques, and be able to appreciate the difference between analytical and numerical solutions. You should also have gained some knowledge of the issues and problems associated with inversion, which should allow you to assess the way in which model performance tends to be tested in the literature, as well as enabling you to perform useful tests of model performance and accuracy. You will no doubt have noted that there are many such problems, and that we do not necessarily have answers to all of them as yet. You should back up what you have learnt in this lecture and the following practical exercise with further reading. A useful starting point is Goel (1988), although you will tend to find some discussion on model inversion in many of the papers you will read on model development. 2. Numerical Iterative non-linear inversion techniques 2.1 Introduction In most cases of non-linear inversion, we need to look for numerical solutions to the problem of finding the minima of our defined error functions. Computationally, these need to be as efficient and robust as possible i.e. using the optimum amount of information and iterations. There is no 'best' solution - be aware that different techniques can be used in an attempt to find the optimal solution. In inverting canopy reflectance models against reflectance data, we are usually interested in constrained optimisation - a special case of the general optimisation problem where some variables may be fixed, or lie between fixed limits. Figure 1 shows some of the difficulties this may pose in the 1-D case (from Numerical Recipes in C, Press et al., p 395). Figure 6 Local and Global Minima In figure 6 - points A, C and E are local, but not global maxima. Points B and F are local, but not global minima. The global maximum occurs at G, which is on the boundary of the interval so that the derivative of the function need not vanish there. At E, derivatives higher than the first vanish, which can cause difficulties for some algorithms. The points X, Y and Z are said to bracket the point F since Y is less than both X and Z i.e. Y is a "first guess" at a minima in this region. It is important to note that in the general case of N-dimensional optimisation (particularly if not using derivatives), the solution is reduced to a series of N (iteratively improving) minimisations in 1-D. As a result, nearly all the algorithms we will see can be reduced to the problem of finding the extrema of a 1-D function. How do we choose which methods we use? We need to decide between methods that need only to evaluate the function, against those which calculate (partial) derivatives. Derivative methods may often be more powerful in terms of finding global minima (assuming functions are well-behaved and differentiable). Derivative methods such as the conjugate gradient algorithms of Fletcher-Reeves and Polak-Ribiere, and the well-known quasi-Newton algorithms of Davidon-Fletcher-Powell (DFP) and Broyden-FletcherGoldfarb-Shanno (BFGS) are widely used in practice due to their robustness. Derivative methods however require many more function evaluations than is desirable for rapid inversion and also require a priori knowledge of the behaviour of the function. Because of this, we will concentrate on methods not requiring calculation of derivatives. Another computational issue is that of storage: we need to decide between methods requiring storage of order N (N is the number of dimensions of the minimisation problem), and order N2. 2.2 Golden Section Search This method is one of the simplest for determining extrema, and describes the successive bracketing of a minimum of a function f to provide an improving 'guess' at the 'real' value of the minimum. Figure 7 Golden Section Search Figure 7 (see Press et al., p 398) - successive bracketing of a minimum. The minimum is originally bracketed by points 1,3, and 2. The function is evaluated at 4, which replaces 2; than at 5, which replaces 1; then at 6, which replaces 4. The rule at each stage is to keep a center point that is lower than the two outside points. After the steps shown, the minimum is bracketed by points 5,3, and 6. A minimum is bracketed when there is a triplet of points a < b < c (or c < b < a) such that f(b) < f(a) and f(c). In Figure 2, points 1,3 and 2 equate to points a, b and c i.e. f(3) < f(1) and f(2). We can successively improve this bracketing by performing the following steps:      Evaluate f at some new point x, either between 1 and 3, or between 3 and 4. Choosing a point between 3 and 4 say, we now evaluate f(4). f(3) < f(4), so the new bracketing triplet of points is 1,3 and 4. Choose another point at 5, say, and evaluate f(5). f(3) < f(5), so the new bracketing triplet is 5, 3 and 4. In each case, the middle point of the new triplet is the best minimum achieved so far. This process is repeated until the distance between the two outer points becomes sufficiently small (i.e. approaches some set limit) - where sufficiently small means approximately the square root of machine floating point precision. An important question you should be asking is "How do we choose the new point x each time?". Figure 8 Golden Ratio If we say that b is some fraction w of the way between a and c then w = b-a/c-a, and, consequently, 1-w = c-b/c-a. If the new point, x, is some distance z beyond b, then z = xb/c-a. The next bracketing segment will therefore either be of length w+z relative to the current one, or of length 1-w. To minimise the worst case, then we can choose z to make these equal i.e. z = 1 - 2w. This gives us the symmetric point to b in the interval a to c i.e. |b - a| = |x - c|. This implies that x lies in the larger of the two segments ( z > 0 only if w < 1/2). How does the value of w get chosen? It must come presumably from the previous stage of applying the same strategy, so if z is chosen to be optimal, then so was w. This scale similarity implies that z should be the same fraction of the way from b to c (if that is the bigger segment) as was b from a to c i.e. w = z/(1-w). We can use these two equations in w and z to form the quadratic equation w^2 - 3w +1 = 0 yielding w ~ 0.38197, and 1-w ~0.61803. Therefore the optimal bracketing interval (a,b,c) has its middle point b a fractional distance 0.38197 from one end, and 0.61803 from the other. These are the socalled golden mean fractions. In each step of our golden section search therefore, we select a point a fraction 0.38197 into the larger of the two intervals (from the central point of the triplet). An additional feature of this method is that if our initial triplet has segments not in the golden ratios, the procedure outlined above will quickly converge to the proper self-replicating ratios. 2.3 Parabolic Interpolation and Brent's Method in One Dimension The golden section search is a method designed to handle the least favourable of function minimisations: no assumptions are made about the shape of the function in the region of the minimum, and the minimum is bracketed with greater and greater precision with each iteration. However, if the function is smooth near the minimum, which is likely for most 'well-behaved' functions, then a parabola fitted through any three points near to the minimum should take us very close to the actual minimum in one step. Figure 9 illustrates this case. Figure 9 Convergence by inverse parabolic interpolation (see Press et al., p 403) This method is known as inverse parabolic interpolation, as we want to find a point x at which our function y=f(x) is a minimum (not just a value of y). The formula for the abscissa x that is the minimum of a parabola through three points f(a), f(b) and f(c) is: x = b - 1/2 * (b-a)^2*(f(b) - f(c)) - (b-c)^2*(f(b)-f(a)) ------------------------------------------------(b-a)*(f(b)-f(c)) - (b-c)*(f(b)-f(a)) This equation fails only if the three points are collinear. However, it is just as likely to jump to a maximum as a minimum, and somehow this tendency has to be overcome. A technique likely to be both stable and efficient is one which will use a slow-but-sure method such as the golden section search in unfavourable areas, but can also switch to parabolic interpolation when it comes sufficiently close to a minimum. In this way, we can be sure we are approaching a minimum rather than a maximum (or vice versa) before switching to the more efficient parabolic method. The main difficulties of this approach are in finding a quick, robust scheme for deciding which of the two techniques is likely to be favourable, in addition to being careful not to get into problems when we are very near to a solution and the function is being evaluated close to the machine precision mentioned earlier. Brent's method is such a scheme. At each stage, it keeps track of six function points, a, b, u, v, w, and x. The minimum is bracketed between a and b; x is the point with the least function value found so far (or the most recent in the case of a tie); w is the point with the second least function value; v is the previous value of w; u is the point at which the function was evaluated most recently. Parabolic interpolation is tried through the points x, v, and w. To be accepted as a successful move, the parabolic step must fall between a and b and imply a movement from the best current value, x, that is less than half the movement of the step before last. This second critetion insures that the parabolic steps are actually converging to something. In the worst possible case, where the parabolic steps are acceptable but useless, the method will approximately alternate between parabolic steps and golden sections, converging in due course by virtue of the latter. Press et. al. explain the reason for choosing the step before last (as opposed to the previous step) as purely heuristic i.e. through experience, it seems better not to 'punish' the algorithm for taking one bad step if it can make it up on the next one. 2.4 Multidimensional minimisation methods The minimisation problem becomes far more complex in the multidimensional case, which (not surprisingly!) is of most interest to us. Non-linear canopy reflectance models are likely to be functions of many parameters (the Kuusk model, for example, has 15 separate parameters). The next section describes some methods that we can use in these cases. An excellent starting point to look for material regarding optimisation problems, with material on a huge range of optimisation techniques and associated literature are http://solon.cma.univie.ac.at/%7Eneum/glopt.html and http://www.csc.fi/math_topics/Movies/Opt.html. They include some excellent visualisations and animations of optimisation problems (GA). 2.4.1 Downhill Simplex Method (aka amoeba or Nelder-Mead) This is the only multidimensional minimisation method we will come across here which does not make use of a 1-D minimisation algorithm like the ones described above. The downhill simplex method (due to Nelder and Mead), is a method requiring only function evaluations, not derivatives. It is a relatively simple method, and although not particularly efficient (in comparison to Powell, which we will come across later), it can often be the most robust. A simplex is the geometrical figure consisting, in N dimensions, of N+1 vertices and all their interconnecting line segments, polygonal faces etc. In 2-D this is a triangle, in 3-D a tetrahedron. If any point of the simplex is taken as the origin, then the other N points define vector directions that span the N-dimensional vector space. There is no analogous bracketing method (as we saw in 1-D) in multidimensions. All we can do is give the algorithm a starting guess i.e. an N-vector of independent variables to try as the first point. The algorithm is then supposed to make its own way "downhill" (in function space) until it finds a minimum. Figure 10 illustrates this idea. Figure 10 - possible outcomes for a step in the downhill simplex method (Press et al., p 409). If we consider the starting point, in N dimensions we must define N+1 points, which define the initial simplex. If we define one of these points as Po then the other N points are simply: Pi = Po + kei where the ei's are N unit vectors, and where k is a constant which is our guess of the problem's characteristic length scale. We proceed by taking a series of steps away from the starting point, and seeing if these steps take us to a more favourable position (closer to a minimum). The steps available are (a) a reflection away from the high point (b) a reflection and expansion away from the high point (c) a contraction along one dimension from the high point or (d) a contraction along all dimensions towards the low point. When it can, the algorithm expands the simplex in one or other of the directions, to take larger steps. When it reaches a "valley floor" (think of a long, steep-sided valley sloping away from us), it contracts itself in the transverse (cross-valley) direction in order to ooze down the valley. If these steps are not available, the simplex can contract around its lowest (best) point. How do we know when we have reached a minimum? This can be a difficult question in any multidimensional minimisation algorithm. The sensible thing to do is to see whether one of two criteria are satisfied, namely (i) identify one "step" of the algorithm - when the vector distance moved in one step is less than some tolerance, then stop, or (ii) require that the decrease in the function value in one step be fractionally smaller than some defined tolerance. A final warning is that both these termination criteria can be fooled by a single, anomalous, step which doesn't go anywhere. To avoid this, it is a good idea to restart the any multidimensional minimisation algorithm from the point where it claims to have found a minimum. If it "really" has, this will not be a costly move, and if it hasn't, well ...... 2.4.2 Direction Set (Powell's) Method Another technique for multidimensional minimisation is Powell's conjugate set method. This is another method not requiring the caclulation of gradients, however this time it is a method which utilises the 1-D line minimisation techniques we saw earlier. If we start at a point P in N-dimensional space, and proceed from there in some vector direction n, then any function of N variables f(P) can be minimised along the line n by using the 1-D methods. Different methods differ only in the way they choose the next direction, n, to try. Let us look at the simplest direction set method. If we take the unit vectors e1, e2,....eN as a set of directions, we should be able to cycle through the directions, using our 1-D line minimisation methods to minimise along each direction in turn, before going on to the next. This is a good method in some cases, but figure 11 shows where this method struggles. Figure 11 (see Press et al. p. 414) In the case of a long, narrow valley, not aligned with our chosen axes (X and Y in this case), this method has to follow our base axes, and so must take many small steps to crawl towards the minimum. In this case, a more sensible choice of directions is needed. All direction set methods consist of recipes for updating the set of directions as the method progresses, attempting to come up with a set which either (i) includes some very good directions that will take us fast along narrow valleys (e.g. at 45o to X and Y in figure 6), or else (ii) includes some number of "non-interfering" directions with the special property that minimisation along one is not "spoiled" by subsequent minimisation along another, so that we do not have to contiually cycle through all directions. Powell's method does just this. These "non-interfering" directions are more formally defined as conjugate directions, and Powell's method produces N mutually conjugate directions along which to minimise. First, we initialise a set of directions ui = ei where i=1,....,N. Next we repeat the following steps:      Save the starting position as Po For i=1,....,N, move Pi-1 to the minimum along direction ui and call this point Pi For i=1,...,N-1, set ui = ui+1 Set uN = PN - Po Move PN to the minimum along direction uN and call this point Po This method is illustrated in figure 12. Figure 12 - Powell's conjugate direction set method. The first steps move to a new point, P, and line minimisation continues from P along the new set of directions x' and y which are x' is the resultant direction after our first step. In this case, it is clear that ideally we should be minimising along the axes x ideal and yideal, but even after one step, our new set of directions is far closer to these than originally, which is promising. It can be demonstrated that N iterations of the basic procedure i.e. N(N+1) line minimisations will exactly minimise a quadratic form. There is, however, one major drawback to Powell as stated - the procedure of throwing away, at each stage, u1 in favour of PN - Po tends to produce sets of directions that "fold up on each other" i.e. they all start to point in the same way! If this happens, then the algorithm converges only to a minimum of the function over some subsapce of the full N-dimensional case, giving the wrong answer. To avoid this, we can try a number of things:  Reinitalise the set of directions ui to the basis vectors ei after every N or N+1 iterations. This is OK if it is important to retain the quadratic convergence property, which may be the case if our function is close to quadratic.  Brent states that the direction set can just as easily be reset to the columns of any orthogonal matrix. Rather than throw away the information on conjugate directions already built up, he suggests reconfiguring the direction set to the calculated principal directions of this matrix.  Powell himself suggested that we might sacrifice the quadratic convergence property in favour of a more heuristic scheme which attempts to find a few "good" directions along narrow valleys instead of N necessarily cojugate directions. In practice, any of these adjustments to the basic Powell method may be used, with each having slight differences in performance under different situations. 2.4.3 Simulated Annealing We will now briefly look at a few completely different and novel minimisation algorithms that move away from the idea of N-dimensional minimisation as a combination of a few relatively simple techniques such as line minimisation in 1-D and derviative methods. The first of these is simulated annealing. The minimisation algorithms we have looked at up to now all roughly follow the same principles - have a first guess at a solution, minimise in some linear direction, and keep going downhill (in function space) until we reach a minimum. These methods are all very well, but their common flaw is that they immediately head for the fastest downhill route, and will often find a local minima, as opposed to a global one - if we want to avoid this, we generally have to restart the whole thing from another location and see if it converges to the same solution. In a situation where the function has only a few distinct minima this may be sufficient, but in the case where the function has many small, poorly defined minima, this may be of litle use. Figure 13 illustrates this problem. Figure 13 - schematic representation of a function with several local minima, A, B, C, and D, and a global minima at E. Simulated annealing is so named because it has a direct thermodynamic analogy with the annealing, or slow cooling, of metals, or the crystallisation of liquids. At high temperatures, the molecules of a liquid move freely with respect to each other. If the liquid is cooled slowly, thermal mobility is lost, and the atoms are often able to line themselves up and form a pure crystal that is completely ordered over a distance of up to billions of times the size of an individual atom in all directions. This crystalline state is the state of minimum energy of the system. For slowly cooled systems, Nature is able to find this state every time. If a liquid metal is quenched, or cooled quickly, it ends up in some polycrystalline or amorphous state having much higher energy (this situation is partly analogous to the other minimisation processes we have considered). How is Nature able to solve such an incredibly complex optimisation problem, seemingly with no effort, when we are struggling to solve problems requiring minimsation of only a few parameters? The basic process is one of slow cooling, allowing time for the atoms to redistribute themselves as they lose mobility (energy). The annealing process is based on the very important Boltzmann probability distribution Prob(E) ~ exp(-E/kT) where k is a constant relating temperature to energy. This equation simply states that a system in thermal equilibrium at temperature T has its energy probabilistically distributed among all different energy states E. That is, all energy states are possible, but some are far more likely than others. Even at low T there is a small probability that the system may be in a high energy state. There is, therefore, a corresponding chance for the system to get out of a local energy minimum in favour of finding a better, more global one. This means our system can go uphill occasionally, as well as downhill, although as the temperature falls, it is less and less likely that it will go uphill. The algorithm for simulated annealing is implemented in the following way. Offered a succession of options, a simulated thermodynamic system is assumed to change its configuration from energy E1 to E2 with probablity P = exp[-(E2 - E1)/kT]. If E2 < E1, P > 1. In this case, P is arbitrarily assigned the value 1 i.e. the system always took such an option. However, if E2 > E1 i.e. the option is to go to a higher energy state, P is postitive, but less than 1 i.e. there is a small but finite probability that the system may take this option. Hence our system will always take a downhill step that is offered to it, but will sometimes take an uphill one if offered that. In practice, we need to provide the following elements if we wish to use this algorithm:  A description of possible system configurations i.e. available states that the system "could" be in.  A generator of random changes in the configurations - these changes are the "options" presented to the system.  An objective function E (analog of energy) whose minimisation is the goal of the procedure.  A control parameter T (analog of temperature) and an annealing schedule which tells how it is lowered from high to low values e.g. after how many random changes in configuration is each downward step in T taken, and how large is that step. Figure 14 illustrates this procedure. Figure 14 - a 1-D function with 6 separate energy maxima and minima. In the annealing process, going from E1 to E2, E3 to E4 and E5 to E6 are all downhill steps, and so have a probability of 1. Going from E2 to E3, and from E4 to E5 are =uphill steps, and consequently have a small but finite probability of occurring. This small probability is what controls the possibilty of going from the local minima at E2 and E4, to the global minima at E6. Other minimisation algorithms that we have looked at do not possess this ability. The simulated annealing idea is applicable to the optimisation of continuous Ndimensional functions, f(x), where x is an N-dimsensional vector. The four elements required by the above procedure are now as follows: the value of f is the objective function. The system state is the point x. The control parameter T is, as before something like a temperature, with an annealing schedule by which it is gradually reduced. There must also be a generator of random changes in the configuration i.e. a procedure for taking a random step from x to x + dx, which are the new "options" available to the system. In practice, this may be the most difficult of the steps to implement, and a number of solutions have been proposed for this (see Press et al., p 451). All applications of simulated annealing suffer from the difficulty of defining the rate of cooling. How slow is "sufficiently slow"? The success of the approach may be controlled by this factor. There are a number of possibilities worth trying:  Reduce T to (1-e)T after every m moves, where e/m is determined by experiment.  Budget a total of K moves, and reduce T after every m moves to a value T = To(1k/K)a, where k is the cumulative number of moves thus far, and a is a constant, say 1,2 or 4. The optimal value of a depends on the statistical distribution of relative minima of various depths. Larger values of a spend more iterations at lower temperature.  After every m moves, set T to b times f1-fb where b is an experimentally determined constant of order 1, f1 is the current lowest function value available, and fb is the "bestever" function value encountered. Simulated annealing has been applied in practice to solve a classic minimsation problem known as the "Travelling Salesman" problem (see Press et al., p 445-450). The aim is to find the shortest possible route for a travelling salesman visiting N arbitrarily located cities in turn before returning to the starting point (visiting each city only once). This kind of problem is known as an NP-complete problem, and computation time for an exact solution rises proportionally to ekN where k is some constant. As N increases, the time taken to try all solutions rapidly becomes prohibitive. Simulated annealing solves this with the number of calculations only scaling as a small power of N. This has made simulated annealing particularly useful in specialised problems such as that of locating components during chip design in order to minimise heat generation and interference. 3 Other Methods There are a variety of other intriguing methods of multidimensional function optimisation that have arisen from i) the need to find sophisticated methods to solve a huge range of optimisation problems in a whole range of mathematics, physics, engineering and modelling applications, many of which do not surrender easily to the "standard" methods mentioned above ii) increases in computing power that have pushed development of more sophisticated algorithms iii) the observation of the way Nature has created very elegant ways of solving seemingly impossible optimisation tasks. 3.1 (Artificial) Neural networks (ANN) These methods are based on the observation that bioligical neural networks (i.e. brains) are exceptionally good at solving incredibly complex non-linear optimisation problems. They can do this by "training", system responses to behave according to previously experienced situations, and adapting these responses to new situations. There is no universally accepted definition of an ANN, but a general definition is that an ANN is a network of many simple processors ("units"), each possibly having a small amount of local memory. The units are connected by communication channels ("connections") which usually carry numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their local data and on the inputs they receive via the connections. The restriction to local operations is often relaxed during training. This figure schematically illustrates the structure of a simple feed-forward neural net with only one hidden layer, and one output (the figure comes from http://www.cbs.dtu.dk/services/SignalP/methods_nn.html). The neurons have several inputs but only one output. The output from a neuron is calculated as a function of a weighted sum of the inputs. A threshold can be set so that the neuron can have a non-linear output response - if the signal is less than the threshold, the output is zero, greater than the threshold and the ouput is 1, or the weighted sum of the inputs. In an ANN, neurons are connected (with the output of one neuron being the input of another neuron). When the ANN is presented with an input (e.g. a set of reflectance data) it will calculate an output that can be interpreted as a classification of the input (e.g. the model parameter values that result in this reflectance). It is possible to ``teach'' a neural network how to make a classification by repeatedly presenting it with a set of known inputs (the training set - e.g. directional reflectance data sets) and simultaneously modifying the weights associated with each input in such a way that the difference between the desired output (e.g. measured reflectance) and the actual output (e.g. modelled reflectance) is minimised according to some measure such as RMSE. The ability of the ANN to change the output to reflect the input is important, this ability gives the neuron the capability to learn. The more (distinct) training data we can throw at the ANN, the more refined the behaviour of the ANN should become i.e. better able to distinguish between finer variations in the input data. This is achieved via back propagation training. A general Neural Network algorithm is as follows:       Set the inital weights of each node Repeat Show input example and desired output Work out actual output Update weights Until no more examples ANNs have a number of advantage of solving the types of non-linear optimistion problems we have been considering: they can be used where we can't formulate an algorithmic solution, where we can get lots of examples of the behaviour we require or where we need to pick out the structure from existing data. A major drawback is that when we change the nature of the training data e.g. different angular or spectral sampling, then the ANN must be retrained. It suggests that ANN inversion of multispectral data might be a possibility for fixed geometry, multiple waveband sensors (e.g. TM) but is not so useful in the general case (arbitrary viewing and illumination geometry). There are a number of examples of the application of ANNs to the problem of inverting biophysical parameters from optical and microwave data of vegetation canopies e.g. Chuah, 1993; Pierce et al., 1994; Gopal and Woodcock, 1996; Jin and Liu, 1997; Abuelgasim et al., 1998. In addition, ANNs have proved very promising for the classification of EO data, where supervised methods (requiring training data) are already the norm e.g. Foody, 1997a and 1997b. An excellent introduction to ANNs is at Stirling University with many external links. For more information on neural nets see the page mentioned above, or the Neural Net FAQ . 3.2 Genetic (or evolutionary) algorithms (GAs) It has long been noted that Nature manages to solve THE optimisation problem, i.e. maximisation of species survival and propagation, through genetic mutation and the subsequent favouring of a few of these mutations in the long term. If we can phrase our optimisation problems in terms of some non-linear "fitness-for-survival" function, genetic mutation and propagation models can be applied to these problems in order to find optimal solutions. Effectively a GA is a method by which some description of the function state (known as a "string" - equivalent to a gene pattern) is changed in some way according to a probabilistic model (typically crossover, mutation or inversion). In the case of a BRDF model, this would be the N-D vector representing the current state of the model parameters. This mutation is then propagated to the subsequent generation in some predefined way, such as "mating" - the recombination of two parent strings to create an offspring string. Finally, the new generation "mutation" is tested to see if it better fulfils our survival criteria (or has a lower RMSE in terms of model fit). In this way we can see that the GA progresses (as in Nature) through mutation and selection via sexual reproduction, characterised by recombining two parent strings into an offspring. A general Genetic Algorithm is as follows (see http://osiris.sunderland.ac.uk/cbowww/AI/TEXTS/ML2/sect4.htm) for an illustrated example of this):       Populate a set of chromosomes Repeat Determine fitness of chromosomes Choose best chromosomes Evolve chosen chromosomes using crossover, mutation or inversion Until a chromosome is found of a suitable fitness GAs differ from normal optimisation methods in four ways:  They work with a coding of the parameter set, not the parameters themselves    They search from a population of points, not a single one They use 'payoff' information (some objective function) - not derivatives or other auxilliary knowledge. GAs use probabilistic transition rules, not deterministic ones (c.f. simulated annealing). The keys to GA are the inital coding of the description of the function state (into strings), and the way in which these strings are changed at each step according to some rule (e.g. crossover or mutation, mentioned above) The most basic GA is as follows:       STEP 0: Define a genetic representation of the problem STEP 1: Create an initial population P(0) = x10, ...., xN0. Set t=0. STEP 2: Compute the average fitness f(t) = SUMiNf(xi)/N. Assign each individual the normalised fitness value f(xi)/f(t). STEP 3: Assign each xi a probability p(xi, t) proportional to its normalised fitness. Using this distribution, select N vectors from P(t). This gives the selected set of parents. STEP 4: Pair all parents at random, forming N/2 pairs. Apply crossover with a certain probability to each pair and other genetic operators such as mutation, froming a new population P(t+1). STEP 5: Set t = t + 1, return to STEP 2. Some advantage of GAs are their flexibility and power - they may be able to solve problems characterised by very many small, ill-defined minima, the sort of problem that is typically almost impossible to solve by standard methods. Another major advantage is that they are intuitively understandable. The difficulty of using GAs practically is that they combine two different search strategies: random search by mutation and a biased search by recombination of the strings contained in a population, and defining the balance between them is by no means straightforward. More relevant to problems in CR model inversion is the fact that it may take a huge number of generations (iterations) for such a system to converge to a solution. I have donwloaded several useful references to GAs, of which there are copies in the RSU. Also, see Goldberg (1988), MATHS NA20 GOL in the DMS Watson library. 3.3 Knowledge-based systems (KBS) KBS comprise a variety of methods seeking to solve complex computational problems using sources of information external to the problem itself. In CR model inversion, work done on KBS attempts to use an understanding of the inversion problem to drive the optimisation process. An example in vegetation modelling is the work of Kimes and Harrison (1990) and Kimes et al. (1991) who developed VEG, a system for inferring physical and biological surface properties from reflectance data sets. VEG integrates the traditional spectral (directional) data with diverse knowledge bases derived from literature, field-measured directional reflectance data sets and information from human experts into an efficient system for inferring biophysical parameters from reflectance data. Significant devlopmental work was done on KBS in the late 80s - early 90s but the major problem of such systems, namely how the knowledge is defined and input into the system, has never been overcome satisfactorily. This approach has been largely taken over by ANNs, where the decisions as to what response to a particular input is used is concealed in the 'black box' hidden layer(s) of neurons. The drawback of this method of course is that no knowledge of how the problem is being solved within the black box is available - something that is made explicit in the KBS approach. A more promising approach for CR model inversion than either ANN (limited by training data) or KBS (how do we define knowledge data?) is that of look-up tables (LUT). 3.3 LUT Inversion We noted LUT inversion earlier in these notes as a potentially useful method, particularly for optical remote sensing with a high information content. The principles of the method are simple: define a set of points in parameter space and find some subset of these points which give the best match (minimum RMSE) to the observations. One major advantage of LUTs (and ANNs) is that the canopy reflectance modeling does not need to be performed ‘on the fly’ in the inversion: model simulations are stored in the LUT entries. This means that one is not limited to a simple analytical model for forward modeling – any (complex) numerical code can be used – the forward modeling task is divorced from the inversion procedure. The key to applying the technique is in defining a small enough original set to properly sample the parameter space, but at the same time keep the number of RMSE calculations to a minimum. Knowledge of external parameters e.g. VIs, cover type, viewing/illumination geometry etc. are then used to limit the search space in as many ways as possible. The use of external information (knowledge) to limit the search space implies that LUT methods are closely related to KBS methods. LUT methods have a simpler requirement for the use of external data however, and can be applied quite simply in practice e.g. MODIS LAI/FPAR algorithm (see MOD15 LAI/FPAR ATBD. A good example of a LUT inversion is provided by Combal et al. (2002), in which they compare LUT inversion with ANNs and Quasi-Newton methods. They found that LUT and Quasi-Newton methods were more sensitive to model inaccuracies, but that ANN inversion produced less accurate results overall. 4. Summary A number of minimisation (more generally, optimisation) techniques that can be used to derive biophysical parameter information via the inversion of CR models against reflectance data have been introduced. A brief summary of these, including some of the pros and cons in each case is given below.       The most commonly used non-linear inversion methods (ignoring derivative information) are Powell (conjugate direction set) and Nelder and Mead (amoeba). Relatively fast and robust, but tend to converge to local minima -> need to run several times from different starting positions. Simulated annealing - probabilistic partition of energy states of a system - allows finite probability of system going from low -> higher energy state so big advantage is ability to get out of local minima. Drawback is maybe slower, and how to define annealing schedule. ANNs - network trained using "known" reflectance data, to derive model parameters from measured data. Very fast once trained. Limitation is need for training data sets to be of same nature (i.e. viewing, illum. geometry, spectral bands) as measured data. Has been applied to EO data. Also good for classification. GAs - very novel approach to problem of N-D optimisation - mutation and reproduction, survival-of-the-fittest. Advantage is long, gradual change may allow for very subtle solutions but likely to be very slow - few examples of applications in EO (eg. Clark and Canas, 1993), but many in engineering. KBS - use additional information to refine process of finding solution e.g. Kimes et al. VEG model. Advantages, maximise use of data, potentially more robust inversion. Disadvantage of how additional information is defined and used. LUT - subset of KBS (?) - Precalculate many CR scenarios e.g. many viewing/illum geom., spectral bands, biome types etc. prior to run-time. At runtime, follow a search path (tree) using additional data (e.g. biome type), to drastically reduce search space before searching for best-fit solution. Fast once LUT is computed and can just recompute LUT if model is altered. Disadvantage is defining search path and additional data to be used. 5. Additional References     Abuelgasim, A. A., Gopal, S. and Strahler, A. H. (1998) Forward and inverse modelling of canopy directional reflectance using a neural network, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1998, Vol.19, No.3, pp.453-471. Brent, R. P., 1973, Algorithms for Minimisation without Derivatives (Englewood Cliffs, NJ: Prentice-Hall). Bunday, Brian D. (1984) Basic Optimisation Methods (Edward Arnold, London). Chuah, H. T. (1993) An artificial neural-network for inversion of vegetation parameters from radar backscatter coefficients, JOURNAL OF             ELECTROMAGNETIC WAVES AND APPLICATIONS, 1993, Vol.7, No.8, pp.1075-1092. Clark, C. and Canas, T. (1993) A genetic algorithm approach approach to spectral identification, Proc. RSS '93, Chester. Foody, G. M. (1997a) Fully fuzzy supervised classification of land cover from remotely sensed imagery with an artificial neural network, NEURAL COMPUTING & APPLICATIONS, 1997, Vol.5, No.4, pp.238-247. Foody, G. M. (1997b) An evaluation of some factors affecting the accuracy of classification by an artificial neural network, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, Vol.18, No.4, pp.799-810. Gopal, S. and Woodcock, C. (1996) Remote sensing of forest change using artificial neural networks, IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 1996, Vol.34, No.2, pp.398-404. Goldberg, D. E. (1988) Genetic algorithms in search, optimization and machine learning, Addison-Wesley, 1988. - 0-201-15767-5. Jin, Y. Q. and Liu C. (1997) Biomass retrieval from high-dimensional active/passive remote sensing data by using artificial neural networks, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, Vol.18, No.4, pp.971-979. Kimes, D. S., Harrison, P. R. (1990) Inferring hemispherical directional reflectance using a knowledge based system, Proc. IGARSS '90, Washington, p. 1759-1764. Kimes, D. S., Harrison, P. R. and Ratcliffe, P. A. (1991) A knowledge-based expert system for inferring vegetation characteristics, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1991, Vol.12, No.10,pp.1987-2020. Kimes, D. C. and Nelson, R. F. (1998) Attributes of neural networks for extracting continuous vegetation, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1998, Vol.19, No.14, pp.2639-2663. Pierce, L. E., Sarabandi, K. and Ulaby, F. T. (1994) Application of an artificial neural-network in canopy scattering inversion, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1994, Vol.15, No.16, pp.3263-3270. Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P., (1994) Numerical Recipes in C (CUP, 2nd ed., pp. 994). Privette, J. L. (1994) An efficient strategy for the inversion of bidirectional reflectance models with satellite remote sensing data, Ph.D. thesis, University of Colorado (postscript version - view using pageview, 3.12MB).

Vegetation Science - UCL Department of Geography

Related documents

Products

Support

Vegetation Science - UCL Department of Geography

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib