nm_inversionv1 - UCL Department of Geography

GEOGG121: Methods Linear & non-linear models Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: mdisney@ucl.geog.ac.uk www.geog.ucl.ac.uk/~mdisney Lecture outline • Linear models and inversion – Least squares revisited, examples – Parameter estimation, uncertainty • Non-linear models, inversion – The non-linear problem – Parameter estimation, uncertainty – Approaches Reading • Linear models and inversion – Linear modelling notes: Lewis, 2010; ch 2 of Press et al. (1992) Numerical Recipes in C (http://apps.nrbook.com/c/index.html) • Non-linear models, inversion – Gradient descent: NR in C, 10,7 eg BFGS; XXX – Conjugate gradient etc.: NR in C, ch. 10; XXX – Liang, S. (2004) Quantitative Remote Sensing of the Land Surface, ch. 8 (section 8.2). Linear Models • For some set of independent variables x = {x0, x1, x2, … , xn} have a model of a dependent variable y which can be expressed as a linear combination of the independent variables. y  a0  a1 x1 y  a 0  a1 x1  a 2 x 2 y  in a x i 0 i i y  ax y  ax Linear Models? i n y  a 0   a i sin x i  bi  i n i 1 y  a 0   a i sin x i   bi cos x i  i 1 y  in a x i 0 i i 0  a 0  a1 x 0  a 2 x  ...  a n x 2 0 y  a0 e  a1 x n 0 Linear Mixture Modelling • Spectral mixture modelling: – Proportionate mixture of (n) end-member spectra r  i 0  i Fi i n 1  i  n 1 i 0 r  F Fi  1 – First-order model: no interactions between components Linear Mixture Modelling • r = {rl0, rl1, … rlm, 1.0} – Measured reflectance spectrum (m wavelengths) • nx(m+1) matrix:  rl 0    l0 0     rl1    l1 0           rlm 1    l m 1 0  1.0   1.0     l0 1  l1 0  l0 2  l1 2    l m 1 1  l m 1 21 1.0 1.0 r  F    l0 n 1   P0      l1 n 1   P1      l m 1 n 1    1.0    P   2     P   n 1  Linear Mixture Modelling • n=(m+1) – square matrix r  F 1 F  r • Eg n=2 (wavebands), m=2 (end-members) 2 Reflectance Band 2 r 3 1 Reflectance Band 1 Linear Mixture Modelling • • as described, is not robust to error in measurement or end-member spectra; Proportions must be constrained to lie in the interval (0,1) – • • - effectively a convex hull constraint; m+1 end-member spectra can be considered; needs prior definition of end-member spectra; cannot directly take into account any variation in component reflectances – e.g. due to topographic effects Linear Mixture Modelling in the presence of Noise r  F  e • Define residual vector e  e • minimise the sum of the squares of the error e, i.e. r   F  r   F    l  m1 l 0 r l  l  F  2 Method of Least Squares (MLS)  ee r   F  r   F    l  m1 l 0 Error Minimisation • Set (partial) derivatives to zero   2 l  m 1   l 0 rl   l  F   l F   l  m 1    2l 0  rl   l  F Pi Fi        0      l  F Fi   l i  l  m 1 l 0 0  2l 0 rl  l i   l  m 1  r l  m 1 l 0 l    l  F  l i  l   F  l i   r l  l  F  2  ee  l m1 Error Minimisation l 0 rl  l i   ll 0m1  l  F  l i  OMP • Can write as:  rl  ll 0    ll 0  ll 0    l  m 1 l  m  1  rl  ll 1    ll 0  ll 1        l 0   l 0   r  l     l   l  l n 1  l 0 l n 1  l   ll 1  ll 0  ll 1  ll 1   ll 1  ll n 1 Solve for P by matrix inversion  ll n 1  ll 0  ll n 1  ll 1  F0      F1            ll n 1  ll n 1  Fn 1   e.g. Linear Regression y  c  mx  yl  l n1  1       l  0  y l xl  l  0  xl O  MP  y  1    yx   x    M 1 xl  c    2  xl  m  l  n 1 1  x 2  x   2  xx   x 1   x x 2 xx 2 x  c    2  x  m   y y 2 2 xx  x xy2  2 xx    2 xy 2 xx x RMSE e  2 l  n 1 2     y  c  mx  i i l 0 RMSE  2 nm y x2 x x1 x Weight of Determination (1/w) • Calculate uncertainty at y(x) 1  c  y x   Q  P        x  m 1 T 1 Q M Q w 1  e w  1 xx  1 w  xx2  2 P1 RMSE P0 P1 RMSE P0 Issues • • • • Parameter transformation and bounding Weighting of the error function Using additional information Scaling Parameter transformation and bounding • Issue of variable sensitivity – E.g. saturation of LAI effects – Reduce by transformation • Approximately linearise parameters • Need to consider ‘average’ effects Weighting of the error function • Different wavelengths/angles have different sensitivity to parameters • Previously, weighted all equally – Equivalent to assuming ‘noise’ equal for all observations iN 2          i   i  measured modelled RMSE  i 1 iN 1 i 1 Weighting of the error function • Can ‘target’ sensitivity – E.g. to chlorophyll concentration – Use derivative weighting (Privette 1994)           i   i  modelled  P measured   i 1  2 iN        i 1  P  iN RMSE  2 Using additional information • Typically, for Vegetation, use canopy growth model – See Moulin et al. (1998) • Provides expectation of (e.g.) LAI – Need: • planting date • Daily mean temperature • Varietal information (?) • Use in various ways – Reduce parameter search space – Expectations of coupling between parameters Scaling • Many parameters scale approximately linearly – E.g. cover, albedo, fAPAR • Many do not – E.g. LAI • Need to (at least) understand impact of scaling Crop Mosaic LAI 1 LAI 4 LAI 0 L A I 1 Crop Mosaic • 20% of LAI 0, 40% LAI 4, 40% LAI 1. • ‘real’ total value of LAI: – 0.2x0+0.4x4+0.4x1=2.0.    (1  exp(  LAI / 2))   s exp(  LAI / 2) visible: NIR   0.9;  s  0.3   0.2;  s  0.1 L A I 4 L A I 0 canopy reflectance 0.9 0.8 0.7 reflectance 0.6 0.5 visible NIR 0.4 0.3 0.2 0.1 49 46 43 40 37 34 31 28 25 22 19 16 13 10 7 4 1 0 LAI canopy reflectance over the pixel is 0.15 and 0.60 for the NIR. • If assume the model above, this equates to an LAI of 1.4. • ‘real’ answer LAI 2.0 Linear Kernel-driven Modelling of Canopy Reflectance • Semi-empirical models to deal with BRDF effects – Originally due to Roujean et al (1992) – Also Wanner et al (1995) – Practical use in MODIS products • BRDF effects from wide FOV sensors – MODIS, AVHRR, VEGETATION, MERIS Satellite, Day 1 Satellite, Day 2 X 0.45 0.4 0.35 0.25 0.2 0.15 0.1 0.05 Julian Day original NDVI MVC BRDF normalised NDVI AVHRR NDVI over Hapex-Sahel, 1992 282 275 268 261 254 247 240 233 226 218 206 199 192 185 178 171 164 157 150 143 0 136 NDVI 0.3 Linear BRDF Model • of form:  l , ,    f iso l   f vol l k vol ,    f geo l k geo ,   Model parameters: Isotropic Volumetric Geometric-Optics Linear BRDF Model • of form:  l , ,    f iso l   f vol l k vol ,    f geo l k geo ,   Model Kernels: Volumetric Geometric-Optics Volumetric Scattering • Develop from RT theory – – – – Spherical LAD Lambertian soil Leaf reflectance = transmittance First order scattering • Multiple scattering assumed isotropic    sin   cos       2 2    1 ,   l  1  eX  seX 3      L    X  2   Volumetric Scattering e    sin   cos   2    2    l   1 ,   1  eX  seX 3     X  X  L    2    1 X • If LAI small:    sin   cos       2 l   L     2   L      1     s 1    ,   3    2     2        sin   cos       2 l  2   L      1     s  ,   3     2       sin   cos       2 2   L          s  1 ,   l  3     2    Volumetric Scattering  thin l , ,   a0 l   a1 l k vol ,  • Write as:    sin   cos   2       k vol ,       2 L l a0 l    s 6 L l a1 l   3 RossThin kernel Similar approach for RossThick  L       exp  LB  exp    2     Geometric Optics • Consider shadowing/protrusion from spheroid on stick (Li-Strahler 1985)  shadowed crown Sunlit crown r Projection (shadowed) b h A() shadowed ground Geometric Optics • Assume ground and crown brightness equal • Fix ‘shape’ parameters • Linearised model – LiSparse – LiDense RossThick Kernels LiSparse 1 0.5 0 kernel value -75 -60 -45 -30 -15 -0.5 0 15 30 45 60 -1 -1.5 -2 -2.5 Retro reflection (‘hot spot’) -3 view angle / degrees Volumetric (RossThick) and Geometric (LiSparse) kernels for viewing angle of 45 degrees 75 Kernel Models • Consider proportionate (a) mixture of two scattering effects  l , ,   1  a a l   aa l    l  0 vol 0 geo mult  1  a a1vol l kvol ,   aa1geo l k geo ,  Using Linear BRDF Models for angular normalisation • Account for BRDF variation • Absolutely vital for compositing samples over time (w. different view/sun angles) • BUT BRDF is source of info. too! MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43) http://www-modis.bu.edu/brdf/userguide/intro.html MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43) http://www-modis.bu.edu/brdf/userguide/intro.html BRDF Normalisation • Fit observations to model • Output predicted reflectance at standardised angles – E.g. nadir reflectance, nadir illumination • Typically not stable – E.g. nadir reflectance, SZA at local mean  l, ,   P  K  f iso l     P   f vol l    f l   geo    1   K   k vol ,     k ,    geo  And uncertainty via 1 T 1 Q M Q w Linear BRDF Models to track change • Examine change due to burn (MODIS) 220 days of Terra (blue) and Aqua (red) sampling over point in Australia, w. sza (T: orange; A: cyan). Time series of NIR samples from above sampling FROM: http://modis-fire.umd.edu/Documents/atbd_mod14.pdf MODIS Channel 5 Observation DOY 275 MODIS Channel 5 Observation DOY 277 Detect Change • Need to model BRDF effects • Define measure of dis-association   observed   predicted  e  2 2   observed   predicted  e 1 1 w MODIS Channel 5 Prediction DOY 277 MODIS Channel 5 Discrepency DOY 277 MODIS Channel 5 Observation DOY 275 MODIS Channel 5 Prediction DOY 277 MODIS Channel 5 Observation DOY 277 So BRDF source of info, not JUST noise! • Use model expectation of angular reflectance behaviour to identify subtle changes Dr. Lisa Maria Rebelo, IWMI, CGIAR. 54 Detect Change • Burns are: – negative change in Channel 5 – Of ‘long’ (week’) duration • Other changes picked up – – – – E.g. clouds, cloud shadow Shorter duration or positive change (in all channels) or negative change in all channels Day of burn http://modis-fire.umd.edu/Burned_Area_Products.html Roy et al. (2005) Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data, RSE 97, 137-162. Non-linear inversion • If we cannot phrase our problem in linear form, then we have non-linear inversion problem • Key tool for many applications – Resource allocation – Systems control – (Non-linear) Model parameter estimation • AKA curve or function fitting: i.e. obtain parameter values that provide “best fit” (in some sense) to a set of observations • And estimate of parameter uncertainty, model fit 57 Options for Numerical Inversion • Same principle as for linear – i.e. find minimum of some cost func. expressing difference between model and obs – We want some set of parameters (x*) which minimise cost function f(x) i.e. for some (small) tolerance δ > 0 so for all x x - x* £ d – Where f(x*) ≤ f(x). So for some region around x*, all values of f(x) are higher (a local minima – may be many) – If region encompasses full range of x, then we have the global minima Numerical Inversion • Iterative numerical techniques – Can we differentiate model (e.g. adjoint)? • YES: Quasi-Newton (eg BFGS etc) • NO: Powell, Simplex, Simulated annealing etc. • • • • Knowledge-based systems (KBS) Artificial Neural Networks (ANNs) Genetic Algorithms (GAs) Look-up Tables (LUTs) Errico (1997) What is an adjoint model?, BAMS, 78, 2577-2591 http://paoc2001.mit.edu/cmi/development/adjoint.htm Local and Global Minima (general: optima) Need starting point How to go ‘downhill’? Bracketing of a minimum Slow, but sure How far to go ‘downhill’? w=(b-a)/(c-a) z=(x-b)/(c-a) Choose: z+w=1-w For symmetry Choose: w=z/(1-w) w 1-w z=w-w2=1-2w 0= w2 -3w+1 Golden Mean Fraction =w=0.38197 To keep proportions the same Parabolic Interpolation More rapid Inverse parabolic interpolation Brent’s method • Require ‘fast’ but robust inversion • Golden mean search – Slow but sure • Use in unfavourable areas • Use Parabolic method – when get close to minimum Multi-dimensional minimisation • Use 1D methods multiple times – In which directions? • Some methods for N-D problems – Simplex (amoeba) Downhill Simplex • Simplex: – Simplest N-D • N+1 vertices • Simplex operations: – a reflection away from the high point – a reflection and expansion away from the high point – a contraction along one dimension from the high point – a contraction along all dimensions towards the low point. • Find way to minimum Simplex Direction Set (Powell's) Method • Multiple 1-D minimsations – Inefficient along axes Powell Direction Set (Powell's) Method • Use conjugate directions – Update primary & secondary directions • Issues – Axis covariance Powell Simulated Annealing • Previous methods: – Define start point – Minimise in some direction(s) – Test & proceed • Issue: – Can get trapped in local minima • Solution (?) – Need to restart from different point Simulated Annealing Simulated Annealing • Annealing – – – – Thermodynamic phenomenon ‘slow cooling’ of metals or crystalisation of liquids Atoms ‘line up’ & form ‘pure cystal’ / Stronger (metals) Slow cooling allows time for atoms to redistribute as they lose energy (cool) – Low energy state • Quenching – ‘fast cooling’ – Polycrystaline state Simulated Annealing • Simulate ‘slow cooling’ E • Based on Boltzmann probability distribution:  Pr( E )  e kT • k – constant relating energy to temperature • System in thermal equilibrium at temperature T has distribution of energy states E • All (E) states possible, but some more likely than others • Even at low T, small probability that system may be in higher energy state Simulated Annealing • Use analogy of energy to RMSE • As decrease ‘temperature’, move to generally lower energy state • Boltzmann gives distribution of E states – So some probability of higher energy state • i.e. ‘going uphill’ – Probability of ‘uphill’ decreases as T decreases Implementation • System changes from E1 to E2 with probability exp[-(E2E1)/kT] – If(E2< E1), P>1 (threshold at 1) • System will take this option – If(E2> E1), P<1 • Generate random number • System may take this option • Probability of doing so decreases with T Simulated Annealing P= exp[-(E2-E1)/kT] – rand() - OK P= exp[-(E2-E1)/kT] – rand() - X T Simulated Annealing • Rate of cooling very important • Coupled with effects of k – exp[-(E2-E1)/kT] – So 2xk equivalent to state of T/2 • Used in a range of optimisation problems • Not much used in Remote Sensing (Artificial) Neural networks (ANN) • Another ‘Natural’ analogy – Biological NNs good at solving complex problems – Do so by ‘training’ system with ‘experience’ (Artificial) Neural networks (ANN) • ANN architecture (Artificial) Neural networks (ANN) • ‘Neurons’ – have 1 output but many inputs – Output is weighted sum of inputs – Threshold can be set • Gives non-linear response (Artificial) Neural networks (ANN) • Training – – – – – – Initialise weights for all neurons Present input layer with e.g. spectral reflectance Calculate outputs Compare outputs with e.g. biophysical parameters Update weights to attempt a match Repeat until all examples presented (Artificial) Neural networks (ANN) • Use in this way for canopy model inversion • Train other way around for forward model • Also used for classification and spectral unmixing – Again – train with examples • ANN has ability to generalise from input examples • Definition of architecture and training phases critical – Can ‘over-train’ – too specific – Similar to fitting polynomial with too high an order • Many ‘types’ of ANN – feedback/forward (Artificial) Neural networks (ANN) • In essence, trained ANN is just a (essentially) (highly) nonlinear response function • Training (definition of e.g. inverse model) is performed as separate stage to application of inversion – Can use complex models for training • Many examples in remote sensing • Issue: – How to train for arbitrary set of viewing/illumination angles? – not solved problem Genetic (or evolutionary) algorithms (GAs) • Another ‘Natural’ analogy • Phrase optimisation as ‘fitness for survival’ • Description of state encoded through ‘string’ (equivalent to genetic pattern) • Apply operations to ‘genes’ – Cross-over, mutation, inversion Genetic (or evolutionary) algorithms (GAs) • E.g. of BRDF model inversion: • Encode N-D vector representing current state of biophysical parameters as string • Apply operations: – E.g. mutation/mating with another string – See if mutant is ‘fitter to survive’ (lower RMSE) – If not, can discard (die) Genetic (or evolutionary) algorithms (GAs) • General operation: – Populate set of chromosomes (strings) – Repeat: • Determine fitness of each • Choose best set • Evolve chosen set – Using crossover, mutation or inversion – Until a chromosome found of suitable fitness Genetic (or evolutionary) algorithms (GAs) • Differ from other optimisation methods – Work on coding of parameters, not parameters themselves – Search from population set, not single members (points) – Use ‘payoff’ information (some objective function for selection) not derivatives or other auxilliary information – Use probabilistic transition rules (as with simulated annealing) not deterministic rules Genetic (or evolutionary) algorithms (GAs) • Example operation: 1. 2. 3. Define genetic representation of state Create initial population, set t=0 Compute average fitness of the set - 4. 5. 6. Assign each individual normalised fitness value Assign probability based on this Using this distribution, select N parents Pair parents at random Apply genetic operations to parent sets - generate offspring Becomes population at t+1 7. Repeat until termination criterion satisfied Genetic (or evolutionary) algorithms (GAs) – – – Flexible and powerful method Can solve problems with many small, ill-defined minima May take huge number of iterations to solve – Not applied to remote sensing model inversions Knowledge-based systems (KBS) • Seek to solve problem by incorporation of information external to the problem • Only RS inversion e.g. Kimes et al (1990;1991) – VEG model • Integrates spectral libraries, information from literature, information from human experts etc • Major problems: – Encoding and using the information – 1980s/90s ‘fad’? LUT Inversion • Sample parameter space • Calculate RMSE for each sample point • Define best fit as minimum RMSE parameters – Or function of set of points fitting to a certain tolerance • Essentially a sampled ‘exhaustive search’ LUT Inversion • Issues: – May require large sample set – Not so if function is well-behaved • for many optical EO inversions • In some cases, may assume function is locally linear over large (linearised) parameter range • Use linear interpolation – Being developed UCL/Swansea for CHRIS-PROBA – May limit search space based on some expectation • E.g. some loose VI relationship or canopy growth model or land cover map • Approach used for operational MODIS LAI/fAPAR algorithm (Myneni et al) LUT Inversion • Issues: – As operating on stored LUT, can pre-calculate model outputs • Don’t need to calculate model ‘on the fly’ as in e.g. simplex methods • Can use complex models to populate LUT – E.g. of Lewis, Saich & Disney using 3D scattering models (optical and microwave) of forest and crop – Error in inversion may be slightly higher if (non-interpolated) sparse LUT • But may still lie within desirable limits – Method is simple to code and easy to understand • essentially a sort operation on a table Summary • Range of options for non-linear inversion • ‘traditional’ NL methods: – Powell, AMOEBA • Complex to code – though library functions available • Can easily converge to local minima – Need to start at several points • Calculate canopy reflectance ‘on the fly’ – Need fast models, involving simplifications • Not felt to be suitable for operationalisation Summary • Simulated Annealing – Can deal with local minima – Slow – Need to define annealing schedule • ANNs – – – – – Train ANN from model (or measurements) ANN generalises as non-linear model Issues of variable input conditions (e.g. VZA) Can train with complex models Applied to a variety of EO problems Summary • GAs – Novel approach, suitable for highly complex inversion problems – Can be very slow – Not suitable for operationalisation • KBS – – – – Use range of information in inversion Kimes VEG model Maximises use of data Need to decide how to encode and use information Summary • LUT – Simple method • Sort – Used more and more widely for optical model inversion • Suitable for ‘well-behaved’ non-linear problems – Can operationalise – Can use arbitrarily complex models to populate LUT – Issue of LUT size • Can use additional information to limit search space • Can use interpolation for sparse LUT for ‘high information content’ inversion

nm_inversionv1 - UCL Department of Geography

Related documents

Products

Support

nm_inversionv1 - UCL Department of Geography

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib