GEOGG121: Methods Linear & non-linear models Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: mdisney@ucl.geog.ac.uk www.geog.ucl.ac.uk/~mdisney Lecture outline • Linear models and inversion – Least squares revisited, examples – Parameter estimation, uncertainty • Non-linear models, inversion – The non-linear problem – Parameter estimation, uncertainty – Approaches Reading • Linear models and inversion – Linear modelling notes: Lewis, 2010; ch 2 of Press et al. (1992) Numerical Recipes in C (http://apps.nrbook.com/c/index.html) • Non-linear models, inversion – Gradient descent: NR in C, 10,7 eg BFGS; XXX – Conjugate gradient etc.: NR in C, ch. 10; XXX – Liang, S. (2004) Quantitative Remote Sensing of the Land Surface, ch. 8 (section 8.2). Linear Models • For some set of independent variables x = {x0, x1, x2, … , xn} have a model of a dependent variable y which can be expressed as a linear combination of the independent variables. y a0 a1 x1 y a 0 a1 x1 a 2 x 2 y in a x i 0 i i y ax y ax Linear Models? i n y a 0 a i sin x i bi i n i 1 y a 0 a i sin x i bi cos x i i 1 y in a x i 0 i i 0 a 0 a1 x 0 a 2 x ... a n x 2 0 y a0 e a1 x n 0 Linear Mixture Modelling • Spectral mixture modelling: – Proportionate mixture of (n) end-member spectra r i 0 i Fi i n 1 i n 1 i 0 r F Fi 1 – First-order model: no interactions between components Linear Mixture Modelling • r = {rl0, rl1, … rlm, 1.0} – Measured reflectance spectrum (m wavelengths) • nx(m+1) matrix: rl 0 l0 0 rl1 l1 0 rlm 1 l m 1 0 1.0 1.0 l0 1 l1 0 l0 2 l1 2 l m 1 1 l m 1 21 1.0 1.0 r F l0 n 1 P0 l1 n 1 P1 l m 1 n 1 1.0 P 2 P n 1 Linear Mixture Modelling • n=(m+1) – square matrix r F 1 F r • Eg n=2 (wavebands), m=2 (end-members) 2 Reflectance Band 2 r 3 1 Reflectance Band 1 Linear Mixture Modelling • • as described, is not robust to error in measurement or end-member spectra; Proportions must be constrained to lie in the interval (0,1) – • • - effectively a convex hull constraint; m+1 end-member spectra can be considered; needs prior definition of end-member spectra; cannot directly take into account any variation in component reflectances – e.g. due to topographic effects Linear Mixture Modelling in the presence of Noise r F e • Define residual vector e e • minimise the sum of the squares of the error e, i.e. r F r F l m1 l 0 r l l F 2 Method of Least Squares (MLS) ee r F r F l m1 l 0 Error Minimisation • Set (partial) derivatives to zero 2 l m 1 l 0 rl l F l F l m 1 2l 0 rl l F Pi Fi 0 l F Fi l i l m 1 l 0 0 2l 0 rl l i l m 1 r l m 1 l 0 l l F l i l F l i r l l F 2 ee l m1 Error Minimisation l 0 rl l i ll 0m1 l F l i OMP • Can write as: rl ll 0 ll 0 ll 0 l m 1 l m 1 rl ll 1 ll 0 ll 1 l 0 l 0 r l l l l n 1 l 0 l n 1 l ll 1 ll 0 ll 1 ll 1 ll 1 ll n 1 Solve for P by matrix inversion ll n 1 ll 0 ll n 1 ll 1 F0 F1 ll n 1 ll n 1 Fn 1 e.g. Linear Regression y c mx yl l n1 1 l 0 y l xl l 0 xl O MP y 1 yx x M 1 xl c 2 xl m l n 1 1 x 2 x 2 xx x 1 x x 2 xx 2 x c 2 x m y y 2 2 xx x xy2 2 xx 2 xy 2 xx x RMSE e 2 l n 1 2 y c mx i i l 0 RMSE 2 nm y x2 x x1 x Weight of Determination (1/w) • Calculate uncertainty at y(x) 1 c y x Q P x m 1 T 1 Q M Q w 1 e w 1 xx 1 w xx2 2 P1 RMSE P0 P1 RMSE P0 Issues • • • • Parameter transformation and bounding Weighting of the error function Using additional information Scaling Parameter transformation and bounding • Issue of variable sensitivity – E.g. saturation of LAI effects – Reduce by transformation • Approximately linearise parameters • Need to consider ‘average’ effects Weighting of the error function • Different wavelengths/angles have different sensitivity to parameters • Previously, weighted all equally – Equivalent to assuming ‘noise’ equal for all observations iN 2 i i measured modelled RMSE i 1 iN 1 i 1 Weighting of the error function • Can ‘target’ sensitivity – E.g. to chlorophyll concentration – Use derivative weighting (Privette 1994) i i modelled P measured i 1 2 iN i 1 P iN RMSE 2 Using additional information • Typically, for Vegetation, use canopy growth model – See Moulin et al. (1998) • Provides expectation of (e.g.) LAI – Need: • planting date • Daily mean temperature • Varietal information (?) • Use in various ways – Reduce parameter search space – Expectations of coupling between parameters Scaling • Many parameters scale approximately linearly – E.g. cover, albedo, fAPAR • Many do not – E.g. LAI • Need to (at least) understand impact of scaling Crop Mosaic LAI 1 LAI 4 LAI 0 L A I 1 Crop Mosaic • 20% of LAI 0, 40% LAI 4, 40% LAI 1. • ‘real’ total value of LAI: – 0.2x0+0.4x4+0.4x1=2.0. (1 exp( LAI / 2)) s exp( LAI / 2) visible: NIR 0.9; s 0.3 0.2; s 0.1 L A I 4 L A I 0 canopy reflectance 0.9 0.8 0.7 reflectance 0.6 0.5 visible NIR 0.4 0.3 0.2 0.1 49 46 43 40 37 34 31 28 25 22 19 16 13 10 7 4 1 0 LAI canopy reflectance over the pixel is 0.15 and 0.60 for the NIR. • If assume the model above, this equates to an LAI of 1.4. • ‘real’ answer LAI 2.0 Linear Kernel-driven Modelling of Canopy Reflectance • Semi-empirical models to deal with BRDF effects – Originally due to Roujean et al (1992) – Also Wanner et al (1995) – Practical use in MODIS products • BRDF effects from wide FOV sensors – MODIS, AVHRR, VEGETATION, MERIS Satellite, Day 1 Satellite, Day 2 X 0.45 0.4 0.35 0.25 0.2 0.15 0.1 0.05 Julian Day original NDVI MVC BRDF normalised NDVI AVHRR NDVI over Hapex-Sahel, 1992 282 275 268 261 254 247 240 233 226 218 206 199 192 185 178 171 164 157 150 143 0 136 NDVI 0.3 Linear BRDF Model • of form: l , , f iso l f vol l k vol , f geo l k geo , Model parameters: Isotropic Volumetric Geometric-Optics Linear BRDF Model • of form: l , , f iso l f vol l k vol , f geo l k geo , Model Kernels: Volumetric Geometric-Optics Volumetric Scattering • Develop from RT theory – – – – Spherical LAD Lambertian soil Leaf reflectance = transmittance First order scattering • Multiple scattering assumed isotropic sin cos 2 2 1 , l 1 eX seX 3 L X 2 Volumetric Scattering e sin cos 2 2 l 1 , 1 eX seX 3 X X L 2 1 X • If LAI small: sin cos 2 l L 2 L 1 s 1 , 3 2 2 sin cos 2 l 2 L 1 s , 3 2 sin cos 2 2 L s 1 , l 3 2 Volumetric Scattering thin l , , a0 l a1 l k vol , • Write as: sin cos 2 k vol , 2 L l a0 l s 6 L l a1 l 3 RossThin kernel Similar approach for RossThick L exp LB exp 2 Geometric Optics • Consider shadowing/protrusion from spheroid on stick (Li-Strahler 1985) shadowed crown Sunlit crown r Projection (shadowed) b h A() shadowed ground Geometric Optics • Assume ground and crown brightness equal • Fix ‘shape’ parameters • Linearised model – LiSparse – LiDense RossThick Kernels LiSparse 1 0.5 0 kernel value -75 -60 -45 -30 -15 -0.5 0 15 30 45 60 -1 -1.5 -2 -2.5 Retro reflection (‘hot spot’) -3 view angle / degrees Volumetric (RossThick) and Geometric (LiSparse) kernels for viewing angle of 45 degrees 75 Kernel Models • Consider proportionate (a) mixture of two scattering effects l , , 1 a a l aa l l 0 vol 0 geo mult 1 a a1vol l kvol , aa1geo l k geo , Using Linear BRDF Models for angular normalisation • Account for BRDF variation • Absolutely vital for compositing samples over time (w. different view/sun angles) • BUT BRDF is source of info. too! MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43) http://www-modis.bu.edu/brdf/userguide/intro.html MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43) http://www-modis.bu.edu/brdf/userguide/intro.html BRDF Normalisation • Fit observations to model • Output predicted reflectance at standardised angles – E.g. nadir reflectance, nadir illumination • Typically not stable – E.g. nadir reflectance, SZA at local mean l, , P K f iso l P f vol l f l geo 1 K k vol , k , geo And uncertainty via 1 T 1 Q M Q w Linear BRDF Models to track change • Examine change due to burn (MODIS) 220 days of Terra (blue) and Aqua (red) sampling over point in Australia, w. sza (T: orange; A: cyan). Time series of NIR samples from above sampling FROM: http://modis-fire.umd.edu/Documents/atbd_mod14.pdf MODIS Channel 5 Observation DOY 275 MODIS Channel 5 Observation DOY 277 Detect Change • Need to model BRDF effects • Define measure of dis-association observed predicted e 2 2 observed predicted e 1 1 w MODIS Channel 5 Prediction DOY 277 MODIS Channel 5 Discrepency DOY 277 MODIS Channel 5 Observation DOY 275 MODIS Channel 5 Prediction DOY 277 MODIS Channel 5 Observation DOY 277 So BRDF source of info, not JUST noise! • Use model expectation of angular reflectance behaviour to identify subtle changes Dr. Lisa Maria Rebelo, IWMI, CGIAR. 54 Detect Change • Burns are: – negative change in Channel 5 – Of ‘long’ (week’) duration • Other changes picked up – – – – E.g. clouds, cloud shadow Shorter duration or positive change (in all channels) or negative change in all channels Day of burn http://modis-fire.umd.edu/Burned_Area_Products.html Roy et al. (2005) Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data, RSE 97, 137-162. Non-linear inversion • If we cannot phrase our problem in linear form, then we have non-linear inversion problem • Key tool for many applications – Resource allocation – Systems control – (Non-linear) Model parameter estimation • AKA curve or function fitting: i.e. obtain parameter values that provide “best fit” (in some sense) to a set of observations • And estimate of parameter uncertainty, model fit 57 Options for Numerical Inversion • Same principle as for linear – i.e. find minimum of some cost func. expressing difference between model and obs – We want some set of parameters (x*) which minimise cost function f(x) i.e. for some (small) tolerance δ > 0 so for all x x - x* £ d – Where f(x*) ≤ f(x). So for some region around x*, all values of f(x) are higher (a local minima – may be many) – If region encompasses full range of x, then we have the global minima Numerical Inversion • Iterative numerical techniques – Can we differentiate model (e.g. adjoint)? • YES: Quasi-Newton (eg BFGS etc) • NO: Powell, Simplex, Simulated annealing etc. • • • • Knowledge-based systems (KBS) Artificial Neural Networks (ANNs) Genetic Algorithms (GAs) Look-up Tables (LUTs) Errico (1997) What is an adjoint model?, BAMS, 78, 2577-2591 http://paoc2001.mit.edu/cmi/development/adjoint.htm Local and Global Minima (general: optima) Need starting point How to go ‘downhill’? Bracketing of a minimum Slow, but sure How far to go ‘downhill’? w=(b-a)/(c-a) z=(x-b)/(c-a) Choose: z+w=1-w For symmetry Choose: w=z/(1-w) w 1-w z=w-w2=1-2w 0= w2 -3w+1 Golden Mean Fraction =w=0.38197 To keep proportions the same Parabolic Interpolation More rapid Inverse parabolic interpolation Brent’s method • Require ‘fast’ but robust inversion • Golden mean search – Slow but sure • Use in unfavourable areas • Use Parabolic method – when get close to minimum Multi-dimensional minimisation • Use 1D methods multiple times – In which directions? • Some methods for N-D problems – Simplex (amoeba) Downhill Simplex • Simplex: – Simplest N-D • N+1 vertices • Simplex operations: – a reflection away from the high point – a reflection and expansion away from the high point – a contraction along one dimension from the high point – a contraction along all dimensions towards the low point. • Find way to minimum Simplex Direction Set (Powell's) Method • Multiple 1-D minimsations – Inefficient along axes Powell Direction Set (Powell's) Method • Use conjugate directions – Update primary & secondary directions • Issues – Axis covariance Powell Simulated Annealing • Previous methods: – Define start point – Minimise in some direction(s) – Test & proceed • Issue: – Can get trapped in local minima • Solution (?) – Need to restart from different point Simulated Annealing Simulated Annealing • Annealing – – – – Thermodynamic phenomenon ‘slow cooling’ of metals or crystalisation of liquids Atoms ‘line up’ & form ‘pure cystal’ / Stronger (metals) Slow cooling allows time for atoms to redistribute as they lose energy (cool) – Low energy state • Quenching – ‘fast cooling’ – Polycrystaline state Simulated Annealing • Simulate ‘slow cooling’ E • Based on Boltzmann probability distribution: Pr( E ) e kT • k – constant relating energy to temperature • System in thermal equilibrium at temperature T has distribution of energy states E • All (E) states possible, but some more likely than others • Even at low T, small probability that system may be in higher energy state Simulated Annealing • Use analogy of energy to RMSE • As decrease ‘temperature’, move to generally lower energy state • Boltzmann gives distribution of E states – So some probability of higher energy state • i.e. ‘going uphill’ – Probability of ‘uphill’ decreases as T decreases Implementation • System changes from E1 to E2 with probability exp[-(E2E1)/kT] – If(E2< E1), P>1 (threshold at 1) • System will take this option – If(E2> E1), P<1 • Generate random number • System may take this option • Probability of doing so decreases with T Simulated Annealing P= exp[-(E2-E1)/kT] – rand() - OK P= exp[-(E2-E1)/kT] – rand() - X T Simulated Annealing • Rate of cooling very important • Coupled with effects of k – exp[-(E2-E1)/kT] – So 2xk equivalent to state of T/2 • Used in a range of optimisation problems • Not much used in Remote Sensing (Artificial) Neural networks (ANN) • Another ‘Natural’ analogy – Biological NNs good at solving complex problems – Do so by ‘training’ system with ‘experience’ (Artificial) Neural networks (ANN) • ANN architecture (Artificial) Neural networks (ANN) • ‘Neurons’ – have 1 output but many inputs – Output is weighted sum of inputs – Threshold can be set • Gives non-linear response (Artificial) Neural networks (ANN) • Training – – – – – – Initialise weights for all neurons Present input layer with e.g. spectral reflectance Calculate outputs Compare outputs with e.g. biophysical parameters Update weights to attempt a match Repeat until all examples presented (Artificial) Neural networks (ANN) • Use in this way for canopy model inversion • Train other way around for forward model • Also used for classification and spectral unmixing – Again – train with examples • ANN has ability to generalise from input examples • Definition of architecture and training phases critical – Can ‘over-train’ – too specific – Similar to fitting polynomial with too high an order • Many ‘types’ of ANN – feedback/forward (Artificial) Neural networks (ANN) • In essence, trained ANN is just a (essentially) (highly) nonlinear response function • Training (definition of e.g. inverse model) is performed as separate stage to application of inversion – Can use complex models for training • Many examples in remote sensing • Issue: – How to train for arbitrary set of viewing/illumination angles? – not solved problem Genetic (or evolutionary) algorithms (GAs) • Another ‘Natural’ analogy • Phrase optimisation as ‘fitness for survival’ • Description of state encoded through ‘string’ (equivalent to genetic pattern) • Apply operations to ‘genes’ – Cross-over, mutation, inversion Genetic (or evolutionary) algorithms (GAs) • E.g. of BRDF model inversion: • Encode N-D vector representing current state of biophysical parameters as string • Apply operations: – E.g. mutation/mating with another string – See if mutant is ‘fitter to survive’ (lower RMSE) – If not, can discard (die) Genetic (or evolutionary) algorithms (GAs) • General operation: – Populate set of chromosomes (strings) – Repeat: • Determine fitness of each • Choose best set • Evolve chosen set – Using crossover, mutation or inversion – Until a chromosome found of suitable fitness Genetic (or evolutionary) algorithms (GAs) • Differ from other optimisation methods – Work on coding of parameters, not parameters themselves – Search from population set, not single members (points) – Use ‘payoff’ information (some objective function for selection) not derivatives or other auxilliary information – Use probabilistic transition rules (as with simulated annealing) not deterministic rules Genetic (or evolutionary) algorithms (GAs) • Example operation: 1. 2. 3. Define genetic representation of state Create initial population, set t=0 Compute average fitness of the set - 4. 5. 6. Assign each individual normalised fitness value Assign probability based on this Using this distribution, select N parents Pair parents at random Apply genetic operations to parent sets - generate offspring Becomes population at t+1 7. Repeat until termination criterion satisfied Genetic (or evolutionary) algorithms (GAs) – – – Flexible and powerful method Can solve problems with many small, ill-defined minima May take huge number of iterations to solve – Not applied to remote sensing model inversions Knowledge-based systems (KBS) • Seek to solve problem by incorporation of information external to the problem • Only RS inversion e.g. Kimes et al (1990;1991) – VEG model • Integrates spectral libraries, information from literature, information from human experts etc • Major problems: – Encoding and using the information – 1980s/90s ‘fad’? LUT Inversion • Sample parameter space • Calculate RMSE for each sample point • Define best fit as minimum RMSE parameters – Or function of set of points fitting to a certain tolerance • Essentially a sampled ‘exhaustive search’ LUT Inversion • Issues: – May require large sample set – Not so if function is well-behaved • for many optical EO inversions • In some cases, may assume function is locally linear over large (linearised) parameter range • Use linear interpolation – Being developed UCL/Swansea for CHRIS-PROBA – May limit search space based on some expectation • E.g. some loose VI relationship or canopy growth model or land cover map • Approach used for operational MODIS LAI/fAPAR algorithm (Myneni et al) LUT Inversion • Issues: – As operating on stored LUT, can pre-calculate model outputs • Don’t need to calculate model ‘on the fly’ as in e.g. simplex methods • Can use complex models to populate LUT – E.g. of Lewis, Saich & Disney using 3D scattering models (optical and microwave) of forest and crop – Error in inversion may be slightly higher if (non-interpolated) sparse LUT • But may still lie within desirable limits – Method is simple to code and easy to understand • essentially a sort operation on a table Summary • Range of options for non-linear inversion • ‘traditional’ NL methods: – Powell, AMOEBA • Complex to code – though library functions available • Can easily converge to local minima – Need to start at several points • Calculate canopy reflectance ‘on the fly’ – Need fast models, involving simplifications • Not felt to be suitable for operationalisation Summary • Simulated Annealing – Can deal with local minima – Slow – Need to define annealing schedule • ANNs – – – – – Train ANN from model (or measurements) ANN generalises as non-linear model Issues of variable input conditions (e.g. VZA) Can train with complex models Applied to a variety of EO problems Summary • GAs – Novel approach, suitable for highly complex inversion problems – Can be very slow – Not suitable for operationalisation • KBS – – – – Use range of information in inversion Kimes VEG model Maximises use of data Need to decide how to encode and use information Summary • LUT – Simple method • Sort – Used more and more widely for optical model inversion • Suitable for ‘well-behaved’ non-linear problems – Can operationalise – Can use arbitrarily complex models to populate LUT – Issue of LUT size • Can use additional information to limit search space • Can use interpolation for sparse LUT for ‘high information content’ inversion