nm_inversionv1 - UCL Department of Geography

advertisement
GEOGG121: Methods
Linear & non-linear models
Dr. Mathias (Mat) Disney
UCL Geography
Office: 113, Pearson Building
Tel: 7670 0592
Email: mdisney@ucl.geog.ac.uk
www.geog.ucl.ac.uk/~mdisney
Lecture outline
• Linear models and inversion
– Least squares revisited, examples
– Parameter estimation, uncertainty
• Non-linear models, inversion
– The non-linear problem
– Parameter estimation, uncertainty
– Approaches
Reading
• Linear models and inversion
– Linear modelling notes: Lewis, 2010; ch 2 of Press et al. (1992)
Numerical Recipes in C (http://apps.nrbook.com/c/index.html)
• Non-linear models, inversion
– Gradient descent: NR in C, 10,7 eg BFGS; XXX
– Conjugate gradient etc.: NR in C, ch. 10; XXX
– Liang, S. (2004) Quantitative Remote Sensing of the Land
Surface, ch. 8 (section 8.2).
Linear Models
• For some set of independent variables
x = {x0, x1, x2, … , xn}
have a model of a dependent variable y which can be expressed
as a linear combination of the independent variables.
y  a0  a1 x1
y  a 0  a1 x1  a 2 x 2
y 
in
a x
i 0
i
i
y  ax
y  ax
Linear Models?
i n
y  a 0   a i sin x i  bi 
i n
i 1
y  a 0   a i sin x i   bi cos x i 
i 1
y 
in
a x
i 0
i
i
0
 a 0  a1 x 0  a 2 x  ...  a n x
2
0
y  a0 e
 a1 x
n
0
Linear Mixture Modelling
• Spectral mixture modelling:
– Proportionate mixture of (n) end-member spectra
r  i 0  i Fi
i n 1

i  n 1
i 0
r  F
Fi  1
– First-order model: no interactions between components
Linear Mixture Modelling
• r = {rl0, rl1, … rlm, 1.0}
– Measured reflectance spectrum (m wavelengths)
• nx(m+1) matrix:
 rl 0 
  l0 0



 rl1 
  l1 0
    




 rlm 1 
  l m 1 0
 1.0 
 1.0



 l0 1
 l1 0
 l0 2
 l1 2


 l m 1 1
 l m 1 21
1.0
1.0
r  F


 l0 n 1   P0 

 
 l1 n 1   P1 


  l m 1 n 1 


1.0


 P 
 2 
  
P 
 n 1 
Linear Mixture Modelling
• n=(m+1) – square matrix
r  F
1
F  r
• Eg n=2 (wavebands), m=2 (end-members)
2
Reflectance
Band 2
r
3
1
Reflectance
Band 1
Linear Mixture Modelling
•
•
as described, is not robust to error in measurement or
end-member spectra;
Proportions must be constrained to lie in the interval (0,1)
–
•
•
- effectively a convex hull constraint;
m+1 end-member spectra can be considered;
needs prior definition of end-member spectra; cannot
directly take into account any variation in component
reflectances
–
e.g. due to topographic effects
Linear Mixture Modelling in the presence of
Noise
r  F  e
• Define residual vector e  e
• minimise the sum of the squares of the error e,
i.e.
r   F  r   F   
l  m1
l 0
r
l
 l  F

2
Method of Least Squares (MLS)
 ee
r   F  r   F   
l  m1
l 0
Error Minimisation
• Set (partial) derivatives to zero


2
l  m 1

 l 0 rl   l  F 
 l F


l  m 1 

 2l 0  rl   l  F
Pi
Fi





  0



  l  F Fi   l i

l  m 1
l 0
0
 2l 0
rl  l i 

l  m 1

r
l  m 1
l 0
l

  l  F  l i

l

 F  l i


r
l
 l  F

2
 ee

l m1
Error Minimisation
l 0
rl  l i   ll 0m1  l  F  l i 
OMP
• Can write as:
 rl  ll 0 
  ll 0  ll 0



l  m 1
l

m

1
 rl  ll 1 
  ll 0  ll 1



 


l 0

 l 0 
 r  l  
  l   l 
l n 1 
l 0
l n 1
 l

 ll 1  ll 0
 ll 1  ll 1

 ll 1  ll n 1
Solve for P by matrix inversion
 ll n 1  ll 0
 ll n 1  ll 1
 F0 



 F1 
  





  ll n 1  ll n 1  Fn 1 

e.g. Linear Regression
y  c  mx
 yl  l n1  1

   

l  0  y l xl 
l  0  xl
O  MP
 y  1
 
 yx   x
  
M
1
xl  c 
 
2 
xl  m 
l  n 1
1  x 2  x 
 2
 xx   x 1 
 x x
2
xx
2
x  c 
 
2 
x  m 

y
y
2
2
xx
 x xy2

2
xx
 

2
xy
2
xx
x
RMSE
e 
2
l  n 1
2




y

c

mx
 i
i
l 0
RMSE 
2
nm
y
x2
x
x1
x
Weight of Determination (1/w)
• Calculate uncertainty at y(x)
1  c 
y x   Q  P      
 x  m
1
T
1
Q M Q
w
1
 e
w

1
xx
 1
w
 xx2

2
P1
RMSE
P0
P1
RMSE
P0
Issues
•
•
•
•
Parameter transformation and bounding
Weighting of the error function
Using additional information
Scaling
Parameter transformation and bounding
• Issue of variable sensitivity
– E.g. saturation of LAI effects
– Reduce by transformation
• Approximately linearise parameters
• Need to consider ‘average’ effects
Weighting of the error function
• Different wavelengths/angles have different
sensitivity to parameters
• Previously, weighted all equally
– Equivalent to assuming ‘noise’ equal for all
observations
iN
2









i


i
 measured
modelled
RMSE 
i 1
iN
1
i 1
Weighting of the error function
• Can ‘target’ sensitivity
– E.g. to chlorophyll concentration
– Use derivative weighting (Privette 1994)
 








i


i

modelled
 P measured


i 1 
2
iN


 

 
i 1  P 
iN
RMSE 
2
Using additional information
• Typically, for Vegetation, use canopy growth model
– See Moulin et al. (1998)
• Provides expectation of (e.g.) LAI
– Need:
• planting date
• Daily mean temperature
• Varietal information (?)
• Use in various ways
– Reduce parameter search space
– Expectations of coupling between parameters
Scaling
• Many parameters scale approximately linearly
– E.g. cover, albedo, fAPAR
• Many do not
– E.g. LAI
• Need to (at least) understand impact of scaling
Crop Mosaic
LAI 1
LAI 4
LAI 0
L
A
I
1
Crop Mosaic
• 20% of LAI 0, 40% LAI 4, 40% LAI 1.
• ‘real’ total value of LAI:
– 0.2x0+0.4x4+0.4x1=2.0.
   (1  exp(  LAI / 2))   s exp(  LAI / 2)
visible:
NIR
  0.9;  s  0.3
  0.2;  s  0.1
L
A
I
4
L
A
I
0
canopy reflectance
0.9
0.8
0.7
reflectance
0.6
0.5
visible
NIR
0.4
0.3
0.2
0.1
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
LAI
canopy reflectance over the pixel is 0.15 and 0.60 for the NIR.
• If assume the model above, this equates to an LAI of 1.4.
• ‘real’ answer LAI 2.0
Linear Kernel-driven Modelling of Canopy
Reflectance
• Semi-empirical models to deal with BRDF effects
– Originally due to Roujean et al (1992)
– Also Wanner et al (1995)
– Practical use in MODIS products
• BRDF effects from wide FOV sensors
– MODIS, AVHRR, VEGETATION, MERIS
Satellite, Day 1
Satellite, Day 2
X
0.45
0.4
0.35
0.25
0.2
0.15
0.1
0.05
Julian Day
original NDVI
MVC
BRDF normalised NDVI
AVHRR NDVI over Hapex-Sahel, 1992
282
275
268
261
254
247
240
233
226
218
206
199
192
185
178
171
164
157
150
143
0
136
NDVI
0.3
Linear BRDF Model
• of form:
 l , ,    f iso l   f vol l k vol ,    f geo l k geo ,  
Model parameters:
Isotropic
Volumetric
Geometric-Optics
Linear BRDF Model
• of form:
 l , ,    f iso l   f vol l k vol ,    f geo l k geo ,  
Model Kernels:
Volumetric
Geometric-Optics
Volumetric Scattering
• Develop from RT theory
–
–
–
–
Spherical LAD
Lambertian soil
Leaf reflectance = transmittance
First order scattering
• Multiple scattering assumed isotropic



sin


cos






2
2


 1 ,   l 
1  eX  seX
3
  


L   
X 
2  
Volumetric Scattering
e



sin   cos   2   
2



l 
 1 ,  
1  eX  seX
3
  

X

X 
L   
2  
 1 X
• If LAI small:



sin   cos     

2 l 
 L    
2
  L     
1

   s 1 

 ,  
3
  
2   
 2   




sin   cos     

2 l 
2
  L     
1

   s
 ,  
3
  
 2   



sin   cos     

2
2
  L     

   s
 1 ,   l 
3
  
 2   
Volumetric Scattering
 thin l , ,   a0 l   a1 l k vol , 
• Write as:



sin   cos   2   

 
k vol ,   

 
2
L l
a0 l  
 s
6
L l
a1 l  
3
RossThin kernel
Similar approach for RossThick
 L    
  exp  LB 
exp  

2




Geometric Optics
• Consider shadowing/protrusion from spheroid on
stick (Li-Strahler 1985)

shadowed crown
Sunlit crown
r
Projection
(shadowed)
b
h
A()
shadowed ground
Geometric Optics
• Assume ground and crown brightness equal
• Fix ‘shape’ parameters
• Linearised model
– LiSparse
– LiDense
RossThick
Kernels
LiSparse
1
0.5
0
kernel value
-75
-60
-45
-30
-15
-0.5
0
15
30
45
60
-1
-1.5
-2
-2.5
Retro reflection (‘hot spot’)
-3
view angle / degrees
Volumetric (RossThick) and Geometric (LiSparse) kernels
for viewing angle of 45 degrees
75
Kernel Models
• Consider proportionate (a) mixture of two
scattering effects
 l , ,  
1  a a l   aa l    l 
0 vol
0 geo
mult
 1  a a1vol l kvol ,   aa1geo l k geo , 
Using Linear BRDF Models for angular
normalisation
• Account for BRDF variation
• Absolutely vital for compositing samples
over time (w. different view/sun angles)
• BUT BRDF is source of info. too!
MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)
http://www-modis.bu.edu/brdf/userguide/intro.html
MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)
http://www-modis.bu.edu/brdf/userguide/intro.html
BRDF Normalisation
• Fit observations to model
• Output predicted reflectance at standardised
angles
– E.g. nadir reflectance, nadir illumination
• Typically not stable
– E.g. nadir reflectance, SZA at local mean
 l, ,   P  K
 f iso l  


P   f vol l  
 f l 
 geo 


1


K   k vol ,   
 k ,  
 geo

And uncertainty via
1
T
1
Q M Q
w
Linear BRDF Models to track change
• Examine change due to burn (MODIS)
220 days of Terra (blue) and Aqua (red) sampling over
point in Australia, w. sza (T: orange; A: cyan).
Time series of NIR
samples from above
sampling
FROM: http://modis-fire.umd.edu/Documents/atbd_mod14.pdf
MODIS Channel 5 Observation
DOY 275
MODIS Channel 5 Observation
DOY 277
Detect Change
• Need to model BRDF effects
• Define measure of dis-association


observed
  predicted 
e 
2
2


observed
  predicted 
e 1
1
w
MODIS Channel 5 Prediction
DOY 277
MODIS Channel 5 Discrepency
DOY 277
MODIS Channel 5 Observation
DOY 275
MODIS Channel 5 Prediction
DOY 277
MODIS Channel 5 Observation
DOY 277
So BRDF source of info, not JUST noise!
• Use model expectation of angular reflectance
behaviour to identify subtle changes
Dr. Lisa Maria Rebelo, IWMI, CGIAR.
54
Detect Change
• Burns are:
– negative change in Channel 5
– Of ‘long’ (week’) duration
• Other changes picked up
–
–
–
–
E.g. clouds, cloud shadow
Shorter duration
or positive change (in all channels)
or negative change in all channels
Day of burn
http://modis-fire.umd.edu/Burned_Area_Products.html
Roy et al. (2005) Prototyping a global algorithm for systematic fire-affected area mapping
using MODIS time series data, RSE 97, 137-162.
Non-linear inversion
• If we cannot phrase our problem in linear form, then we
have non-linear inversion problem
• Key tool for many applications
– Resource allocation
– Systems control
– (Non-linear) Model parameter estimation
• AKA curve or function fitting: i.e. obtain parameter values that provide
“best fit” (in some sense) to a set of observations
• And estimate of parameter uncertainty, model fit
57
Options for Numerical Inversion
• Same principle as for linear
– i.e. find minimum of some cost func. expressing difference between
model and obs
– We want some set of parameters (x*) which minimise cost function
f(x) i.e. for some (small) tolerance δ > 0 so for all x
x - x* £ d
– Where f(x*) ≤ f(x). So for some region around x*, all values of f(x)
are higher (a local minima – may be many)
– If region encompasses full range of x, then we have the global
minima
Numerical Inversion
• Iterative numerical techniques
– Can we differentiate model (e.g. adjoint)?
• YES: Quasi-Newton (eg BFGS etc)
• NO: Powell, Simplex, Simulated annealing etc.
•
•
•
•
Knowledge-based systems (KBS)
Artificial Neural Networks (ANNs)
Genetic Algorithms (GAs)
Look-up Tables (LUTs)
Errico (1997) What is an adjoint model?, BAMS, 78, 2577-2591
http://paoc2001.mit.edu/cmi/development/adjoint.htm
Local and Global Minima (general: optima)
Need starting point
How to go ‘downhill’?
Bracketing of a minimum
Slow, but sure
How far to go ‘downhill’?
w=(b-a)/(c-a)
z=(x-b)/(c-a)
Choose:
z+w=1-w
For symmetry
Choose:
w=z/(1-w)
w
1-w
z=w-w2=1-2w
0= w2 -3w+1
Golden Mean Fraction =w=0.38197
To keep
proportions the
same
Parabolic Interpolation
More rapid
Inverse parabolic
interpolation
Brent’s method
• Require ‘fast’ but robust inversion
• Golden mean search
– Slow but sure
• Use in unfavourable areas
• Use Parabolic method
– when get close to minimum
Multi-dimensional minimisation
• Use 1D methods multiple times
– In which directions?
• Some methods for N-D problems
– Simplex (amoeba)
Downhill Simplex
•
Simplex:
– Simplest N-D
• N+1 vertices
•
Simplex operations:
– a reflection away from the high
point
– a reflection and expansion away
from the high point
– a contraction along one dimension
from the high point
– a contraction along all dimensions
towards the low point.
•
Find way to minimum
Simplex
Direction Set (Powell's) Method
• Multiple 1-D
minimsations
– Inefficient
along axes
Powell
Direction Set (Powell's) Method
• Use conjugate
directions
– Update primary
& secondary
directions
• Issues
– Axis covariance
Powell
Simulated Annealing
• Previous methods:
– Define start point
– Minimise in some direction(s)
– Test & proceed
• Issue:
– Can get trapped in local minima
• Solution (?)
– Need to restart from different point
Simulated Annealing
Simulated Annealing
• Annealing
–
–
–
–
Thermodynamic phenomenon
‘slow cooling’ of metals or crystalisation of liquids
Atoms ‘line up’ & form ‘pure cystal’ / Stronger (metals)
Slow cooling allows time for atoms to redistribute as they lose
energy (cool)
– Low energy state
• Quenching
– ‘fast cooling’
– Polycrystaline state
Simulated Annealing
• Simulate ‘slow cooling’
E
• Based on Boltzmann probability distribution:

Pr( E )  e
kT
• k – constant relating energy to temperature
• System in thermal equilibrium at temperature T has distribution of
energy states E
• All (E) states possible, but some more likely than others
• Even at low T, small probability that system may be in higher energy
state
Simulated Annealing
• Use analogy of energy to RMSE
• As decrease ‘temperature’, move to generally
lower energy state
• Boltzmann gives distribution of E states
– So some probability of higher energy state
• i.e. ‘going uphill’
– Probability of ‘uphill’ decreases as T decreases
Implementation
• System changes from E1 to E2 with probability exp[-(E2E1)/kT]
– If(E2< E1), P>1 (threshold at 1)
• System will take this option
– If(E2> E1), P<1
• Generate random number
• System may take this option
• Probability of doing so decreases with T
Simulated Annealing
P= exp[-(E2-E1)/kT] – rand() - OK
P= exp[-(E2-E1)/kT] – rand() - X
T
Simulated Annealing
• Rate of cooling very important
• Coupled with effects of k
– exp[-(E2-E1)/kT]
– So 2xk equivalent to state of T/2
• Used in a range of optimisation problems
• Not much used in Remote Sensing
(Artificial) Neural networks (ANN)
• Another ‘Natural’ analogy
– Biological NNs good at solving complex problems
– Do so by ‘training’ system with ‘experience’
(Artificial) Neural networks (ANN)
• ANN architecture
(Artificial) Neural networks (ANN)
• ‘Neurons’
– have 1 output but many inputs
– Output is weighted sum of inputs
– Threshold can be set
• Gives non-linear response
(Artificial) Neural networks (ANN)
• Training
–
–
–
–
–
–
Initialise weights for all neurons
Present input layer with e.g. spectral reflectance
Calculate outputs
Compare outputs with e.g. biophysical parameters
Update weights to attempt a match
Repeat until all examples presented
(Artificial) Neural networks (ANN)
• Use in this way for canopy model inversion
• Train other way around for forward model
• Also used for classification and spectral unmixing
– Again – train with examples
• ANN has ability to generalise from input examples
• Definition of architecture and training phases critical
– Can ‘over-train’ – too specific
– Similar to fitting polynomial with too high an order
• Many ‘types’ of ANN – feedback/forward
(Artificial) Neural networks (ANN)
• In essence, trained ANN is just a (essentially) (highly) nonlinear response function
• Training (definition of e.g. inverse model) is performed as
separate stage to application of inversion
– Can use complex models for training
• Many examples in remote sensing
• Issue:
– How to train for arbitrary set of viewing/illumination angles? – not
solved problem
Genetic (or evolutionary) algorithms (GAs)
• Another ‘Natural’ analogy
• Phrase optimisation as ‘fitness for survival’
• Description of state encoded through ‘string’
(equivalent to genetic pattern)
• Apply operations to ‘genes’
– Cross-over, mutation, inversion
Genetic (or evolutionary) algorithms (GAs)
• E.g. of BRDF model inversion:
• Encode N-D vector representing current state of
biophysical parameters as string
• Apply operations:
– E.g. mutation/mating with another string
– See if mutant is ‘fitter to survive’ (lower RMSE)
– If not, can discard (die)
Genetic (or evolutionary) algorithms (GAs)
• General operation:
– Populate set of chromosomes (strings)
– Repeat:
• Determine fitness of each
• Choose best set
• Evolve chosen set
– Using crossover, mutation or inversion
– Until a chromosome found of suitable fitness
Genetic (or evolutionary) algorithms (GAs)
• Differ from other optimisation methods
– Work on coding of parameters, not parameters themselves
– Search from population set, not single members (points)
– Use ‘payoff’ information (some objective function for selection) not
derivatives or other auxilliary information
– Use probabilistic transition rules (as with simulated annealing) not
deterministic rules
Genetic (or evolutionary) algorithms (GAs)
•
Example operation:
1.
2.
3.
Define genetic representation of state
Create initial population, set t=0
Compute average fitness of the set
-
4.
5.
6.
Assign each individual normalised fitness value
Assign probability based on this
Using this distribution, select N parents
Pair parents at random
Apply genetic operations to parent sets
-
generate offspring
Becomes population at t+1
7. Repeat until termination criterion satisfied
Genetic (or evolutionary) algorithms (GAs)
–
–
–
Flexible and powerful method
Can solve problems with many small, ill-defined
minima
May take huge number of iterations to solve
–
Not applied to remote sensing model inversions
Knowledge-based systems (KBS)
• Seek to solve problem by incorporation of information
external to the problem
• Only RS inversion e.g. Kimes et al (1990;1991)
– VEG model
• Integrates spectral libraries, information from literature, information
from human experts etc
• Major problems:
– Encoding and using the information
– 1980s/90s ‘fad’?
LUT Inversion
• Sample parameter space
• Calculate RMSE for each sample point
• Define best fit as minimum RMSE parameters
– Or function of set of points fitting to a certain tolerance
• Essentially a sampled ‘exhaustive search’
LUT Inversion
• Issues:
– May require large sample set
– Not so if function is well-behaved
• for many optical EO inversions
• In some cases, may assume function is locally linear over large
(linearised) parameter range
• Use linear interpolation
– Being developed UCL/Swansea for CHRIS-PROBA
– May limit search space based on some expectation
• E.g. some loose VI relationship or canopy growth model or land cover
map
• Approach used for operational MODIS LAI/fAPAR algorithm (Myneni et
al)
LUT Inversion
• Issues:
– As operating on stored LUT, can pre-calculate model outputs
• Don’t need to calculate model ‘on the fly’ as in e.g. simplex methods
• Can use complex models to populate LUT
– E.g. of Lewis, Saich & Disney using 3D scattering models (optical
and microwave) of forest and crop
– Error in inversion may be slightly higher if (non-interpolated) sparse
LUT
• But may still lie within desirable limits
– Method is simple to code and easy to understand
• essentially a sort operation on a table
Summary
• Range of options for non-linear inversion
• ‘traditional’ NL methods:
– Powell, AMOEBA
• Complex to code
– though library functions available
• Can easily converge to local minima
– Need to start at several points
• Calculate canopy reflectance ‘on the fly’
– Need fast models, involving simplifications
• Not felt to be suitable for operationalisation
Summary
• Simulated Annealing
– Can deal with local minima
– Slow
– Need to define annealing schedule
• ANNs
–
–
–
–
–
Train ANN from model (or measurements)
ANN generalises as non-linear model
Issues of variable input conditions (e.g. VZA)
Can train with complex models
Applied to a variety of EO problems
Summary
• GAs
– Novel approach, suitable for highly complex inversion problems
– Can be very slow
– Not suitable for operationalisation
• KBS
–
–
–
–
Use range of information in inversion
Kimes VEG model
Maximises use of data
Need to decide how to encode and use information
Summary
• LUT
– Simple method
• Sort
– Used more and more widely for optical model inversion
• Suitable for ‘well-behaved’ non-linear problems
– Can operationalise
– Can use arbitrarily complex models to populate LUT
– Issue of LUT size
• Can use additional information to limit search space
• Can use interpolation for sparse LUT for ‘high information content’
inversion
Download