Kriging

advertisement
Kriging - Introduction
• Method invented in the 1950s by South African
geologist Daniel Krige (1919-) for predicting
distribution of minerals.
• Became very popular for fitting surrogates to
expensive computer simulations in the 21st
century.
• It is one of the best surrogates available.
• It probably became popular late mostly because
of the high computer cost of fitting it to data.
Kriging philosophy
• We assume that the data is sampled from an unknown
function that obeys simple correlation rules.
• The value of the function at a point is correlated to the
values at neighboring points based on their separation in
different directions.
• The correlation is strong to nearby points and weak with far
away points, but strength does not change based on
location.
• Normally Kriging is used with the assumption that there is
no noise so that it interpolates exactly the function values.
• It works out to be a local surrogate, and it uses functions
that are very similar to radial basis functions.
Reminder: Covariance and Correlation
• Covariance of two random variables X and Y
cov( X , Y )  E[( X   X )(Y  Y )]  E[ XY ]   X Y
• The covariance of a random variable with itself is
the square of the standard deviation
• Covariance matrix for a vector contains the
covariances of the components
• Correlation
cor ( X , Y ) 
cov( X , Y )
 X Y
1  cor ( X , Y )  1
• The correlation matrix has 1 on the diagonal.
Correlation between function values at
nearby points for sine(x)
• Generate 10 random numbers, translate them by a bit (0.1), and by more
(1.0)
x=10*rand(1,10)
8.147 9.058 1.267 9.134 6.324 0.975 2.785 5.469 9.575 9.649
xnear=x+0.1; xfar=x+1;
• Calculate the sine function at the three sets.
ynear=sin(xnear)
0.9237 0.2637 0.9799 0.1899 0.1399 0.8798 0.2538 -0.6551 -0.2477 -0.3185
y=sin(x)
0.9573 0.3587 0.9551 0.2869 0.0404 0.8279 0.3491 -0.7273 -0.1497 -0.2222
yfar=sin(xfar)
0.2740 -0.5917 0.7654 -0.6511 0.8626 0.9193 -0.5999 0.1846 -0.9129 -0.9405
• Compare corelations.
r=corrcoef(y,ynear) 0.9894; rfar=corrcoef(y,yfar) 0.4229
• Decay to about 0.4 over one sixth of the wavelength.
Gaussian correlation function
• Correlation between point x and point s
Nv
C  Z (x), Z (s), θ    exp  i ( xi  si )2 
i 1
• We would like the correlation to decay to
about 0.4 at one sixth of the wavelength 𝑙𝑖 .
2
2
• Approximately 𝜃𝑖 𝑙𝑖 6 = 1 or 𝜃𝑖 = 36 𝑙𝑖
• For the function 𝑦 = sin(𝑥1 ∗ sin(5𝑥2 we
would like to estimate 𝜃1 ≈ 1, 𝜃2 ≈ 25
Universal Kriging
Nv
Linear trend model
i 1
Systematic departure
yˆ (x)   ii (x)  Z (x)
• Linear trend function is most often a low
order polynomial
• We will cover ordinary kriging, where
y
linear trend is just a constant to be
estimated by data.
• There is also simple kriging, where
constant is assumed to be known.
• Assumption: Systematic departures Z(x)
are correlated.
• Kriging prediction comes with a normal
distribution of the uncertainty in the
prediction.
Sampling
data points
Systematic
Departure
Linear Trend
Model
Kriging
x
Notation
• The function values are given at 𝑛𝑦 points 𝐱 𝑖 , 𝑖 = 1, . . . , 𝑛𝑦 ,
with the point 𝐱
𝑖
𝑖
having components 𝑥𝑘 , k = 1, … , n.
• The function value at the ith point is 𝑦𝑖 =y(𝐱 𝑖 ), and the vector
of 𝑛𝑦 function values is denoted y.
• Given decay rates 𝜃𝑘 , we form the covariance matrix of the data
 n

Cov( yi , y j )   exp    k ( xk(i )  xk( j ) ) 2    2 Rij
 k 1

2
i, j  1,..., n y
• The correlation matrix R above is formed from the covariance
matrix, assuming a constant standard deviation 𝜎, which
measures the uncertainty in function values.
• For dense data, 𝜎 will be small, for sparse data 𝜎 will be large.
• How do you decide whether the data is sparse or dense?
Prediction and shape functions
• Ordinary Kriging prediction formula
yˆ (x)  ˆ  r R y  ˆ  b r
T
1
T
 n
(i )
2
ri  exp    k ( xk  xk ) 
 k 1

• The equation is linear in r, which means that the
exponentials 𝑟𝑖 may be viewed as basis functions.
• The equation is linear in the data y, in common
with linear regression, but b is not calculated by
minimizing rms.
• Note that far away from data 𝑦(𝐱 ~𝜇.
Fitting the data
• Fitting means finding the parameters 𝜃𝑘 .
• We fit by maximizing the likelihood that the data
comes from a Gaussian process defined by 𝜃𝑘 .
• Once they are found, the estimate of the mean
and standard deviation is obtained as
1T R 1y
ˆ  T 1
1 R 1
 y  1 
ˆ 2 
T
R 1  y  1 
n
• Maximum likelihood is a tough optimization
problem.
• Some kriging codes minimize the cross validation
error.
Prediction variance

(1  1T R 1r) 
T 1
V  yˆ (x)   1  r R r 

T
1
1
R
r


2
Square root of variance is
called standard error
The uncertainty at any x is
normally distributed.
Kriging fitting problems
• The maximum likelihood or cross-validation
optimization problem solved to obtain the kriging fit is
often ill-conditioned leading to poor fit, or poor
estimate of the prediction variance.
• Poor estimate of the prediction variance can be
checked by comparing it to the cross validation error.
• Poor fits are often characterized by the kriging
surrogate having large curvature near data points (see
example on next slide).
• It is recommended to visualize by plotting the kriging
fit and its standard error.
Example of poor fits.
True function
𝑦 = 𝑥 2 + 5𝑥 − 10
True function
Data points
100
80
y
60
40
20
0
-20
-8
-6
-4
-2
0
2
4
6
x
Kriging Model
Trend model
: 𝜉(𝑥 = 0
Covariance function : c𝑜𝑣 𝑥𝑖 , 𝑥𝑗 = exp(−𝜃(𝑥𝑖 − 𝑥𝑗
8
2
𝜃 selection
𝑃𝑜𝑜𝑟 𝑓𝑖𝑡 𝑎𝑛𝑑 𝑝𝑜𝑜𝑟
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
𝐺𝑜𝑜𝑑 𝑓𝑖𝑡 𝑎𝑛𝑑 𝑝𝑜𝑜𝑟
standard error
100
80
True function
Test points
Krigin interpolation
2SE bounds
100
True function
Test points
Kriging interpolation
2SE bounds
80
60
y
y
60
40
40
20
20
0
0
-20
-20
-8
-6
-4
-2
0
x
2
4
6
8
-8
-6
-4
-2
0
2
4
6
x
SE: standard error
8
Problems
• Fit the quadratic function of Slide 13 with
kriging using different options, like different
covariance and trend function and compare
the accuracy of the fit.
• For this problems compare the standard error
with the actual error.
Download