Efficient emulation of high- dimensional outputs using manifold learning: Theory and applications

advertisement
Efficient emulation of highdimensional outputs using manifold
learning: Theory and applications
Akeel Shah, Wei Xing, Vasilis Triantafyllidis
(WCPM)
Prasanth Nair
(University of Toronto)
•  Problems and definition of emulator
•  Types and basic construction
•  ‘Data’ from PDE models
•  Gaussian process emulation (scalar case)
•  Gaussian process emulation for PDE outputs
•  Gaussian process emulation for PDE outputs using
dimensionality reduction
•  Limitations of linear dimensionality reduction
•  Nonlinear dimensionality reduction
•  Application to PDE output data
•  Emulation of multiple fields
WCPM 19/2/2015
•  High fidelity simulation of fields, e.g., velocity or temperature
field, in many computational physics and engineering problems
!  Design optimization (4 inputs, 10 values = 104 cases!)
!  Model calibration, validation, sensitivity analysis,
uncertainty analysis (parameter/structural, numerical error)
•  Computational cost of Monte Carlo based methods for
sensitivity/uncertainty analysis can be prohibitive
•  For field problems, the dimesionality of the output space is
typically high, e.g., numerical grid 100 × 100 × 100 = 106 points
•  Use emulators (meta-model, surrogate) to replace calls to full
simulation model (simulator)
•  Rapidly predict and visualize behavior of complex systems as
functions of design parameters
WCPM 19/2/2015
•  How to construct emulators?
•  Two basic approaches:
•  Data driven (statistical)
•  Physics based
•  Data-driven emulators constructed by applying function
approximation/machine learning (e.g., neural network,
Gaussian process modeling) to input-output data
•  Physics-based emulation works directly with the PDE
model to achieve model order reduction, e.g. reduced basis
expansion. No natural way of dealing with nonlinearities
•  Both require careful design of experiment
WCPM 19/2/2015
To illustrate
theuitypes
underunder
consideration,
consider
(q ,the
t; xof
): problems
Toables
illustrate
types
of problems
consideration,
considera
em of order-n nonlinear
PDEs for dependent
variand parameterized.
It is assumed that the P
ms
consideration,
consider a system
paramezedunder
nonlinear,
time-dependent
of order-n
PDEs
forfor
depend
terized
nonlinear,
time-dependent
system
of order-n
PDEs
depen
2
n derivatives of u of ord
partial
i (0, T ],
@t ufor
Fi (qvalue
, t, u, Du,
D
u,
.
.
.
,
D
u; x ) = 0caninbe
⌦ compute
⇥
i + any
of
x
and
that
a
solution
ystem
of, t;order-n
PDEs
ables
u):
x ): for dependent varis ui (q
x
i (q , t;
•  Nonlinear PDE model
↵
↵
↵
D
u
=
{D
u
,
.
.
.
,
D
uM }, de
1
We describe a method
for
constructing
emulators
fo
T
l
x ) = 0 inin⌦which
⇥ (0, Tu], = i(u=1 ,1,.2.. .. ,. u
, JJ ) (1)
,nx 2 X ⇢ R is a vector of l par
2
n
@
u
+
F
(q
,
t,
u,
Du,
D
u,
.
.
.
,
D
u;
x0) =
0 ⌦in⇥⌦(0,
T ], ibounda
i=
1,
all
variables
u⇥i .(0,
The
t i, t, u,i Du, D u, . . . , D u; x )of
ui + Fi (q
=
in
T
],
=
1,
.
to (1), together
2 with consistent initial/boundary condit
is . the
domain over which the p
u;lx ) = 0 time,
in ⌦ ⇥and
(0, T⌦], ⇢ iR= 1,
. . , Jphysical
(1)
nonlinear
and parameterized. I
R is a vector of
l
parameters,
t
denotes
T
l
model
is
referred
to2asX la⇢simulator.
The
quantity
or
in which
u
=
(u
,
.
.
.
,
u
)
,
x
R
is
a
vector
of
l
parameters,
1
J
T
qa =
(⇠, ).
F
i = that
1, . . .a, Jst
i ,and
which u =The
(u1 ,spatial
. . . , uJ variable
) , x 2 is
X denoted
⇢ R for
is any
vector
of
lxparameters,
value
of
main
over
which
thel parameters,
problem
defined.
2any orisall
⇢ Rl time,
is a vector
of
t denotes
include
of the
ui , orover
functions
derived
fro
and 2⌦ ⇢ R is the physical
domain
which the
problem
rameterized
functions
ofdomain
their arguments.
Forthe
amethod
given
multi-i
e,
and
⌦
⇢
R
is
the
physical
over
which
problem
is
We
describe
a
for
co
⇠,
).
F
,
i
=
1,
.
.
.
,
J
are
nonlinear
pa• 
are
nonlinear,
parameterized
operators
i
omain
over
which
the
is defined.
The
spatial
variable
Fi , iof=a 1,2D
. . . steady-sta
, J are
quantity,
u(q
; xq), =in(⇠,
the).case
↵⇠ problem
↵ is denoted
↵ such
k non
D ui = is
@⇠ denoted
@ ui . For
a (⇠,
non-negative
integer
k,
let
D nonli
ui =
spatial
variable
q(↵=
).
F
,
i
=
1,
.
.
.
,
J
are
to
(1),
together
with
consistent
i
ts.
For
a
given
multi-index
↵
=
,
↵
),
⇠
spatial
grid
(⇠, • rameterized
). In
Fi2D,
, i =simulated
1,functions
...,J
are
pa- the
ofnonlinear
their
arguments.
Forsimulations
a given multi-index
↵=
or
derived
from
at
specifie
where k|↵|of:=
↵↵⇠ +
↵ is themodel
order
ofreferred
themulti-index
multi-index,
den
eterized
functions
their
arguments.
For
a
given
↵
is
to
as
a
simul
↵
↵
integer
let
D
u
=
{D
u
:
|↵|
=
k},
⇠ multi-index
↵ k,
k
↵=
i
i
ents.
For
a
given
↵
=
(↵
,
↵
),
⇠
D ui = @grid
uii,. For
ai =
non-negative
integer
k,
let. DThe
ui =
{Dpoin
ui
⇠ @ (⇠
),
1,
.
.
.
N
,
j
=
1,
.
.
.
N
grid
j
⇠
↵⇠ ↵
kall of the↵u , o
kdenote
↵ set of all
include
any
or
ufve
=
@
@
u
.
For
a
non-negative
integer
k,
let
D
ui =denote
{D uiithe
:|
the
multi-index,
the
i integer
i
k,
let
D
u
=
{D
u
:
|↵|
=
k},
⇠
i the order of the multi-index,
where
|↵|spaced
:= ↵i⇠ +
↵
is
provided that the same grid
is used for each si
5
•  Consider
of
interest
(steady
state): u(qdenote
such
quantity,
; x ), in the
the sc
reof |↵|
:=
↵⇠ + 1↵quantity
is the the
order
the
multi-index,
the
multi-index,
denote
set of all
For di↵erent design inputs x (k) 2 X ⇢ Rl , k =
5
simulated or derived from the
(k) 19/2/2015 5
WCPM
gives outputs u (⇠i , j ), i = 1, . . . N⇠ , j = 1, . . . N .
grid (⇠ , ), i = 1, . . . N , j = 1
The grid points
need
noti =
be1,uniformly
grid
(⇠
,
),
...N
N⇠., outputs
jThe
= 1,grid
. . .uN
grid
points
n.
(k). The
i
j
d (⇠i , j ), i = 1, . . . N⇠ , j = 1, ...gives
points
not
be
u
(⇠i , need
),
i
=
1,
.
.
j
sed for eachspaced
simulation.
provided
the
samefor
grid
is used
(k)
(k) for each simul
ced provided that the
same that
grid
is
used
each
simulation.
notation ui,j := u (⇠i , j ). Each
l
(k)
l
X ⇢ R , k = For
1, . .di↵erent
. , m, thedesign
simulator
(k)
l
inputs
x
2
X
⇢
R
= 1,s
Choosedesign
design inputs
inputs x
For • di↵erent
2 X ⇢ R , k = 1, . . . , ,m,k the
(k)
(k)
(k)
(k)
(k)
j = 1,
.
.
.
N
.
We
will
use
the
compact
y
=
(u
,
.
.
.
,
u
,
u
, . the
.We
.,u
(k)
•  Simulator
outputs
u 1, (⇠
= 1,
1,1,1
.. .. .. N
=
. . .2,1
N
1,N1,
es outputs
ugives
(⇠ outputs
, ), i =
. .i., Nj ),, ji =
N⇠ ,. jWe
will
use
i
j
⇠
(k)
e outputs
be
as(k)follows:
(k) can
(k)vectorized
notation
u
:=
u
(⇠i(k)
,thej ).outputs
Each
thebe
outputs
can be
v
(i,
j
+
1)
(k)ofcan
(k)vectorized
ation ui,j := u (⇠i , i,j
as
j ). Each of
y
=d =(uN
. . .N
, u1,N
where
. In, the case o
1,1⇠, ⇥
(k)
(k)
(k) (k)
(k) (i + 1, j)
d
(k)
(k)
(k)
(k)
(k)
(i
1,
j)
u
,
.
.
.
,
u
,
(k)
(k)
(k)
(k)
,(k)
. . . . . . . (k)
. . , uNy⇠ ,1 , (k)
.=
. . ,(u
u1,1
)
2
R
(2)
2,1
2,N
, ., .. .. ,. u, u
, u, in
.. .. ., .u. , uN⇠ ,1) ,2.
steps
⇠ ,N
2,1
1,N
= (u1,1 , . . . , u1,N , uN2,1
. . .,the
.. .. .. ., temporal
.u.2,N
, uN,⇠.,1. ,. .. discretizatio
2,N
N⇠ ,N
(i, j)
.........,
(k)
d (k)
(k)
Td = N
d⇠
(x
,
t)
and
y
2
R
,
where
u
,
.
.
.
,
u
)
2
R
me-dependentwhere
simulations
with
time
N⇠ ,1
N⇠ ,N
dIn
N
NN
. tIn
the case
of time-dependent
simu
j= the
1)⇠ ⇥
ere d = N⇠ ⇥ N . (i,
case
of
time-dependent
simulations
with
This
equivalent
toadt
is treated assteps
an additional
input,
x is
7!a multivariate
in
the
temporal
discretization,
t
is
treated
as
an
ps in• the
discretization,
t istime
treated
as an additional
inp
Fortemporal
time dependent
case treat
as additional
input
[13].
For. The
spatially-unifor
(k)
d O’Hagan
. The number
points
iswhere
mNtd.samples
parameter.
Then
there
are
(training
points)
t)
and
y
2
R
,
=
N
⇥N
number
of tr
(k) (x ,of
d training
⇠
t) and y 2 R , where d = N⇠ ⇥N . The number of training
points
where
The input
same(TI)
proced
•  For(TI)
spatially
uniform
t . time
me input
case
defined
byproblems
Conti
andd = toNthe
This
is
a
multivariate
equivalent
case
s is a multivariate equivalent to the time input (TI) case defined by
C
interest
simulations. Simu
ime-dependent
simulations,
y (k)
2 Rd from the time-dependent
O’Hagan
[13].
For
spatially-uniform,
Hagan [13]. For spatially-uniform, time-dependent simulations, sim
y(
WCPM 19/2/2015
in section
Append
can be extended
wheretod other
= N .quantities
Thediscussed
sameof procedure
can4beand
extended
t
me procedure can be extended to other quantities of
perature), which are
functionsinofsection
multiple
inputAp
p
discussed
4 and
tions. Simultaneous emulation of multiple quantities is
interest could include
one or umore
scalar/vector
fie
quantity
is the
target for em
nd Appendix
C.. can
Forbenow
it is assumed
that a from
singlethe
•  Simulator
considered
as a mapping
aspace
scalar
quantity.
For
aThe
single
scalar quantity,
simulator
can bethe
cons
design
to
the
output
space
for emulation.
as a function ⌘ : Xlthe
! PDE
Rdd , taking
as inputs
x) 2
X
solution
u(q
;
x
at
e considered as a mapping ⌘ : R ! R (representing
•  The goal
of emulation
is to outputs
approximate
this vectorized
mapping field
puts
y 2 Rd . The
are
the
is to learn the mapping ⌘ , g
x ) at the
defined
gridpoints
points).
The goal
of design
emulation
using
training
generated
as the
points
points in a spatial domain or time interval. Very f
(i)
g ⌘ , given data y (i) = ⌘ (x (i) ) for
design
, high dim
i =
1, . .points
. , m. xThe
the
emulation
ofspace
such
simulators,
which
•  Machine
learning
strategies
formeans
such
general
(input- poses en
h dimension
d of
the output
that most
output) problems, e.g., neural networks, linear
of computational efficiency. For even moderately co
regression, Gaussian process emulation
6 a 100 ⇥ 100 ⇥ 100 grid in R3 , the value of d
e.g.,
•  GP emulation
is attractive because it is Bayesian and
non-parametric (no basis functions to choose)
involving complex geometries or multiple spatial sc
be required WCPM
to adequately
resolve small-scale charac
19/2/2015
T
k(·, ·) = b b = ( ⌧ ) ⌧ = ⌧ T K⌧⌧
x 2 X, which can be discrete (X ✓ N) or contin
T
T
T
b
k(i,
·)
=
=
(
⌧
)
=
⌧
krestricted
i
i density function
i
joint probability
to a fi
•  A GP is a stochastic process Sx , x 2 X : the joint
Gaussian.
A GP
is specified
its mean
Single output
Gaussian
process
emulation
probability
density
function
restricted
to by
a finite
subsetand
of co
the index set is multivariate Gaussian. A GP is specified
An emulator
provides
a probability
distribut
We summarize
the
standard
(scalar)
formulation
in
Oakley
and O’Ha
by its mean and covariance functions
The
emulator
mulator
is definedasimulator.
as
an unknown
function
⌘ :isRltrained
! R of using
inputsmx sim
2
•  Consider
scalar
valued
simulator
(i)
l
from
design
points
x
2
X
⇢
R
(this
notation
• 
Run
as
before
at
design
points
ochastic process is a family of random variables, W(x),
indexed
by e
(i)
(i)
y
=
⌘(x
),of or
ithe
=continuous
1,
. . . , m (X ✓simulator
•  Outputs are
the outputs
multi-output
X, which can be discrete
(X ✓y N)
R). For a⌘
•  GP prior assumption: ⌘(·) is regarded as a GP indexed by
regarded
as a random function that has a GP d
probability
density
the inputs
x function restricted to a finite subset of X is mul
ssian. A GP is⌘(x
specified
by its
covariance
T
) = T h(x
) +mean
G(x )and
+ ✏(x
) ⌘(x ) =functions.
h(x ) + W
Modelling
measurement
error
for
⌘(·) given
data
An emulator
provides
a probability
distribution
f
Regression
functions
Zero mean
GP
l
p
where
h
:
R
!
R
a vector of
regression
fu
lator. The emulator is trained using mis simulator
outputs
Z (i) =
WCPM 19/2/2015
(i)
is a vectorl of regression coefficients and W(x )
•  In Bayesian linear regression (without noise)
⌘(x ) =
T
h(x )
•  Place a prior distribution on the weight vector, e.g.,
Gaussian with i.i.d coordinates, ⇠ N (0, b2 I)
•  Underlying function is distributed according to a
Gaussian process indexed by the inputs x, namely
⌘(x ) ⇠ GP(0, b2 h(x )T h(x 0 )
•  GPE generalizes this concept by directly placing a
covariance structure on the GP – allows for a much
broader class of functions
WCPM 19/2/2015
•  Covariance function
k(x , x 0 ) =
2
c(x , x 0 ; ✓ )
•  Stationary correlation function
c(x , x 0 ; ✓ ) = exp (x x 0 )T diag(✓1 , . . . , ✓l )(x
x 0)
•  Expresses smoothness of the GP (of the mean square
derivatives to all orders)
•  Conditional distribution of data t = (y (1) , . . . , y (m) )T
t| ,
2
, ✓ , ⇠ N (H ,
2
C)
⇥
⇤
(1)
(m) T
H = h(x ) . . . h(x
)
[C]ij = c(x (i) , x (j) ; ✓ )
WCPM 19/2/2015
•  Predictive distribution given data and hyperparameters
for a test point x
⌘(·)| ,
2
, ✓ , t ⇠ GP µ0 (·),
2 0
⌫ (·, ·)
µ0 (x ) = T h(x ) + a(x )T C 1 (t H )
⌫ 0 (x , x 0 ) = c(x , x 0 ; ✓ ) a(x )T C 1 a(x 0 )
T
a(x ) = c(x (1) , x ; ✓ ), . . . c(x (m) , x ; ✓ )
•  Once hyperparameters are known, predictions can be
made easily and quickly with simple formulae
•  Formulae can be extended to predict several points
simultaneously
WCPM 19/2/2015
•  Fully Bayesian approach places a prior f (✓✓ ) on these
hyperparameters and uses MCMC to present predictions
as a sample
•  Time consuming process so often ‘plug in’ (point)
estimates are used
•  Maximum a-posteriori (MAP) using a conjugate prior
•  Maximum likelihood estimate (MLE)
✓
1
1 T
✓ M LE = arg max ✓
ln |C|
t C
2
2
•  Depends on number of samples m
WCPM 19/2/2015
1
t
m
ln(2⇡)
2
◆
•  How do we extend the scalar case to outputs in a high
dimensional space?
•  A naive approach (Kennedy and O’Hagan (2001)) is to
treat the output index as an additional parameter and
perform scalar GPE – for emulating a whole field this
requires d computations
•  How do we overcome this problem?
•  Consider (Paulo et al. (2102)) the linear model of
coregionalization (Wackernagel (1995))
⌘ (x ) = Aw (x )
w (x ) = (w1 (x ), . . . , wJ (x ))T
•  Independent zero-mean GPs. A independent of x!
WCPM 19/2/2015
•  Correlation functions for the GPs are ci (·, ·, ✓ i )
•  The covariance function for the multivariate GP is
Cov(⌘⌘ (x ), ⌘ (x 0 )) =
J
X
a i a Ti ci (x , x 0 , ✓ i )
i=1
•  Fairly general assumption – allows different scales to be
incorporated by using a linear combination of correlation
functions with different scale parameters ✓ i
•  We can assume a separable structure for the covariance by
taking all wi to be i.i.d., i.e. a single correlation function
•  Leads to a tractable problem (Conti & O’Hagan (2009);
Rougier (2008)) – equivalent to a multivariate GP prior!
WCPM 19/2/2015
•  An alternative method is to use the data to find a ‘suitable’
A and at the same time restrict the number of univariate
GPs to those that contribute the ‘most’
•  Leads to a reduction in dimensionality of the output space
(restrict to a linear subspace) – Higdon et al. (2008)
•  The method relies on only principal component analysis
•  Singular value decomposition of data matrix (or eigendecomposition of sample covariance matrix) reorders
data according to variance in the d dimensions with
uncorrelated coefficients
•  Natural basis (columns of A) for output space and
expansion in terms of coefficients (assumed to be GPs)
WCPM 19/2/2015
•  Orthonormal basis p j , i = 1, . . . , m for  d . Select linear
subspace: span{p 1 , . . . , p r }, r ⌧ d
(i)
•  Coefficients in this basis cj , j = 1, . . . , d, i = 1, . . . , m
•  Expansion (restrict to subspace)
r
X
y = (x ) ⇡
cj p j
j=1
Select a test point x for prediction
for j = 1 : r
(i)
scalar GPE on training set cj , x (i) , i = 1, . . . , m, prediction cj
end
Approximate y = (x ) ⇡
Pr
j=1 cj p j
WCPM 19/2/2015
(b)
(a)
20
20
10
10
0
0
−10
−10
−20
40
−20
50
20
0 −10
0
10
20
0
−50 −10
(c)
0
10
20
•  Fails in many cases to provide an accurate representation
20
of a response surface
10
•  Informally, works
with a relatively ‘flat’ surface
0
•  For highly nonlinear
response surfaces with abrupt
changes it could
−10 fail completely
−20
40
WCPM
19/2/2015
20
0
10
20
..........................................................
reduction method is, therefore, a natural extension, which forms the motivation for the method
developed below.
0
0
−10
−10
−20
40
−20
50
20 reduction
•  There are other methods
for dimensionality
20
0
10
•  Linear methods
0 −10
0
−50 −10
(c)
–  PCA
–  Multi-dimensional scaling
–  Independent component analysis
•  Nonlinear methods
–  Kernel PCA
–  Isomap/kernel Isomap
20
10
0
−10
−20
40
20
–  Diffusion maps
0 −10
Figure 1.
–  Laplacian eigenmaps
–  Local linear embedding
(c) Dimensionality reduction via Isomap
WCPM 19/2/2015
(i) Multidimensional scaling (MDS)
0
10
20
•  Manifold assumption: the input data resides on or close
to a ‘low-dimensional’ manifold embedded in the
ambient space – informally the dimension is the number
of parameters needed to specify a point, e.g., a surface of
a sphere has dimension 2
•  Learning/characterizing such manifolds from given data
is called manifold learning
•  Approaches are characterized in a number of ways, e.g.,
spectral, kernel-based, embeddings
•  Each have their own advantages and disadvantages – no
universal technique
•  Performance on toy data sets can be misleading
WCPM 19/2/2015
Euclidean norm.
Gaussian kernel
e
2s2
Polynomial (order n) ((y (i) )T y (j) +
p
finding the eigenvectors and eigenvalues of the sample covariance
matrix
(i)
Multiquadric
1
+
s||y
•  Mapped to a higher dimensional (possibly infinite)
m
⇣
⌘data
X
T
(i) T
1
feature space and apply LPCA
to
mapped
(Scholkopf
(i)
(i)
Sigmoid
tanh(s(y
)
e
e
S=
(y ) (y )
et al. (1998))
m i=1
•  Transform data in such a way that it liesP
in
(or near) a
0
m
(i)
(i)
(i)
(j)
e
of
kernels.
s and s are user
defined
shape
n whichlinear
(y subspace
) =Table
(y 1: of
)Examples
=
(y
)
(1/m)
(y
) is the
ith
j=1
the feature space
Euclidean norm.
point • centered
in
feature
space.
This
problem
is recast(possibly
by introducing a
Feature space
can
be very
high
dimensional
unctioninfinite)
k(·, ·) (examples of kernels are given in Table 2.1). The centred
finding the
eigenvectors and eigenvalues of the sample
d
•  Feature
R ! F is implicitly specified via kernel
e
unction
k(·, ·) is map
defined: as:
m
⇣
⌘T
X
function
1
e(y (i) ) e(y (i) )
S
=
e(y (i) )T e(y (j) ) = e
e ij
m)i=1
k(y (i) , y (j)
=K
Pm
(i)
(i)
(i)
e
e(y (j) )j
in which
(y two
) = centred
(y ) points,
= e(y
(1/m)
which is a dot product
between
(y (i))) and
e ij , i, j point
in feature
space. This
recas
matrix [K]
= 1, . .centered
. , m, is called
the centred
kernelproblem
matrix.isWe
u
WCPM 19/2/2015
e m ↵ =
or in matrix form (using the positive definiteness of K),
ethe m
then
becomes
one
of
finding
the
eigenvectors
of
ke
orproblem
in matrix
form
(using
the
positive
definiteness
of
K),
or in matrix form (using the positive definitene
p
e i the
If•  the
eigenvectors
↵ i are
replaced
by
↵
= ↵eigenvectors
In general
the
map
is one
not
known
explicitly
i/
i , normalized
problem
then
becomes
of
finding
of th
problem then becomes one of finding
the
eigenv
Pm
p (in
• 
Recast
eigenproblem
for
sample
covariance
matrix
ve j , j = 1, . . . , dim(F ), can be defined as ve i = k=1
↵
eki ek , in p
e i = matrix
If thefeature
↵ai eigenproblem
are replaced
bykernel
↵
↵ i /by ↵
ei ,i normal
If theaseigenvectors
↵ i are
replaced
= ↵i/
space)
for
peigenvectors
Pm projectio
↵ki / i and ↵ki denotes the kth component of ↵ i . The
e
e
ve j • , jEigenvectors
= 1, . .ve. ,,dim(F
),
can
be
defined
as
v
=
↵
e
of
sample
covariance
matrix
are
not
i
ki
as ve i =k
j j = 1, . . . , dim(F ), can be defined k=1
ve i is
given but
by: the coefficients in an expansion are known
p
known
p
and↵↵ki/ denotes
the
kth
component
of
↵ i . The proj
m
m
and
↵
denotes
the
kth
component
of ↵
ki
i
ki
X
X
T
(j)
Te
e kj i = 1, . . . , m
e
z
=
v
↵
eki ek ej =
↵
eki K
i j =
i
ve i is given by:
ve is given by:
↵ki /
i
i
k=1
k=1
m
m
X
X
•  Do GPE on
first
r
coefficients
(test
input
x)
to approximate
m
m
T
X
X
(j)
T
e
T
e ))=
e
ecalculated
erelated
Theprojection
centredzkernel
values
k(i,
j),↵
can
bev
are
to 1,
the.
(j)
T
e
=
v
e
=
↵
e
K
i
=
(y
(x
e
span{e
,
.
.
.
,
v
}
e
e
e
ki
ki
kj
of
onto
j
j
i
k
i
zi = ve i j =1
↵
ekir k j =
↵
eki K
k=1
kernel values k(i, j) := k(y (i)
,X
yr(j) ) by: k=1k=1
k=1
b(y (x )) =
zi (x )e
vi +
e
The centred kernel
values
k(i,i=1
j),values
canTbee
calculated
are
related t
T can be
T calculated
e
The centred
kernel
k(i,
j),
k(i, j) = k(i, j)
1 ki
1 k j + 1 K1
(i)
kernel valueskernel
k(i, j)values
:= WCPM
k(yk(i,
, j)
y (j)
by: (i) , y (j) ) by:
19/2/2015
:=) k(y
•  Examples of a kernel functions
1
2s2
||y (i) y (j) ||2
Gaussian kernel
e
Polynomial (order n)
((y (i) )T y (j) + s)n
p
1 + s||y (i) y (j) ||2
Multiquadric
Sigmoid
tanh(s(y (i) )T y (j) + s0 )
•  Since the map is not known (nor the basis vectors in feature
space), a pre-image problem has to be solved –
uclidean norm.
approximate the inverse map
able 1: Examples of kernels. s and s0 are user defined shape parameters. || · || denotes t
•  It turns out this is possible to do in 3 main ways for most
standard kernels
nding the eigenvectors and eigenvalues of the sample covariance matrix S:
m
1 X e (i) ⇣e (i) ⌘T
S=
(y ) (y )
WCPM
19/2/2015
m i=1
(
•  Least squares approximation is possible by expressing
distances between points in physical space to distances
between points in feature space via kernel function.
Method can suffer from numerical instabilities if m < d.!
•  A fixed-point iterative algorithm (Mika et al. (1999)) can
be used but is again prone to instability
•  Local linear interpolation (Ma & Zabaras (2011)) is the
third method (again based on distance information) –
use a weighted sum of known data point values. This
gives stable results
WCPM 19/2/2015
1. Select design points x(j) 2 X ⇢ Rl , j = 1, . . . , m, using DOE
2. Collect outputs y(j) = ⌘ (x(j) ) 2 Rd from the simulator
(j)
3. Perform kPCA on y(j) ) zi , j = 1, . . . , m, i = 1, . . . , d
4. Select a test point x for prediction
for i = 1 : r
(j)
perform scalar GPE on Di = {x(j) , zi }m
j=1 ) zi
end
Reconstruct ) b
y=
1
(b) ⇡ ⌘ (x)
WCPM 19/2/2015
r·u=0
density and buoyant flow is generated. The free-convection problem is
nts
temperature,
g denotesbuoyancy
gravitational
acceleration,
is lifting
the
by introducing
a Boussinesq
term (accounting
for ⇢the
Subsurface
flowtointhe
media
driven
variations
etoreference
temperature
Tporous
✏ is the
porosity
ofby
thedensity
medium,
thermal
expansion)
equation:
c ,Brinkmans
µ
cient ofµ volumetric
thermal
The heat
u + rp r ·
ru +expansion
ruT = ⇢gof(Tthe Tfluid.
c)
form:

✏
(55)
r·u=0
T = Tc
Tc )s
(Th
T = Th
(Tc
Th to Tcequations
along outerare
edges
he Brinkman
subject to no-slip conditions on
T = Tc
T = Tc
T represents
g denotes
gravitational acceleration, ⇢(56)
is the
⇢Cp utemperature,
· rT r · (k
m rT ) = 0
sity at the reference temperature Tc , ✏ is the porosity of the medium,
•  Use
Brinkman’s
equation
with thermal conductivity of
tes the
e↵ective
(volume
averaged)
the coefficient of volumetric thermal expansion of the fluid. The heat
Boussinesq buoyancy term
ture,
andform:
Cp is the specific heat capacity of⌘ the fluid at conakes the
•  Temperature varies from high
Th )s
⇢Cp u · rT r · (km rT ) = 0
(56)
⇠
Initially pressure
water stagnant
but arbitrarily). The boundary
th a • reference
specified
T = Th
temperature
gradients
alteraveraged)
fluid
km denotes
the e↵ective
(volume
thermal conductivity
of
densityare
andshown
buoyantinflow
generated
temperature
Figure
2.FigureThe
model
was
solved
2: Temperature
boundary
for the ‘Free
convection in porous media’ examp
solid mixture, and Cp is the specific heat capacity of the fluid at con-
2 [10 , 10 ]media’
(K ) and T under
2 [40, 60] ( C).
This method generally pro
tiphysics 4.3b (‘Free convection in porous
the
11
8
1
h
o
ssure. The Brinkman equations
are subject
to coverage
no-slip
conditions
WCPM
19/2/2015
a more uniform
of the
input space thanon
a Latin hypercube.
4
•  Two input parameters varied: coefficient of volumetric
thermal expansion β and the high temperature Th
•  A total of 500 numerical experiments were performed, with inputs selected using a Sobol sequence
•  For each simulation, magnitude of the velocity |u| was
recorded on regular 100 × 100 square spatial grid
•  The 10000 points in the 2D spatial domain re-ordered
into vector form in
•  400 samples reserved for testing
WCPM 19/2/2015
kGPE, on the other hand, captures the test case extremely accurately.
0
1
(a)
3
5
(b)
0
7
9
0.14
0.12
−4
4
2
0
0
2
4
6
0.6
5
8
0.08
0.5
0.4
0.06
0.3
4
6
3
4
2
0.2
0.04
2
1
0.1
6
0.02
ξ / cm
(c)
8
2
4
ξ / cm
6
8
10
(d)
0
10
0
0
10
0.7
1
8
0.6
6
10
3
5
8
7
4
0.3
2
4
Relative square error
2
ξ / cm
6
8
0
0
10
4
0.06
6
4
0.3
5
0.04
8
4
0.2
0.15
0.02
4
ξ / cm
6
0
6
3
4
2
0.1
2
0.05
2
10
10
0.25
2
0
0
8
(f)
η / cm
η / cm
8
ξ / cm
6
8
1
10
3
Figure 3:
18
0
0
1
2
5
4
ξ / cm
6
7
8
10
9
9
5
7
9
0.1
0.08
0.06
0.04
0.02
0
1
3
5
(d)
0.14
x 10
0.12
2
(e)
10
3
1
0.1
0.08
0
0
7
0
2
0.2
2
0
0.020
3
(c)
5
0.04
0.04
0.02
6
4
0.12
1
4
η / cm
η / cm
0.4
0.14
10
0.06
95
0.5
6
0.1
0.08
0.08
Relative square error
6
10
η / cm
Relative square error
η / cm
8
0.7
RelativeRelative
square error
square error
0.1
10
x 10
(c)
0.12
(b)
Relative square error
(a)
7
−4
Figure 2:
6
0.1
9
4
0.08
2
3.2. Continuously stirred
tank reactor
0.06
In
0.04
0
this example, a continuously
7
9 stirred tank rea
duce
0.02propylene glycol (PrOH) from the reaction of
water0 in the presence of an acid catalyst (H2 SO4 ): P
1
3
5
7
9
liquid phase reaction takes place in a CSTR, equip
Methanol (MeOH) is added to the mixture but do
WCPM Figure
19/2/2015
2:
ci Cp,i
i
dt
=
Hr+
H2 SO4
x
1
e
PrOV+
H
O
!
PrOH
2
r
p,x
+
Vr
i
The liquid phase reaction takes place in a CSTR, equipped with a heat-exc
in which:
capacity
is rassumed
to beE/(RT
a molar
flow ratethe
intospecific
(and outheat
of) the
tank; and
= AcPrO exp(
)) isaveraged
the rate valu
•  CSTR
usedistoadded
produce
propylene
glycol
(PrOH)
from the
Methanol
(MeOH)
to
the
mixture
but
does
not
react.
The ma
of
reaction
(57).
The
reaction
rate
is
assumed
to
be
first
order
(mass
action)
the specific
heatoxide
capacities
of the
individual
in the liquid,
p,i ; H is
propylene
(PrO)
with
water inspecies
the presence
of HC2SO
4
nces for
therespect
species
(i =
MeOH,
PrOH,
H2 O) are:
with
to PrO
andPrO,
to have
an Arrhenius
temperature
dependence, with
enthalpy
reaction (57), with the first term on the right hand side represen
•  MassofBalances
activation energy E and pre-exponential factor A (R is the universal gas constant
dcsecond
i
the heat of reaction. V
The
term
on the
side represents
=reactor).
vf (c
) + right
⌫ibalance
Vr rhand
r of the
f,i Thecienergy
and T is the temperature
is expressed as
dt
heat
loss to
the heat exchanger, where Fx is the flow rate, Cp,x is the spe
•  follows:
Heat
balance
n which:
c is the concentration, ⌫i⇣ is the stoichiometric
coefficient a
⌘ X
X i
heat capacity
isx Cthe
inlet
of the heat
exchanger
dT and Tx F
Ttemperature
)
vf cf,i
(hf,i hi ) med
p,x (T
x
U A/(Fx Cp,x )
ci Cp,i
=
Hr+
1
e
+
dt concentration
Vr of species i, respectively;
s theU feed
stream
vf Varea
is
r the volu
i
i
and
A are the
heat transfer
coefficient and heat exchange
of the
(59)
Heat
loss
to HX
Convective
heat
exchanger,
respectively.
third
ontothe
side
isflow
the
Heat
reaction
in which:
theofspecific
heatThe
capacity
is term
assumed
be aright
molarhand
averaged
value
of enth
1 Available
fromto
http://www.di.ens.fr/
mschmidt/Software/minFunc.html
specific
heat
ofspecies
the individual
species
the liquid,
Cp,imolar
; H is the
change
due
thecapacities
flowofofspecies
the inreactor.
The
enthalp
•  the
Molar
enthalpy
ithrough
enthalpy of reaction (57), with the first term on the right hand side representing
species
i is given by hi = Cp,i (T Tref ) + hi,ref , in which hi,ref is the stan
the heat of reaction. The second term 25
on the right hand side represents the
heat of formation of i at the reference temperature Tref . The feed stream m
heat loss to the heat exchanger, where Fx is the flow rate, Cp,x is the specific
WCPM
19/2/2015
enthalpies
h
are
calculated
similarly.
Theofmodel
is completed
by initial
f,i
heat capacity and Tx is the inlet temperature
the heat
exchanger medium.
va
•  The model was solved in COMSOL Multiphysics 4.3b
(‘Free convection in porous media’)
•  Three input parameters varied: initial temperatures,
initial concentrations of PrO, heat exchange
parameter, UA!
•  A total of 500 numerical experiments were performed, with inputs selected using a Sobol sequence
•  Temperature recorded every 14 s up to 7000 s
•  The 501 points re-ordered into vector form
•  400 samples reserved for testing
WCPM 19/2/2015
2
aining points, kGPE
showed minor improvements
90
−3
5
5
9
1
(c)
60
50
30
30
5
20
1
3
5
−3
90
(d)
0
15 x 10
10
5
40 7
60 9 80
Time / min
7
9
(c)
100
10
5
1
120
0
1
100
120
(c)
80
70
60
60
50
50
40
50
20
20 5
40
60
80
Time / min
40 7 60 9 80
Time(d)/ min
100
21
40
60
80
7 Time / min
9
100
120
100
120
Figure 6:
70
60
50
WCPM 19/2/2015
30
0
60
50
120
40
20
5
70
40
80
60
30
0
80
70
Figure 5:
Figure 6:
Figure 5:
(d)
9
60
80
Time(b)/ min
90
70
30
0
3
7
40
90
40
30
0
30
03
40
0
5
20
90
80
Temperature / C
Temperature / C
80
90
−3
15 x 10
70
1
30 3
0
(a)
40
0
Relative square error
7
80
Temperature / C
Relative
square
errorerror
Relative
square
15 x 10
15
10
10
5
(b)
−3
x 10
9
40
Temperature / C
3
Temperature
Temperature
/C /C
1
Relative square error
all tests cases.
9
40
0
0
20
40
60
80
Time / min
100
120
22
30
0
•  Classical multidimensional scaling provides a lowdimensional Euclidean space representation of data that
lies on a manifold in a high dimensional ambient space
•  It relates ‘dissimilarities’ dij between points i and j to
Euclidean distances δij in the low-dimensional space
•  Classical scaling is an isometric embedding
ij
= ||z (i)
z (j) || = dij
Dissimilarity matrix
dij
z
(j)
z
(i)
D = [dij ]
WCPM 19/2/2015
y (i)
ij
y (j)
•  When dissimilarities are defined by Euclidean distance
MDS is equivalent to PCA (easily seen from least-squares
optimality of PCA)
•  Method is also spectral: eigenvectors of a centred kernel
matrix K = (1/2)H(D D)H
•  Idea generalised by Tenenbaum et al. using geodesic
distances for dij (e.g. shortest path distance) – Isomap
•  Can be considered equivalent to kPCA: Dissimilarity
matrix defines distances between points in feature space
and leads to a centred kernel matrix
•  Coordinates obtained from a spectral decomposition same
as before, provided the kernel matrix is p.s.d.
WCPM 19/2/2015
•  Kernel ISOMAP guarantees p.s.d. kernel matrices and
therefore the existence of a feature space
•  Can use same procedure as before on coefficients learned
by ISOMAP
•  Pre-image problem can be solved similarly by relating
distances between points in the low-dimensional space to
dissimilarities (= Euclidean distances for ‘neighbours’)
WCPM 19/2/2015
at constant
g denotes grav
nd solid phases are treated as separate liquid
domains
sharing apressure,
moving melting
⇠
ence
density
liquid,according
⇢ is the lineari
onstant
pressure,
g denotes
gravitational
acceleration,
⇢the
0 is the refer(see Figure
8). The
position
of the melting
front isofcalculated
n · ( krT ) =density,
0
thermal
and in
Tf Wol↵
denotes th
ty
of the
liquid, ⇢ A
is the
is theexpansion,
metal
tefan
condition.
full linearized
description ofofthe
model
can
be coefficient
found
Metal and
melting
front:the
a square
cavity
containing
solid
and takes th
the
solid
phase,
the
heat
balance
expansion,
T
denotes
fusion
temperature
of
the
solid.
In
f
iskanta [29]. The liquid part on the left of Figure 8 is governed by the
re 8: Temperature
boundaryto
fora the
‘Metal melting
front’ example.
liquid submitted
temperature
difference
between
the
@Ts
⇢Cp
r·(
@t
@u
T
tant pressure,
acceleration,
⇢0 =
is 0the refers +gravitational
⇢0
+ ⇢0g⇢C
(udenotes
·p @T
r)u
rp
r
·
µ
ru
+
ru
⇢g
r
·
(krT
)
=
0
(61)
@t
in which Ts represents the solid temperatu
@t
of the liquid, ⇢ is the linearized
density,
is the metal coefficient
r·u=
0
n · ( krT
= (60)
0 the same
tivity
of the
solid
(assumed
to )be
represents
the
solid
temperature
and
k
denotes
the
thermal
conducs
@T
l
xpansion, and⇢C
Tf denotes
fusionrtemperature
+ ⇢Cpthe
u · rT
· (krTl ) = 0 of the solid. In
p
l
front
defined
by liquid-solid
⇠ = s(⌘, t), along
T = T whic
e solid (assumed to @t
be the same as that of
the is
liquid).
The
se, the heat balance takes
⇢ =the
⇢0 form:
(Tl Tf )
front is given by [29]:
fined by ⇠ = s(⌘, t), along which T = Tmelting
f . An energy balance at the
T =
Liquid!
T =T
@Ts Tl represents the liquid temperature,
ch u is the liquid velocity,
k denotesSolid!
✓ ◆
⇢Cp
r · (krT ) = 0
ont is given by [29]:
@s (61)
@s
⌘
@t
⇢0 capacity
hf
=of 1the
+
ermal conductivity of the
liquid,
C
is
the
specific
heat
p
!
@t
@y
✓ ◆2 ✓
◆
@s
@s
@Ts
@Tlthe thermal conducepresents
and
⇠ (62)
⇢0 the
hf solid
= temperature
1+
k k denotes
k
where @x
hf is the latent heat
of fusion. A n
@t
@y
n · ( krT ) = 0
30 @y
olid (assumed to be the same as that of the liquid). The liquid-solid
Figure the
8: Temperature
boundary
for the ‘Metal
melting front’
boundaries for
liquid.
The
model
wasexam
is the latent heat of fusion. A no slip condition is applied at the other
ed by ⇠ = s(⌘, t), along which
T =19/2/2015
Tf . Anliquid
energy
balance
at the acceleration, ⇢
WCPM
at constant
pressure,under
g denotes gravitational
(‘Tin Melting
Front’
the Heat Tran
for the liquid. The model was solved in COMSOL Multiphysics 4.3b
hase, left
theequations
heat
takes
the form: approximation:
r-Stokes
using
a Boussinesq
and balance
right
boundaries
f
h
•  The model was solved in COMSOL Multiphysics 4.3b
•  Two input parameters varied: latent heat of fusion ∆hf
and thermal conductivity k
•  50 numerical experiments were performed, with
inputs selected using a Sobol sequence.
•  For each set of parameters, 10 snapshots of the
velocity field were recorded for t = 50, 100, . . . , 500 s.
•  For each simulation, magnitude of the velocity |u| was
recorded on regular 100 × 100 square spatial grid
•  400 samples reserved for testing
WCPM 19/2/2015
0
0
2
4
ξ / cm
6
8
10
0
0 0
0 0
3
2.5 10 10
10
8
8
8
1.5 6
6
6
1
4
4
4
0.5 2
2
2
2.5 2
8 8
4 4
1
2 2
0.5
/ cm
ηη/ cm
1.5
η / cm
2
6 6
η / cm
η / cm
0.5
2 2
4 4 6 6
ξ / cm
ξ / cm
(a)
(e) (d)
8 8
22
0.5
00
00
0 0
10 10
3
3 2.5
1
1 1
44
66
ξ /ξcm
/ cm
8 8
0
1010 0
(b)
(c)
(f)
3 3 2.5
2.5
2.5
2
88 8
22
6 6
6
1.5
1.5
1.5
1
1
1
4 4
4
2 2
2
0.5
0.50.5
6
6
6
4
4
4
2
2
2
0.5 0.5
0.5
0
0 00
0 0
2
2
2
4
6
4 4 ξ /6cm 6
ξ / cm
ξ / cm
(c)
(f)
8
010 0
8
0
8 10 10
2
2
2
2
2
36
0 0
00 0
0
2
2
2
4 4 6 6
ξ / 6cm
4 ξ / cm
ξ / cm
(e) (d)
8
8
WCPM 19/2/2015
0.5
0.5 0.5 2
0
0 0
8 10 10
10
10 10
10
3
10
3
gure 10: Predictions of the velocity magnitude using 40 training points and 9 princi2.5
8 8
8
components in the ‘Metal melting front’ example:
the test case for (a) ( hf , k, t) =
2.5
8
η / cm
8
8
0 0
10 10
η /ηcm
/ cm
8
8 8
η / cm
10
10 10
4 4
6 6
ξ / cm
ξ / cm
(a)
(e) (d)
η / cm
2 2
0
0 000
0
22
10
10
10
2.5
2.5
2
2
2
1.5
1.5
1.5
0.50.5
0 0
0 00
4
6
8
10
0
2 2
4 4
6 6
8 8
0
4
6
8
10
ξ
/
cm
ξ / cm
2
4
6
8
10
0
2
4ξ / cm
6
8
ξ / cm
ξ / cm
ξ / cm
(c) (b)
(e) (d)
(f)
3
10 10
3 2.5 10 10
3 2.5
10
3
Figure 10: Predictions of the velocity magnitude using 40 training points and 9 princi2.5
2.5
8 8
2.5 2 8 8
2.5pal components
in
the
‘Metal
melting
front’
example:
the test case for (a) ( hf , k, t) =
8
2
2
150) and (b) ( hf , k, t) = (246.88,2282.03, 100). The corresponding predictions
2 (1031.25, 76.25,
6 6
1.5 6 6
6
1.5
using kPCA-GPR are shown in (c) and (d), respectively,
while the corresponding predictions
1.5
1.5
1.5 1.5
4 4
1 4 4
using LPCA-GPR
are shown in (e) and (f), respectively.
1
4
1
1
1
0 0
0 0
η /ηcm
/ cm
η / cm
2
(c) (b)
10 10
2
0.5
/ cm
ηη/ cm
η / cm
2
2
0 00
0
0.5
0.5
0.5
0
0 0
10
10
10
3
2.5
2.5 2
2
1.5
1.5
1
1
0.5 0.5
2
2
4
4 6 6
ξ / cm
ξ / cm
8
0
0
8 10 10
(f)
10
3
2.5
Figure 10: Predictions of the velocity magnitude us
2.5
2.5pal components in the ‘Metal melting front’ example
8
2
2 (1031.25, 76.25, 150) and (b) ( hf , k, t) = (246.88,282.03
3
•  We can emulate multiple fields or vector fields by
combining data sets or separate emulation assuming
independence
•  Could lead to problems with scaling (e.g., temperature
variations vs. velocity variations) or ignores correlations
between outputs (e.g., electric potential and current)
(i)
y
•  Multiple outputs types (fields) j , j = 1, . . . , J
•  Perform NDR for each output type and extract
(i)
coefficients zk,j where k indexes the coefficient number, j
indexes the output type and i indexes inputs
(i)
(i) T
=
(z
,
.
.
.
,
z
•  For a fixed k, define Z (i)
k
k,1
k,J )
WCPM 19/2/2015
•  Use LMC to infer coefficients simultaneously for test
inputs: a J-variate GP
Z k (·) = FkW k (·)
•  W k = (W1,k . . . , WJ,k )T where coordinates are independent
GPs with zero mean and correlation functions ci,k (·, ·, ✓ i,k )
Mk (·), Kk (·, ·))
Z k (·)| parameters, data ⇠ GP (M
•  Explicit formulae for mean function M k (x ) and variancecovariance matrix Kk (x , x ) are given
•  Parameters determined via likelihood or MCMC
•  Repeat for each k and solve pre-image problem
WCPM 19/2/2015
•  Diffusion maps, with a new method for solving preimage problem based on an extended diffusion matrix
and local interpolation
•  Physics based approaches using NDR (direct
approach?)
•  Large scale problems (more complex data sets)
including issues with DOE, number of samples and
dimensionality of input space
WCPM 19/2/2015
1)  M. Kennedy, A. O’Hagan, J. R. Stat. Soc. Ser. B Stat. Methodol. 63 (2001)
425–464
2)  R. Paulo, G. Garcia-Donato, J. Palomo, Comput. Stat. Data Analysis 56
(2012) 3959–3974
3)  H. Wackernagel, Multivariate Geostatistics, Springer, Berlin, 1995.
4)  S. Conti, A. O’Hagan, J. Statistical Planning and Inference 140 (2010)
640–651
5)  J. Rougier, J. Computational and Graphical Statistics 17 (2008) 827–843
6)  D. Higdon et al., J. Am. Stat. Association 103 (482) (2008) 570–583
7)  B. Scholkopf, A. Smola, K.-R. Muller, Neural Computation 10 (1998)
1299–1319
8)  S. Mika et al., Advances in neural information processing systems II 11
(1999) 536–542
9)  B. Ganapathysubramanian, N. Zabaras, J. Comput. Phys. 227 (2008)
6612–6637
10) X. Ma, N. Zabaras, J. Comput. Phys. 230 (2011) 7311-7331
11)  J.B. Tenenbaum, V. De Silva, J.C. Langford, Science 290 (2000) 2319–
2323
WCPM 19/2/2015
Download