Uncertainty Quantification & the
PSUADE Software
Mahmoud Khademi
Supervisor: Prof. Ned Nedialkov
Department of Computing and Software
McMaster University, Hamilton, Ontario
Canada
2012
Outline
Introduction to Uncertainty Quantification (UQ)
Identification
Characterization
Propagation
Analysis
Common algorithms and methods
PSUADE: UQ software library and environment
https://computation.llnl.gov/casc/uncertainty_quantification/
Conclusions & future research directions
Introduction to UQ
Quantitative characterization and reduction of
uncertainty
Estimating probability of certain outcomes when
some aspects of system are unknown
Advances of simulation-based scientific discovery
caused emergence of verification and validation
(V&V) and UQ
Many problems in the natural sciences and
engineering have uncertainty
Identification
Model structure: models are only approximation to
reality
Numerical approximation: methods are not exact
Input and model parameters may only be known
approximately
Variations in inputs and model parameters due to
differences between instances of same object
Noise, measurement errors and lack of data
Characterization
Aleatoric (statistical) uncertainties: differ each time
we run same experiment
Monte Carlo methods are used, probability
density function (PDF) can be represented by its
moments
Epistemic (systematic) uncertainties: due to things
we could in principle know but don't in practice
 Fuzzy logic or generalization of Bayes theory are
used
Propagation
How uncertainty evolve?
Analyzing impact parameter uncertainties have on
outputs
Finding major sources of uncertainties (sensitivity
analysis)
Exploring “interesting” regions in parameter space
(model exploration)
Analysis
Assessing "anomalous" regions in parameter space
(risk analysis)
Creating integrity of a simulation model (validation)
Providing information on which additional physical
experiments are needed to improve understanding
of system (experimental guidance)
Selecting Proper Methods
Is there nonlinear relationship between uncertain
and output variables?
Is uncertain parameter space high-dimensional?
There may be some model form uncertainties
How much is computational cost per simulation?
Which experimental data are available?
Monte Carlo Algorithms
Based on repeated random sampling to compute
their results
Used when it is not feasible to compute an exact
result with a deterministic algorithm
Useful for simulating systems with many degrees of
freedom, e.g. cellular structures
Monte Carlo Method: Outline
Define a domain of possible inputs
Generate inputs randomly from a probability density
function over domain
Perform a deterministic computation on inputs
Aggregate results
Polynomial Regression
Input data:
(x i , yi) :i= 1,... , n
a i (i= 0,... ,m)
Unknown parameters:
 ε: random error with mean zero conditioned on x
yi= a 0+ a1 x i+ a 2 x2i + ...+ a m x m
i + εi( i= 1, ... , n)
[ ][
y1
y2 =
⋮
yn
1
1
⋮
1
x1
x2
⋮
xn
x2
1
x2
2
⋮
x2
n
⋯
⋯
⋯
⋯
xm
1
xm
2
⋮
xm
n
⃗
⃗Y= X ⃗a + ε⇒ ⃗â = (XT X)− 1 XT Y
][ ] [ ]
a0
a1
a2 +
a3
⋮
am
ε1
ε2
⋮
εn
MARS
MARS (multivariate adaptive regression splines) is
weighted sum of some bases functions:
f̂ (x )= ∑ c i Bi (x )
Each basis is constant 1, hinge function or
product of them as:
max (0, x− c) or max (0, c− x)
Each step of forward pass finds pair of bases
functions that gives maximum reduction in error
Backward pass prunes the model
MARS Versus Linear Regression
̂y= 25+6.1max (0,x−13)−3.1max (0,13− x)
ŷ = − 37+ 5.1x
Principal Component Analysis
Consider a set of N points in n-dimensional space:
{x 1 , x 2 ,... , x N }
Principal Component Analysis (PCA) looks for n by m
linear transformation matrix W mapping original ndimensional space into an m-dimensional feature
space, where m < n:
T
yk= W x k
(k= 1,. ., N)
High variance is associated with more information
Principal Component Analysis
Scatter matrix of transformed feature vectors is:
T
T
Sy= ∑ (y k− my )(y k− my ) = W Sx W
Sx is scatter of input vectors & m y mean yk ' s
Projection is chosen to maximize determinant of
total scatter matrix of projected samples:
Wopt = argmaxdet (WT Sx W)= [w 1 w2 ... wm ]
 {w i : i= 1,... , m} are set of eigenvectors
corresponding to m largest eigenvalues of scatter
matrix of input vectors
PSUADE: How it works?
Input section allows the users to specify number of
inputs, their names, their range, their distributions,
etc.
Driver program can be in any language provided that
it is executable.
Run PSUADE with: [Linux] psuade psuade.in
At completion of runs, information will be displayed
and data file will also be created for further analysis
PSUADE Capabilities
Can study first order sensitivities of individual input
parameter (main effect)
Can construct a relationship between some input
parameters to model & output (response surface
modeling)
Can quantify impact of a subset of parameters on
output (global sensitivity analysis)
Can identify subset of parameters accounting for
output variability (parameter screening)
PSUADE Capabilities
Monte Carlo, quasi-Monte Carlo, Latin hypercube
and variants, factorial, Morris method, Fourier
Amplitude Sampling Test (FAST), etc
Simulator Execution Environment
Markov Chain Monte Carlo for parameter estimation
and basic statistical analysis
Many different types of response surfaces
Many methods for main, second-order, and totalorder effect analyses
y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1)
Scatter plot of x1 and y
Linear regres. (y with respect to x1)
Quadratic regres. (y with respect to x1)
MARS (y with respect to x1)
y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1)
Scatter plot of x2 and y
Linear regres. (y with respect to x2)
Quadratic regres. (y with respect to x2)
MARS (y with respect to x2)
y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1)
Scatter plot of x3 and y
Linear regres. (y with respect to x3)
Quadratic regres. (y with respect to x3)
MARS (y with respect to x3)
Sensitivity Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
MARS screening rankings :
* Rank 1 : Input = 1 (score = 100.0)
* Rank 2 : Input = 3 (score = 0.0)
* Rank 3 : Input = 2 (score = 0.0)
MOAT Analysis (ordered):
Input 1 (mu*, sigma, dof) = 1.1011e-04 6.9425e-05 17
Input 3 (mu*, sigma, dof) = 0.0000e+00 0.0000e+00 -1
Input 2 (mu*, sigma, dof) = 0.0000e+00 0.0000e+00 -1
delta_test: perform Delta test:
Order of importance (based on 20 best configurations):
(D)Rank 1 : input 1 (score = 80 )
(D)Rank 2 : input 3 (score = 48 )
(D)Rank 3 : input 2 (score = 38 )
Sensitivity Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
Gaussian process-based sensitivity analysis:
* Rank 1 : Input = 1 (score = 100.0)
* Rank 2 : Input = 2 (score = 75.9)
* Rank 3 : Input = 3 (score = 5.9)
Sum-of-trees-based sensitivity analysis:
* SumOfTrees screening rankings (with bootstrapping)
* Minimum points per node = 10
* Rank 1 : Input = 1 (score = 100.0)
* Rank 2 : Input = 3 (score = 0.9)
* Rank 3 : Input = 2 (score = 0.0)
Correlation Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
Pearson correlation coefficients (PEAR) - linear relationship which gives a measure of relationship between X_i's & Y.
* Pearson Correlation coeff. (Input 1) = -8.526593e-01
* Pearson Correlation coeff. (Input 2) = -3.777038e-18
* Pearson Correlation coeff. (Input 3) = -2.356118e-18
Spearman coefficients (SPEA) - nonlinear relationship which gives a measure of relationship between X_i's & Y.
* Spearman coefficient(ordered) (Input 1 ) = 8.833944e-01
* Spearman coefficient(ordered) (Input 2 ) = 6.837607e-02
* Spearman coefficient(ordered) (Input 3 ) = 5.189255e-02
Main Effect Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
RS-based 1-input Sobol' decomposition:
RSMSobol1: Normalized VCE (ordered) for input 1 = 1.003211e+00
RSMSobol1: Normalized VCE (ordered) for input 2 = 9.395314e-32
RSMSobol1: Normalized VCE (ordered) for input 3 = 4.440130e-33
McKay's correlation ratio:
INPUT 1 = 7.27e-01 (raw = 2.02e-09)
INPUT 2 = 1.14e-11 (raw = 3.17e-20)
INPUT 3 = 1.77e-35 (raw = 4.92e-44)
2 2
2
y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2]
Response surface analysis (MARS)
Response surface anal . (Linear regres.)
2 2
2
y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2]
Response surface analysis (Quadratic)
Response surface anal ysis (Cubic)
2 2
2
y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2]
Response surface analysis (Sum-of-trees)
Response surface anal ysis (Quartic)
Future Research Directions
Resolving curse of dimensionality
Representation of uncertainty
Bayesian computation & machine learning
techniques e.g. stochastic multi-scale systems for
model selection , classification & decision making
Visualization in high-dimensional spaces