Uncertainty Quantification & the PSUADE Software Mahmoud Khademi Supervisor: Prof. Ned Nedialkov Department of Computing and Software McMaster University, Hamilton, Ontario Canada 2012 Outline Introduction to Uncertainty Quantification (UQ) Identification Characterization Propagation Analysis Common algorithms and methods PSUADE: UQ software library and environment https://computation.llnl.gov/casc/uncertainty_quantification/ Conclusions & future research directions Introduction to UQ Quantitative characterization and reduction of uncertainty Estimating probability of certain outcomes when some aspects of system are unknown Advances of simulation-based scientific discovery caused emergence of verification and validation (V&V) and UQ Many problems in the natural sciences and engineering have uncertainty Identification Model structure: models are only approximation to reality Numerical approximation: methods are not exact Input and model parameters may only be known approximately Variations in inputs and model parameters due to differences between instances of same object Noise, measurement errors and lack of data Characterization Aleatoric (statistical) uncertainties: differ each time we run same experiment Monte Carlo methods are used, probability density function (PDF) can be represented by its moments Epistemic (systematic) uncertainties: due to things we could in principle know but don't in practice Fuzzy logic or generalization of Bayes theory are used Propagation How uncertainty evolve? Analyzing impact parameter uncertainties have on outputs Finding major sources of uncertainties (sensitivity analysis) Exploring “interesting” regions in parameter space (model exploration) Analysis Assessing "anomalous" regions in parameter space (risk analysis) Creating integrity of a simulation model (validation) Providing information on which additional physical experiments are needed to improve understanding of system (experimental guidance) Selecting Proper Methods Is there nonlinear relationship between uncertain and output variables? Is uncertain parameter space high-dimensional? There may be some model form uncertainties How much is computational cost per simulation? Which experimental data are available? Monte Carlo Algorithms Based on repeated random sampling to compute their results Used when it is not feasible to compute an exact result with a deterministic algorithm Useful for simulating systems with many degrees of freedom, e.g. cellular structures Monte Carlo Method: Outline Define a domain of possible inputs Generate inputs randomly from a probability density function over domain Perform a deterministic computation on inputs Aggregate results Polynomial Regression Input data: (x i , yi) :i= 1,... , n a i (i= 0,... ,m) Unknown parameters: ε: random error with mean zero conditioned on x yi= a 0+ a1 x i+ a 2 x2i + ...+ a m x m i + εi( i= 1, ... , n) [ ][ y1 y2 = ⋮ yn 1 1 ⋮ 1 x1 x2 ⋮ xn x2 1 x2 2 ⋮ x2 n ⋯ ⋯ ⋯ ⋯ xm 1 xm 2 ⋮ xm n ⃗ ⃗Y= X ⃗a + ε⇒ ⃗â = (XT X)− 1 XT Y ][ ] [ ] a0 a1 a2 + a3 ⋮ am ε1 ε2 ⋮ εn MARS MARS (multivariate adaptive regression splines) is weighted sum of some bases functions: f̂ (x )= ∑ c i Bi (x ) Each basis is constant 1, hinge function or product of them as: max (0, x− c) or max (0, c− x) Each step of forward pass finds pair of bases functions that gives maximum reduction in error Backward pass prunes the model MARS Versus Linear Regression ̂y= 25+6.1max (0,x−13)−3.1max (0,13− x) ŷ = − 37+ 5.1x Principal Component Analysis Consider a set of N points in n-dimensional space: {x 1 , x 2 ,... , x N } Principal Component Analysis (PCA) looks for n by m linear transformation matrix W mapping original ndimensional space into an m-dimensional feature space, where m < n: T yk= W x k (k= 1,. ., N) High variance is associated with more information Principal Component Analysis Scatter matrix of transformed feature vectors is: T T Sy= ∑ (y k− my )(y k− my ) = W Sx W Sx is scatter of input vectors & m y mean yk ' s Projection is chosen to maximize determinant of total scatter matrix of projected samples: Wopt = argmaxdet (WT Sx W)= [w 1 w2 ... wm ] {w i : i= 1,... , m} are set of eigenvectors corresponding to m largest eigenvalues of scatter matrix of input vectors PSUADE: How it works? Input section allows the users to specify number of inputs, their names, their range, their distributions, etc. Driver program can be in any language provided that it is executable. Run PSUADE with: [Linux] psuade psuade.in At completion of runs, information will be displayed and data file will also be created for further analysis PSUADE Capabilities Can study first order sensitivities of individual input parameter (main effect) Can construct a relationship between some input parameters to model & output (response surface modeling) Can quantify impact of a subset of parameters on output (global sensitivity analysis) Can identify subset of parameters accounting for output variability (parameter screening) PSUADE Capabilities Monte Carlo, quasi-Monte Carlo, Latin hypercube and variants, factorial, Morris method, Fourier Amplitude Sampling Test (FAST), etc Simulator Execution Environment Markov Chain Monte Carlo for parameter estimation and basic statistical analysis Many different types of response surfaces Many methods for main, second-order, and totalorder effect analyses y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1) Scatter plot of x1 and y Linear regres. (y with respect to x1) Quadratic regres. (y with respect to x1) MARS (y with respect to x1) y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1) Scatter plot of x2 and y Linear regres. (y with respect to x2) Quadratic regres. (y with respect to x2) MARS (y with respect to x2) y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1) Scatter plot of x3 and y Linear regres. (y with respect to x3) Quadratic regres. (y with respect to x3) MARS (y with respect to x3) Sensitivity Analysis y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1) MARS screening rankings : * Rank 1 : Input = 1 (score = 100.0) * Rank 2 : Input = 3 (score = 0.0) * Rank 3 : Input = 2 (score = 0.0) MOAT Analysis (ordered): Input 1 (mu*, sigma, dof) = 1.1011e-04 6.9425e-05 17 Input 3 (mu*, sigma, dof) = 0.0000e+00 0.0000e+00 -1 Input 2 (mu*, sigma, dof) = 0.0000e+00 0.0000e+00 -1 delta_test: perform Delta test: Order of importance (based on 20 best configurations): (D)Rank 1 : input 1 (score = 80 ) (D)Rank 2 : input 3 (score = 48 ) (D)Rank 3 : input 2 (score = 38 ) Sensitivity Analysis y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1) Gaussian process-based sensitivity analysis: * Rank 1 : Input = 1 (score = 100.0) * Rank 2 : Input = 2 (score = 75.9) * Rank 3 : Input = 3 (score = 5.9) Sum-of-trees-based sensitivity analysis: * SumOfTrees screening rankings (with bootstrapping) * Minimum points per node = 10 * Rank 1 : Input = 1 (score = 100.0) * Rank 2 : Input = 3 (score = 0.9) * Rank 3 : Input = 2 (score = 0.0) Correlation Analysis y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1) Pearson correlation coefficients (PEAR) - linear relationship which gives a measure of relationship between X_i's & Y. * Pearson Correlation coeff. (Input 1) = -8.526593e-01 * Pearson Correlation coeff. (Input 2) = -3.777038e-18 * Pearson Correlation coeff. (Input 3) = -2.356118e-18 Spearman coefficients (SPEA) - nonlinear relationship which gives a measure of relationship between X_i's & Y. * Spearman coefficient(ordered) (Input 1 ) = 8.833944e-01 * Spearman coefficient(ordered) (Input 2 ) = 6.837607e-02 * Spearman coefficient(ordered) (Input 3 ) = 5.189255e-02 Main Effect Analysis y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1) RS-based 1-input Sobol' decomposition: RSMSobol1: Normalized VCE (ordered) for input 1 = 1.003211e+00 RSMSobol1: Normalized VCE (ordered) for input 2 = 9.395314e-32 RSMSobol1: Normalized VCE (ordered) for input 3 = 4.440130e-33 McKay's correlation ratio: INPUT 1 = 7.27e-01 (raw = 2.02e-09) INPUT 2 = 1.14e-11 (raw = 3.17e-20) INPUT 3 = 1.77e-35 (raw = 4.92e-44) 2 2 2 y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2] Response surface analysis (MARS) Response surface anal . (Linear regres.) 2 2 2 y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2] Response surface analysis (Quadratic) Response surface anal ysis (Cubic) 2 2 2 y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2] Response surface analysis (Sum-of-trees) Response surface anal ysis (Quartic) Future Research Directions Resolving curse of dimensionality Representation of uncertainty Bayesian computation & machine learning techniques e.g. stochastic multi-scale systems for model selection , classification & decision making Visualization in high-dimensional spaces