V&V Issues Timothy G. Trucano Optimization and Uncertainty Estimation Department, Org 9211 Sandia National Laboratories Albuquerque, NM 87185 Workshop on Error Estimation and Uncertainty Quantification November 13-14, 2003 Johns Hopkins University Phone: 844-8812, FAX: 844-0918 Email: tgtruca@sandia.gov Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000. Outline of talk. • The problem. • What is validation? • What is verification? • Coupling is required. • Walking through V&V. • A few research issues. 21 March 2016 Johns Hopkins, November 2003 Page 2 Useful quotes to keep in mind. • Hamming – “The purpose of computing is insight…” (?) • ASCI – the purpose of computing is to provide “highperformance, full-system, high-fidelity-physics predictive codes to support weapon assessments, renewal process analyses, accident analyses, and certification.” (DOE/DP99-000010592) • Philip Holmes – “…a huge simulation of the ‘exact’ equations…may be no more enlightening than the experiments that led to those equations…Solving the equations leads to a deeper understanding of the model itself. Solving is not the same as simulating.” (SIAM News, June, 2002) 21 March 2016 Johns Hopkins, November 2003 Page 3 “Validation” is a process of comparing calculations with experimental data and drawing inferences about the scientific fidelity of the code for a particular application. For example: This is physics. 6 pr / pinc 5 Experiment + Error Bar 4 Analytic 3 2 ALEGRA Calculation 1 30 35 45 40 50 Incident Angle • Validation is a “physics problem.” • Verification is a “math problem.” This is math. 21 March 2016 Johns Hopkins, November 2003 Page 4 Some of the questions that occur to us as a result of this comparison: • Error bars mean what? • What is the numerical accuracy of the code? • Is the comparison good, bad, or indifferent? In what context? • Why did we choose this means to compare the data and the calculation? Is there something better? • Why did we choose this problem to begin with? • What does the work rest on (such as previous knowledge)? • Where is the work going (e.g. what next)? 21 March 2016 Johns Hopkins, November 2003 Page 5 What is validation? • Validation of computational science software is the process of answering the following question: Are the equations correct? • It is convenient to recognize that validation is also the process of answering the following question: Are the software requirements correct? • It goes without saying that validation is “hard;” but it is sometimes forgotten that the latter definition of validation takes precedence (it applies to ANY software). – Focus on the former; but remember the latter. 21 March 2016 Johns Hopkins, November 2003 Page 6 What is verification? • Verification of computational science software is the process of answering the following question: Are the equations solved correctly ? • It is convenient to recognize that verification is also the process of answering the following question: Are the software requirements correctly implemented? • A strict definition of verification is: Prove that calculations converge to the correct solution of the equations. • This latter definition is HARD (impossible)!! Use it as a mission statement. • Provably correct error estimation is essentially an equivalent problem. 21 March 2016 Johns Hopkins, November 2003 Page 7 V&V are processes that accumulate information – “evidence”. What evidence is required requires V&V plans How evidence is accumulated requires V&V tasks How evidence is “accredited” requires V&V assessment How evidence is applied intersects Computing 21 March 2016 Johns Hopkins, November 2003 Page 8 Why do we choose specific validation tasks? 2-D Shock Wave Experiment We have defined and implemented a planning framework for code application validation at Sandia that reflects the hierarchical nature of validation. • We have defined formal and documented planning guidance. (Trucano, et al, Planning Guidance Ver. 1, 2 SAND99-3098, SAND2000-3101) • A core concept is the PIRT (Phenomena Identification and Ranking Table) (a “Quality Function Deployment” tool). This is a single-material, simple EOS strong-shock multidimensional hydrodynamics validation problem that develops validation evidence for ALEGRAHEDP capabilities. It will be used in a “validation grind” for ALEGRA. (Chen-Trucano, 2002) 21 March 2016 • We have defined a formal and documented assessment methodology for the planning component. Johns Hopkins, November 2003 Page 9 What do the experimental error bars mean? pr / pinc Instrument Fidelity 6 All serious discussion of validation metrics begins with uncertainty in the experimental data. 5 • A difficult problem is to characterize the uncertainty embedded in |Calc – Expt|. 4 • For the short term, validation metrics are driven by the assumption that this uncertainty can be characterized probabilistically. Analytic Experiment + Error Bar • An important component in doing this right is to execute dedicated experimental validation. 3 This study relied upon existing experimental data that did not • A rigorous methodology for experimental 2 characterize uncertainty properly. validation addresses experimental data Our interpretation of the error bars requirements. (Trucano, et al “General Concepts is that they reflect only instrument ALEGRA Calculationfor Experimental Validation of ASCI Code Applications” fidelity. (From this, we might make a 1 SAND2002-0341) strong assumption about 30 35 “uniform 40 45 50 distributions.”) 21 March 2016 Johns Hopkins, November 2003 Incident Angle Page 10 What are the “Validation Metrics” |Calc – Expt|? Viewgraph Norm Our key R&D project is the Validation Metrics Project (origin ~ 1998) 6 • The focus of the project is to answer the questions: (Trucano, et al “Description of the Sandia pr / pinc 5 Validation Metrics Project” SAND2002-0121) Experiment + Error Bar 4 1. What metrics and why? Analytic 3 2. What are relevant Pass/Fail criteria? 3. What are the implications for calculation prediction confidence? 2 ALEGRA Calculation 1 30 35 40 45 50 Incident Angle The main metric was reproduction of qualitative trends in shock reflection. The secondary goal was quantitative pointwise comparison of specific Mach No./Angle pairs. 21 March 2016 • Critical impact on current V&V milestones. • Uncertainty Quantification is an enabling technology. • Current themes are thermal analysis, solid mechanics, & structural dynamics. Johns Hopkins, November 2003 Page 11 It’s obvious that there are better metrics than the viewgraph norm. • Probabilistic sophistication in development and application of these metrics is a great challenge. P Mi observation P observation M i P Mi = × P M P observation M P M j observation j j 21 March 2016 Johns Hopkins, November 2003 Page 12 What do probabilistic metrics mean? 40 30 “Error” = |Calc – Expt| is probabilistic: Statistical Error Description • Expt = random field; Calc = random field Empirical Residual Histogram • These fields depend on many variables – geometry, initial and boundary condition specifications, numerical parameters in the case of the calculation. (Hills - SAND99-1256, 20 SAND2001-0312, SAND2001-1783) 10 Hopefully only a “few” of these variables are important 0 2. 2. 1. 1. 50 00 50 00 0 00 .5 0. 0 0 0 0 0 - .5 0 .0 -1 .5 -1 .0 -2 .5 -2 .0 -3 Regression Standardized Residual A study of Hills investigating statistical methodologies for very simple validation data (Hugoniot data for aluminum) has been influential for the entire validation metrics project. This is the simplest starting point for validation of shock wave calcs. 21 March 2016 • Predictive confidence results from understanding of the error field; depends on quantity and quality of data. • Additional complexities arise from the hierarchical nature of validation and the intended applications. This is an important subject for current research (Hills & Leslie – multivariate statistical approaches; Mahadevan – Bayesian net reliability) Johns Hopkins, November 2003 Page 13 What are the calculation error bars? These calculations are not converged. It is critical to “verify” calculations used in validation studies. (Verification guidance currently missing but in progress) • This requires convergence studies and error estimation techniques. • Because it is unlikely that this will be fully definitive, our confidence in the numerical accuracy of validation calculations also rests upon: When we performed the original work we could not converge the calculations because of hardware limitations. ALEGRA-HEDP has a growing set of verification problems that increase our confidence in the numerical accuracy. 21 March 2016 Code verification processes and results. This includes attention to software engineering (SE). Careful design and application of verification test suites where convergence to the right answer can be demonstrated. • DOE has demanded formal attention to SE (and Sandia has responded). Johns Hopkins, November 2003 Page 14 Is there also uncertainty in the calculation beyond numerical accuracy? Is uncertainty in the Grüneisen parameter important? (Part of an ensemble of 120000 calculations.) • Numerical accuracy uncertainties fundamentally reside in lack of convergence for fixed problem specification; one extreme point is known under-resolution of the grid. Experiment To study the probabilistic content of the error field when we compare calculated and experimental Hugoniot data we studied the influence of uncertainty in certain computational parameters. (L. Lehoucq, using the DDACE UQ tool. This type of study can now be accomplished using DAKOTA.) 21 March 2016 Calculations have uncertainties that are composed both of numerical accuracy questions and uncertainties arising from problem specifications. • There is uncertainty in translating experimental specs into calculation specs. • There is uncertainty in specifying a variety of numerical parameters; hence calibration of uncertain models becomes an important question. Johns Hopkins, November 2003 Page 15 Uncertainty dominates real V&V. Input Codes Output Application Specification Algorithm lack of rigor Quantitative Margins and Uncertainty Calibrations Under-resolution Decisions Structural (Model) Uncertainty Code Reliability Validation Data 21 March 2016 Human Reliability Infrastructure Reliability Johns Hopkins, November 2003 Page 16 What is the intended application and are we accumulating predictive confidence? Predictive modeling for Z-pinch physics. There is a very important link between V&V and the intended application of modeling and simulation. • Rigorous assessment of predictive confidence resulting from V&V is important. This is demanded by the experimental validation methodology and the validation metrics project. ALEGRA-HEDP validation calculation for Imperial College 4x4 arrays. Success or failure of predictive M&S may have an important influence on the future of the Pulsed Power program. 21 March 2016 There are technical problems, such as how to quantify the benefit gained by doing additional validation experiments; and how to quantify the risk associated with not having validation experiments. • We have also devoted significant attention to the issue of stockpile computing – how to make proper use of the investment in V&V. Document in progress. • “Quantitative Margins and Uncertainty”(!) Johns Hopkins, November 2003 Page 17 WIPP and NUREG-1150 Precedents High Consequence Regulatory Issues in the National Interest Addressed Primary Through Modeling and Simulation WIPP Data Lessons Learned: (1) Seek BE + Uncertainty 21 March 2016 (2) It takesJohns more than 2003 one shot to get it rightPage 18 Hopkins, November WIPP Data Example research question: Is Probabilistic Software Reliability (PSR) useful for computational science software? Hypothetical reliability model for ASCI codes. Failure Rate Application Application Decisions Decisions 1st use / Validation Development and test # of Users Capability I Capability II Capability etc Note the important implication here that the software is NEVER FROZEN! This effects reliability methods. A general purpose computational physics code such as ALEGRAHEDP, has a complex software lifecycle and reliability history. A fundamental complexity is the constant evolution of the software capability. 21 March 2016 PSR methodologies may deepen our ability to express operational confidence in our codes as software products. • A vigorous area of research is the expansion and limits of statistical testing techniques. “Based on the software developer and user surveys, the national annual costs of an inadequate infrastructure for software testing is estimated to range from $22.2 to $59.5 billion.” (“The Economic Impacts of Inadequate Infrastructure for Software Testing,” NIST report, 2002.) • Can PSR be extended to include “failures” defined by unacceptable algorithm performance? By inadequate resolution? • Can interesting code acceptance criteria be devised based on statistical software reliability ideas? Johns Hopkins, November 2003 Page 19 Example research question: Validation Metric Research. Real data Uncertainty quantification remains an critical enabling technology for validation: • Forward uncertainty propagation is computationally demanding (ideal would be stochastic PDE’s). • UQ needs new ideas in experimental design for simulations and coupled simulation-experiment validation tasks. • UQ needs tools and the expertise to use them properly. Load Implosion Load Stagnation The most relevant data on the Zmachine tends to be complicated, integral, and spatio-temporally correlated. The uncertainty is currently not well-characterized. 21 March 2016 DAKOTA is the platform of choice for current and future evolution of UQ tool capability at Sandia. • The “backward” UQ problem – model improvement – is an even harder formal challenge. This is related to Optimization Under Uncertainty (OUU). Johns Hopkins, November 2003 Page 20 Example research question: OUU (Optimization Under Uncertainty). z High Gain Capsules High Z dense plasma (Design, uncertain, unstable) Foam + ? (Design) • V&V are the source of the confidence that we have in the modeling component of these activities. Robust (reliable?) pulse shaping Robust (reliable?) pulse compensation Conversion – 1,2,3D ALEGRA rad-MHD Wire initiation – 3D ALEGRA MHD Drive and implosion – 1,2D ALEGRA radhydro Capsule (Design) Lagrangian Using computational models in reliability-based or robust design is an important goal. SMALE MMALE Eulerian Fusion capsule design for a Z-pinch driver is an interesting and extreme problem in OUU. We are currently using ALEGRA-HEDP and DAKOTA to study features of this problem. 21 March 2016 • Model improvement derived from V&V is related to OUU. For example, calibration under uncertainty. • It is important to couple research on OUU with research threads in Validation Metrics. r • We are just beginning this work. • VERY COMPLEX computation underlies this work. Johns Hopkins, November 2003 Page 21 Giunta has been working on the use of multifidelity surrogates, which will surely be crucial for use of OUU in such complex problems. Multifidelity Surrogate Models • The low-fidelity surrogate model retains many of the important features of the high-fidelity “truth” model, but is simplified in some way. – decreased physical resolution – decreased FE mesh resolution – simplified physics Finite Element Models of the Same Component • Independent of number of design parameters. • Low-fidelity model still may have nonsmooth response trends. • Works well when low-fidelity trends match high-fidelity trends. Low Fidelity 30,000 DOF High Fidelity 800,000 DOF 10 21 March 2016 Johns Hopkins, November 2003 Page 22 Combining uncertainty and multi-fidelity runs us head-on into probabilistic error models. log(r) Material #1 Material #2 • The problem is a simple shock problem involving shock transmission and reflection from a contact discontinuity. • Key features are various wave spacetime trajectories. • This is also a common verification test problem and has an analytic solution. 21 March 2016 error[log(r)] mean error in contact x Johns Hopkins, November 2003 empirical histogram of error in shock arrival time at wall Page 23 Probabilistic Error Models (PEM) are useful for computational science software and necessary for risk-informed decisions. • Suppose that we can neither “verify codes” nor “verify calculations.” – “When quantifying uncertainty, one cannot make errors small and then neglect them, as is the goal of classical numerical analysis; rather we must of necessity study and model these errors.” – “…most simulations of key problems will continue to be under resolved, and consequently useful models of solution errors must be applicable in such circumstances.” – “…an uncertain input parameter will lead not only to an uncertain solution but to an uncertain solution error as well.” • These quotes reflect a new view of “numerical error” expressed in B. DeVolder, J. Glimm, et al. (2001), “Uncertainty Quantification for Multiscale Simulations,” Los Alamos National Laboratory, LAUR01-4022. 21 March 2016 Johns Hopkins, November 2003 Page 24 Conclusion: “We make no warranties, express or implied, that the programs contained in this volume are FREE OF ERROR, or are consistent with any particular merchantability, or that they will meet your requirements for any particular application. THEY SHOULD NOT BE RELIED UPON FOR SOLVING A PROBLEM WHOSE SOLUTION COULD RESULT IN INJURY TO A PERSON OR LOSS OF PROPERTY…” [Emphasis Mine] (from Numerical Recipes in Fortran, Press, Teukolsky, Vetterling, and Flannery) Will we be able to seriously claim that ASCI codes are any better than this?! 21 March 2016 Johns Hopkins, November 2003 Page 25 How absurd would the following be? We make no warranties, express or implied, that the bridge you are about to drive on is free of error… 21 March 2016 Johns Hopkins, November 2003 Page 26 How much more absurd would the following be? We make no warranties, express or implied, that the book you are about to read is free of error… 21 March 2016 Johns Hopkins, November 2003 Page 27