How has Computational Geometry helped Visualization and Analysis of Massive Scientific Data: from the Design of Clean Fuels to Understanding Climate Change Valerio Pascucci Director, Center for Extreme Data Management Analysis and Visualization Professor, SCI institute and School of Computing, University of Utah Laboratory Fellow, Pacific Northwest National Laboratory Pascucci-1 Massive Scientific Models are Source of Great Challenges and Opportunities Subsurface Modeling BlueGene/L Hydrodynamic Instabilities Combustion Simulations Molecular Dynamics Climate Modeling Pascucci-2 Material Sciences Given Three Burning Hydrogen Flames, Rank Them by Level of Turbulence Pascucci-3 Count the Number of Bubbles in a Rayleigh–Taylor Instability Rayleigh-Taylor instabilities arise in fusion, super-novae, and other fundamental phenomena: – start: heavy fluid above, light fluid below – gravity drives the mixing process – the mixing region lies between the upper envelope surface (red) and the lower envelope surface (blue) – 25 to 40 TB of data from simulations Pascucci-4 Are these LES and DNS simulation codes expressing equivalent fenomena? LES Large Eddy Simulation DNS Direct Numerical Simulation Pascucci-5 What is the Long Term Impact of Human Activities on the Global Climate? Pascucci-6 What Conditions Cause Extinction and Reignition in Premixed Hydrogen Flames Pascucci-7 We Develop General Purpose Tools for Efficient and Reliable Data Understanding Real functions are ubiquitous in the representation of scientific information. Pascucci-8 Traditional Data Analysis Tools are Often Ineffective for Massive Models • Massive models are challenging: Rayleigh–Taylor instability (Miranda) – Sheer volume of information – Complexity of the information represented – Complexity of presentation • Tools do not scale with the data sizes • Difficult to capture multiple scales • Numerical methods unstable and sensitive to noise • Difficulty in providing error bounds associated with the coarse scale analysis • Need new abstractions and metaphors to covey information reliably and efficiently Pascucci-9 We Develop New Techniques for Efficient Data Management and Presentation • Advanced data storage techniques: – Data re-organization. – Compression. • Advanced algorithmic techniques: – Streaming. – Progressive multi-resolution. – Out of core computations. • Scalability across a wide range of running conditions: – From laptop, to office desktop, to cluster of PC, to BG/L. – Memory, to disk, to remote data access. Pascucci-10 We Demonstrated Performance and Scalability in a Variety of Applications Pascucci-11 Our Framework is Based on Robust Topological Computations for Quantitative Data Analysis • Provably robust computation • Provably complete feature extraction and quantification • Hierarchical topological structures used to capture multiple scales • Error-bounded approximations associated with each scale • Formal mathematical definition associated with each analysis • Scalable performance in association with streaming techniques Pascucci-12 Hierarchical topology of a 2D Miranda vorticity field Coarse scale: blue = molecules Medium scale: red-blue = dipoles Molecular dynamics simulation (left) with abstract graph representation of its features at two scales (right) Motivation • The presence of computational errors in streamline computation using integration techniques can lead to inconsistent results Pascucci-13 We Adopted a Combinatorial Approach to Morse Theory for Provably Correct Computations f ( x ) : D F ( x ) : S Classical mathematical definitions domain function critical point D f S smooth manifold infinitely differentiable f( p ) 0 numerical 1D Simulation of differentiability simplicial complex F(x) PL-extension of f (xi) d 1 Low ( p ) Be combinatorial 2D 3D Independent local computation yield globally consistent results Pascucci-14 We Use Robust Techniques for Critical Point Detection and Classification type index The Morse Lemma Minimum 0 There are d+1 types of critical points Saddle d-1 Maximum d Minimum f( p ) 0 d k d 2 2 i j i 1 j d k 1 Minimum Maximum Saddle Maximum f ( x ) | f ( p ) x x p Minimum numerical 1-saddle 2-saddle combinatorial Pascucci-15 Maximum We Introduced the Morse–Smale Complex for Complete Data Analysis • The Morse–Smale complex partitions the domain of f in regions of uniform gradient • Generalizes the notion of monotonic interval • Dimension of a region equal index difference of source and destination • Remove inconsistency of local gradient evaluations 1D 2D 3D Pascucci-16 Minimum Saddle Maximum We Use Cancellations to Create a Multi-scale Representation of the Trends in the Data Cancellations: Approximation: Multi-scale: critical points can be created or destroyed in pairs that are connected 1-manifolds error = persistence/2 (proven lower bound) consistent gradient segmentation at all scales persistence p 1D: cancellation=contraction 3D 2D: cancellation=contraction + edge removal Time Pascucci-17 Contraction of extrema with saddles allows to simplify the MS-complex. 0.5% persistence filter 1.0% persistence filter Pascucci-18 20% persistence filter Contraction of extrema with saddles allows to simplify the MS-complex. 0.5% persistence filter Pascucci-19 Contraction of extrema with saddles allows to simplify the MS-complex. 1.0% persistence filter Pascucci-20 Contraction of extrema with saddles allows to simplify the MS-complex. 20% persistence filter Pascucci-21 Topological Constructs Allow Building Effective and Succinct Shape Descriptors The Reeb graph is the graph obtained by continuous contraction of all the contours in a scalar field, where each contour is collapsed to a distinct point. Pascucci-24 We Use Streaming Techniques to Achieve High Performance Analysis of Shapes Pascucci-28 We Exploit the Existing Locality of the Input Mesh to Avoid Costly Reordering • Computation of the Reeb graph for two layouts of the same model Pascucci-29 We Develop Shape Signatures to Find Defects in Large Scale Geometric Models Pascucci-30 1.E+05 1.E+04 Practical Tests Confirm Robustness and Show Virtually Linear Scalability 1.E+03 100Ks 1.E+05 100MB 1.E+05 1.E+02 10MB 1.E+04 10Ks 1.E+04 1hour 1MB 1.E+01 1.E+03 1ks 1.E+03 100KB 1.E+02 1.E+00 100s 1.E+02 1min 10KB 1.E+01 1.E-01 1.E+00 1KB 10s 1.E+01 1.E-02 1.E-01 100B 1s 1.E+00 1.E-02 10B 1.E-03 0.1s 1.E-01 1.E-03 1B 1.E-04 1.E-04 1.E-02 1.E+031Kt 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 10Kt 100Kt 1Mt 100Mt 1Gt 10Mt 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 Min Max Time Lin. Lin. Scale Avg Run Run Time Max Run Run Time Time Read T MinMem Mem Max Mem Mem Time Scale Avg Time Max Pascucci-31 Demo C4H4 Pascucci-32 Demo S3D Combustion Simulation Pascucci-33 Data Analysis Examples Pascucci-34 We Analyze High-Resolution Rayleigh–Taylor Instability Simulations • Large eddy simulation run on Linux cluster: 1152 x 1152 x 1152 maximum bubble – ~ 40 G / dump – 759 dumps, about 25 TB • Direct numerical simulation run on BlueGene/L: 3072 x 3072 x Z – Z depends on width of mixing layer – More than 40 TB • Bubble-like structures are observed in laboratory and simulations • Bubble dynamics are considered an important way to characterize the mixing process – Mixing rate = (# bubbles ) / t. • There is no prevalent formal definition of bubbles Pascucci-35 We Compute the Morse–Smale Complex of the Upper Envelope Surface Maximum F(x) = z F(x) on the surface is aligned against the direction of gravity which drives the flow Morse complex cells drawn in distinct colors Pascucci-36 In each Morse complex cell, all steepest ascending lines converge to one maximum A Hierarchal Model is Generated by Simplification of Critical Points • Persistence is varied to annihilate pairs of critical points and produce coarser segmentations • Critical points with higher persistence are preserved at the coarser scales Saddle p1 p2 Pascucci-37 The Segmentation Method is Robust From Early Mixing to Late Turbulence T=100 T=700 T=353 Pascucci-38 We Evaluated Our Quantitative Analysis at Multiple Scales Under-segmented Coarse scale Medium scale 100000 Persistence Threshold 00.93% 02.39% 04.84% slope -0.65 slope -1.72 10000 Number of bubbles Over-segmented slope -0.75 1000 100 10 53.0 Timestep 100 353 400 503 700 1000 Fine scale Pascucci-39 We Characterize Events that Occur in the Mixing Process death birth merge split Comp AD 07-DRC Pascucci-17 Pascucci-40 First Robust Bubble Tracking From Beginning to Late Turbulent Stages Pascucci-41 First Time Scientists Can Quantify Robustly Mixing Rates by Bubble Count slope 1.536 slope 1.949 -0.5 -1 -1.5 -2 -2.5 -3 1 Bubble Count Derivative 2 0.1 slope 2.575 Derivative of Bubble Count 0 3 4 5 6 7 8 9 10 t/ (log scale) Pascucci-42 Mode-Normalized Bubble Count (log scale) 1.0 (9524 bubbles) 0.5 0.01 20 30 40 We Provide the First Quantification of Known Stages of the Mixing Process mixing 1.0 transition 0.5 slope 1.536 slope 1.949 -0.5 -1 -1.5 -2 -2.5 weak -3 1 turbulence Bubble Count Derivative 2 3 0.1 slope 2.575 Derivative of Bubble Count 0 linear growth 4 5 6 7 8 9 10 t/ (log scale) Pascucci-43 Mode-Normalized Bubble Count (log scale) (9524 bubbles) 0.01 20 30 40 strong turbulence We Provided the First Feature-Based Validation of a LES with Respect to a DNS LES (Large Eddy Simulation) Mode-Normalized Bubble Count DNS (Direct Numerical Simulation) Normalized Time Pascucci-44 Quantitative Analysis of the Impact of a Micrometeoroid in a Porous Medium • Many possible applications: – – – – NASA’s Stardust Spacecraft National Ignition Facility Targets Light and Robust Materials many more... Micrometeoroid impact direction Pascucci-45 The Topological Reconstruction Method is Validated with a Controlled Test Shape Challenge: robust reconstruction of the structure of a porous medium Preparation: we develop control test data to validate the approach Pascucci-46 We Report the Distribution of Topological Features in the Full Resolution Data saddle p Pascucci-47 maximum q (f(p), f(q)) The Hierarchical Morse-Smale Complex Has Very Good Reconstruction Properties Pascucci-48 We Compute the Complete Morse-Smale Complex for the Porous Medium Pascucci-49 Need to Find Proper Threshold Values and Characterize the Stability of the Solution Pascucci-50 Need to Find Proper Threshold Values and Characterize the Stability of the Solution Pascucci-51 We Obtain a Robust Reconstruction of the Filament Structures in the Material Pascucci-52 We Track the Evolution of the Filament Structure of the Material Under Impact Time comparison of the reconstructions Pascucci-53 Demo Porous Medium Pascucci-54 The Extracted Structures Allow to Quantify the Change in Porosity of the Material Density profiles Decay in porosity of the material Pascucci-55 Understanding Turbulence for Low Emission, High Efficiency Combustion • Lean premixed H2 flames • Low Swirl Combustion (LSC) Burners • Low pollution in energy production • High Efficiency in fuel consumption • Scalable from residential to industrial use • Each variable 3.9-4.5 TB Experiment Simulation Pascucci-56 We Take on the Challenge of Developing a Quantitative Analysis Detecting Turbulence Understanding combustion processes over a broad range of burning conditions is an important problem for designing engines and power plants. – Simulation with AMR mesh. – Simulations of lean premixed hydrogen flames with three degrees of turbulence. – Can we identify precisely and track in time burning regions? – Can we discriminate the degree of turbulence from a quantitative analysis? Pascucci-57 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-58 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-59 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-60 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-61 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-62 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-63 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-64 We Build a Reduced Topological Model of H2 Consumption on an Isotermal Surfaces domain hierarchy Pascucci-65 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-66 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-67 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-68 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-69 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-70 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-71 We track in time by interpolation in 4D and contraction to a Reeb Graph Pascucci-72 Each Set of Parameters Results in a Robust Segmentation and Tracking of Burning Cells Pascucci-73 We Allow Exploration of the Full Space of Parameters Defining the Features • interactive exploration • comparison of statistics Pascucci-74 Topological Segmentation Allows to Quantify Turbulence as Slope of the Area Distributions • Combustion & flame Pascucci-75 Exploration of High Dimensional Functions for Sensitivity Analysis Integrated presentation of statistics and topology Pascucci-76 Exploration of High Dimensional Functions for Sensitivity Analysis Pascucci-77 Analysis of Combustion Simulations Pascucci-78 Pure fuel The Framework Allows Detailed Visualization and Analysis of High Dimensional Functions 10 dimensional data set describing the heat release wrt. to various chemical species in a combustion simulation Pascucci-79 Pure oxidizer The Framework Allows Detailed Visualization and Analysis of High Dimensional Functions 10 dimensional data set describing the heat release wrt. to various chemical species in a combustion simulation Pascucci-80 Local extinction The Framework Allows Detailed Visualization and Analysis of High Dimensional Functions 10 dimensional data set describing the heat release wrt. to various chemical species in a combustion simulation Pascucci-81 Pascucci-82 Analysis of Climate Data Pascucci-83 The Framework Reveals Relationship Between Convection and Global Long Wave Flux Pascucci-84 Pascucci-85 Discrete Analysis of Vector Field Structures is Enabling New Robust Analysis of Climate Data How many eddies in the ocean transport energy of a global climate model? 1 2 Pascucci-86 Encoding Must Explicitly Capture Flow’s Topological Features H. Theisel, T. Weinkauf, H.-C. Hege, H.-P. Seidel. Gridindependent Detection of Closed Stream Lines in 2D Vector Fields. VMV 2004. J. L. Helman and L. Hesselink. Representation and Display of Vector Field Topology in Fluid Flow Data Sets. IEEE Computer, 1989. G. Scheuermann, X. Tricoche. Topological Methods in Flow Visualization. Visualization Handbook, 2004 Pascucci-87 Detecting Features Robustly Is Critical Data Credit: Mathew Maltrud from the Climate, Ocean and Sea Ice Modelling program at Los Alamos National Laboratory (LANL) and the BER Office of Science UVCDAT team Pascucci-88 Classification of Closed Orbits Attracting Neutral Repelling Pascucci-89 Quantized Features: Topological Decompositions Stable Manifolds Topological Skeletons Pascucci-90 Comparison to Other Discretizations Morse Sets G. Chen, Q. Deng, A. Szymczak, R. S. Laramee, E. Zhang. Morse set classification and hierarchical refinement using the Conley index. IEEE TVCG, 2011. PC Morse Sets A. Szymczak. Stable Morse Decompositions for piecewise constant vector fields on surfaces. CGF 2011. Pascucci-91 Stable Manifolds of Ocean Data time depth data: 3600x2400 mesh ~25k critical points ~39k cycles Pascucci-92 CG Enabled Fundamental Advances in Scientific Data Analysis and Visualization • Tight cycle of : – basic research, – software deployment – user support • Plenty of Open Problems: – Combinatorial Methods for: • Vector Fields • Tensor Fields • Dynamic models First time scientists can quantify robustly mixing rates by bubble count slope 1.949 -0.5 -1.5 -2 -3 1 Comp AD 07-DRC Bubble Count Derivative 2 3 4 5 6 7 8 9 10 t/ (log scale) 0.1 slope 2.575 -1 Mode-Normalized Bubble Count (log scale) slope 1.536 0 -2.5 Pascucci-93 1.0 (9524 bubbles) 0.5 Derivative of Bubble Count – Efficient approaches for high dimensional data – Simple data structure and algorithms (fast in practice and easier to validate) 0.01 20 30 40 Pascucci-37