Constructing a Performance Database for Large-Scale Quantum Chemistry Packages Meng-Shiou Wu1, Hirotoshi Mori2, Jonathan Bentz3, Theresa Windus2, Heather Netzloff2, Masha Sosonkina1, Mark S. Gordon1,2 1Scalable Computing Laboratory, Ames Laboratory, US DOE 2Department of Chemistry, Iowa State University 3Cray Inc. Introduction Why Quantum Chemistry (QC)?? – Applications to Surface science Environmental science Biological science Nanotechnology Much more… – Provides a way to treat problems with greater accuracy – With advent of more powerful computational tools, QC can be applied to more and more problems Introduction: Three Major QC Software Packages Program Development History Primary Language Parallel Computations GAMESS 25+ years F77 ‘Homegrown’ DDI NWChem 10+ years F77** Global Arrays + ARMCI MPQC 10+ years C++ MPI All provide various computations to calculate physical properties of molecules including energies, geometries, reaction pathways, etc. – Some calculation types are the same, but some are unique to the specific package Introduction We have used the Common Component Architecture (CCA) framework to enable interoperability between GAMESS, NWChem and MPQC – Obtain a palette of components to utilize functionalities between each package Insight into CCA: Example--Core Interfaces for Integral Computations IntegralEvaluatorFactoryInterface CCA interface integral evaluator factory Integral Evaluator1 Interface Integral Evaluator2 Interface Integral Evaluator3 Interface Integral Evaluator4 Interface integral evaluator1 integral evaluator2 integral evaluator3 integral evaluator4 component class Chemistry Package NWChem GAMESS MPQC IntegralEvaluatorNInterface for N-center integral , N = 1,2,3, 4. IntegralEvaluatorFactoryInterface provides references to the integral evaluators from a chemistry program Background Questions to ponder: – Easy selection of components with the best efficiency – Find compromises between efficiency and accuracy Rely on Computational Quality of Service (CQoS) to address these issues: – Automatic selection and configuration of components to suit a particular computational purpose – Special requirements for scientific components: High efficiency and parallel scalability Functional qualities, such as the level of accuracy achieved for a particular algorithm Core question in our CQoS research: – Need an approach for the large-scale applications Motivation for CQoS Three basic steps for performance analysis – code instrumentation – performance evaluation – data analysis This is straightforward IF applied to smaller packages – Must be augmented to deal with large scale applications Need mechanisms to handle performance data Issues to be Addressed Large number of quantum chemistry computations available in each QC packages – Individual chemists may not know all of the available computations and the algorithmic details of these computations – Construction of the database must allow/encourage participation of many chemists to try to obtain a ‘big picture’ view The construction process should be as automated as possible – No need for chemists to select the right performance tools. Chemists should not need to manually handle the large amounts of performance data – Management of performance data and the database should be transparent to chemists. Stages in the Development of CQoS Tuning Rules Stage 1 Stage 2 Stage 3 Stage 4 Develop/Collect Capable Implementations Performance Evaluation Interoperating mechanisms Source code instrumentation Develop Analysis Procedures Performance modeling Minimizing overhead Data collecting Exploring important parameters Heuristic tuning rules Performance database Analyze the relationships between parameters Mechanisms to incorporate tuning rules Data Analysis Tuning Rules Development Stage 1: Collect/Develop/Incorporate Implementations Goal: – Develop, collect and integrate capable methods/implementations For packages integrated with CCA, the goal is to develop interoperating mechanisms. – Overhead introduced by the interoperating mechanisms must be minimized Interoperating Mechanisms Minimizing Overhead Stage 2: Performance Evaluation Goal: – Generate useful performance data and explore methods to easily manipulate large amounts of data This is our current area of focus Source code Instrumentation Data Collection Performance Database Stage 3: Performance Analysis Goal: – Use the collected performance data and the constructed database to conduct analysis. Very complex process for quantum chemistry computations – Example to follow… Develop Analysis Procedures Exploring important parameters Analyze the relationships between parameters Stage 4: Tuning Rules Development Goal: – Develop mechanisms to select one method among a pool of methods. Depends on the results of performance analysis. Two common approaches: – develop performance models – use heuristic tuning rules. Tuning rules must be integrated into the original interoperating mechanisms to be useful. Performance Modeling Heuristic Tuning rules Mechanisms to Incorporate Tuning rules Performance Evaluation: Source Code Instrumentation Insert performance evaluation functions into application source codes. – Straightforward with automatic instrumentation capability provided by existing performance tools. – Limitations exist when you need only part of the overall performance data Expected performance data may not be generated Source Code Instrumentation (CQoS with GAMESS) Example: A whole computation Computation 1 Computation 2 Computation 3 communications In many cases, manual instrumentation is unavoidable in GAMESS performance evaluation. GAMESS Performance Tools Integrator (GPERFI) Built on top of existing performance tools such as TAU (Tuning and Analysis Utilities) or PAPI Allows more flexible instrumentation mechanisms specific to GAMESS Provides mechanisms to extract I/O, communication, or partial profiling data from the overall wall clock time Performance Evaluation: Data Collection Discard, remove its related data from database. Uploading data to the database through PerfDMF Post processing TAU performance data. Genetate application metadata from the input file/out log files. Conduct performance analysis with PerfExplorer Run the testing on the experimental cluster Output performance analysis results. Format verification No Decide if the input file should be tested on the other platforms Yes Test the input file on different platforms. Input files uploading / results viewing interface View results and provide comments. Provide input files for experiments Performance Evaluation: Performance Database Simply conducting many performance evaluations and putting the performance data into the database is not going to work!!! – Need to properly collect metadata that are related to performance data. – Need to build relationships between metadata. – Need participation of chemists!!! Performance Evaluation: Example of Construction Sequence Benzen (bz) OH CH3 NH2 Hydroxyl-bz Methyl-bz Amino 1. Chemist defines ‘molecule similarity’ between a set of molecules. 2. Conduct performance evaluation for the base molecule. 3. Use the performance data collected from the base molecule to conduct performance analysis and prediction of the other molecules with similar structure. Performance Evaluation: Performance Database Use PerfDMF (Performance Data Management Framework), included in the TAU toolkit, for data manipulation Use MySQL as the database engine underneath PerfDMF – Pros: Data manipulation is simplified. PerfDMF – Cons: Data manipulation is restricted by the schema defined by PerfDMF MySQL Performance Database Metadata Mapping Application GAMESS Experiment Computations Energy Trial Experimental runs with different system settings Experiment set 1 Metadata (Platform 1, CPU, cache.., etc.) Metadata (conv-SCF, .., etc) Application characteristics Experiment set 2 Metadata (Platform 2, CPU, cache.., etc.) Experiment set 3 Metadata (Platform 3, CPU, cache.., etc.) … Energy System characteristics Experiment set 1 Metadata (Platform 1, CPU, cache.., etc.) Metadata (directSCF, .., etc) Experiment set 2 Metadata (Platform 2, CPU, cache.., etc.) Experiment set 3 Metadata (Platform 3, CPU, cache.., etc.) … … Hierarchy of Quantum Chemical Calculation in GAMESS Molecule 1. Molecule # of Atoms & electrons Structure (Linearity/Planarity of the Molecule) 2. Property Energy Gradient $data Hessian 3. Level of Theory HF & post-HF HF CI 4. Basis Sets CC MPn DFT $contrl BLYP PBE B3LYP etc. # of basis functions Angular momentum 5. Integral Algorithm Conventional (stored on disk) Direct (calculated ‘on the fly’) $basis $scf Detail of the Performance Evaluation Stage Performance Evaluation Source code Instrumentation (GPERFI for GAMESS, TAU/PDT for NWChem) Data Collecting (Python programs) Application Metadata Performance Data Performance Analysis Tuning Rules Development Exploring important parameters Performance Modeling Analyze the relationship between parameters System Metadata PerfDMF PerfExplorer MySQL Performance Database Heuristic Tuning Rules Complexity in Performance Analysis: A Case Study (Conventional Integral Calculation) Bassi Runtime (sec) Seaborg Number of nodes/Number of Proc per node (1) On Seaborg, parallel write (PWRT) is not an issue until we use 16 processors per node. On Bassi, we did not have problem with parallel write. (2) On Bassi, the latency of global sum (GSUM) increase to certain ratio that’s not neglectable when we increase the number of processors used per node. On Seaborg, the ration of global sum is not very important. What Needs to be Analyzed? Why does the cost of parallel write (PWRT) jump dramatically when using 16 processors per node? When does global sum (GSUM) become the performance bottleneck? Investigation of many subroutines to study the relationships among – – – – CPU speed I/O bandwidth/ latency data size communication bandwidth/latency – number of processors used per node Conclusions Initial work to construct a performance database that can ease the burden in manipulating performance data for quantum chemists – Data manipulation is mostly transparent to chemists. They need only to provide their knowledge of chemistry. The infrastructure can be used on the other applications – We are working on incorporating NWChem Acknowledgements Funding: – US DOE SciDAC initiative Resources: – Lawrence Berkeley National Lab-NERSC Questions/Comments??