HPC Performance Metrics for the SCEC CyberShake Application The SCEC Community Modeling Environment (CME) performed a CyberShake production run in February 2014 using Blue Waters. This CyberShake 14.2 Study calculated new 4 Los Angeles hazard models. Our CME HPC group is funded to perform this large-scale research calculation through NSF and USGS research grants. Funding agencies ask us to define and report metrics that measure the productivity of our HPC research group. When reporting to research funding sources, we want to report metrics in terms that are meaningful to the funding organizations. For NSF and USGS funded research, our goal is to deliver new research results, so we want to select metrics that measure the ability of our computational research group to deliver computational research results. Our selection of HPC productivity metrics was influenced by the 2004 HECRTF advisory report [1] that says the emphasis should be placed on time to solution, the major metric of value to high-end computing users. It states, “The real purpose of having supercomputers is to solve problems. Just as purchase cost does not equal total cost of ownership, execution time does not equal time to solution. Time to solution encompasses the time required to code the algorithms, load data and offload results, analyze output, and validate the software, as well as the execution time itself.” As defined in the HECRTF [1], time to solution measures the length (wall clock duration) it takes to setup, run, and analyze a research calculation. For this reason, computing resource providers are interested in reporting time to solution, as competitive advantage. Time to solution improvements may result from an improved computing environment as well as from the improved skill and efficiency of the research group performing the calculation. We define the CyberShake hazard model calculation as our “problem” to be solved for several reasons. CyberShake is a large-scale, long running HPC calculation, with a well-defined starting point, and a well-defined completion point. CyberShake represents a useful reference calculation for us to measure our improvements over time, because researchers want to re-run this calculation with alternative input parameters, and alternative inputs have little impact on computational requirements. In one of our earlier HPC metric publications [2], we discuss the concept of Application level metrics. Application level metrics are measurements of the complete end-to-end research calculation, considering all processing stages, even if these processing stages are conducted with alternative codes and technologies, and on different computing system. The application is the problem to be solved. We will use our CyberShake application as the research problem to be solved. In our current metrics, we define the CyberShake application to be solved as a SCEC CyberShake Los Angeles area 1144 site, 0.5Hz hazard model calculation. This calculation starts with the first SGT mesh job, and it completes when the final site-specific hazard curve values are committed to the CyberShake database. When comparing proposed Blue Waters hazard calculations against earlier, smaller calculations, we scaled-up previous measures to the current 1144 site hazard model for comparison purposes. These are high-level performance measures. We collect and use additional detailed performance measures, not discussed here, for diagnostics and other purposes. In Table 1, we present three selected performance metrics for previous CyberShake hazard calculations. The 2014 column in this table includes our estimated performance metric for our planned Blue Waters run. The following descriptions define each of these three metrics. 1. Application Core Hours (Hours): The application core hours metric provides as an indication of the scale of the HPC calculation needed to solve the problem of interest, in our case a CyberShake seismic hazard model calculation. We define application core hours as the sum of all the core hours used to complete the application. So 1000 cores running for 10 hour requires 10,000 core hours. A reduction in the number of application core hours required to perform a hazard model calculation will be considered a performance improvement. Improvements may come from improved codes, improved computational techniques, more efficient computers, and other sources. For Blue Water’s we will convert Blue Waters XE Node-Hours x 32 to get core hours, and Blue Waters XK Node-Hours x 16 to get core hours. We will count core hours as measured on the machine at the time we ran the calculation. No conversion from 2008 core hours to 2013 core hours, due to differences in core performance, is made. Our metrics include only core hours for simplicity. Data I/O is another type of computational metric that might be added in the future. 2. Application Makespan (Hours): The application makespan metrics describes how long it takes to run the calculation, and solve the problem. We define our application makespan as the total execution duration of our application [3]. We will measure our application makespan as the wallclock time (in hours) between the start of the calculation until the completion of calculation. This measure includes time waiting in queues and any processing stops and delays. A reduction in the application makespan will be considered a performance improvement. Improvements may come from many sources including improved codes, simplified techniques, use of larger or faster computers, automation of calculation, and other sources. 3. Application Time to Solution (Hours): The application time to solution metric includes time to setup the problem, time to run the calculation, and time to analyze the results. In research terms, we might describe this metric in terms of testing a hypothesis. Start with a hypothesis, how long does it take to setup a computation to test the hypothesis, run the calculation, and analyze the results long enough to prove or disprove the hypothesis. The HECRTF definition for time to solution [1] includes defining a computational problem, coding a solution, setting up and checking the calculation on a computing resource, running the full-scale calculation, and analyzing the results. A reduction in time to solution will be considered a performance improvement. Because our application has working codes, we will define our application time-to-solution as application setup-time plus the application makespan plus the application analysis time. Setup time and analysis time are vaguely defined and harder to measure in a repeatable and reliable way. For our CyberShake calculation, we can roughly define the start of the setup time as the time we get access to a new supercomputer. We will define CyberShake analysis time to include standard processing such as hazard maps, disaggregation, and averaging-based factorization. Setup time measurements are particularly vulnerable to uncertainties introduced by multitasking of a group. Until we have good measurements, we will use operator estimates for setup and analysis times. Furthermore, we will assume the same setup and analysis time for all machines. If these estimates introduce uncertainties too large for your intended use, consider using our makespan metric that focuses on execution time of the calculation. Although we do not have historical measures for setup and analysis time, we will include these in our time to solution calculations as a reminder that the setup and analysis times are important considerations in an important metric. Conclusions: We have identified three high-level computational performance metrics and use them to track our ability to complete a CyberShake hazard calculation over the last six years. Our metrics include application core hours, as an indicator of computational efficiency, application makespan, as an indicator of execution efficiency, and application time to solution, as an indicator of computational research efficiency. Several HPC advisory reports have identified time to solution [1][4] [5] as the main figure of merit to computational scientists [6]. In these reports, Time to solution includes getting a new application up and running (the programming time), waiting for it to run (the execution time), and, finally, interpreting the results (the interpretation time). We use our CyberShake hazard model calculation as the computational problem to be solved, and we report the time to solution for three previous solutions of this problem. Our current measurements show continued improvements each time we calculate a new solution. Our improvements have come from many sources including from access to faster supercomputers, improved computational methods, and an increased computational efficiency gained through workflow automation. Table 1: Measurements from four previous CyberShake hazard calculations, with values of two earliest scaled up to the current 1144 site scale of the last and the planned calculations. 2014 results are based on measurements taken after we completed our CyberShake 14.2 Study. CyberShake Application Metrics (Hours): Application Core Hours: Application Makespan: Application Time to Solution: PSHA 2008 2009 2013 2014 (Mercury, (Ranger, (Blue Waters/ (Blue Waters) normalized) normalized) Stampede) 0.69 19,448,000 16,130,400 12,200,000 10,032,704 (CPU) (CPU) (CPU) (CPU+GPU) 98 seconds 70,165 6,191 1,467 342 7 days + 7 72,493 8,519 3,795 2,670 days + 1min References: [1] Federal Plan for High-end Computing: Report of the High-end Computing Revitalization Task Force (HECRTF) (2004) Executive Office of the President, Office of Science and Technology Policy, 2004 [2] Metrics for heterogeneous scientific workflows: A case study of an earthquake science application (2011) Scott Callaghan, Philip Maechling, Patrick Small, Kevin Milner, Gideon Juve, Thomas H Jordan, Ewa Deelman, Gaurang Mehta, Karan Vahi, Dan Gunter, Keith Beattie and Christopher Brooks, International Journal of High Performance Computing Applications 2011 25: 274 DOI: 10.1177/1094342011414743 [3] Scientific Workflow Makespan Reduction Through Cloud Augmented Desktop Grids (2011) Christopher J. Reynolds, Stephen Winter, Gabor Z. Terstyanszk, Tamas Kiss, Third IEEE International Conference on Coud Computing Technology and Science, 2011 [4] The Opportunities and Challenges of Exascale Computing (2010) Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee on Exascale Computing, Department of Energy, Office of Science, Fall 2010 [5] National Research Council. Getting Up to Speed: The Future of Supercomputing. Washington, DC: The National Academies Press, 2004. [6] Computational Science: Ensuring America’s Competitiveness (2005) The President’s Information Technology Advisory Committee (PITAC) CyberShake Los Angeles Region Hazard Model Calculation Core Hours (Hours) 25,000,000 Core-Hours (Hours) 20,000,000 15,000,000 Series1 10,000,000 5,000,000 0 2007 2008 2009 2010 2011 2012 SCEC Project Year 2013 2014 2015 CyberShake Los Angeles Region Hazard Model Calculation Core Hours (Hours) 18,000,000 16,000,000 Core-Hours (Hours) 14,000,000 12,000,000 10,000,000 8,000,000 Series1 6,000,000 4,000,000 2,000,000 0 2008 2009 2010 2011 2012 2013 SCEC Project Year 2014 2015 CyberShake Los Angeles Region Hazard Model Calculation Makespan (Hours) 80,000 70,000 Makespan (Hours) 60,000 50,000 40,000 Series1 30,000 20,000 10,000 0 2007 2008 2009 2010 2011 2012 SCEC Project Year 2013 2014 2015 CyberShake Los Angeles Region Hazard Model Calculation Makespan (Hours) 7,000 Makespan (Hours) 6,000 5,000 4,000 3,000 Series1 2,000 1,000 0 2008 2009 2010 2011 2012 SCEC Project Year 2013 2014 2015 CyberShake Los Angeles Region Hazard Model Calculation Time To Solution (Hours) 80,000 Time To Solution (Hours) 70,000 60,000 50,000 40,000 Series1 30,000 20,000 10,000 0 2007 2008 2009 2010 2011 2012 SCEC Project Year 2013 2014 2015 CyberShake Los Angeles Region Hazard Model Calculation Time To Solution (Hours) 9,000 Time To Solution (Hours) 8,000 7,000 6,000 5,000 4,000 Series1 3,000 2,000 1,000 0 2008 2009 2010 2011 2012 SCEC Project Year 2013 2014 2015