Data Processing and Analysis Center with Burst

Data Processing and Analysis Center with Burst-Capability on Flagship/Leadership Computing Facility Participants: Kenneth Read (ORNL), Galen Shipman (ORNL), Adam Simpson (OLCF), Alexei Klimentov (BNL), Sergey Panitkin (BNL), Eli Dart (ESnet), Shane Canon (NERSC) Facilities: OLCF (ASCR), BNL (ASCR), NERSC (ASCR), ESnet (ASCR) The Worldwide LHC Computing Grid (which includes the Open Science Grid) facilitates the processing of high priority science in fields such as high energy physics and high energy nuclear physics by connecting computing centers of varying size across the globe. This science includes understanding various decay modes of the recently discovered Higgs candidate and understanding properties of the Quark Gluon Plasma. Virtually all of this workflow is eventbased with no capability-class scale requirements other than necessary time-to-completion. Heretofore, such event-based projects in High Energy and Nuclear Physics have only exploited the largest Flagship/Leadership Computing Facilities in experimental scavenging mode, as opposed to major awarded dedicated time. Meanwhile, the dramatically increasing data rates and escalating computational needs of these fields are projected to exceed the available resources unless Flagship/Leadership supercomputing resources are incorporated as part of the solution moving forward. The ASCR-funded BigPanDA workflow management software coordinates much of the job and data availability requirements in this field. The ALICE Online/Offline Computing Project includes the future need for advanced HPC resources. The new ORNL CADES (Compute and Data Environment for Science) HPC provides the unique flexibility of Tier1-level resources with in-house, high-bandwidth access to the Titan Leadership supercomputer and 30 PB Atlas file system. We propose a real-time demonstration that marries the flexible data handling and processing capabilities at ORNL CADES, providing resources comparable to a typical Tier1 LHC/FAIR Computing Center, with coordinated opportunistic burst-capability running at-scale on Titan at the OLCF. To accomplish the goal of demonstrating the promise of federated real-time distributed computing operations with Flagship/Leadership burst-capability, we propose the following multi-laboratory, multi-project demo. The components of the demo will include: 1. Remote federated job coordination from a distant central server located at BNL or CERN (Geneva). 2. Local job coordination and data availability workflow managed on ORNL CADES by a PanDA workflow management system, temporarily replicating the full flexibility, predefined and controlled remote connectivity, distributed data management, and promised quality-of-service of a Tier1 center (with 97% uptime). 3. Real-time burst-mode job submission, based on Titan availability, to OLCF Titan, relying on high-bandwidth, in-house data transfer and visibility. 4. High statistics, multi-threaded simulated data production using Geant4 and Monte Carlo simulation using alpgen, PYTHIA, and sherpa generators. 5. Dynamically establish a GridFTP connection between NERSC, OLCF, and BNL/CERN to transfer generated/processed data from Titan to NERSC.

Data Processing and Analysis Center with Burst

Related documents

Products

Support

Data Processing and Analysis Center with Burst

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib