1 PL-Grid: Polish Infrastructure for Supporting Computational Science in the European Research Space Distributed Computing Instrastructure as a Tool for e-Science Jacek Kitowski, Kazimierz Wiatr, Łukasz Dutka, Maciej Twardy, Tomasz Szepieniec, Mariusz Sterzel, Renata Słota and Robert Pająk ACK Cyfronet AGH PL-Grid Consortium PPAM 2015, 7-9.09.2015, Kraków 2 Outline National e-Infrastructure Assumptions and foundations Tool for e-Science e-Infrastructure creation – motivation, background and issues Conceptualization and implementation PLGrid case study Enhancement of Achievements Platforms and Environments – Selected Examples Conclusions 3 e-Infrastructure Creation Motivation and Background Increasing importance of Computational Science and Big Data Analysis Needs: Preventing users from technical problems Expert support for making science Increase of resources Openess for future paradigms Numerically intensive computing Experiments in silico: 1st: Theory 3rd: Simulation Computing and Data for Open Science International collaboration User/platform driven e-infrastructure innovation (e-Science and e-Infrastructure interaction) Computational Science problems: Algorithms, environments and deployment Future and emerging technologies 4th paradigm, distributed, grid and cloud computing, Data Farming 2nd: Experiment 4th Paradigm: Data Intensive Scientific Discovery Data intensive computing Activity initiated by Cyfronet e-Infrastructure Creation 4 Issues Synergistic effort in several dimensions: Meeting user demands in the field of grand challenges applications Activity supported by users with scientific achievements and by well-defined requirements Organizational: horizontal perspective - federation of computer centres supporting the e-infrastructure with different kinds of resources and competences vertical perspective - involvement of computer, computational and domain-specific experts into e-infrastructure operations Technological: different computing hardware and software various middleware environments Meeting user demands Energy awareness Technological Energy awareness: optimal scheduling strategies of computing jobs among federation resources to minimize energy consumption as a whole Organizational 5 PL-Grid Consortium Consortium creation – 2007 a response to requirements from Polish scientists due to ongoing eScience activities in Europe and in the World Aim: significant extension of computing resources and solutions provided to the scientific community PL-Grid Programme Development based on (SWOT analysis): projects funded by the European Regional Development Fund as part of the Innovative Economy Program close international collaboration (EGI, ….) previous projects (5FP, 6FP, 7FP, EDA…) National Network Infrastructure available: Pionier National Project computing resources: Top500 list Polish scientific communities: ~75% highly rated Polish publications in 5 Communities PL-Grid Consortium members: 5 High Performance Computing Polish Centres, representing the Communities coordinated by ACC Cyfronet AGH ACK Cyfronet AGH 42 years of expertise Social Networking Human Resources Infrastructure Resources High Performace Computing Centre of Competence High Performance Networking Rank TOP500 Site System 49 Cyfronet Prometheus VII.2015 Poland HP Apollo 8000 Network Resources Cores Rmax Tflops Rpeak Tflops 41,472 1262.4 1658.9 25,468 266.9 373.9 Zeus 269 Cyfronet Cluster Platform VII.2015 Poland Hewlett-Packard 7 The most powerful HPC Asset (in Poland) Prometheus Cluster (2014/2015) Rpeak = 1658.9 TFlops 1728 servers 41,472 Haswell cores 216 TB RAM (DDR4) 10 PB disks, 180 GB/s HP Apollo 8000 In operation April 2015 Q4 2015 Extensions Rpeak = 483.8 TFlops 504 servers 12,096 Haswell cores RpeakNVIDIA= 256.3 TFlops 144 Nvidia K40 XL In SUMMARY: 2,4 PFlops (with GPU) 49th position on the July 2015 edition of the TOP500 list TOP500, July 2015 Polish Sites 8 System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Prometheus - HP Apollo 8000, Xeon E52680v3 12C 2.5GHz, Infiniband FDR Hewlett-Packard 41,472 1,262.4 1,658.9 Zeus - Cluster Platform SL390/BL2x220, Xeon X5650 6C, 2.660GHz, Infiniband QDR, NVIDIA 2090 Hewlett-Packard 25,468 266.9 373.9 126 TASK, Gdańsk Tryton - HP ProLiant XL230a Gen9, Xeon E5-2670v3 12C 2.3GHz, Infiniband Megatel/Action 17,280 530.5 635.9 135 WCSS, Wrocław BEM - Actina Solar 820 S6, Xeon E52670v3 12C 2.3GHz, Infiniband FDR ACTION 17,280 480.1 635.9 NCNR, Świerk Świerk Computing Centre - Supermicro TwinBlade SBI-7227R/Bull DLC B720, Intel Xeon E5-2680v2/E5-2650 v3 10C 2.8GHz, Infiniband QDR/FDR Format, Bull, Atos Group 17,960 423.2 490.4 NGSC & ICM, ORION - Dell PowerEdge R730, Xeon E52680v3 12C 2.5GHz, Infiniband FDR, AMD FirePro S9150 Dell 16,800 198.8 903.0 University of Warsaw BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect IBM 16,384 189.0 209.7 Rank Site 49 269 155 380 418 Cyfronet, Krakow 10 Family of PL-Grid Projects coordinated by Cyfronet Real Users PL-Grid (2009–2012) Outcome: Common base infrastructure PLGrid PLUS (2011–2015) Assumed Performance Outcome: Focus on users (training, helpdesk…) Domain specific solutions: 13 PLGrid NG (2014–2015) +1500 Tflops +500 Tflops Outcome: Optimization of resources usage, training +8 Tflops Extension of domain specific by 14 PLGrid CORE (2014–2015) Outcome: Competence Center Open Science paradigm (large workflow app., data farming mass comp., ……) End-user services 230 Tflops Summary of Projects Results (up-to-date) Close collaboration between Partners and research communities Development of tools, environments and middleware services, Clouds Integration, HPC, Data intensive, Instruments Development of 27 domain specific solutions Development of IT PL-Grid Infrastructure and ecosystem 12 13 Summary of Projects Results (up-to-date) 26 papers on PL-Grid Project results Facilitation of community participation in international collaboration EGI Council, EGI Executive Board FP7 (VPH-Share, VirtROLL….) EGI-InSPIRE, FedSM, … EGI-Engage, Indico DataCloud, EPOS, CTA, PRACE, H2020…. Publications 700 600 600 500 36 papers on PLGrid Plus Project results 147 authors, 76 reviewers 436 400 300 192 200 121 91 100 54 24 6 3 22 0 50 45 40 35 30 25 20 15 10 1 14 Journal Publications (subjective selection) Journal IF Journal IF Journal IF J.Chem.Theor.Phys.Appl. 5.31 J.Chem.Phys. 3,122 Macromolecules 5,927 Phys.Lett. B 6,019 J.Phys.Chem.Lett. 6,687 Astrophys.J.Lett. 5,602 J.High Energy Phys. 6,22 Phys.Chem.Chem.Phys. 4,638 Phys.Rev.Letters 7,728 Astonomy &Astrophys. 4,479 Fuel Processing Techn. 3,019 J.Chem.Theor.Appl. 5,31 Inorganic Chem. 4,794 J.Magn. & Magn. Mat. 2,002 Astrophys.J 6,28 J.Org.Chem. 4,638 Eur.J.Inorg.Chem. 2,965 Chem.Physics 2,028 Optic Lett. 3,179 Chem.Phys.Lett. 1,991 Molec.Pharmaceutics 4,787 Appl.Phys.Lett. 3.515 Phys.Rev.B 3,664 Eur.J.Pharmacology 2,684 J.Comput.Chem. 3,601 Eur.Phys.J. 2,421 Energy 4,159 J.Phys.Chem. B 3,377 Future Gen.Comp.Syst. 2,639 Carbon 6,16 Soft Matter 4,151 J.Phys.Chem. C 4,835 J.Biogeography 4,969 Int.J.Hydrogen Energy 2,93 Crystal Growth & Desing 4,558 Electrochem.Comm. 4,287 Physica B 1,133 J.Magn.&Magn.Mat. 1,892 Conferences: • Cracow Grid Workshop (since 2001) • KU KDM (since 2008) 15 Summary of Projects Results (up-to-date) 200 180 160 # users’ grants (active) 140 120 100 80 60 40 20 0 6000 Jan/13 Mar/13 May/13 Jul/13 Sep/13 Nov/13 Jan/14 Mar/14 May/14 Jul/14 Sep/14 Nov/14 Jan/15 Mar/15 May/15 Jul/15 Sep/15 ##users users 5000 4000 ALL USERS REGISTERED 3000 2000 INFRASTRUCTURE USERS 1000 EMPLOYEES 0 Summary of Projects Results (up-to-date) Examples of active grants PROTMD (18.9.2015-18.9.2016) – Cyfronet Research on proteins using MD 25 mln hours (2,800 cores) PCJ2015GA (26.8.2015-31.12.2015) – ICM Research on connectome of nematodes using GA 15 mln hours (6,000 cores) PSB (1.3.2015-1.3.2016) – TASK, Cyfronet, ICMM, WCSS New characteristics of DNA in the context of tumor therapy 11 mln hours (1,200 cores) 16 Summary of Projects Results (up-to-date) 17 18 Deployed PLGrid IT Platforms and Tools – selected examples (by Cyfronet) GridSpace A platform for e-Science applications Experiment: an e-science application composed of code fragments (snippets), expressed in either general-purpose scripting programming languages, domain-specific languages or purpose-specific notations. Each snippet is evaluated by a corresponding interpreter. GridSpace2 Experiment Workbench: a web application - an entry point to GridSpace2. It facilitates exploratory development, execution and management of e-science experiments. Embedded Experiment: a published experiment embedded in a web site. GridSpace2 Core: a Java library providing an API for development, storage, management and execution of experiments. Records all available interpreters and their installations on the underlying computational resources. Computational Resources: servers, clusters, grids, clouds and e-infrastructures where the experiments are computed. Contact: E. Ciepiela, D. Harężlak, M. Bubak 19 20 InSilicoLab science gateway framework Goals Complex computations done in non-complex way Separating users from the concept of jobs and the infrastructure Modelling the computation scenarios in an intuitive way Different granularity of the computations Interactive nature of applications Dependencies between applications Summary The framework proved to be an easy way to integrate new domain-specific scenarios Architecture of the InSilicoLab framework: Domain Layer Mediation Layer with its Core Services Resource Layer with different kinds of workers Even if done by external teams Natively supports multiple types of computational resources Including private resources – e.g. private clouds Different kinds of users different kinds of resources Supports various types of computations Contact: J. Kocot, M. Sterzel, T. Szepieniec 21 DataNet collaborative metadata management Objectives Provide means for ad-hoc metadata model creation and deployment of corresponding storage facilities Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place Support different types of storage sites and data transfer protocols Support the exploratory paradigm by making the models evolve together with data Architecture Web Interface is used by users to create, extend and discover metadata models Model repositories are deployed in the PaaS Cloud layer for scalable and reliable access from computing nodes through REST interfaces Data items from Storage Sites are linked from the model repositories Contact: E. Ciepiela, D. Harężlak, M. Bubak 22 Onedata transparent access to data A system that provides a unified and efficient access to data stored in organizationally distributed environments. Provides a uniform and coherent view on all data stored on the storage systems distributed across the infrastructure Supports working in groups by creation of an easy-to-use shared workspace for each group. Serves data efficiently Contact: Ł. Dutka Onedata Global Registry 23 Scalarm data farming experiments Scalarm overview Self-scalable platform for parametric studies What problems are addressed with Scalarm ? Data farming experiments with an exploratory approach Adapting to experiment size and simulation type Parameter space generation with support of design of experiment methods Accessing heterogeneous computational infrastructure 75% all submitted tasks Exploratory approach for conducting experiments Supporting online analysis of experiment partial results Integrates with clusters, Grids, Clouds Contact: R. Słota Self-scalability of the management/execution parts Scalarm Graphical User Interface 24 Rimrock access to resources A service which simplifies the management of processes and tasks executed in the PLGrid infrastructure. Rimrock architecture Rimrock features simplicity – non-complicated integration with other applications, scripts and services interactivity – a user can modify working processes based on indirect results universalism – supported by many programming languages versatility – it allows to execute an application in a batch mode or start an interactive application user friendliness – it does not require advanced knowledge (basic information about Bash shell and curl command are sufficient to start using it) Contact: D. Harężlak 25 Cloud Computing The Cloud increases elasticity of research, as scientists can tune the virtual machines to their specific needs. The catalogue of VMs offered by PL-Grid contains many OSs. Cloud platform is also the best and in many cases the only solution for running jobs with legacy software packages. Open Nebula migration to Open Stack, …. Cloud Platform for VPH-Share applications (Atmoshere env.) IaaS, PaaS, STaaS…. Contact: J. Meizner, T. Szepieniec, M. Radecki Cloud environment for VPH-Share app. Portal and Atmosphere 26 Applications Catalog service Objective: to present in one place and in a uniform manner the current offer of the software available in the PLGrid infrastructure, broken down into supercomputing centers, clusters as well as categories and areas of application. Applications Catalog is a system collecting and providing information on the applications, development tools and libraries offered in the PLGrid infrastructure. It allows to search for applications, check the status of their operation, obtain information about changes and updates, as well as it provides documentation and examples of usage. It is designed for all those interested in the use of the applications available in the PLGrid infrastructure. 27 Map-Reduce service Apache Spark 1.5.0 functionality: API, RDD, DataFrame, SQL Backend Execution: DataFrame and SQL Integrations: Data Sources, Hive, Hadoop, Mesos and Cluster Management R Language Machine Learning and Advanced Analytics Spark Streaming 28 29 Summary and Conclusions Three dimensions of development: HPC/GRID/CLOUDs Data & Knowledge layer Network & Future Internet Deployments have the national scope; however with close European links Development oriented on end-users & research projects Achieving synergy between research projects and e-infrastructures by close cooperation and offering relevant services Durability at least 5 years after finishing the projects - confirmed in contracts Future plans: continuation of development Center of Excellence CGW, KUKDM as places to exchange experience and for collaboration between eScience centers in Europe 30 More information Please visit our Web pages: http://www.plgrid.pl/en http://www.plgrid.pl CREDITS! 31 Credits ICM ACC Cyfronet AGH Michał Turała Marian Bubak Krzysztof Zieliński Karol Krawentek Agnieszka Szymańska Maciej Twardy Angelika Zaleska-Walterbach Andrzej Oziębło Zofia Mosurska Marcin Radecki Renata Słota Tomasz Gubała Darin Nikolow Aleksandra Pałuk Patryk Lasoń Marek Magryś Łukasz Flis Special thanks to many domain experts ! … and many others….. Marek Niezgódka Piotr Bała Maciej Filocha PCSS Maciej Stroiński Norbert Meyer Krzysztof Kurowski Tomasz Piontek Paweł Wolniewicz WCSS Jacek Oko Józef Janyszek Mateusz Tykierko Paweł Dziekoński Bartłomiej Balcerek TASK Rafał Tylman Mścislaw Nakonieczny Jarosław Rybicki