Leibniz Supercomputing Centre Garching/Munich March 16 Matthias Brehm HPC Group Leibniz Supercomputing Centre (LRZ) Bavarian Academy of Sciences and Humanities Computing Centre (~175 employees) for Munich Universities Regional Computing Centre for all Bavarian Universities All kinds of IT services and support Capacity computing, (Virtual) servers Capacity computing Backup and Archiving Centre (more than 7 petabytes, 5.5 billion files) Competence Centre (Networks, IT Management) National Supercomputing Centre Integrated into Gauss Centre for Supercomputing = (JSC, HLRS + LRZ) • Legal entity for acting in Europe High End System (62 TF, 9726 cores) Linux Cluster (45 TF, 5000 cores) Grid Computing Active in DEISA and PRACE (1IP) • WP8 (WP9) Leadership: Future Technologies Current Procurement: Multi-PetaFlop System: End of 2011 • Contract in 2010 • General Purpose System (Intel or AMD based) of Thin and Fat Shared Memory Nodes • Doubling of Computer Cube, Cave & Visualization, new office space LRZ, High Performance Computing Group, Matthias Brehm, March 16 2 HPC research activities IT Management (Methods, Architectures, Tools) Service Management: Impact Analysis, Customer Service Mgmt, SLA Mgmt, Monitoring, Process Refinement Virtualization Operational Strategies for Petaflop Systems Grids Middleware (IGE, Initiative for Globus in Europe, Project Leader): services, coordination, provisioning Grid Monitoring (D-MON, Resources of gLite, Globus, Unicore) Security and Intrusion Detection, Meta-Scheduling, SLA Computational Science Munich Computational Sciences Centre (MCSC) & Munich Centre of Advanced Computing (MAC): TU Munich, Univ. Munich, Max-Planck-Society Garching New Programming Paradigms for Petaflop Systems Energy efficiency • • (Hot water) Cooling & Reuse (heating of buildings) Scheduling, sleep mode of idle procs etc. Automatic performance analysis and system-wide performance monitoring Network Technologies & Network Monitoring Long-Term Archiving Talks/Activities with Russia LSU Moscow: Coop. Competence Network of HPC & Bavarian Graduate School of Comp. Engin.: joint courses, applications in physics, climatology, quantum chemistry, drug design Steklov Inst. / State Univ. St. Petersburg: Joint Advanced Student School (JASS): Modelling and Simulation T-Platforms: Cooling technology, energy-efficiency LRZ, High Performance Computing Group, Matthias Brehm, March 16 3 Specific research ideas for collaboration Programming models and runtime support PGAS (partitioned global address space) – Coarray Fortran CAF (or UPC) • Re-implement an essential infrastructure library – e.g. ARPACK in CAF • sparse might be a good candidate for load balancing • Implement a microbenchmark set – measure QoImpl. vs. OpenMP / MPI – measure QoImpl. for message optimization (message aggregation etc.) • Investigate potential of interoperability between CAF and UPC, CAF and OpenMP, CAF and MPI – what is feasible? what isn‘t? – standards don‘t mention this anywhere (yet) • Develop Fortran class libraries for parallel patterns – presently the only „OO“ and simultaneously parallel language • User Training Scalable Visualisation Infrastructure • • Highly scalable visualisation service for HPC Remote visualization, virtualization – location-independent, instant, and cost-effective framework for the analysis of HPC simulation results – resource allocation, account management, data transfer and data compression, advance reservation and quality of service LRZ, High Performance Computing Group, Matthias Brehm, March 16 4 Specific research ideas for collaboration Energy Efficiency • Scheduling • Dynamic clock adjustment of CPU (and Memory) • Monitoring and Tuning of energy fluxes • Cooling technologies, energy reuse Performance analysis tools for HPC Automatic performance monitoring and analysis • System-wide background monitoring • Hardware performance counters, communication behaviour, I/O • Automatic bottleneck detection • (System) Monitoring – By Using Map-Reduce-Techniques Optimisation, scalability and porting of codes Scalable and dynamical mesh generation & load balancing • • More than parMetis Application areas: geophysics, cosmology, CFD, multi-physics LRZ, High Performance Computing Group, Matthias Brehm, March 16 5