Leibniz Supercomputing Centre Garching/Munich

advertisement
Leibniz Supercomputing Centre
Garching/Munich
March 16
Matthias Brehm
HPC Group
Leibniz Supercomputing Centre (LRZ)
Bavarian Academy of Sciences and Humanities

Computing Centre (~175 employees) for Munich Universities



Regional Computing Centre for all Bavarian Universities




All kinds of IT services and support
Capacity computing, (Virtual) servers
Capacity computing
Backup and Archiving Centre (more than 7 petabytes, 5.5 billion files)
Competence Centre (Networks, IT Management)
National Supercomputing Centre






Integrated into Gauss Centre for Supercomputing = (JSC, HLRS + LRZ)
•
Legal entity for acting in Europe
High End System (62 TF, 9726 cores)
Linux Cluster (45 TF, 5000 cores)
Grid Computing
Active in DEISA and PRACE (1IP)
•
WP8 (WP9) Leadership: Future Technologies
Current Procurement: Multi-PetaFlop System: End of 2011
•
Contract in 2010
•
General Purpose System (Intel or AMD based) of Thin and Fat Shared Memory Nodes
•
Doubling of Computer Cube, Cave & Visualization, new office space
LRZ, High Performance Computing Group, Matthias Brehm, March 16
2
HPC research activities



IT Management (Methods, Architectures, Tools)
 Service Management: Impact Analysis, Customer Service Mgmt, SLA Mgmt, Monitoring, Process Refinement
 Virtualization
 Operational Strategies for Petaflop Systems
Grids
 Middleware (IGE, Initiative for Globus in Europe, Project Leader): services, coordination, provisioning
 Grid Monitoring (D-MON, Resources of gLite, Globus, Unicore)
 Security and Intrusion Detection, Meta-Scheduling, SLA
Computational Science

Munich Computational Sciences Centre (MCSC) & Munich Centre of Advanced Computing (MAC): TU Munich, Univ.
Munich, Max-Planck-Society Garching


New Programming Paradigms for Petaflop Systems
Energy efficiency
•
•
(Hot water) Cooling & Reuse (heating of buildings)
Scheduling, sleep mode of idle procs etc.



Automatic performance analysis and system-wide
performance monitoring
Network Technologies & Network Monitoring
Long-Term Archiving

Talks/Activities with Russia



LSU Moscow: Coop. Competence Network of HPC & Bavarian Graduate School of Comp. Engin.: joint courses, applications in
physics, climatology, quantum chemistry, drug design
Steklov Inst. / State Univ. St. Petersburg: Joint Advanced Student School (JASS): Modelling and Simulation
T-Platforms: Cooling technology, energy-efficiency
LRZ, High Performance Computing Group, Matthias Brehm, March 16
3
Specific research ideas for collaboration
 Programming models and runtime support
 PGAS (partitioned global address space) – Coarray Fortran CAF (or UPC)
• Re-implement an essential infrastructure library – e.g. ARPACK in CAF
• sparse might be a good candidate for load balancing
• Implement a microbenchmark set
– measure QoImpl. vs. OpenMP / MPI
– measure QoImpl. for message optimization (message aggregation etc.)
•
Investigate potential of interoperability between CAF and UPC, CAF and
OpenMP, CAF and MPI
– what is feasible? what isn‘t?
– standards don‘t mention this anywhere (yet)
•
Develop Fortran class libraries for parallel patterns
– presently the only „OO“ and simultaneously parallel language
•
User Training
 Scalable Visualisation Infrastructure
•
•
Highly scalable visualisation service for HPC
Remote visualization, virtualization
– location-independent, instant, and cost-effective framework for the analysis of HPC simulation results
– resource allocation, account management, data transfer and data compression, advance reservation and quality
of service
LRZ, High Performance Computing Group, Matthias Brehm, March 16
4
Specific research ideas for collaboration
 Energy Efficiency
• Scheduling
• Dynamic clock adjustment of CPU (and Memory)
• Monitoring and Tuning of energy fluxes
• Cooling technologies, energy reuse
 Performance analysis tools for HPC
 Automatic performance monitoring and analysis
• System-wide background monitoring
• Hardware performance counters, communication behaviour, I/O
• Automatic bottleneck detection
• (System) Monitoring
– By Using Map-Reduce-Techniques
 Optimisation, scalability and porting of codes
 Scalable and dynamical mesh generation & load balancing
•
•
More than parMetis
Application areas: geophysics, cosmology, CFD, multi-physics
LRZ, High Performance Computing Group, Matthias Brehm, March 16
5
Download