IM&T Vacation Program Virtualisation and Hyper-Threading in Scientific Computing Benjamin Meyer HPC Clusters • • • A large set of connected computers Used for computation intensive workloads, rather than I/O orientated operations Each node runs own instance of OS http://www.redbooks.ibm.com/redbooks/pdfs/sg247287.pdf CSIRO’s Bragg Cluster • 128 compute nodes with 16 CPUs each • 2048 cores in total • 128GB of RAM per node • 384 Fermi Tesla M2050 GPUs • 172,032 streaming cores What is Virtualisation? Hypervisor • Software which allows different and multiple operating systems to run on the underlying hardware • Ensures all privileged operations are appropriately handled to maintain system integrity • Invisible to operating system • OS thinks it is running natively • VMware ESXi Hypervisor used for this project Benefits: Heterogeneous Clusters Benefits: Live Migration • Running jobs can be moved to other hardware • Allows dynamic scheduling • Preemptive failure/down time evasion Benefits • Checkpointing • Status of OS, application and memory are saved at intervals • Allows for easy failure recovery • Software debugging • Clean compute • Security • Run time/failure isolation • Clean start Performance Comparison Floating point operations per second High Performance LINPACK GFLOPS 270 260 250 240 Native - Max 230 Native - Average Native - Min 220 Virtualised 210 200 190 15000 25000 35000 45000 55000 65000 75000 Problem Size 85000 95000 105000 115000 Performance Comparison Updates to random memory locations per second Random Access GUPs 0.14 0.12 100% 0.1 0.08 Native Virtualised 0.06 100% 0.04 100% 0.02 0 87.1% 54.8% 4.43% MPI (message passing) Parallel Processes Single Process Performance Comparison MPI (message passing) latency uS MPI Latency 90 80 70 60 50 Native 40 Virtualised 30 20 10 0 0 10000 20000 30000 Message Size (bytes) 40000 50000 60000 Hyper-Threading Hyper-Threading Example non Hyper-Threaded Resource 1 Resource 2 Resource 3 Resource 4 Resource 1 Resource 2 Resource 3 Resource 4 Physical Cores Thread 1 Thread 2 Thread 1 Time Thread 2 Hyper-Threaded Logical Cores Physical (seen by Cores OS) Thread 1 Thread 1 Thread 2 Thread 2 Time Performance Comparison Floating point operations per second High Performance LINPACK GFLOPS 280 260 240 non HyperThreaded 220 Hyper-Threaded, 16 processes 200 Hyper-Threaded, 32 processes 180 160 140 120 0 20000 40000 60000 Problem Size 80000 100000 120000 Performance Comparison Updates to random memory locations per second Random Accesses to Memory GUPs 0.16 119.2% 0.14 0.12 100% 0.1 non Hyper-Threaded 0.08 Hyper-Threaded 0.06 0.04 100% 0.02 104.7% 0 MPI Random Access (message passing) Parallel Processes Random Access References • Tim Ho (2012, Nov.). CSIRO Advanced Scientific Computing User Manual [Online]. Available: https://wiki.csiro.au/display/ASC/ASC+Homepage • (2013, Jan.). Top500 HPC Statistics [Online]. Available: http://www.top500.org/statistics/overtime/ • (2012, Oct.). IBM Blue Gene #1 in Supercomputing [Online]. Available: http://www.03.ibm.com/systems/technicalcomputing/solutions/bluegene/index.html • (2013). Virtualize for Efficiency, Higher Availability and Lower Costs [Online]. Available: http://www.vmware.com/virtualization/virtualization-basics/virtualization-benefits.html • (2012). Tuning a Linux HPC Cluster: HPC Challenge [Online]. Available: http://www.ibm.com/support/publications/us/library/