Virtualisation and Hyper-Threading in Scientific Computing

advertisement
IM&T Vacation Program
Virtualisation and Hyper-Threading in Scientific Computing
Benjamin Meyer
HPC Clusters
•
•
•
A large set of connected computers
Used for computation intensive workloads, rather than I/O orientated operations
Each node runs own instance of OS
http://www.redbooks.ibm.com/redbooks/pdfs/sg247287.pdf
CSIRO’s Bragg Cluster
• 128 compute nodes with 16 CPUs each
• 2048 cores in total
• 128GB of RAM per node
• 384 Fermi Tesla M2050 GPUs
• 172,032 streaming cores
What is Virtualisation?
Hypervisor
• Software which allows different and multiple operating
systems to run on the underlying hardware
• Ensures all privileged operations are appropriately handled
to maintain system integrity
• Invisible to operating system
• OS thinks it is running natively
• VMware ESXi Hypervisor used for this project
Benefits: Heterogeneous Clusters
Benefits: Live Migration
•
Running jobs can be moved to other hardware
•
Allows dynamic scheduling
•
Preemptive failure/down time evasion
Benefits
• Checkpointing
• Status of OS, application and memory are saved at intervals
• Allows for easy failure recovery
• Software debugging
• Clean compute
• Security
• Run time/failure isolation
• Clean start
Performance Comparison
Floating point operations per second
High Performance LINPACK
GFLOPS
270
260
250
240
Native - Max
230
Native - Average
Native - Min
220
Virtualised
210
200
190
15000
25000
35000
45000
55000
65000
75000
Problem Size
85000
95000
105000
115000
Performance Comparison
Updates to random memory locations per second
Random Access
GUPs
0.14
0.12
100%
0.1
0.08
Native
Virtualised
0.06
100%
0.04
100%
0.02
0
87.1%
54.8%
4.43%
MPI (message passing)
Parallel Processes
Single Process
Performance Comparison
MPI (message passing) latency
uS
MPI Latency
90
80
70
60
50
Native
40
Virtualised
30
20
10
0
0
10000
20000
30000
Message Size (bytes)
40000
50000
60000
Hyper-Threading
Hyper-Threading Example
non Hyper-Threaded
Resource
1
Resource
2
Resource
3
Resource
4
Resource
1
Resource
2
Resource
3
Resource
4
Physical Cores
Thread 1
Thread 2
Thread 1
Time
Thread 2
Hyper-Threaded
Logical Cores
Physical
(seen by Cores
OS)
Thread 1
Thread 1
Thread 2
Thread 2
Time
Performance Comparison
Floating point operations per second
High Performance LINPACK
GFLOPS
280
260
240
non HyperThreaded
220
Hyper-Threaded, 16
processes
200
Hyper-Threaded, 32
processes
180
160
140
120
0
20000
40000
60000
Problem Size
80000
100000
120000
Performance Comparison
Updates to random memory locations per second
Random Accesses to Memory
GUPs
0.16
119.2%
0.14
0.12
100%
0.1
non Hyper-Threaded
0.08
Hyper-Threaded
0.06
0.04
100%
0.02
104.7%
0
MPI Random Access (message passing)
Parallel Processes Random Access
References
•
Tim Ho (2012, Nov.). CSIRO Advanced Scientific Computing User Manual [Online]. Available:
https://wiki.csiro.au/display/ASC/ASC+Homepage
•
(2013, Jan.). Top500 HPC Statistics [Online]. Available:
http://www.top500.org/statistics/overtime/
•
(2012, Oct.). IBM Blue Gene #1 in Supercomputing [Online]. Available:
http://www.03.ibm.com/systems/technicalcomputing/solutions/bluegene/index.html
•
(2013). Virtualize for Efficiency, Higher Availability and Lower Costs [Online]. Available:
http://www.vmware.com/virtualization/virtualization-basics/virtualization-benefits.html
•
(2012). Tuning a Linux HPC Cluster: HPC Challenge [Online]. Available:
http://www.ibm.com/support/publications/us/library/
Download