MoreCharmTauSlides

advertisement
More Charm++/TAU examples
Applications:
 NAMD
 Parallel Framework for Unstructured Meshing (ParFUM)
Features:
• Profile snapshots:
• Captures the runtime of the application by segregating it
into user specified intervals
•
CUDA Profiling
• Tracks time spent in CUDA kernel routines
• Shows scaling behavior for a experiment varying the
number of devices used.
Mean Exclusive Time
Standard Deviation
Load Balancing Phases
NAMD Snapshot Profile of over 800sec on 2048 processors
enqueneSelfB
enqueneSelfA
Main
enqueneWorkB
enqueneWorkA
Idle
NAMD CUDA events
~50%
efficiency
~100%
efficiency
Device #0
GPU efficiency gained by doubling the number of GPU from
16 to 32. These Events are broken down by routine and by
device number.
Scaling Efficiency
NAMD CUDA scaling
Non-Bonded
Calculations
Sum Forces
Calculations
Number of Devices
Scaling by event and device number, Non-Bonded
Calculations scale well. Sum Forces less well but the
overall time is only a few microseconds.
ParFUM CUDA speedup
250
200
150
Total time using
only a CPU
100
Total Time with
CUDA acceleration
Time spent in
CUDA Kernel
50
0
128x8x8 Mesh
Single CPU or GPU Performance on a 128x8x8 mesh. When run with
GPU acceleration enabled ParFUM spent 9 seconds in the CUDA Kernel
routines.
Download