MPI and C-Language Seminars 2010 Seminar Plan Week 1 – Introduction, Data Types, Control Flow, Pointers Week 2 – Arrays, Structures, Enums, I/O, Memory Week 3 – Compiler Options and Debugging Week 4 – MPI in C and Using the HPSG Cluster Week 5 – “How to Build a Performance Model” Week 6-9 – Coursework Troubleshooting (Seminar tutors available in their office) Performance Models Aim to predict the runtime of an application. Allow you to predict beyond grid size and unknown hardware. Gauge the scalability of the algorithm / code. Help analyse the parallel efficiency of code. See where bottle necks are. Both hardware and software. Factors of a Model Computation – Active Processing: The time spent doing actual work. More processors => less work per processor. Overall computation time should fall with increase in processors. Communication – Message Passing: Communication between processors. Overall I/O time will increase with processor count. More network contention means slower communication. Getting the Balance (1 / 2) Communication Vs Computation 1200 1000 Time (S) 800 600 Comm Time Comp Time Total Time 400 200 0 1 5 9 13 17 21 25 29 33 37 41 Number Of Processors 45 49 53 57 61 Getting the Balance (2 / 2) Fixed Costs: The work done by all processors. More processors will not reduce this time. Variable Costs: Portion of work which varies with processor count. Generally based on problem size decomposition. Timers Lots of different timers: CPU Time – Actual time spent of CPU. Wall Time – Total time since program start. Different timers have different overheads. Try and avoid timing timers. Recommend C timer – Need to call with 2 double pointers. double cpuStart, wallStart; Timers(&wallStart, &cpuStart); Where is the Expense? Need to establish what the expensive operations are: Functions which are called frequently. Functions which take a long time. Work out a percentage break down of total time. Is it communicational or computational expense? Is it a fixed cost or a variable cost? Computational Model (1 / 2) How will the number of processors effect the amount of work done by each processor. Will they all the same amount of work? Even decomposition. Are loops dependent on the problem size? Need to look at: How long operations take. How many times they are performed. Computational Model (2 / 2) A basic model: Time how long each different operation takes. Calculate how many times those operations are used. Add them all up. Inaccuracy: When using timers consider their overhead. Always more accurate to time the repetition of operations and divide through. Note: Communication will show in wall time. Communication Model (1 / 2) Many different types of communication: Send and receives. Collective operations. Barriers. Need to build a model of the network: Can use existing programs: PingPong / SKaMPI Or write your own. How much data is being sent? Communication Model (2 / 2) Communication times are based on packet size. There is an initial cost of a send – Handshake. Then a variable cost – Payload. Time (uSec) Send Time For Message Size 180 160 140 120 100 80 60 40 20 0 Ethernet 0 2048 4096 6144 8192 Message Size (Bytes) Where is the data being sent? Are the source and destination on the same node? Bringing it all Together What you need: Computation benchmark application. Communication benchmark application. Spreadsheet model. Run benchmarks on cluster and plug data into model. Make predictions for different processor configurations.