HPC ISE-2 AY 23-24 Two questions each batch: B1: 1,8 B2: 2,7 B3: 3,6 B4: 4,5 B5: 2,8 B6: 3,7 Students must include snapshot of their system configuration in the document and must comment on the analysis based on their configuration. Document must include graphs along with tables of analysis. 1. Execute the all-to-all broadcast operation (Program 3.2.1.c) with varying message sizes. Plot the performance of the operation with varying message sizes from 1K to 10K (with constant number of processes, 8). Explain the performance observed. 2. Execute the all-reduce operation (Program 3.2.2.c) with varying number of processors (1 to 16) and fixed message size of 10K words. Plot the performance of the operation with varying number of processors (with constant message size). Explain the performance observed. 3. Consider an implementation of the gather operation given to you (Program 3.3.1.c) – in this implementation, each processor sends its message to processor 0, which gathers the message. Compare this code to the one that uses the MPI gather operation (Program 3.3.1b.c). Compare the performance for the fixed message size (1K words at each processor) case with varying number of processors (1 - 16). Which implementation is better? Why? 4. Run the scatter operation (Program 3.3.2.c) with varying message sizes (10K to 100K), with a fixed number of processors (8). Plot the runtime as a function of the message size. Explain the observed performance. 5. Consider an implementation of the all-to-all personalized operation given to you (Program 3.4.1.c) – in this implementation, each processor simply sends messages to all other processors. Plot the time for a fixed message size (8K words at each processor divided equally among all processors) with varying number of processors (1, 2, 4, 8). HPC ISE-2 AY 23-24 6. Run the all-to-all personalized operation (Program 3.4.2.c) with fixed message size (8K words at each processor) of with varying number of processors (1, 2, 4, and 8). Plot the runtime as a function of the number of processors. Compare the performance to that of Program 3.4.1.c above. Which implementation is better? Why? 7. Consider two implementations of one-to-all broadcast. The first implementation uses the MPI implementation (Program 3.5.1.c). The second implementation splits the message and executes the broadcast in two steps (Program 3.5.1b.c). Plot the runtime of the two implementations with varying number of processors (1, 2, 4, 8) with constant message size 100K. Explain the observed performance of the two implementations. 8. Execute the MPI program (Program 3.1.1.c) with varying sized broadcasts. Plot the performance of the broadcast with varying message sizes from 1K words to 100K words (with constant number of processes, 8). Explain the performance observed.