With RTAI, MPICH2, MPE, Jumpshot,
Sar and hopefully soon OProfile or VTune
Dawn Nelson
We already know that parallel programming makes
good use of multiple processors, but we don’t
always know if our programs are making the most
efficient use of MPI.
 Knowing how to use something doesn’t necessarily
mean you know how it works.
 There are many different ways to get data from
point A to point B.
 If you don’t know which way is better, try it, time
it, profile it.
 When adding in calls to the RTAI API, you can find
out if you are adding overhead or benefit.
 You can determine other bottlenecks or
 Within
your code
For overall timing, add gettimeofday() call in
your program at the beginning and end to get
overall time.
Use MPI_WTime() to time MPI calls.
Add MPE tags for user programmed events. MPE
already keeps track of send, receive, wait,
barrier and other MPI calls.
 Without
changing code
Use the system call sar (yum install sysstat).
Use an open source profiler such as OProfile.
Use a purchased profiler such as VTune.
tags in your code look like this:
#include <mpe.h>
int event,event1a, event1b;
MPE_Log_get_state_eventIDs( &event1a, &event1b );
…… //your code block here
MPE_Log_event(event1b, 0, "slave_compute_end");
//logging environment ends with MPI_Finalize();
//start of code section to profile
clog2 file is generated.
 Use Jumpshot to graphically display the file:
>jumpshot filename.clog2
Matrix Multiply, using 4 nodes, master-slave scenario
Matrix Multiply from the midterm with MPI_Barrier() call
 Real
Time computing is done at the highest
priority (SCHED_FIFO 99)
 Real Time computing is non-preemptible.
 Real Time computing blocks interrupts and
interference from the OS and from users.
 Reduces Jitter and Latency: processes finish
at the same time.
 For high compute to IO ratios, Real Time
improves overall processing speed.
Soft LXRT Real Time implementation.
Matrix Multiply without RTAI
 Jitter
is the dissimilar completion times of
MPI workers, as it effects the timing of the
overall process.
 Jitter is caused by interference of interrupts
by the operating system, other processes,
other users.
 By running a process 100 times and collecting
the overall time, jitter can be measured.
 Jitter was measured using an MPI application
with and without injected interference from
a cpu hog.
 Jitter was measured using an RTAI-MPI
application with and without injected
interference from a cpu hog.
shows chronologically when MPI calls are
done and completed but not the effects of
programs on the cpu, memory, interrupts,
swap, I/O, tcp
 The sysstat command sar will collect
information about the above listed items.
 Open source kSar java program will parse the
text file generated by sar and graph
RTAI lxrt-soft real time
Notice, big drop in amount
of interrupts while lxrt-soft
program is running
Finish testing the I/O hog program and implement it.
Current kSar graph data was obtained from my home
system and not the C107 cluster. The kSar program
was not working for importing a text file, as well as
not working remotely via ssh from home to cluster.
Implement OProfile or VTune for additional profiling
on master and one or more worker nodes.
Obtain sar and kSar data for master and one or more
worker nodes on cluster.
Obtain MPE data while using a cpu hog and an
interrupt hog.
Obtain sar and kSar data while using a cpu hog and an
interrupt hog.
Determine what the interrupt levels mean: i014 etc.
 Perform
system benchmarking
 Write the corresponding paper for this
 Determine the scalability of a real time and
parallel application. There is a little bit of
overhead using high priority and RTAI.