Notes

advertisement
Concurrent computing, lecture 2
Some performance issues
Previously we said that it is important that the
individual nodes must be fast (i.e., powerful).
What is the potential for increased computational speed
with cluster computing as opposed to a single computer?
Observe:
- a parallel program is composed of some number of
processes, say N.
- in a message passing computer, some time will have
to be spent in processes communicating with each other
tcomp = computation time
tcomm = communication time
fine granularity - each process executes a few
instructions before communicating
course granularity - each process executes many
instructions before communicating
The computation/communication ratio = tcomp/tcomm
is one measure of granularity.
One measure of the relative performance between a
multiprocessor system and a single processor system is
the speedup factor, S(n)
S(n) = execution time on one processor/execution time
on n processors
t1/tn
For example, suppose my application runs in one hour
on my PC.
Suppose that it runs in 10 minutes on my 8-node cluster.
Then I would say that the speedup is
S(8) = 60min/10min = 6.
Calculation can be done
- experimentally (using actual measurement)
- theoretically, by counting computation steps.
Use the best sequential algorithm/program for the single
processor case.
linear speedup: S(n) = n
superlinear speedup: S(n) > n
(can happen when the application
fits into cache on more processors)
Usually, S(n) < n because:
1) there are some periods of the problem that cannot
be parallelized and when one or only a few processors
are busy (e.g., setup time, input time)
2) extra computation may be needed in the parallel
version, for example to calculate local values
3) communication takes time that is not spent processing
Example parallel program: master/slave
time 
---- master reads setup data
----
---- output results
---- master gets results
---- bcast to slaves
---------------------------- /
--slaves compute---/ (send results to master)
^^^^^^^^^
^^^^^^^^^^^^^^^^ serial part
ts = time to do serial part
tp = time to do parallel part if it were done
by one process
Amdahl's Law helps us calculate the effect of the serial part
Let f = ts/(ts+tp) = fraction that is serial
S(n) = (ts + tp) / (ts + tp/ n) /* Amdahl's Law */
= (f + (1-f) ) / ( f + (1-f)/n )
Let n go to infinity
 1/f
==> high performance computing is only possible on
parallel computers where the individual nodes have
some level of "high performance" !!
For example, if 5% of the application is serial, then max S(n) = 20.
On n=10 processors, S(10) = 1/(.05+.95/10) = 6.9
Efficiency, E = execution time with one processor /
(execution time with n * number of processors)
E = t1/(tn*n) = S(n)/n*100%
In my example with a 1 hour application, E = 6/8 = 75% (not that great)
A system/algorithm is scalable if, when you increase the problem size,
you can increase the system size and still compute with "good"
efficiency.
Gustafson observed that the serial part may not change in size as the
problem gets bigger.
Rewrite S(n) = (f + n*(1-f)) /(f+(1-f))
= f + n*(1-f)
If 5% of the program is not parallelizable when run on a single
computer, then S(n) = .05 + n*(.95), for 10 processors
S(10) = 9.55 (a better prediction than Amdahl)
Try this on your own:
Suppose that I get a speedup of 8 when I run my application
on 10 processors. According to Amdahl's Law, what portion is
serial? What is the speedup on 20 processors? What is the
best speedup that I could hope for?
Suppose that 4% of my application is serial. What is my predicted
speedup according to Gustafson's Law on 5 processors?
How to Build A Beowulf Cluster
-
Some commodity cluster nodes
An OS
A network
A mechanism for passing messages between application processes
Dedicated cluster with a master node
Dedi cated Clu ster
User
Comp ute node s
Master no de
Up l ink
Switch
2nd Ethernet
int erface
Exter na l ne twork
What is in a commodity cluster node?
Basically, it is a computer!
- Processor
- Cache
- Memory
- Disk controller
- Disks
- Motherboard
- Bus
- Network Interface Controller (NIC)
– Gigabit Ethernet, Fast Ethernet
- Power Supply
You can build it from parts, or you can get it all in one package from Dell, Gateway, …
Software: Linux or Windows?
OS and software building blocks
Linux - a free, open source OS, developed as a project by Linus Torvalds,
has progress to a robust, multitasking operating system
We use the distribution by RedHat on our machines, but many exist.
A Linux kernel is really Linux,
A distribution includes the kernel plus windowing, tools for configuration,
Review of some basic OS concepts:
- process state diagram
- processes run, wait to be scheduled, block for I/O
- a scheduling discipline is used to schedule the next process
Scheduler/
Dispatch
Ready
Running
Time out/ Context switch
I/O
completion
I/O request
Wait
Process state diagram
processes can be created, suspended, killed
Try these in Linux:
ps
ps aux
vmstat
top
pstree
Unix/Linux fork() is used to create processes
/* Example of use of fork system call */
#include <stdio.h>
main()
{
int pid;
pid = fork();
if (pid < 0) { /* error occurred */
fprintf(stderr, "Fork failed!\n");
exit(-1);
}
else if (pid==0) { /* child process */
printf("I am the child, pid=%d\n", pid);
execlp("/bin/ps", "ps", NULL);
}
else { /* parent process */
printf("I am the parent, pid=%d\n", pid);
printf("Child complete!\n");
exit(0);
}
}
Quick network tutorial,
http://csce.uark.edu/~aapon/courses/concurrent/notes/shortnetworktutorial.ppt





Application
TCP
IP
Data link layer (Ethernet CSMA/CD, …)
Physical layer (Manchester encoding, …)
App: Uses MPI to solve my parallel programming problem.
MPI: Message Passing Interface - a messaging protocol,
sometimes built over TCP/IP, that understands things
like integers, endian conversion, and has special routines
for scientific programming.
TCP: Transmission Control Protocol - reliable, in-order,
connected point-to-point delivery of a data stream across
a network
A typical client:
open connection to an IP address
send/receive until done
close connection
Doesn't recognize data types, byte ordering of numeric
values, end of messages
IP: Internet Protocol - unreliable, connectionless (datagram)
delivery of packets
Every machine has an IP address, most have an IP name. A
Domain Name Server (DNS) can convert from one to the
other IP is a protocol to send IP packets from one IP
address to another. IP has routing protocol to be sure
the packets get to the right place.
data link layer: error free delivery of a frame on the same
network.
In Fast Ethernet, the maximum frame size is about 1500
bytes. The transmission rate is 100Mbps. Special networks
built for clusters may transmit at over 10Gbps.
physical layer: delivery of raw bits over a communication
channel
Download