DISTRIBUTED MULTIPROCESSOR ENVIRONMENTS Timothy Rolfe Computer Science Department Eastern Washington University 202 Computer Sciences Building Cheney, WA 99004-2412 (509) 358-2065 Timothy.Rolfe@mail.ewu.edu http://penguin.ewu.edu/~trolfe/ The slide set from the presentation at CCSC-NW 2002 is available through this link ABSTRACT This paper discusses the material included in a seminar for seniors and graduate students on distributed parallel processing offered as an experimental course at Eastern Washington University in the Fall 2000 and Winter 2002 quarters, along with the difficulties discovered as the courses progressed. INTRODUCTION Since the 1991 public release of PVM (Parallel Virtual Machine) [1] the teaching of parallel processing has been an option, even in schools with a limited budget, provided that the educational institution supports Unix or a Unix variant. [2] Recent developments in PC workstations and freeware operating systems like Linux have greatly facilitated offering such courses, as has the availability of freeware implementations of the more recent MPI (Message Passing Interface) such as MPICH and LAM/MPI. This paper will present the resources gathered together for a four-credit seminar course on parallel processing offered at Eastern Washington University in the Fall 2000 and Winter 2002 quarters, in the hopes that they may be of use to others developing similar courses. The Fall 2000 class had one undergraduate senior, five registered graduate students, and one graduate student auditor; the Winter 2002 class had two graduate and five undergraduate students. The computers used all had x86/Pentium processors running under the Linux operating system. The author (while at Dakota State University in Madison, SD) has also used similar machines under the FreeBSD operating system as well as a Digital Equipment Corporation Alpha computer running under Ultrix. Some of the EWU computers were dual-processor machines, making speed-ups under a simple fork-based parallelism possible at the beginning of the term before any of the message-passing systems were covered. (They could also have allowed exploration of threads under Java or POSIX, though that was not done.) The programming language used was C, with C++ elements added as convenient. The Fall 2000 course used the following message-passing environments for distributed parallel processing: PVM (version 3.4.3) and MPI as MPICH (version 1.2.1). The Winter 2002 course used MPI as LAM/MPI (version 6.4-a3) and PVM (version 3.4.4). The instructor had hoped to include some explicit socket programming to show message passing under direct Page 1 Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 2 programmer control. Due, however, to his own inexperience, he was not able to develop that component of the course within the time available. Highlighting the difficulties of socket programming, a highly competent graduate student developed a socket example program for demonstration in the Winter 2002 course, but the application failed to port from his computer’s version of the Linux operating system to Linux as installed on the course computers. The PVM Users’ Guide is available on-line in HTML format [3], as well as PostScript format, [4] while there is no comparable free source for the MPI Users’ Guide [5] (at least that the author is aware of). Consequently, the MPI Users’ Guide was assigned as the required text for the Fall 2000 course. Towards the end of that quarter the instructor discovered Peter S. Pacheco’s book Parallel Programming with MPI [6], and that was used as the text for the Winter 2002 course. Now another book has come to light that looks even more attractive: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, by Barry Wilkinson and Michael Allen. [7] PARALLEL PROCESSING BACKGROUND Since the students could not be expected to have a background in parallel computing, the course began with an overview of parallel programming based on journal and web articles. Michael Flynn’s original papers proposing the SISD/SIMD/MISD/MIMD taxonomy for high-speed computers. [8] A very useful survey of parallel computing from the [IEEE] Computer magazine. [9] A more recent survey of parallel computing (1997) discovered on the World-Wide Web. [10] Several papers on Beowulf Clusters —commodity off-the-shelf computers running a Unix variant and networked for distributed parallel processing. One of these is from an internal NASA publication, made available by one of the graduate students who had worked the previous summer within NASA. [11] Among the articles referenced is one from LinuxWorld that makes mention of the “Stone Souper Computer” — a name conflating the folk tale of “stone soup” with the idea of cooperatively assembling computing resources to generate a powerful parallel computational tool. “Queens on a Chessboard: Making the Best of a Bad Situation”, [12] the instructor’s own paper that includes a discussion of parallel processing based on the Unix fork, shared memory processing on a Silicon Graphic multiprocessor, and distributed processing based on message passing under PVM. Examples of massively distributed and extremely loosely coupled processing can be found in the on-going SETI-at-home project, as well as analogous projects for prime number discovery and configurational studies of potential cancer medicines. [13] As the quarter progressed, further web articles were encountered on “grid computing” and passed along to the students. PROGRAMMING ENVIRONMENTS The initial programming environment for the Fall 2000 course was the departmental Linux computer lab, in which several computers have dual processors. These machines run with “rsh” disabled, requiring access through “ssh.” This restriction caused some problems as the quarter progressed. One of the class members noticed some surplus 486 computers stacked to one side Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 3 of a lab awaiting disposal, and suggested that the class build it’s own “Stone Souper Computer.” Thus was launched the “Boat Anchor Armada” [14]: five 486-66 computers on a local network running under Linux, one in which it was possible to allow use of “rsh” without security problems. The generation of this system was addressed in a paper presented at the CCSC-NW conference in 2001 by Stuart Steiner, one of the graduate students in the class. [15] Thanks to a grant from the Washington State Higher Education Coordinating Board for the Eastern Washington University Center for Distributed Computing Studies, four high-speed dual processor computers were added to the EWU Armada (and nicknamed the “hydroplanes”) along with a somewhat slower administrative computer to provide an external connection — the administrative machine is the only one with access to the Internet. This amplified network was used for the Winter 2002 course. In the Fall 2000 course, the students generated their own message-passing environments by installing first PVM and then MPICH in their own accounts. For the Winter 2002 course the LAM/MPI environment was already installed on the EWU Armada computers. The environment definitions for PVM, however, were removed and students installed their own copies of PVM in their own accounts. This allowed each student to have the experience of bringing up a messagepassing environment as a totally unprivileged user. FIRST ENVIRONMENT: UNIX FORK The simplest parallel programming is for what are called “embarrassingly parallel problems” [16] (requiring minimal interprocess communications) done on a computer with more than one processor. On such a system, one may use the simple Unix “fork” to generate multiple copies of the same program, all of them sharing the files open at the time of the “fork” and usable as a communications channel. The actual parallel processing is then handled by the operating system’s assignment of processes to available processors. A paper on the “NOW Sort” [17] suggested to the instructor a particularly simple problem that might be used as a teaching example — one so simple that the bulk of the code developed would actually be related to the parallel processing rather than the problem solution. The NOW sort partitions the data to be sorted into k segments (where, for all j from 1 to k–1, all data found in segment j are less than any data found in segment j+1), after which those segments are sorted in parallel. This suggests a preliminary problem: determination of the sizes of the partitions, a problem that amounts to determining the values for a histogram. In the context of a fork-based approach, the data are provided to the child process or processes by the fork itself. The loop logic generating child processes and the values returned by the fork function itself allow each process to determine which instance it is, and thus which array segment it is responsible for. The child process or processes send their data back by means of a shared binary output file in the /tmp directory (to avoid network overhead, a local disk is preferable to a shared NFS disk). Once all child processes have terminated, the original parent moves to the front of that file, and from it accumulates the sums for the k segments in the histogram. For the Winter 2002 course, Pacheco’s Chapter 4 example of numerical integration [18] suggested to the instructor another embarrassingly parallel problem to exemplify fork-based parallelism with a shared file as the data communication medium: something nicknamed the “world’s worst way” to calculate the natural logarithm of N, namely the Monte Carlo integration of the function “f(t) = 1/t” from 1 to N — and this program quite naturally leads to the “world’s Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 4 worst way” to calculate π, namely a Monte Carlo run counting the number of (x,y) pairs in a uniform random distribution in the range (0..1, 0..1) that meet the constraint “x2 + y2 < 1.” For both of these, if the child processes take steps to insure using different seeds for the random number generator, their Monte Carlo runs are presumed to be independent of the each other’s and the parent’s run for the purposes of this class. Data flow is extremely minimal, since all that is required is the communication of the number of (x,y) points falling under the curve during the integration. (Wilkinson and Allen provide a discussion of parallel random number generation to guarantee independent sequences of numbers. [19]) If there is any interest, these simple examples are available on the World-Wide Web. [20] That same link also provides an example of realistic use of fork-based parallelism: a program to characterize two algorithms for one-time balancing of Binary Search Trees as compared with AVL trees. It samples varying sizes, and uses a fork to have two processors simultaneously generating search trees and accumulating statistics, each for a different tree size. The shared output file then accumulates these results. MESSAGE-PASSING ENVIRONMENTS (PVM [21] AND MPI [22]) In the Fall 2000 course, the first PVM program covered was the analog to the standard “Hello, World” program — the master program starts the slave programs under PVM, sends each one a message, and receives back a message from each one. It does, however, show all the PVM components needed to do significant parallel programming. This was also used as the first PVM program in the Winter 2002 course, following the consideration of MPI. In the Winter 2002 course, Pacheco’s book provided the first specimen MPI programs, which also show the MPI components needed to do significant parallel programming. The histogram program developed earlier under “fork” was brought into the messagepassing environments. This allowed modeling the passing of arrays as messages since the master/root process needs to send the data segments for processing to the other processes, and they need to return the frequency-count arrays for their portions of the data array. The natural development under PVM is as a pair of programs running in MIMD mode, while under MPI the natural development is as a single program running in SPMD mode. Of course, an environment allowing MIMD applications necessarily supports SPMD applications, and so the histogram program was also developed as an SPMD application under PVM. It also provided a means of exemplifying the use of “groups” under PVM to approximate the environment provided by the MPI “communicator.” These various parallel implementations of the histogram calculation are available on the World-Wide Web. [23] To further exemplify cooperating processes, the instructor programmed an implementation of the classical “Bakery algorithm” in the Winter 2002 course. While the MPI version was initially developed as an SPMD application, it was then transformed into a MIMD application since the LAM/MPI environment provides the “application schema” as a way of programming in that fashion. As implemented, the application has four categories of cooperating programs: the user interface (notifying the number server and clerk processes of the first number to be dispensed and served, and then passing along pastry information to the customer processes), the number server process, the clerk process, and then the various customer processes. For each purchase, a customer process receives a number from the number server, and then uses that number in its interaction with the clerk process to obtain its pastry. Since the MPI version has a static number of processes, the MPI application was developed with a fixed number of customer processes — one could say, a limited lobby area in which the Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 5 customers can wait. PVM, with its dynamic creation of processes, does not have this restriction, and customer processes are immediately created for each pastry request. These programs are also available on the World-Wide Web. [24] STUDENT PARALLEL PROGRAMMING EXERCISE The problem assigned to the students for their own programming was generated by taking Dijkstra’s railroad problem of two-way travel on a single track (transformed by Tanenbaum into baboons crossing a rope, [25] but transformed back to a railroad problem) and formulating it for solution based on message passing rather than semaphores. The architecture suggested to the students for the solution had three types of cooperating processes: The user interface, which under PVM acts as master process spawning the slave processes. The user interface also determines passage of time (since a user “synchronization” command denotes the end of each time unit). A single track controller that communicates with the user interface (receiving commands and returning information) and with the individual train processes (receiving track entry requests and track exit notifications, and sending track entry permission). A variable number of train processes, activated by the user interface and communicating with the track controller to cross the guarded track segment. Once the train enters that segment, it requires three time units to clear it. The train process is finished once it has cleared the track segment and has notified the track controller. In the PVM environment, the process is created at need, and can simply terminate. In the MPI environment, however, the number of train processes is fixed — one is tempted to call them the “switch engine” processes. In that case, the train process notifies the user interface that it has completed its current assignment and that it is available for the next train request. Some students in the Winter 2002 course chose difference allocations of process responsibilities. For instance, one chose to consolidate the user interface and the track controller functionalities into one process, and to have a second cooperating process handle all of the train interactions. The instructor’s implementations in both environments are available on the World-Wide Web [26]. The MPI version is specific to the LAM/MPI environment since it takes advantage of the “application schema” to develop a MIMD application. For the MPICH environment, there is code available to use a small driver to start up the several processes, with the minor changes to make the three processes callable from that driver. SELF-CHOSEN STUDENT PROJECTS To finish the quarter, the students formed programming teams to develop applications of their own choosing as parallel applications, within the message-passing environment of their own selection. In the Fall 2000 course both a three-person team and a two-person team chose to do the graphical problem of ray tracing in parallel, while the remaining single person chose to begin the development of an interactive combat simulation. The two ray-tracing teams chose to develop their programs under PVM, while the combat simulation was based on explicit socket communications among the cooperating processes. A mandated intermediate design presentation did provide some useful cross-fertilization in the two ray-tracing projects. Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 6 The Winter 2002 student projects were all lost when the Armada system administrator accidentally destroyed all user data from the Winter-quarter /home directory. It is only by chance that the instructor’s own files were copied to a different location before the disk partitioning that destroyed the earlier information. SECURITY CONFLICT PROBLEM: RSH (REMOTE SHELL) The PVM and MPI message passing systems accomplish their tasks in part by remotely initiating processes on other computers. The classical Unix command for this is “rsh” — and that is a major security hole within Unix; consequently it is commonly disabled on computers connected to the Internet. (Since the Armada has computers communicating only on a local network, there was no problem using “rsh” under that environment.) While an alternative is available in “ssh” (secure shell), in 2000 that utility was not initially implemented on the available computers. Even when “ssh” became available on the Internet-accessible computers used for the course, we never have found a way of avoiding the requirement for a password when issuing an “ssh” command — the documentation available from several sources does not in fact work as indicated on the Linux lab machines as they are currently configured. There is, however, a work-around for PVM, allowing use of those machines for PVM applications. The PVM system is based on daemons running on each computer within the virtual machine. Though these are typically started by “rsh” or “ssh” commands from the computer starting the virtual machine, it is possible to use a maintenance option for starting the virtual machine: the start option of “manual start,” whereby the user expressly starts each of the daemons in the virtual machine based on a command string provided by the initial daemon. MPI, at least under MPICH, did have a significant problem. MPICH initializes the cooperating processes (without any intervening daemons) by issuing “rsh” or “ssh” commands to the computers in use. Consequently the Armada can be used easily (since “rsh” is available there), but the departmental networked computers need to be accessed through “ssh.” As mentioned above, these connections require that the user provide passwords for each of the “ssh” commands issued. A significant side effect of this is that C’s stdin and C++’s cin are not available as communication channels, even for instance zero of the processes running under MPICH. This was confirmed by developing a program attempting keyboard input that runs without any problem in the Armada environment, but fails on the departmental networked computers. RECOMMENDATION The rsh/ssh problem makes it highly desirable to teach the course using a private network within which rsh is available. Those wishing to develop such a system may find Stuart Steiner’s paper useful. [15] In addition, there is an entire book available on constructing a Beowulf cluster, which can be used to supplement Stuart Steiner’s comments. [27] Such a cluster can be constructed with surplus computers (as was the original Boat Anchor Armada) and still provide a useable instructional environment. Parallel applications do not need to be computationally intensive to allow students to learn how to generate them. Further, the private network can provide a useful debugging environment. Once the parallel application has been debugged, it can be moved to publicly networked computers using one of the ssh work-arounds (either the “manual start” option under PVM, or avoiding dependence on keyboard input under MPI). Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 7 Programs developed by the instructor for this course are available through the author’s web site at Eastern Washington University. [28] NOTES Web copies of this paper, with hyperlinks for all URLs, are available: MS Word: http://penguin.ewu.edu/~trolfe/CCSC2002/Distrib.doc HTML: http://penguin.ewu.edu/~trolfe/CCSC2002/Distrib.html RTF: http://penguin.ewu.edu/~trolfe/CCSC2002/Distrib.rtf [1] Al Geist and others, PVM: Parallel Virtual Machine — a Users’ Guide and Tutorial for Networked Parallel Computing (MIT Press, 1994), p. xiv. [2] Timothy Rolfe, “PVM: an affordable parallel processing environment,” SCCS Proceedings: 27th Annual Small College Computing Symposium (SCCS, 1994), pp. 118-125. Available through http://penguin.ewu.edu/~trolfe/SCCS-94/SCCS-94.html [3] http://www.netlib.org/pvm3/book/pvm-book.html [4] http://www.netlib.org/pvm3/book/pvm-book.ps [5] William Gropp, Ewing Lusk, and Anthony Skjellum, Using MPI — Portable Parallel Programming with the Message-Passing Interface (2nd edition; MIT Press, 1999). [6] Peter S. Pacheco, Parallel Programming with MPI (Morgan Kaufmann Publishers, Inc., 1997). It is discussed in http://fawlty.cs.usfca.edu/mpi/ — and is an extensive revision and expansion of A User’s Guide to MPI, available through ftp://math.usfca.edu/pub/MPI/mpi.guide.ps.Z [7] Barry Wilkinson and Michael Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (Prentice-Hall, Inc., 1999). [8] Michael J. Flynn, “Very High-Speed Computing Systems,” Proceedings of the IEEE, Vol. 54, No. 12 (December 1966), pp. 1901-09. Michael J. Flynn, “Some Computer Organizations and Their Effectiveness”, IEEE Transactions on Computers, Vol. C-21, No. 9 (Sep 1972), pp. 948-960. [9] Ralph Duncan, “A Survey of Parallel Computer Architectures”, [IEEE] Computer, Vol. 23, No. 2 (Feb 1990), pp. 5-16. [10] Thuy Trong Le and Tri Cao Huu, “Advances in Parallel Computing For the Year 2000 and Beyond,” available at http://www.vacets.org/vtic97/ttle.htm — with HTML running title “A Survey of Parallel Computing: From the Past to the Future” [11] Thomas Sterling, Donald Becker, Daniel Savarese, et al., “BEOWULF: A Parallel Workstation for Scientific Computation,” Proceedings of the 1995 International Conference on Parallel Processing, Vol. 1 (August 1995), pp. 11-14. Available in PostScript at http://www.beowulf.org/papers/ICPP95/icpp95.ps — the HTML version of Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 8 the paper at the same location appears unable to access the .gif files for its figures. Jarrett Cohen, "Beowulf Lives On — As a Build-It-Yourself Computer", [NASA] InSights, November 1998, pp. 2-9. Rick Cook, "Fast and Cheap", LinuxWorld, April 2000 — http://www.linuxworld.com/linuxworld/lw-2000-04/lw-04-parallel_p.html The principle web site for the Beowulf Project is at http://www.beowulf.org/, and many more resources are available through that site. [12] Timothy Rolfe, “Queens on a Chessboard: Making the Best of a Bad Situation”, SCCS: Proceedings of the 28th Annual Small College Computing Symposium (SCCS, 1995), pp. 201-10. Available through http://penguin.ewu.edu/~trolfe/SCCS-95/SCCS-95.html [13] SETI-at-home: articles: http://www.computer.org/cise/articles/seti.htm http://www.discovery.com/news/features/setiathome/setiathome.html home page: http://setiathome.ssl.berkeley.edu/ GIMPS (prime number search): articles: http://www.utm.edu/research/primes/mersenne/index.html http://www.utm.edu/research/primes/notes/13466917/ home page: http://www.mersenne.org/ Cancer drug configurational studies: articles: http://www.the-scientist.com/yr2001/may/hand_p1_010514.html http://more.abcnews.go.com/sections/scitech/DailyNews/screensaver010524.html home page: http://www.chem.ox.ac.uk/curecancer.html [14] The instructor referred to the 486-66 computers as “boat anchors”; when the network was generated, Prof. Steve Simmons suggested extending the nautical reference by using “armada” as the private network name. [15] Stuart Steiner, “Building and Installing a Beowulf Cluster,” The Journal of Computing in Small Colleges [Proceedings of the Third Annual CCSC Northwestern Conference], Vol. 17, No. 2 (December 2001), pp. 75-83. [16] A complete chapter is devoted to “embarrassing parallel computations” in Wilkinson and Allen, op.cit., pp. 82-106. [17] Home page for NOW Sort: http://now.cs.berkeley.edu/NowSort/ Conference presentation at SIGMOD ’97 (Tucson, Arizona, May, 1997): Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein, and David A. Patterson, “High-Performance Sorting on Networks of Workstations”, http://now.cs.berkeley.edu/NowSort/nowSort.ps (PostScript). [18] Pacheco, op.cit., pp. 53 ff. [19] Wilkinson and Allen, op.cit., pp. 99-100. [20] http://penguin.ewu.edu/~trolfe/CCSC2002/ForkBased/index.html Printed 07/Mar/2016 at 07:46 “Distributed Multiprocessor Environment”, T. J. Rolfe Page 9 [21] The main web page for PVM is at http://www.epm.ornl.gov/pvm/. The current version is available through http://www.netlib.org/pvm3/index.html [22] Information on MPI itself is available through http://www-unix.mcs.anl.gov/mpi/. Information on MPICH is available through ftp://ftp.mcs.anl.gov/pub/mpi. Information on LAM/MPI is available through http://www.mpi.nd.edu/lam/. MPI — The Complete Reference is available in HTML format through http://www.netlib.org/utk/papers/mpi-book/mpi-book.html, and can also be downloaded in PostScript format through http://www.netlib.org/utk/papers/mpi-book/mpi-book.ps. The Joint Institute for Computational Science at the University of Tennessee (Knoxville) has a very useful “Beginners Guide to MPI” in 22 pages. The PostScript version is available at http://www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.ps, while the entry point for an HTML version is at http://www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.html (though there seem to be some problems with its figure files). [23] http://penguin.ewu.edu/~trolfe/CCSC2002/Histogram.html [24] http://penguin.ewu.edu/~trolfe/CCSC2002/Bakery.html [25] Andrew S. Tanenbaum, Modern Operating Systems (Prentice-Hall Inc., 1992), p. 264. [26] http://penguin.ewu.edu/~trolfe/CCSC2002/Train.html [27] Thomas Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese, How to Build a Beowulf (MIT Press, 1999), 261 pp. [28] http://penguin.ewu.edu/~trolfe/CCSC2002/ Printed 07/Mar/2016 at 07:46