3600ProjectV2

advertisement
PROJECT LAYOUT
CSCE 3600 SYSTEMS PROGRAMMING
NOT TOO PARALLEL VIRTUAL MACHINE
1. INTRODUCTION
A PVM (Parallel Virtual Machine) is a software tool that allows multiple
heterogeneous computers to interconnect together to be used as a more
powerful parallel computational resource. Thus, large problems can be solved
more cost effectively.
PVM consists is a set of tools that consists of a run-time environment and library
for message-passing, task and resource management, and fault notification.
PVM enables the user to create a manual partition of the computation among all
the resources composed by the parallel virtual machine.
2. A LITTLE BACKGROUND
PVM allows grouping heterogeneous, interconnected machines to provide more
computation power and better parallelism by providing a cross-platform,
message-passing facility with higher-level services built on top. PVM creates the
illusion of a single big parallel resource that is transparent for the user. The main
basic unit for PVM is the task. Tasks can communicate with each other by
passing messages from one to another. The networking details are hidden to the
user. A task is equivalent to a UNIX process. Tasks that cooperate, either
through communication or synchronization, are organized into groups called
computations. PVM supports direct communication, broadcast and barriers within
a computation.
In PVM, the user is the responsible for the creation of the tasks and groups. All
the tasks within a group share the property that they can communicate with each
other. In addition, PVM has a dedicated task to handle output and user display.
PVM handles all the interaction among tasks and the user’s console with a
control daemon. To send input to a particular task, PVM sends the data to the
pvmd daemon on the destination host, which then forwards it to the appropriate
task. Similarly, a task produces output by sending a message to its pvmd, which,
in turn, forwards it to the console's pvmd and on to the application's output task.
If you are interested in more detailed information about PVM visit its web page:
http://www.csm.ornl.gov/pvm/
3. ORGANIZATION
This project, with minor modifications, comes from the textbook “Unix Systems
Programming --- Communication, Concurrency and Threads”, but Kay A.
Robbins and Steven Robbins, published in 1993.
comes from that text.
Much of this description
The project is divided in four parts. Each part will add new features to the
previous one. The final program will be a simplified PVM called the Not Too
Parallel Virtual Machine, NTPVM.
The project will consist of four major assignments; each assignment will
correspond to a part of the project. All the details of the implementation as well
as a detail description of the project are included in later sections of this
document.
Note that this project is based upon one included in the Robbins and Robbins
text mentioned above. The project is a slightly modified from its original form for
use in this course. There is additional material available, such as source code, on
the Robbins and Robbins book’s web page. http://usp.cs.utsa.edu/usp/. We are
allowed to use the code provided at this web site, BUT, we must acknowledge
that web site as the source in our program(s) and any descriptive documents we
use.
4. DESCRIPTION
4.1. Overview
The project consists of creating a dispatcher that shares some characteristics of
the PVM control daemon described in section 2. The main difference with PVM is
that this will be a standalone implementation running on a single machine.
Therefore, it will not be too parallel since it will be limited only to the resources of
the particular machine where is implemented.
The dispatcher will have the following the main responsibilities:






Receive requests through its standard input.
Responds through its standard output.
Manage tasks (Forking, killing for process.)
Handle threads
Synchronize Processes
Pass messages
The tasks are independent processes grouped into unit cells called
computations. A task is just a process that executes a specified program. Each
task is identified by a computation ID and a task ID. When the dispatcher is
requested to create a new task, it will ask for a computation ID and a Task ID. In
addition, the dispatcher needs to create a pair of communication pipes and fork a
child process and execute the task. The pipes will open a communication
channel from the dispatcher to the child process:


writefd - dispatcher to child.
readfd - child to the dispatcher.
The dispatcher communicates with the outside world by reading packets from its
standard input and writing packets to its standard output. The dispatcher might
receive a packet requesting that it create a new task, or it might receive a data
packet intended for a task under its control. For this project, the tasks send ASCII
data and the dispatcher wraps the data in a packet. The next section will give
more details about the packets.
4.2. Packets
Along with the project you will be provided with some source code to start the
project. The file ntpvm.h contains two structs:

taskpacket_t – includes a computation ID, a task ID, a packet type, a
packet length and the packet information. The first four items make up a
fixed-length packet header that is stored in a structure of type
taskpacket_t.

ntpvm_task_t – contains information about each active task in a global
tasks array.
There are four types of dispatcher packets in the project:
a. NEWTASK – When the dispatcher receives this packet it will initiate a new
task with the information given in the packet. It also will create the
communication pipes (readfd, writefd), fork the children, store its PID and
execute it. The dispatcher closes then closes unused pipe file descriptors
and then waits for I/O either from its standard input or from the readfd
descriptors of its tasks. The child task forked by the dispatcher redirects its
standard input and output to the pipes and closes the unused file
descriptors. The child then calls execvp to execute the command string.
Example:
Computation ID: 3
Task ID: 2
Packet Type: NEWTASK
Packet Data Length: 5
Packet Information: ls –l
The dispatcher discards the packet and reports an error if it detects that a
task with the same computation and task IDs is already in the tasks array.
The new entry has sentpackets, sentbytes, recvpackets, recvbytes and
endinput members of the tasks array entry set to 0. The barrier is out of
the scope of this project.
NOTE: The
data in the packet of Example is not null-terminated
b. DATA – receives standard input as input data for the task along with the
compution ID and task ID on the packet header. The dispatcher sends the
data using the writedf pipe to the task. In the other hand, when a task
writes data to its standard output, the dispatcher wraps the data and
sends it to standard output.
Example:
Computation ID: 3
Task ID: 2
Packet Type: NEWTASK
Packet Data Length: 15
Packet Information: This is my data
When the dispatcher reads a DATA packet from standard input, it asks the
task’s object to determine whether the packet's task ID and computation
ID match those of any entry in the tasks array. The dispatcher discards the
packet if no entry matches. Otherwise, the dispatcher updates the
recvpackets and recvbytes members of the task's entry in the tasks array.
c. DONE - closes the writefd file descriptor for the task identified by the
computation ID and task ID members of the packet header. The
corresponding task then detects end-of-file on its standard input. When
the dispatcher detects end-of-file on the readfd descriptor of a task, it
performs the appropriate cleanup and sends a DONE packet on standard
output to signify that the task has completed.
Example:
Computation ID: 3
Task ID: 2
Packet Type: DONE
Packet Data Length: 0
Packet Information:
If the writefd descriptor for the task is still open, the dispatcher closes it.
The dispatcher must eventually call wait on the child task process and
sets the compid member of the tasks array entry to –1 so that the array
entry can be reused.
d. TERMINATE - kills the task identified by the packet's computation ID and
task ID. If task ID is –1, the dispatcher kills all tasks in the specified
computation. The dispatcher handles a TERMINATE packet received from
readfd in a similar way. However, if no task ID matches the packet or if
task ID is –1, the dispatcher also writes the TERMINATE packet to
standard output.
5. IMPLEMENTATION
5.1. PART I
The objective of this first part is to create and test I/O for the dispatcher. It also
covers a simple debugging layout. The scope of this part is simple. You will
introduce data to the dispatcher’s standard input and output the same data to the
despatcher’s standard output.
To do that, you need to define two functions getpacket and putpacket. By calling
getpacket, the dispatcher receives data from its standard input. Similarly,
putpacket sends data to the standard output. The data is transferred in two
parts:


Reads the header of type taskpacket_t.
Determine how many bytes to send by the information in the header.
The getpacket function has the following prototype.
int getpacket(int fd, int *compidp, int *taskidp, packet_t *typep, int *lenp, unsigned char
*buf);
The getpacket function reads a taskpacket_t header from fd and then reads into
buf the number of bytes specified by the length member. If successful, getpacket
returns 0. If unsuccessful, getpacket returns –1 and sets errno. The getpacket
function sets *compidp, *taskidp, *typep and *lenp from the compid, taskid, type
and length members of the packet header, respectively. If getpacket receives an
end-of-file while trying to read a packet, it returns –1 and sets errno. Since errno
will not automatically be set, you must pick an appropriate value. There is no
standard error number to represent end-of-file.
The putpacket function has the following prototype.
int putpacket(int fd, int compid, int taskid, packet_t type, int len, unsigned char *buf);
The putpacket function assembles a taskpacket_t header from compid, taskid,
type and len. It then writes the packet header to fd followed by len bytes from buf.
If successful, putpacket returns 0. If unsuccessful, putpacket returns –1 and sets
errno.
For this part of the project you will be provided source code. The source code is
available in http://usp.cs.utsa.edu/us. It will also be available locally in the
directory ~sweany/public/3600/Project/Part1. You must include an
acknowledgement of the web site where we got the code in each file of your
project code. Note that three main components are important

Ntpvm.h, contains the data structures taskpacket_t and ntpvm_task_t that
will contain the information about the headers and task execution.

a2ts.c , filter that reads ASCII characters from standard input, constructs a
task packet, and writes it to standard output. The a2ts program writes all
prompt messages to standard error, so it can be run either with interactive
prompts or with standard input redirected from a file. For interactive use,
a2ts prompts for the required information, sending the prompts to standard
error.

ts2a.c, prompts for packet information and writes the information as a
binary packet to its standard output. The standard output of a2ts is piped
into standard input of ts2a. The ts2a program reads binary packets from
its standard input and outputs them in ASCII format to its standard output.
Input entered to a2ts will be interleaved with output from ts2a, but this
should not be a problem since ts2a will not produce any output until a2ts
has received an entire packet.
To complete PART I, you need to:
a. Write the getpacket and putpacket functions (also provided in the source
code).
b. Compile and run the program to make sure that there are no syntax
errors.
c. Test the program. (You can test the program using the | command or
using multiple terminal windows.)
d. Add debugging messages to the loop of the main program to show what
values are being read and written. All debugging messages should go to
standard error.
e. Replace the ts2a program with the ts2log program
f. Test the program.
g. Document all tested execution.
5.2. PART II
In PART II, you need to add more features to the program started in PART I. The
objective of this part is to use the dispatcher with a single task. This task will
have no input. The only purpose of this part is to be able to set up the
communication pipes.
When the dispatcher receives a NEWTASK packet from the standard input. It
creates the pipes, forks the child and executes the task. The dispatcher will be
monitoring the readfd pipe for output from the task and will forward what it reads
as DATA packets to standard output. When the dispatcher encounters an end-offile on readfd, it waits for the child task to exit and then exits.
To complete PART II, you need to:
a. Read a packet from standard input, using getpacket. If the packet is not a
NEWTASK packet, then exit after outputting an error message.
b. Create a pipe for communication with a child task.
c. Fork a child to execute the command given in the NEWTASK packet of
step a. The child should redirect standard input and output to the pipe and
close all pipe file descriptors before executing the command. Use the
makeargv (see the online resources for this). If an error occurs, the child
just exits after printing an informative message.
d. Have the parent close all unneeded pipe descriptors so that the parent
can detect end-of-file on readfd.
e. Wait for output from the child on readfd. For this part of the assignment,
the child will be executing standard UNIX commands. Assume that the
child outputs only text. The dispatcher reads the child task's output from
readfd, wraps this output in a DATA packet, and sends the packet to
standard output by calling putpacket.
f. If getpacket returns an error, assume that this is an end-of-file. Close the
readfd and writefd descriptors for the task. Send a DONE packet to
standard output identifying the task and exit.
Logging the process will be rewarded with extra credit
Test the program by using ls –l.
5.3. PART III
In PART III, the dispatcher will have to handle not only input but also output from
the child process. You need to implement ntpvm_task_t as an array where all the
information of the tasks is stored. Implement the tasks array as an object with
appropriate access functions in a different file. For this part, we only allow one
task at a time.
In addition, you need to implement input and output threads and create
mechanisms to synchronize their access to this structure. To achieve this you
should use thread programming. It is recommended to use mutex locks as a
synchronization method.
The input thread monitors standard input and takes action according to the input
it receives. You must write an input function that executes the following steps in a
loop until it encounters an end-of-file on standard input.
a. Read a packet from standard input by using getpacket.
b. Process the packet.
After falling through the loop, close writefd and call pthread_exit.
Processing a packet depends on the packet type.
NEWTASK
If a child task is already executing, discard the packet and output an error
message.



Otherwise, if no child task exists, create two pipes to handle the task's
input and output.
Update the tasks object, and fork a child. The child should redirect its
standard input and output to the pipes and use the makeargv function to
construct the argument array before calling execvp to execute the
command given in the packet.
Create a detached output thread by calling pthread_create. Pass a key for
the tasks entry of this task as an argument to the output thread. The key is
just the index of the appropriate tasks array entry.
DATA



If the packet's communication and task IDs don't match those of the
executing task or if the task's endinput is true, output an error message
and discard the packet.
Otherwise, copy the data portion to writefd.
Update the recvpackets and recvbytes members of the appropriate task
entry of the tasks object.
DONE



If the packet's computation and task IDs do not match those of the
executing task, output an error message and discard the packet.
Otherwise, close the writefd descriptor if it is still open.
Set the endinput member for this task entry.
The output thread handles input from the readfd descriptor of a particular task.
The output thread receives a tasks object key to the task it monitors as a
parameter. Write an output function that executes the following steps in a loop
until it encounters an end-of-file on readfd.
a. Read data from readfd.
b. Call putpacket to construct a DATA packet and send it to standard output.
c. Update the sentpackets and sentbytes members of the appropriate task
entry in the tasks object.
After falling through the loop because of an end-of-file or an error on readfd, the
output thread does the following.






Close the readfd and writefd descriptors for the task.
Execute wait for the child task.
Send a DONE packet with the appropriate computation and task IDs to
standard output.
Output information about the finished task to standard error or to the
remote logger. Include the computation ID, the task ID, the total bytes sent
by the task, the total packets sent by the task, the total bytes received by
the task and the total packets received by the task.
Deactivate the task entry by setting the computation ID to –
Call pthread_exit.
Test the program by starting tasks to execute various cat and ls -l commands.
Try other filters such as sort to test the command-line parsing. For this part you
should not enter a new command until the previous command has completed.
To complete PART III, you need:
a. Implement pthreads in NTPVM.
b. Implement synchronization techniques to the input and output threads.
c. Test executing various linux commands one by one.
5.4. PART IV
In PART IV, you should modify PART III in such way that multiple computations
and tasks can be allowed simultaneously. You should restrict the number of new
tasks by setting the MAX_TASKS. In this phase, NTPVM will allow a NEWTASK
packet to be read from the dispatcher before previous tasks have terminated.
When a new NEWTASK packet comes in, find an available slot in the tasks
object, create a new set of pipes, and fork a new child to execute the command.
Don't enter any duplicates in the tasks array.
When another request comes in, the input thread creates a new output thread.
Since multiple output threads write to standard output, define an additional mutex
lock to synchronize output on the dispatcher's standard output.
The dispatcher should kill the process when it receives a TERMINATE packet
and notify in its standard output a message TERMINATE.
To complete PART IV, you need:
a. Allow multiple simultaneous computations.
b. Allow multiple tasks each one with set communication pipes.
c. Handle duplicate tasks.
d. Create multiple output threads and handle them with synchronization
techniques.
e. Handle TERMINATE packets.
6. GRADING
The project is divided in four parts. Each part is assigned as one of the four major
assignments for the course.




Major Assignment 1: Part I is due on 02/20/2012 at 11:59pm
Major Assignment 2: Part II is due on 03/16/2012 at 11:59pm
Major Assignment 3: Part III is due on 04/13/2012 at 11:59pm
Major Assignment 4: Part IV is due on 05/02/2012 at 11.59pm
Each assignment will be weighted as 25% of your grade assigned to Major
Assignments.
All assignments must be submitted in a package with the following elements:
 Source code of the programs with descriptive comments
 A Make file to compile the programs. Add a clean directive.
 A readme file that describes:
o Which CSP machine was used to run/text the code.
o The compilation process.
o The execution process (It will be good if you add the actual
execution output from every PART of the project)
o The expected results
Each part of the project will be graded based upon how well it satisfies all the
explicit instructions given for each section of the project. Note that every part of
the project needs that the previous one be completed and properly running.
Download