A Model and API for Communication between BOINC Workunits Nagarajan Kanna (nkanna@cs.uh.edu) Jaspal Subhlok (jaspal@uh.edu) Department of Computer Science University of Houston, Houston, TX 77006 (DRAFT ) 1. Motivation The current BOINC platform supports execution of parallel applications composed of independent work units. The work units cannot communicate with each other. This document presents a method and API for data communication between work units. 2. Synopsis A parallel application consists of communicating processes. In BOINC, processes are executed as work units. Our basic model for communication between work units is based on reading and writing data objects to/from a logical ``dataspace’’. Data objects in dataspace are identified by unique tags. The fundamental concept is similar to the LINDA model and some “publish subscribe” systems. In order to enable some forms of communication between work units, it is essential that work units be identifiable from user code. We introduce the notion of a user managed workunit Id. (This is analogous to a process Id in other parallel systems). A workunit Id is a parameter that is provided when a work unit is created. An executing work unit can find its own Id using an API call. Multiple work units can have the same Id if they represent the same computation. This can happen when redundant computation is employed to improve robustness or when a new work unit is created to replace an existing work unit that may have failed. Work units execute read/write operations with identification tags for communication. A write operation will store a data object in an abstract shared dataspace with an identification tag. Once a data object is written, any process can retrieve the data object with a read operation with a matching tag. The data object will continue to exist in the dataspace after a read operation. Thus, multiple work units can receive a data object written by a single sending work unit. An explicit clear operation deletes a data object from the dataspace. Since there is a single shared dataspace for the entire application and identification tags are unique, communication works fine with redundant work units. The workunit Id is typically a component of the identification tag when one work unit needs to communicate with another workunit. As a trivial example, if each workunit has only one data object to share, it can use its own workunit Id as the tag to index the data object in the dataspace. This document focuses on the model and API for communication, and not the implementation. However, the simplest implementation of such a dataspace is on a single data server node (can be the BOINC server), although distributed implementations are also planned. 3. API We introduce the API proposed for discovery of workunit Ids and communication between workunits: int boinc_write(int tag, int dataSize, byte *buffer) int boinc_read(int tag, int dataSize, byte *buffer) int boinc_getWorkunitId() int boinc_getNumWorkunits() int boinc_clear(int tag) int boinc_clearall() tag dataSize buffer - Identifies (or indexes) each data object in the dataspace for read/write. - Number of bytes to be written/read to/from dataspace. - Pointer to data being written/read to/from dataspace. boinc_write Writes given data object (buffer) indexed with the tag into the dataspace. It is a non-blocking operation. An ERROR return implies the operation cannot be completed. boinc_read Reads a data object (with size less than or equal to dataSize) with the given tag from the dataspace. It is a blocking operation that will wait until data is available. A successful operation will return the actual size of data object that is read. An ERROR return implies the operation cannot be completed. boinc_getWorkunitId Returns the current workunit Id within the application. If the workunit cannot be identified, then it returns -1. boinc_getNumWorkunits Returns the number of workunits in the given application. If the number of workunits cannot be identified, then it returns -1. boinc_clear Deletes a given tag along with the corresponding data object. boinc_clearall Clears the dataspace for the given application 3.1 Work Unit naming convention In BOINC a work unit name can be specified during workunit creation. To enable communication, the workunit Id and the number of workunits have to be encoded in the workunit name as follows: <WorkUnitName>_<WorkunitId>_<NumWorkunits> The runtime library will automatically extract the required information from the text string. For example, if a work unit is named ‘vip_run1_0_16’, it indicates this is workunit Id 0 among 16 workunits. 4. Example: Solving a 2D - Laplace equation 4. 1 Problem Description 2-D Laplace equation 2 2 u ( x , y ) u ( x, y ) 0 x 2 y 2 (Equation 1) Central discretization leads to u i 1, j 2u i , j u i , j 1 h 2 u i , j 1 2u i , j u i , j 1 h2 0 (Equation 2) This will lead to a set of linear equations which can be solved by BiConjugate Gradient Stabilized method (Bi-CGSTAB). It is an iterative method, which creates an approximate solution and improves it on successive iterations. In a parallel implementation, boundary rows/columns (ghost cells) are exchanged among processes between iterations as illustrated below. Refer - Equation 2 Ghost cells (copy of row/column of data from neighbor process) are exchanged between iterations at process boundaries. 4. 1 Using BOINC This computation can be implemented in a straightforward way in BOINC with the API discussed in this paper as illustrated below Creating Work Units $create_work -appname laplace -wu_name laplace_run1_0_11 -wu_template templates/laplace_wu -result_template templates/laplace_result -min_quorum 1 -target_nresults 1 inputfile Similarly 11 more work units have to be created. Process mapping and determining neighbor process The Workunits are assigned parts of the global array as shown here: Code Snippet … boinc_init(); my_workunitId = boinc_getWorkunitId(); num_workunits = boinc_getNumWorkunits(); … [The neighboring workunit Ids are nup, ndown, nleft and nright are computed based on the distribution illustrated above] for iterationCount = 1 to N sendTagUp = getTag(my_workunitId, nup, iterationCount); recvTagDown = getTag(ndown, my_workunitId, iterationCount); //Communication in Y direction. // The getTag function will return a unique tag for a given list of arguments. // The computation above determines a well defined indes in the dataspace for exchanging data with up/down workunits in each iteration. boinc_write(sendTagUp, sizeof(buffer), &buffer); boinc_read(recvTagDown, sizeof(buffer), &buffer); //Actual sending of data to workunit above and receiving from workunit below //The above steps are repeated in each direction end for } …… Normal Program Execution The value of ghost cells will be exchanged with neighboring workunits in each iteration. We have to make sure that tag values used in successive iterations and for different communication pairs are unique. Also we have to construct the data objects that have to transmitted as they may not be contiguous. Failure scenario Suppose a workunit (say B) fails to complete its operation. Then other workunits (say A) may continue to wait for data to be available in the dataspace which will not happen. When work units are created in the BOINC server, an expiry time is set for each work unit. Ultimately workunit (B) will be considered to have failed and reassigned. Workunit A may fail also because of the wait and have to be reassigned. This may lead to repeated writes to the same dataspace location, but does not impact the final results. The application will ultimately complete when a work unit associated with each process completes. Redundant computation Suppose two identical work units are created for each process – say A1, A2, B1, B2, C1, C2. Each of B1/B2 (and C1/C2) will compute the same result and write/overwrite to the dataspace with the same tag . Work units A1/A2 are able to receive the data objects from the dataspace when B1 or B2 , and C1 or C2 have written into the dataspace. The computation will terminate successfully as long as one copy of each of A1/A2 , B1/B2, C1/C2 completes normally. 5. Discussion This document is a basic draft with many of the issues not addressed explicitly. Some of these are listed here. 5.1 Verification API int boinc_write_verify (int tag, int dataSize, byte *buffer, int (*)checkData() ) The objective of the verify_write is to check if redundant work units are computing identical data objects with the same tag. The basic boinc_write directly writes the given data on the shared dataspace. This is an extension to the basic API that requires a user created function for verification. Verification function is invoked by a function pointer. It will be this function which does the actual data comparison or verification. 5.2 Helper API The basic API that we have listed will be sufficient to accomplish our goals. In addition, we will provide a set of helper functions that will enhance the productivity of application developers. Some example: Conversion function to/from primitive data types to byte arrays Converting local indexes and process Ids to tag values. This function will take an n-tuple of indexing integers and create a unique global tag. As an example, the programmer provides the row and column of an array element, along with its own process Id, and the helper function will convert it to a single unique tag. (similar to getTag in the example) A command to initiate the execution of parallel applications. (Say ‘boincrun’). The input will be the number of processes to be used and the input files to be used for starting the application. This program (boincrun) is responsible for creating various work units to be executed in BOINC clients. [boincrun is similar to mpirun in MPI]. All of these functions are simply shortcuts for activities that a programmer can also implement directly. They do not require any changes to the execution infrastructure. 5.4 Implementation This document focuses only on the API. We list some of the higher level implementation issues. The basic implementation will have the BOINC server or a designated node as the communication server. However, future implementations may employ direct client to client communication. The API intentionally does not specify how long an object remains in the dataspace after it is written and read, but never cleared An implementation may choose to automatically clear outdated data items – loosely analogous to garbage collection in compilers. 5.5 Extensions This basic communication framework is designed to increase the flexibility of BOINC applications. However, it can be used to implement a message passing framework like MPI. An implementation can also conceivably support inter-application communication beside inter-process communication within an application.