Scalability of data in parallel computations Final Report - June 3, 2002 Occhio Orsini COM3620 The last section (7) explains the project status and completion state, the other sections are the same as interim report with updates. 1. Problem Overview Large parallel systems often need large data sets. To run efficiently the data need to be in memory. However, all systems have a limited amount of memory. This project will resolve that problem by allowing parallelized programs to also distribute their data access in a structured way independent of the algorithm parallelization but within the same infrastructure. 2. Solution Overview Adding the concept of distributed data nodes to the parallel environment and adding a new programming API specifically for managing and accessing the data nodes provide the solution to this problem. A single configuration file will describe the entire distributed data node system. This will allow any client node to access any data node, as all data nodes will be identified in the configuration file. Data nodes have a very different role to play than traditional execution nodes in a parallel system. As such, they will often need to be tuned for individual machines. So, part of the configuration process will allow data to be explicitly located on a particular server. An additional benefit of this solution is that there will no longer need to be duplication of data for execution so, if you have 100 parallel processes, you now only need 1/100 as much data memory to execute the program as the data is only loaded once in the data nodes. A future enhancement project could be to support generic data node partitioning where the user has no control of how the data is partitioned and it is dynamically load balanced at initialization time. Throughout the rest of this document I will abbreviate Distributed Data Nodes as DDN. 3. Architecture and Design The DDN architecture has no direct correlation to the TOPC architecture but it will interoperate effectively with TOPC. The DDN system will consist of a server executable, a server side dynamic library provided by the user, a client static library, and a configuration file. Each of these system pieces are reviewed below. The related functions are covered in section 4. 1 The configuration file specifies which hosts will manage which portions of the data-set and where the user provided dynamic library is located. More details on the configuration file are provided in section 4. The server executable is started by the user on each host system with the config file location passed as an argument. The process parses the config file and if it’s host name appears, it loads the portion of the data-set it is supposed to service. The functions to load and lookup values in the data set are provided by the user specified DLL. All DDN servers must be started before any clients are initialized. The servers will run indefinitely or until the user kills the process. The client executable that will actually use the data is built and links in the client DDN library. An exact copy of the configuration file must be available to the client and must be provided when the DDN connection is initialized. The client has specified DDN get functions wherever data from the DDN is needed as part of their program. When these functions are executed, a socket call is made to the appropriate DDN server and the requested data value is returned to the client program. Here is a diagram that shows how a sample parallelized system might be constructed using the DDN infrastructure. Sample System Layout Par. Instance Data Node Server Data Source Par. Instance Data Node Server Par. Instance The current design provides an interface that can support a single key lookup or a multiple key lookup. If time permits the multiple key lookup will also be implemented, as 2 it will provide additional performance benefits by provided multiple data lookup’s per socket call. The current design specifies that the user start each individual DDN server. In the future a DDN manager could be written that takes care of starting and stopping the individual DDN servers. 4. API The DDN API contains functions for initializing and loading data sets, looking up data in the data sets and determining which dataset contains a piece of data. Some of this functionality is implemented with callback functions. The user will need to write all the callbacks and that is the only API interaction they will have on the server side. On The client side the user will call a few additional functions and there will be others that they normally will not use. I have split up the functions into client side and server side. 4.1. Client Side Functions 4.1.1. DDN_init(configFile, errorText) The DDN_init function parses the configuration file, builds a table of data node servers and initializes a connection to each one. It returns a handle that is used to access the DDN servers. 4.1.2. DDN_finalize(handle) The DDN_finalize function disconnects from all data node servers and frees all allocated resources. 4.1.3. DDN_get_data(handle, server_number, *keyValueStruct) The DDN_get_data function takes a server number and a data key and requests the data value corresponding to the key from the appropriate data node. If the server number is blank then the server is looked up with the DDN_key_mapper functions, otherwise the specified server is used. The keyValueStruct can hold one or more keys so multiple key lookups will be supported in the future. Using the DDN_key_mapper function will add a lot of overhead to the DDN_get_data call so it is recommended that when you which server containes the hash value you are looking for, you specifiy the server number. 4.1.4. DDN_key_mapper(handle, key, *server) The DDN_key_mapper function takes an arbitrary data value key and looks through the list of data node servers specified in the configuration file to determine which server contains the specified data value. This function is only called if a server number is not known ahead of time. 4.2. Server Side Functions 4.2.1. DDN_server_init(configFile) The DDN_server_init function parses the configuration file, figures out which data node server it is and what part of the data-set it needs to load. 4.2.2. DDN_server_finalize(handle) The DDN_server_finalize function frees all allocated resources. 3 4.2.3. DDN_server_load(handle,*LoadDataByKey()) The DDN_server_load function loads it’s portion of the data-set by calling the user provided LoadDataByKey function. 4.2.4. DDN_server_serve(handle,*LookupDataByKey()) The DDN_server_serve function is an infinite loop that waits on a socket to receive data key lookup requests and returns the corresponding data value. 4.2.5. LoadDataByKey(dataSource, begKey, endKey, *userHandle) The LoadDataByKey callback is provided by the user and loads all data key values from begKey to endKey from the data source and stores them under userHandle. 4.2.6. LookupDataByKey(userHandle, *keyValueStruct) The LookupDataByKey callback is provided by the user and finds an individual data value under the userHandle specified data image. 4.2.7. FreeUserHandle(userHandle) The FreeUserHandle callback is provided by the user and cleans up all memory and resources allocated by LoadDataByKey. The configuration file that controls which nodes manage which piece of the data set is straightforward. Below is a sample of the configuration file and I will explain the syntax. Lines starting with “#” are comments and are ignored. Lines starting with “DN” describe an individual data node. Lines starting with “DNP” specify a dynamic library, which contains the user provided callbacks that contain the LoadDataByKey and LookupDataByKey functions. # The format is the following: # DN server_number server_name first_key last_key data_source # Each part is defined as: # DN -- config parser key to signify this line defines a data # node # server_number -- the unique number of the host that is going # to serve the following key range # server_name -- the host name that is going to serve the # following key range # first_key -- the lowest logical key according to application # logic for the data set. # last_key -- the highest logical key according to application # logic for the data set. # data_source -- the local path to or location of a data source # to load the reference data. # This is required and can contain no spaces but # other than that there are no restrictions. # Sample Hash program configuration PORT 3333 DN 1 rastaban.ccs.neu.edu 1 999999 dynamic DN 2 ruchbah.ccs.neu.edu 1000000 1999999 dynamic DN 3 rotanev.ccs.neu.edu 2000000 2999999 dynamic DN 4 rukbat.ccs.neu.edu 3000000 3999999 dynamic DNP ./libddnpluginhash.so 4 5. Implementation An infrastructure needs to be built to support the data node architecture. The notion of a data client call and a data server implementer is needed. This project is going to use a TCP/IP sockets implementation but the architecture will be such that any kind of connection or connectionless data client/data server interaction is possible. A config file parser needs to be written to parse the config file and manage the list of data node servers. Dynamic load library system calls are needed to load the user’s library on the server side. Initially I looked into using the ACE platform independent communication layer that was mentioned in class during my presentation by one of the other students. This layer would definitely meet all of my needs. However, it is a very large software package that can be complex to build. Just the source code package is 40MB, which won’t even fit in my disk quota. So, I expect it would take over 100MB to build and I really only need about 6 function and 2 sub libraries of the whole package. So, I decided that I would roll my own network layer and probably use most of the infrastructure from TOP-C in that area. 6. Testing Like any project, we need a simple test program that will demonstrate the effectiveness of the technology and solution but at the same time be simple to construct and understand. The program I have chosen to build simply allocates a very large sequential hash array filled with dummy data and then queries all of that data. Because the purpose of Dynamic Data Nodes is to allow a data-set that would not otherwise fit in memory to fit in memory, there is really no reference case. The proof that the solution is effective is to choose a data-set that is larger than the virtual memory size of any of the test machines. This would result in a memory allocation failure if tested. Then the data-set will be distributed over 4 or more data nodes and the test program will be able to run to completion. There is no explicate size required for the hash array. For testing the effective scalability of the solution, a large hash array potentially approaching 1 GB while likely be used. Once the system is operational, then additional performance testing can be done on how the performance may improve or degrade with small data sets. In addition the people working on Super Linear speedup could then use this infrastructure to test the possibility that this system will exhibit super liner speedup with small data sets that fit in cache when distributed. 7. Project Status The end of the quarter has come and this section will summarize what is working, what has been tested and what was not completed due to time constraints and features that could be added in the future. I wrote almost all code for this project. I barrowed my network read write wrappers from a previous project but the rest of the code is new. I also borrowed the TOPC way of timing that I used to time the ddntester program. I did occasionally loop at the TOPC code for reference as well as other code. 5 Implementation notes The most significant implementation note is that the system really only supports one simultaneous client. This is due to multiple implementation issues. First, in order for performance to be acceptable the client needs to maintain their server connection in-between server requests. Second, I didn’t plan on implementing a multithreaded server which means that the server is locked waiting for the next client request until the client is finished. Third, as part of the DDN client library implementation I connect to all DDN servers with DDN_Init and do not disconnect until DDN_finalize. So, one client may lock down every single server indefinitely. In the future, multi-threaded support needs to be added to the server which will eliminate this problem. The sample hash program uses int hash keys and 1K strings as hash values. This can be changed by changing the define VALUE_SIZE in ddatanode.h and recompiling. The following significant tasks were achieved as part of this project Implemented all the basic components so that the system is operational including; client library, server executable, test application, test plugin for the server, makefiles, testing. Implemented a hash lookup test program that supported local and distributed modes. When run in local mode it used the hash lookup plug-in directly, when run in distributed mode it calls the DDN server which accesses the hash lookup plugin. The hash test program generates all data dynamically so it requires no external data source. I tested with a 1GB hash array and then accessed a few thousand hash values in that set. I was able to load the 1GB memory set in local mode and that was the maximum. In distributed mode I was able to load a 9GB hash array distributed across nine systems thus verifying that my solution did indeed provide distributed data set capability that exceeded local capability. Almost all the client library functionality was implemented as designed. Both client and server must use the same configuration file as specified. With this release a maximum of 9 DDN servers are support per config file and client library. The following pieces of the project that were included in the design did not get completed. While I hoped to provide a dynamic link library infrastructure on the server side I did not have time to complete this. So, currently the DDN server is statically linked with the user’s plug-in. What this means is that when a new application is desired, the user must recompile the server. All most of the functionality on the server side is implemented however I never got a chance to modularize it as I had specified in the design. So, all the code is just inlined in the main function. 6 The server map routine DDN_key_mapper that figures out which server data is located on if the user doesn’t know was never implemented. Although a great feature, for performance reasons you would never want to use it so I left it for last and never got to it. The user is supposed to be able to specify any config file they want but currently they must use the ddnconfig.cfg file name and the config file must be placed in the same directory as the client library and server. Only a debug makefile is provided. These are important things that I learned along the way and ideas for future enhancement. The most important thing I saw was that without adding parallelism to your application or asynchronous support to DDN you loss much of the benefits of DDN. Because, although DDN allows infinite in memory datasets, without rewriting your application they are still accessed sequentially so they perform in a similar way as if you did have one large dataset on your one system. If parallelism were added to the user app, performance would likely improve. The performance of the solution was much lower than I expected. I tested on the campus network of Solaris systems. I consistently saw a 2 orders of magnitude decrease in performance compared with the non distributed version of the hash test program. For example, one test that took 3 second in local version took 101 seconds in distributed version(wall clock time). This makes sense when you think through it. The local version does a single array reference to get the hash value while the distributed version must go through probably 100 lines of code and transmit the hash value across the network. I tried both small hash values and large ones but neither preformed well. The large ones take a long time to send across the network. The small ones run much faster but there is proportionally more network overhead per bytes moved. Running on a high-speed network would improve things. A specially tuned test program could probably narrow the performance gap. I just retrieved a block of the hash values which would favor the local version. Only a single simultaneous running client currently support. The details are above but adding multi-threaded support will address this issue. Deliverable details. The project directory that is being handed in contains the following: Src directory with 9 significant files. Ddatanode.h is the public include file for libddnclient.a. ddnclient.c is the source for libddnclient.a . ddnconfig.cfg is a sample configuration for both client library and server. Ddnphash.c is the sample hash plug-in implementation and gets built into libddnphash.a. ddnserver.c is the source for the server and gets built into ddnserver. Ddnshared.c and ddnshared.h are used by both the client library and server executable for common things like network communication and config file parsing. Disthash.c is the sample hash application in both distributed and local modes. It is built into the ddntester 7 executable. Both Disthash.c and ddnconfig.cfg are setup for 2 DDN servers as most of my testing was done this way and this is how it should be demoed for simplicity. It just takes longer to setup 9 servers for testing. The makefile builds all source files with the all rule or individually if desired. It also has a clean rule and install rule. The install rule generates the other directories described below except the doc directory. Bin directory contains ddnconfig.cfg, ddnserver and ddntester which are described above and are just copied to bin by the make install rule. Include directory contains ddatanode.h which is the header file for the libddnclient.a library. Lib directory contains libddnclient.a which is the library you link into your application if you want to use the DDN server. Doc directory includes a copy of this document. Simple execution instructions. To run the executable just edit ddnconfig.cfg and replace the server names with the two systems that you want to use as test servers. They must be different. Then run ./ddnserver on those systems. Then when both servers say ready to accept connections, run ./ddntester on another system with the config file in the same directoy it will run and you will see how ling it took to run with some other output. 8