
Scalability of data in parallel computations
Final Report - June 3, 2002
Occhio Orsini
The last section (7) explains the project status and completion state, the other
sections are the same as interim report with updates.
1. Problem Overview
Large parallel systems often need large data sets. To run efficiently the data need to be in
memory. However, all systems have a limited amount of memory. This project will
resolve that problem by allowing parallelized programs to also distribute their data access
in a structured way independent of the algorithm parallelization but within the same
2. Solution Overview
Adding the concept of distributed data nodes to the parallel environment and adding a
new programming API specifically for managing and accessing the data nodes provide
the solution to this problem. A single configuration file will describe the entire distributed
data node system. This will allow any client node to access any data node, as all data
nodes will be identified in the configuration file. Data nodes have a very different role to
play than traditional execution nodes in a parallel system. As such, they will often need to
be tuned for individual machines. So, part of the configuration process will allow data to
be explicitly located on a particular server.
An additional benefit of this solution is that there will no longer need to be duplication of
data for execution so, if you have 100 parallel processes, you now only need 1/100 as
much data memory to execute the program as the data is only loaded once in the data
A future enhancement project could be to support generic data node partitioning where
the user has no control of how the data is partitioned and it is dynamically load balanced
at initialization time. Throughout the rest of this document I will abbreviate Distributed
Data Nodes as DDN.
3. Architecture and Design
The DDN architecture has no direct correlation to the TOPC architecture but it will
interoperate effectively with TOPC. The DDN system will consist of a server executable,
a server side dynamic library provided by the user, a client static library, and a
configuration file. Each of these system pieces are reviewed below. The related functions
are covered in section 4.
The configuration file specifies which hosts will manage which portions of the data-set
and where the user provided dynamic library is located. More details on the configuration
file are provided in section 4.
The server executable is started by the user on each host system with the config file
location passed as an argument. The process parses the config file and if it’s host name
appears, it loads the portion of the data-set it is supposed to service. The functions to load
and lookup values in the data set are provided by the user specified DLL. All DDN
servers must be started before any clients are initialized. The servers will run indefinitely
or until the user kills the process.
The client executable that will actually use the data is built and links in the client DDN
library. An exact copy of the configuration file must be available to the client and must
be provided when the DDN connection is initialized. The client has specified DDN get
functions wherever data from the DDN is needed as part of their program. When these
functions are executed, a socket call is made to the appropriate DDN server and the
requested data value is returned to the client program.
Here is a diagram that shows how a sample parallelized system might be constructed
using the DDN infrastructure.
Sample System Layout
Par. Instance
Data Node
Par. Instance
Data Node
Par. Instance
The current design provides an interface that can support a single key lookup or a
multiple key lookup. If time permits the multiple key lookup will also be implemented, as
it will provide additional performance benefits by provided multiple data lookup’s per
socket call.
The current design specifies that the user start each individual DDN server. In the future a
DDN manager could be written that takes care of starting and stopping the individual
DDN servers.
4. API
The DDN API contains functions for initializing and loading data sets, looking up data in
the data sets and determining which dataset contains a piece of data. Some of this
functionality is implemented with callback functions. The user will need to write all the
callbacks and that is the only API interaction they will have on the server side. On The
client side the user will call a few additional functions and there will be others that they
normally will not use. I have split up the functions into client side and server side.
4.1. Client Side Functions
4.1.1. DDN_init(configFile, errorText)
The DDN_init function parses the configuration file, builds a table of data node
servers and initializes a connection to each one. It returns a handle that is used to
access the DDN servers.
4.1.2. DDN_finalize(handle)
The DDN_finalize function disconnects from all data node servers and frees all
allocated resources.
4.1.3. DDN_get_data(handle, server_number, *keyValueStruct)
The DDN_get_data function takes a server number and a data key and requests the
data value corresponding to the key from the appropriate data node. If the server
number is blank then the server is looked up with the DDN_key_mapper functions,
otherwise the specified server is used. The keyValueStruct can hold one or more keys
so multiple key lookups will be supported in the future. Using the DDN_key_mapper
function will add a lot of overhead to the DDN_get_data call so it is recommended
that when you which server containes the hash value you are looking for, you specifiy
the server number.
4.1.4. DDN_key_mapper(handle, key, *server)
The DDN_key_mapper function takes an arbitrary data value key and looks through
the list of data node servers specified in the configuration file to determine which
server contains the specified data value. This function is only called if a server
number is not known ahead of time.
4.2. Server Side Functions
4.2.1. DDN_server_init(configFile)
The DDN_server_init function parses the configuration file, figures out which data
node server it is and what part of the data-set it needs to load.
4.2.2. DDN_server_finalize(handle)
The DDN_server_finalize function frees all allocated resources.
4.2.3. DDN_server_load(handle,*LoadDataByKey())
The DDN_server_load function loads it’s portion of the data-set by calling the user
provided LoadDataByKey function.
4.2.4. DDN_server_serve(handle,*LookupDataByKey())
The DDN_server_serve function is an infinite loop that waits on a socket to receive
data key lookup requests and returns the corresponding data value.
4.2.5. LoadDataByKey(dataSource, begKey, endKey, *userHandle)
The LoadDataByKey callback is provided by the user and loads all data key values
from begKey to endKey from the data source and stores them under userHandle.
4.2.6. LookupDataByKey(userHandle, *keyValueStruct)
The LookupDataByKey callback is provided by the user and finds an individual data
value under the userHandle specified data image.
4.2.7. FreeUserHandle(userHandle)
The FreeUserHandle callback is provided by the user and cleans up all memory and
resources allocated by LoadDataByKey.
The configuration file that controls which nodes manage which piece of the data set is
straightforward. Below is a sample of the configuration file and I will explain the syntax.
Lines starting with “#” are comments and are ignored. Lines starting with “DN” describe
an individual data node. Lines starting with “DNP” specify a dynamic library, which
contains the user provided callbacks that contain the LoadDataByKey and
LookupDataByKey functions.
# The format is the following:
DN server_number server_name first_key last_key data_source
# Each part is defined as:
# DN -- config parser key to signify this line defines a data
# server_number -- the unique number of the host that is going
to serve the following key range
# server_name -- the host name that is going to serve the
following key range
# first_key -- the lowest logical key according to application
logic for the data set.
# last_key -- the highest logical key according to application
logic for the data set.
# data_source -- the local path to or location of a data source
to load the reference data.
This is required and can contain no spaces but
other than that there are no restrictions.
# Sample Hash program configuration
PORT 3333
DN 1 1 999999 dynamic
DN 2 1000000 1999999 dynamic
DN 3 2000000 2999999 dynamic
DN 4 3000000 3999999 dynamic
DNP ./
5. Implementation
An infrastructure needs to be built to support the data node architecture. The notion of a
data client call and a data server implementer is needed. This project is going to use a
TCP/IP sockets implementation but the architecture will be such that any kind of
connection or connectionless data client/data server interaction is possible. A config file
parser needs to be written to parse the config file and manage the list of data node
servers. Dynamic load library system calls are needed to load the user’s library on the
server side.
Initially I looked into using the ACE platform independent communication layer that was
mentioned in class during my presentation by one of the other students. This layer would
definitely meet all of my needs. However, it is a very large software package that can be
complex to build. Just the source code package is 40MB, which won’t even fit in my disk
quota. So, I expect it would take over 100MB to build and I really only need about 6
function and 2 sub libraries of the whole package. So, I decided that I would roll my own
network layer and probably use most of the infrastructure from TOP-C in that area.
6. Testing
Like any project, we need a simple test program that will demonstrate the effectiveness of
the technology and solution but at the same time be simple to construct and understand.
The program I have chosen to build simply allocates a very large sequential hash array
filled with dummy data and then queries all of that data. Because the purpose of Dynamic
Data Nodes is to allow a data-set that would not otherwise fit in memory to fit in
memory, there is really no reference case. The proof that the solution is effective is to
choose a data-set that is larger than the virtual memory size of any of the test machines.
This would result in a memory allocation failure if tested. Then the data-set will be
distributed over 4 or more data nodes and the test program will be able to run to
completion. There is no explicate size required for the hash array. For testing the
effective scalability of the solution, a large hash array potentially approaching 1 GB
while likely be used.
Once the system is operational, then additional performance testing can be done on how
the performance may improve or degrade with small data sets. In addition the people
working on Super Linear speedup could then use this infrastructure to test the possibility
that this system will exhibit super liner speedup with small data sets that fit in cache
when distributed.
7. Project Status
The end of the quarter has come and this section will summarize what is working, what
has been tested and what was not completed due to time constraints and features that
could be added in the future.
I wrote almost all code for this project. I barrowed my network read write wrappers from
a previous project but the rest of the code is new. I also borrowed the TOPC way of
timing that I used to time the ddntester program. I did occasionally loop at the TOPC
code for reference as well as other code.
Implementation notes
 The most significant implementation note is that the system really only supports
one simultaneous client. This is due to multiple implementation issues. First, in
order for performance to be acceptable the client needs to maintain their server
connection in-between server requests. Second, I didn’t plan on implementing a
multithreaded server which means that the server is locked waiting for the next
client request until the client is finished. Third, as part of the DDN client library
implementation I connect to all DDN servers with DDN_Init and do not
disconnect until DDN_finalize. So, one client may lock down every single server
indefinitely. In the future, multi-threaded support needs to be added to the server
which will eliminate this problem.
 The sample hash program uses int hash keys and 1K strings as hash values. This
can be changed by changing the define VALUE_SIZE in ddatanode.h and
The following significant tasks were achieved as part of this project
 Implemented all the basic components so that the system is operational including;
client library, server executable, test application, test plugin for the server,
makefiles, testing.
 Implemented a hash lookup test program that supported local and distributed
modes. When run in local mode it used the hash lookup plug-in directly, when run
in distributed mode it calls the DDN server which accesses the hash lookup plugin. The hash test program generates all data dynamically so it requires no external
data source.
 I tested with a 1GB hash array and then accessed a few thousand hash values in
that set. I was able to load the 1GB memory set in local mode and that was the
maximum. In distributed mode I was able to load a 9GB hash array distributed
across nine systems thus verifying that my solution did indeed provide distributed
data set capability that exceeded local capability.
 Almost all the client library functionality was implemented as designed.
 Both client and server must use the same configuration file as specified.
 With this release a maximum of 9 DDN servers are support per config file and
client library.
The following pieces of the project that were included in the design did not get
 While I hoped to provide a dynamic link library infrastructure on the server side I
did not have time to complete this. So, currently the DDN server is statically
linked with the user’s plug-in. What this means is that when a new application is
desired, the user must recompile the server.
 All most of the functionality on the server side is implemented however I never
got a chance to modularize it as I had specified in the design. So, all the code is
just inlined in the main function.
The server map routine DDN_key_mapper that figures out which server data is
located on if the user doesn’t know was never implemented. Although a great
feature, for performance reasons you would never want to use it so I left it for last
and never got to it.
The user is supposed to be able to specify any config file they want but currently
they must use the ddnconfig.cfg file name and the config file must be placed in
the same directory as the client library and server.
Only a debug makefile is provided.
These are important things that I learned along the way and ideas for future
 The most important thing I saw was that without adding parallelism to your
application or asynchronous support to DDN you loss much of the benefits of
DDN. Because, although DDN allows infinite in memory datasets, without
rewriting your application they are still accessed sequentially so they perform in a
similar way as if you did have one large dataset on your one system. If parallelism
were added to the user app, performance would likely improve.
 The performance of the solution was much lower than I expected. I tested on the
campus network of Solaris systems. I consistently saw a 2 orders of magnitude
decrease in performance compared with the non distributed version of the hash
test program. For example, one test that took 3 second in local version took 101
seconds in distributed version(wall clock time). This makes sense when you think
through it. The local version does a single array reference to get the hash value
while the distributed version must go through probably 100 lines of code and
transmit the hash value across the network. I tried both small hash values and
large ones but neither preformed well. The large ones take a long time to send
across the network. The small ones run much faster but there is proportionally
more network overhead per bytes moved. Running on a high-speed network
would improve things. A specially tuned test program could probably narrow the
performance gap. I just retrieved a block of the hash values which would favor the
local version.
 Only a single simultaneous running client currently support. The details are above
but adding multi-threaded support will address this issue.
Deliverable details.
The project directory that is being handed in contains the following:
 Src directory with 9 significant files. Ddatanode.h is the public include file for
libddnclient.a. ddnclient.c is the source for libddnclient.a . ddnconfig.cfg is a
sample configuration for both client library and server. Ddnphash.c is the sample
hash plug-in implementation and gets built into libddnphash.a. ddnserver.c is the
source for the server and gets built into ddnserver. Ddnshared.c and ddnshared.h
are used by both the client library and server executable for common things like
network communication and config file parsing. Disthash.c is the sample hash
application in both distributed and local modes. It is built into the ddntester
executable. Both Disthash.c and ddnconfig.cfg are setup for 2 DDN servers as
most of my testing was done this way and this is how it should be demoed for
simplicity. It just takes longer to setup 9 servers for testing. The makefile builds
all source files with the all rule or individually if desired. It also has a clean rule
and install rule. The install rule generates the other directories described below
except the doc directory.
Bin directory contains ddnconfig.cfg, ddnserver and ddntester which are
described above and are just copied to bin by the make install rule.
Include directory contains ddatanode.h which is the header file for the
libddnclient.a library.
Lib directory contains libddnclient.a which is the library you link into your
application if you want to use the DDN server.
Doc directory includes a copy of this document.
Simple execution instructions.
To run the executable just edit ddnconfig.cfg and replace the server names with the two
systems that you want to use as test servers. They must be different. Then run ./ddnserver
on those systems. Then when both servers say ready to accept connections, run
./ddntester on another system with the config file in the same directoy it will run and you
will see how ling it took to run with some other output.