D S F

advertisement
DISTRIBUTED SERVER
FRAMEWORK IN JAVA
Yue Dong
Roy Laurens
Atip Asvanund
Parag Manihar
Saowanee Saewong
Information Networking Institute
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh PA 15213
Distributed Server Framework in Java
Quickware
1. Introduction
In large organizations, the computation resources provided by workstations are
generally left under-utilized. The main problem associated with utilizing the idle CPU
time on these workstations is the lack of an infrastructure offering flexible
computation support that can be easily used. Remote execution is usually done by
sending the parameters, while the code itself is already on the remote workstation.
This method is unsuitable because it limits the flexibility; we cannot execute a new
object in the idle workstation unless we already have the piece of code for the object
already in place on the remote workstation. Another problem associated with
managing the idle resource is the need to have a centralized controller that can
distribute and monitor remote execution of all objects at the workstations.
In this project, we propose to build a broker-based distributed computing
environment. The main idea is to create a distributed infrastructure to be used by a
client (application) which involves performing some intensive computations. Clients
that want to use our infrastructure need to simply defines the metrics, the computation
function and its parameters and pass them to our distributed infrastructure. The broker
in our distributed infrastructure will return a ticket containing the duration, and the
address of the server that's available as well as suitable (according to few metrics
described later) to process this request. The computation function will then be sent to
the corresponding server, and once the computation is completed, the result will be
returned to the application. We will also incorporate multicasting to make our
solution highly available and scalable and use class loading for remote execution.
Many real-world applications, which need to perform a lot of intensive calculations,
will benefit from this infrastructure. Specifically backtracking algorithms will find the
framework highly useful. One example of such an application is checkmate problem
solving in Chess, in which the best possible move needs to be evaluated for each
level. The calculation typically consists of moving each piece on the Chess board to
each possible legitimate square and calculate whether the step taken will end the
game in a "check" or not. Compared to their performance in centralized systems, such
2
Distributed Server Framework in Java
Quickware
kinds of computation generally will be performed in a more efficient manner in
distributed environments, since the problem can be split up and calculated separately
by each computational node.
2. Architecture
To provide a working and stable infrastructure, our architecture consists of three main
components and two major communication parts.
Backup Broker
Backup Broker
Primary Broker
Client
Application
Server(s)
Client API
2.1. Component
There are three components in our infrastructure, each with different functions to
perform:
2.1.1. Broker
The Broker provides information regarding which server is to be used by
the client. There are two kinds of brokers in the system: a primary and one
or more backups. The Primary Broker is the functional broker that
responds to every request generated by clients, while keeping track of the
server status and jobs submitted to each server. The Backup Broker is a
passive standby broker that always monitor the status of primary broker;
when the primary broker goes down for some reason, one of the backup
brokers will replace it to become the new primary.
Every time a client wants to use our infrastructure, it has to contact the
primary broker. The Primary broker will then return the server that it
thinks is appropriate to serve this client’s request. Primary broker also
3
Distributed Server Framework in Java
Quickware
keeps track of each server’s alive-message; monitoring whether they’re
alive or dead. When the Primary broker updates its information, it will
propagate the information to other backup brokers if they exist.
2.1.2. Server
Server is the entity that will perform the actual computation of the client's
request. Typically it is an idle workstation with extra CPU time that can be
used by clients. When we first start each server, it will try to find the
primary broker in the system and register itself with the primary broker.
That registration will make this server’s service available should any client
send request to the broker.
During registration, the server will also send its current status. This
includes CPU type, memory size, disk size, etc. The broker uses this
information when a client has a request that needs specific metrics.
Server will send periodic alive-messages to the broker so that its status is
always monitored. This alive-message contains information of current
status, such as the CPU utilization or available memory.
Once a client receives the information about the server that it can use, then
the client will open a connection to the server, sending the object it wants
to execute there.
2.1.3. Client
Client is a node that wants to utilize the infrastructure. When a client has
an intensive computation that will benefit from our distributed
infrastructure, it will try to find the primary broker in the system. Once
this primary broker is found, the client will contact it and sends its request
along with some metrics that describe the nature of the job. Upon
receiving answer, client will contact the server given by the broker, send
its computation job there and wait for the result.
4
Distributed Server Framework in Java
Quickware
Client is divided into two parts, the Client API and the client application.
Client application is an application that wants to use the infrastructure. To
do that, client application will use the service provided by Client API.
Client API is the part that actually contacts the broker and sends
computation to the server. In other words, Client API is the interface
between the application and the infrastructure.
2.2. Communication
The communication involved can be divided into two parts:
2.2.1. Initialization
The Initialization part deals with the registration of a new server with the
primary broker. Then the primary broker will update the new server
information in other brokers (backups).
1.
The broker has information about all the servers and the client's public key (through
a look-up file). Every time a server is turned on, it will notify the broker that it is
alive, and also send a keep-alive message at a pre-determined interval.
2.
The primary broker will also exchange the current server status with a backup broker
every time a server is added or removed
5
Distributed Server Framework in Java
Quickware
2.2.2. Request Handling
This is a typical communication that occurs whenever a client wants to
request a server from the broker.
1.
The client sends a request message to the broker, containing the authenticator string
and the preferences (metrics).
2.
The broker responds by giving a ticket and the allocating a corresponding server.
The ticket contains the duration of the session, servers to be used, etc.
3.
The client sends the request to the server, specifying the object location and the
ticket.
3. Detailed Design and Implementation
3.1. Metric
The client can define the following metrics, which the broker will then try to
match with those of the servers in its server list:
1. Number of computations (NOC)
2. Maximum Memory required.(MEM)
3. CPU time required (approximately). (CPU)
4. Etc. (this list can be extended to adapt to more metrics in the future.)
6
Distributed Server Framework in Java
Quickware
On the Server side, each Server should send the its current value of metrics
during registration. The Broker stores them in its Server list, and does a look-up
to find the best server, on receiving the client request.
3.2. Dynamic port allocation
Since we want our infrastructure to be scalable, we make our system free of hardcoded values (e.g.: port numbers). We also have an elegant model for message
passing between the various entities (Broker, Server and the Client). The model
now allows adding new messages with no changes to the underlying
communication mechanism.
3.3. Swing-based GUI for each Server
Swing is a new Java cross-platform component set — composed of GUI
components that have precisely the “look and feel” we specify. When we create a
program using Swing and run it under Windows, it has the appearance and
behavior of a program written specifically for Windows. When we run the same
program on a UNIX workstation, it runs just like any program written for UNIX.
When we run it on an Apple Macintosh, it looks and behaves just like any
program written specifically for the Mac — and so on. This conforms to our
platform-independence architecture.
3.4. Class Loading Design and Implementation
Class Loading allows an object to be shipped over to a host that may not know
the implementation details of the object. Upon receiving the object, the host will
reconstruct the object and execute the specified function and return the results to
the original sender.
We use this technique instead of RMI because RMI requires the server to have
the interface and hence the implementation details of a class that the client wants
to execute. One of the objectives of our project is to give the client flexibility in
defining its own class and the method to execute.
7
Distributed Server Framework in Java
Quickware
Note that ClassLoading using the Java ClassLoader package is different from
a conventional Remote Procedure Call or Remote Method Invocation in that the
object to be executed needs to be sent to the host (server) that will execute it. In
RPC or RMI, the object is normally already on the host.
Architecture: A conventional object serialization scheme does not suffice in our
situation because the server may not always have the implementation information
available to de-serialize and reconstruct the object.
Therefore, we need to
combine the scheme of Object Serialization and Class Loading. Upon deserializing and reconstructing the object, the host will load the class information
of the object from the original location of the object on a need by need basis i.e.
it will first try to resolve the class locally, and then remotely if required. Thus if a
class has been already loaded it need not be remotely loaded again.
A client will be given the API to assist in utilizing our architecture. By using the
API, the client need not be concerned with much of the underlying
implementation details. A basic understanding of our architecture will suffice.
To use the API, the clients can simply instantiate the class "ClassServer", ship
the object and wait for the result. The API in consultation with the broker will
pick the appropriate host based on the metrics.
The class “ClassServer” server on the client side is established to assist the host
in de-serializing and reconstructing the object. The object de-serializer on the
host will request class information from the class loader server. The object deserializer will only request class information for classes that it does not have
implementation details for on the host.
The object is shipped using object serialization. The object-deserializer on the
host may use the class “ClassServer” on the client side to de-serialize the object.
The client then specifies the name of the method to be executed. The host will
look for the method in that object and execute it.
8
Distributed Server Framework in Java
Quickware
The client then waits for the result of the execution.
The client, who has
necessary information to de-serialize the result, can easily reconstruct the result
that is shipped from the host.
Initially, the client API needs to instantiate a class "ClassServer" on the client
side. This class will return the class information in form of bytecodes to the
requestor. The bytecodes are taken from the "requested_class_name.class" file.
On compilation, the Java compiler will automatically generates this file.
To instantiated the class server on the given port:
ClassServer cs = new ClassServer(classServerPort);
To instantiated the remote execution engine:
ClientAPI ca0 = new ClientAPI(classServerPort);
The object to be executed remotely will be shipped with object serialization to
the server. The de-serializer on the host will contact the class server to obtain
implementation details of the class. This is done by using a custom ClassLoader
which is created one per each session. This ClassLoader first checks if it can
resolve a class locally. If it cannot, it will then issue calls to the ClassServer
explained above. The ClassLoader has a cache, so that it will not need to
remotely resolve the same class more than once per session. The ClassLoader
takes the bytecodes that is received from the ClassServer and construct it into a
Class object of the class. This Class object is then use to assist the de-serializer
in reconstructing the object. Since object that is shipped may have references to
other classes, the ClassLoader will have to recursively resolve all the referred
classes.
The Server will then execute the specified method.
To do this, the Server
generates a Method object representing the specified method on the shipped
object. This method is then invoked, and the result is shipped back to the client.
To creates new instances of type Object:
Object_name
myobject = new Object_name(parameters);
9
Distributed Server Framework in Java
Quickware
To ship myobject to the remote Server and executes the method “method_name”
on it:
ca0.execute(myobject, "method_name");
The API also provides the function to block wait until the result is returned.
To block and wait for the result:
ca0.waitForResult();
To get the result:
int[][] reply0 = (int[][])ca0.getResult();
Some casting is need to get the result back into int[][] because getResult() returns
an Object type.
3.5. Failure Handling
Our implementation can cope with two types of failures: Broker failures and
Server failures.
We use the primary-backup mechanism to deal with Broker failures. At any
given time, we have one Broker acting as Primary Broker which grants keys to
the clients and stores some states relevant to the servers. We may have one or
more Brokers that act as the Backup Brokers. The Primary Broker makes its
states known to the Backup Brokers after its state gets updated. The Primary
Broker exchanges alive-messages with its backups. If the Primary Broker times
out, the first Backup Broker in line takes over and becomes the Primary Broker.
In intercepting failures that may occur on the servers, we use alive-messages that
are exchanged between the Primary Broker and the servers. If a server times out
for several times consecutively, the Primary Broker assumes that the server is
down and updates its knowledge base that the server has failed. If a server times
out while it is performing computation for a client, the Broker notifies the Clients
of the server’s failure. The stub on the client automatically restarts the session on
another server assigned by the Broker.
10
Distributed Server Framework in Java
Quickware
In the unlikely case where a “live” server looses several alive-messages in a row,
then the broker will still think it's dead. The Broker will update the client and the
client will act accordingly. If the client tries to restart the computation with a
different server, and later the answer from the “dead” server arrives, then the
client has option of dropping this answer or accepting it.
The broker, on the other hand, will receive a normal alive-message from the
server that it considered dead. This will make the broker send a message back to
the server, saying that it doesn't recognize the server and the server should start
the re-registration process.
3.6. Broker Election
In our system, when the Primary Broker is dead, there is a need to choose a
unique Backup Broker to fill up the particular role. An Election Algorithm is
employed for the task of choosing a new Primary amongst many backups. This
allows us to scale to a large number of backups (though there is an added
overhead in message passing). The algorithm selects the one broker from the
surviving Backup brokers, with the largest priority as the new Primary Broker.
Following are the eight kinds of messages that are passed in the algorithm:

New_Broker_Request: when a new Broker is fired up, it sends this message
to let the Primary Broker register it.

Primary_Alive: the Primary always sends this “alive” message to all Backup
Brokers.

Election: this message is multicast to other backups when any Backup
Broker detects that the Primary Broker is dead, to announce an election.

New_Primary: it is sent to all Brokers to notify that the sender is new
Primary Broker.

Answer: each Backup Broker sends this message in response to an election,
and itself starts a new election session.

New_Primary_Ack: After receiving an Answer message, a Backup_Broker
may respond to this message to agree to the new Primary Broker.
11
Distributed Server Framework in Java

Quickware
New_Primary_Refuse: After receiving an Answer message, a Backup
Broker may responsed to this message to refuse to the new Primary Broker.

New_Primary_Confirm: if response messages returned to the new
“Primary” Broker are New_Primary_Acks, it will announce this message to
confirm itself as new Primary Broker, and the election is finished.

New_Primary_Cancel: if any of response message returned to the new
“Primary” Broker is New_Primary_Cancel, or time out, it will announce this
message to cancel itself as new Primary Broker, and start another election.
When the failed Primary Broker is actually perceived as “dead” due to a temporal
failure, it probably resumes in some time. In order to avoid two Primary Brokers
in the system, a new election session starts. If it has the highest priority, then it
will decide that it is the Primary Broke, and “bully-send” the New-Primary
message to all brokers.
In the best case, the process with the second-to-highest priority notices the
Primary Broker’s failure. Then it can immediately elect itself, and send (n-2)
messages. The algorithm requires O(n2) messages in the worst case, that is, when
the process with the least priority first detects the Primary Broker’s failure. Then
(n-1) processes together begin the election process, each sending messages to
other brokers with higher priorities.
3.7. Multicasting
To get high availability of the primary broker, we use the Multicasting technique
instead of the Naming Service of RMI and CORBA. This alongwith the election
algorithm will let the startup broker, server or client multicast a
Primary_Request message to find a Primary broker in the system. The primary
broker in the system will return Primary_Return message to identify its address
and port. This approach gives high flexibility to the system because no one needs
to know about the primary broker before hand and the system still works as long
as there is one broker left in the system. This is an advantage compared to the
12
Distributed Server Framework in Java
Quickware
Naming Service of RMI and CORBA, which will break down when the host
running the naming service is down.
3.8. Load Balancing
Our second means of load balancing is initiated when load in the server exceeds
certain predetermined threshold. If this happens, server will send a message to
the broker telling that this server wants to transfer current object elsewhere.
Broker would then try to find another server that is suitable and respond back to
inform the server where to transfer the object. After the new server receives the
object, it will send notification to the Broker that this transfer has finished
successfully. Broker has the option of not propagating this notification to
respective clients (thereby making the process transparent) or sending an explicit
notification to the client.
3.9. Sample Applications
Since this project only provides the infrastructure for a distributed system, it is
the client application’s responsibility to utilize the infrastructure in an efficient
manner. Our infrastructure is just providing the notion of having several
“processing nodes". The client application is responsible for distributing one
large task into several processes that can work independently.
In order to fully exploit the power of our infrastructure, the task that client wants
to solve must be a suitable task for distributed computation. Several types of
tasks that can largely benefit from our system are:

Tasks that requires numerous independent computations
Simply executing each independent computation in different servers can split
such a task like this. Because the computations will be executed in parallel,
the whole task can be significantly faster than a single node computation.
However, since all of the independent computations must be finished, the
task cannot be considered complete until the result from the last server is
returned.
13
Distributed Server Framework in Java
Quickware
Tasks of this type also cannot be easily split into several parts without careful
consideration of the load that each part will require at the server. In order to
achieve maximum distributed effect, each part should be of equal complexity.
Examples of this type of tasks are evaluation of mathematical expressions.

Tasks that consist of iterative executions
Repetitive execution of the same code with different set of data is among the
easiest type of application to take advantage of distributed computing. If the
code has somewhat similar execution complexity over various sets of data,
then a client can simply send each iteration as a different computation
problem to be processed by the distributed infrastructure.
As with the case of independent computation, this task must also wait for all
servers to finish their respective calculations before a final result can be
obtained.
Example of this type of task is vector multiplication.

Tasks that involve trial and error checking for some value
These kinds of tasks are the one that can really exploit the full use of our
distributed framework. Basically, this task’s purpose is to find a solution to a
problem where there is no known algorithm to solve it other than an
exhaustive search. So, the task will have to search the entire solution space
and try each candidate to see whether it is the right solution or not. Since this
task typically stops the moment it finds the first solution, then the ability to
do checking on several candidates simultaneously is very appealing. After the
task is split into several smaller computations — where each computation
works on different set of candidates — it simply waits if any computation
returns a solution. So, the task doesn’t have to wait for all the servers to finish
their calculations.
Example of this type of tasks are back tracking algorithm, one of which is
explained in Section4.5.
14
Distributed Server Framework in Java
Quickware
4. Discussion
After designing and implementing our infrastructure, we came up with some
interesting issues to discuss.
4.1. Load Balancing
After a thorough examination on class loading, we found out that although
stopping the execution of an object is possible, resuming the execution from the
current state afterwards is almost impossible. This is due to the nature of object
loading that doesn’t preserve local temporal context. Contents of local variable
are not conserved. Therefore if we want to have the ability to resume and
continue execution, the object must implement a scheme to do bookkeeping of
local variables. This task, to be done by the client application is not a trivial one,
and poses a significant risk of introducing insidious bugs that only occur during
object transfer. So, even though the implementation on our infrastructure is quite
straightforward, the enormous routine that has to be added on the client
application makes us decide to drop this feature. Instead, we offer a pause and
delete feature that can be implemented with minor modification on the client and
still offer a reasonable solution in case the load increases.
The pause and delete options allow the client to decide whether to stop the object
execution temporarily (pause) or to delete the object when the server experiences
heavy load. This feature assumes that the client knows and can approximate the
object execution time. Therefore, when the server is overloaded and inform it to
the client, it can decide whether it is better for this object to be paused or to be
deleted.
Pausing is appropriate if the object execution is about to be finished. Server
overload could be a transitory condition, and it would be better to just wait until
the load subsides, rather than restart, executing the object on another server.
When the server load drops, the server will inform the client and the object can
resume its execution.
15
Distributed Server Framework in Java
Quickware
Deleting, on the other hand, is preferable when the object execution is still in its
infancy. When heavy load is experienced during this early stage, it is better to
delete the object, send it to another server and restart it there. Not much
computation time is lost because the discarded object execution time is negligible
compared to the total execution time.
It is clear that only the client can determine whether object execution time up to
the moment of heavy load is considered short (better to be deleted) or long
(better to be paused). A five minutes elapsed time is short if total execution will
take 2 hours, but if total execution is five and a half minutes, it is better to pause.
4.2. Election Algorithm Advantages
Our primary-backup broker algorithm has the following advantages:

Flexible registration
When the broker starts up, it will use multicasting as mentioned before to
find a primary broker in the system. The primary broker will add the new
16
Distributed Server Framework in Java
Quickware
broker in its broker list and transfer this new state of broker list to backup
brokers.

High availability
The primary broker is responsible to send Alive messages to backup brokers.
Therefore, a backup broker who notifies that the primary broker is down first
will start the Election algorithm for finding the best candidate (the highest
priority) to be the new primary broker.

Intelligent discovery of backup-broker
The Election algorithm will let the backup broker who finds that the primary
broker is down to send an Election message to other backup brokers. Only
the higher priority backup brokers will send the Answer message back. The
lower or equal priority backup will send the Election_Ack message back.
The broker who starts the election will update the status of each broker in the
broker list and if it does not receive any Answer message, it will claim that it
is the best candidate and start the New_Primary_Voting algorithm.

High consistency
We implement the New_Primary_Voting algorithm like the Mutual
Distributed Exclusion algorithm. The broker who claims that it is the best
candidate to be the new primary broker and starts the New_Primary_Voting
algorithm, will multicast the New_Primary_Voting message. The broker
who receives this message and has lower or equal priority and have not voted
to other broker yet will send the New_Primary_Voting_Ack and lock itself.
Otherwise it will send the New_Primary_Voting_Refuse back. To avoid
deadlock, whenever the broker receives the New_Primary_Voting message,
which has the priority higher than the one that it already voted on, it will send
the New_Primary_Voting_Refuse to the old broker and wait for the
New_Primary_Cancel to unlock its vote. This broker will send the
17
Distributed Server Framework in Java
Quickware
New_Primary_Voting_Postpone message to the new one to postpone the
voting.
If the voting broker can collect all the voting results, it will send the
New_Primary_Confirm message to all brokers. Otherwise it will send the
New_Primary_Cancel message to all brokers which will unlock its vote.
4.3. Java SecurityManager
Our initial thoughts to get around the security problem, was to simply write a
server as an applet and run it in an applet viewer. Running it in a browser was
ruled out because of the extra over head that would add to our server. But we
found out that an applet viewer has many restrictions that we cannot turn off.
And this prevents us from using our multicast protocols. Therefore we reverted
back to implementing the server as an application and using the Security
Manager package provided in Java to make it more securely robust.
The Java Security Manager is tightly integrated with the core of Java. So, we did
not have to do much work-around to provide the sand box model. We just
custom created our security manager and install it on every server.
In order to do this, we create a subclass of SecurityManager, and override all the
checkX()
methods in it. Any method that we do not override will be disallowed
by the JVM.
public class AtipSecurityManager extends SecurityManager {
public AtipSecurityManager() {
super();
}
…
public void checkConnect(String s, int i) {
//to allow all type of connections from and to the JAVA VM
}
public void checkDelete(String file) {
throw new SecurityException("Thou may not access the file system");
//to not allow any type of file deletion from the JAVA VM
}
…
}
18
Distributed Server Framework in Java
Quickware
As a security feature, only one SecurityManager can be installed for each Java
Virtual Machine, and it is the first one to be installed. During the final
demonstration, we have shown the functionality of our security in the Server.
4.4. Java Multi Threading problem in Solaris
We were experiencing context-switching inconsistencies that we could not
explain within our JAVA code. Then we realized that JAVA on Solaris does not
perform proper time slicing although it does on Microsoft Windows 95/NT
platforms. On Solaris, a JAVA thread will occupy the complete JAVA VM time,
unless it issues the java.lang.sleep () method. Otherwise, the thread is nonpreemptive. This only effects the JAVA VM. It appears that other applications
running outside JAVA VM do time slicing normally. However, these threads are
pre-emptive on Windows even if we don’t issue the java.lang.sleep() method.
This was a significant problem because it means that all application (i.e., chess
engine codes) that were being shipped to run remotely would need to call
java.lang.sleep() at a regular interval. Otherwise, the engine will be not be preempted and the JAVA VM will be solely occupied by the engine until it finishes
calculation — which can take a significant time.
So, despite the security feature that have been discussed in Section4.3, a person
can hack our architecture by sending a code that occupies the whole JAVA VM
if the Server is running on the Solaris platform.
4.5. Chess Checkmate Application
The decision-making parts in calculating the next move on a computer game is a
typical example of a problem requiring a backtrack algorithm. Usually, in order
to determine whether a particular move is a good move, the computer has to
compute the opponent’s entire possible response move space, and see whether
the computer will be in a better position compared to the current position or not.
In chess, a typical board configuration might have twenty to thirty possible
moves. The opponent might also have more than twenty possible replies to each
19
Distributed Server Framework in Java
Quickware
of these moves. A good player will go even further and try to compute the reply
to each of the opponent’s reply, and so on. Calculating each single move could
easily mean calculating hundreds or thousands of board position, depending on
how deep the calculation goes.
In our project, we will try to make a chess engine, where given a board
configuration, and the computer will try to find the best move in such a
configuration. Since it is almost impossible to know whether a particular move is
the best move until way later, we will limit our application only to find the move
that will create a checkmate. In this case, a solution can easily be checked and the
computer has a well-defined criteria to measure the feasibility of the move.
First, the Chess application will split the required computation to several parts. In
our particular implementation, we choose to split it based on the depth level to be
searched. But we can easily change this to be based on a subset of possible
moves. We choose depth level because we have fine-tuned our engine to
intelligently pick the best candidate in each level and try it first. This will make
splitting based on possible moves irrelevant, because in almost all but the most
extreme board position, the first chosen move will result in the solution.
However, we still have to check whether this move will actually lead to a
checkmate or not. Since we are trying to find a checkmate as soon as possible,
we will try to set the depth level to the smallest value and increase it if we cannot
find checkmate with the current depth of search.
20
Distributed Server Framework in Java
Quickware
4.6. Performance Analysis
We developed the following scenario to test the performance of our system.
Although the benchmarks cover a narrow range of our distributed system, it
serves our purpose to provide some performance guidance in our system. The
tests were performed using Java 1.1.4 under Window NT 4.0.
For the initial setup scenario, we tested the setup time of one broker (i.e., elapsed
time from when the broker started until it become fully configured as primary
broker). Then we also tested the setup time when the first, the second and the
third broker is brought up.
21
Distributed Server Framework in Java
Quickware
Timeout: 2000ms
Unit: Second
Case
Time 1
Time 2
Time 3
Time 4
Time 5
Avg.
One Broker setup alone
8
7
8
8
8
7.8
The 1st backup setup
The 2nd backup setup
The 3rd backup setup
2
3
4
2
2
3
2
2
3
2
2
3
3
3
3
2.2
2.4
3.2
The setup time in the first case is longer than the subsequent ones, since it takes
much more time to setup broker environments, to check if there are other
(backup) brokers alive, and to start our election algorithm. For the other cases,
the setup time is almost equally short, took much less time, than the first one.
After detecting there is an existing primary broker, backup brokers in these cases
simply get the information from the primary broker. The transfer time of the
information is almost the same in the later three cases, although the case 3 the a
little longer than the case 2 and case 1, probably due to the slight increase in
message passing.
Meanwhile, we found out that the timeout interval plays a very important role in
the benchmark. Interval of 2,000 ms is used in the above data. If we change the
interval to 10,000 ms (which was necessary under Solaris environments), the
corresponding times are1:
Timeout: 10,000ms
Unit: Second
Case
Average
One Broker setup alone
The 1st backup setup then
The 2nd backup setup then
The 3rd backup setup last
37.6
7.4
7.6
8
Next, we set up the following scenario for measuring the Election algorithm
time:

Election time required when there is only one backup broker, after the
primary broker goes down.
1
This is the result shown in the final demo
22
Distributed Server Framework in Java
Quickware

Election time between two backup brokers, when the primary broker is dead.

Election time among three backup brokers, when the primary broker is dead.
Unit: Second
Case
Time 1
Time 2
Time 3
Time 4
Time 5
Average
1.
2.
3.
10
12
31
9
11
30
10
12
29
10
12
23
10
12
28
9.8
11.8
28.2
As we can expect, the election time greatly depends on the number of backup
brokers in the system. Less backup brokers will lead to less election time. If there
are more backup brokers, it is necessary for them to find each other and also to
check the status (such as priority, active/dead) of each in a larger range. Also,
message passing in a larger backup broker family will increase greatly compared
to the environment where there are only one or two backup brokers.
Finally, we also measure the time required to do class loading and execute an
actual application. We measure the time needed to perform the checkmate engine
without distributing the computation and, then we measure the time taken when it
is submitted to the distributed infrastructure with two servers. We also measure
the time associated with loading a class by executing an empty class (i.e., it
contains no code). This is one of the major overheads of our infrastructure.
Unit: Second
Case
Average
Chess application without our infrastructure
Chess application with our infrastructure (2 servers)
Class loading overhead (one class)
Class loading overhead (two classes)
193.5
104.8
0.643
1.870
As predicted, the improvement in distributed environment is significant. This is
due not just to the splitting of the work, but also because the chess application
only need one server response to complete the calculation. We notice that class
loading imposes a small overhead to our system. This means that a computation
that needs more than a couple of seconds can easily benefit from our system. It is
also observed that loading two classes cost more than twice of loading one class.
23
Distributed Server Framework in Java
Quickware
This might happen since loading more than one class, makes a heavier burden on
the thread infrastructure. So, even though the communication cost is probably
only two times, but the cost for the computation might be taking more than that.
4.7. Future Work
We would like to have billing as an integral part of our system. Clients could be
billed on a range of metrics from CPU time, memory requirements, leasing time,
etc. This feature can be easily implemented on the Broker without affecting much
of the other components.
For billing, we need to focus more on the security aspects that fell outside the
scope of this course project but will be essential for any commercialization in the
future. We can use Kerberos ticket granting, thereby greatly enhancing the
security of our system by providing authentication between different
components.
Currently the system just simulates the metrics at the server end; mechanisms are
required to measure metrics (like load) dynamically in real time.
5. Summary
In this course project we have demonstrated the use of broker-based distributed
framework to utilize the idle capacity of workstations within an intranet environment.
The system has three functional entities namely the broker, the server and the clients.
The clients define the metrics while requesting a server from the broker to do their
intensive computations. Then the clients establish a connection with the most suitable
server specified by the broker and class loading is used to remotely execute the object
by shipping the object over to the server and specifying the method to execute.
We handle broker and server failures. Broker failures are sensed by the backup
brokers, which then start an election algorithm to replace the primary with a backup.
Server failures result in the object execution to be restarted at another available
server. Load balancing is achieved by allowing pausing and deleting of object
execution at the server. Java's Security Manager package is used to make the server
24
Distributed Server Framework in Java
Quickware
more robust and secure. Finally a sample application of solving the checkmate
problem in Chess was built to demonstrate the benefits of our framework.
6. Bibliography
[1] Courtois, Todd, “Java networking & communication”, 1998, Prentice Hall
[2] Cornell, Gary, "Core Java", 1996, Prentice Hall
[3] Sun's web-site http://www.javasoft.com
7. Appendix: Code Summary
7.1. Common Classes within Broker, Server and the Client directories
Connection
General Thread for handling TCP messages
Data
Base Class for all types of communication messages
Data_*
Specification of each type of communication message
Packet
General format of all
communication messages
packets
including
the
7.2. Main Server Classes
AtipClassLoader
Extension of Java’s ClassLoader to perform loading of
client classes to Server.
AtipSecManager
Our implementation of SecurityManager handler.
HelloTXThread
Thread that is responsible to create alive packets at
predetermined interval that will be sent to primary
broker.
Metric
Returns current metric, right now it is just a simulation.
Registration*
Handles server registration with primary broker.
RoyServer
Handles all communication parts of the Server.
ServerControlFrame
Handles the GUI part of the Server.
TCPHandler
Listens at the TCP port and upon receiving incoming
TCP connection will notify appropriate thread.
25
Distributed Server Framework in Java
Quickware
7.3. Main Broker Classes
AliveThread
Monitors alive message from Primary Broker and start
ElectionThread if it detects that Primary Broker is
dead.
Broker
Main class that executes all Broker threads and
functionality.
BrokerList
Handles manipulation of the Broker list.
BrokerPriority
Returns and compares the priority among Brokers.
Election
Class that implements the election algorithm.
HelloRXThread
Thread that handles the incoming alive messages and
updates Broker or Server status if necessary.
HelloTXThread
Thread that is responsible to create alive packet at
predetermined interval that will be sent to backup
brokers.
ListenThread
Listens at UDP port and upon receiving incoming packet
notifies the appropriate thread.
MulticastServer
Handles the multicast packets received from the Client
and Server.
ServerList
Handles manipulation of the Server list.
TCPListenThread
Listens at TCP port and upon receiving incoming TCP
connection notifies the appropriate thread.
TransactionList
Handles manipulation of the Transaction list.
7.4. Client API Classes
ClassConnection
Handles connection between Client and Server to transfer
the object.
ClassServer
Handles request from the Server if the classes needed to
perform computation cannot be resolved locally.
ClientAPI
Main interface used by the application to communicate
with our infrastructure.
TCPHandler
Listens at TCP port and upon receiving incoming TCP
connections notifies the appropriate thread.
26
Distributed Server Framework in Java
Quickware
7.5. Main Chess Engine Methods
Engine(…)
Constructor that receives initial board configuration
(BoardInfo type), depth level, and distributed parameter.
GetChessStatus()
Returns current Chess board status, whether its in check
status or not.
GetResult()
Returns movement that is found by the StartMove.
StartMove()
Start the seach for finding checkmate move.
27
Download