DISTRIBUTED SERVER FRAMEWORK IN JAVA Yue Dong Roy Laurens Atip Asvanund Parag Manihar Saowanee Saewong Information Networking Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 Distributed Server Framework in Java Quickware 1. Introduction In large organizations, the computation resources provided by workstations are generally left under-utilized. The main problem associated with utilizing the idle CPU time on these workstations is the lack of an infrastructure offering flexible computation support that can be easily used. Remote execution is usually done by sending the parameters, while the code itself is already on the remote workstation. This method is unsuitable because it limits the flexibility; we cannot execute a new object in the idle workstation unless we already have the piece of code for the object already in place on the remote workstation. Another problem associated with managing the idle resource is the need to have a centralized controller that can distribute and monitor remote execution of all objects at the workstations. In this project, we propose to build a broker-based distributed computing environment. The main idea is to create a distributed infrastructure to be used by a client (application) which involves performing some intensive computations. Clients that want to use our infrastructure need to simply defines the metrics, the computation function and its parameters and pass them to our distributed infrastructure. The broker in our distributed infrastructure will return a ticket containing the duration, and the address of the server that's available as well as suitable (according to few metrics described later) to process this request. The computation function will then be sent to the corresponding server, and once the computation is completed, the result will be returned to the application. We will also incorporate multicasting to make our solution highly available and scalable and use class loading for remote execution. Many real-world applications, which need to perform a lot of intensive calculations, will benefit from this infrastructure. Specifically backtracking algorithms will find the framework highly useful. One example of such an application is checkmate problem solving in Chess, in which the best possible move needs to be evaluated for each level. The calculation typically consists of moving each piece on the Chess board to each possible legitimate square and calculate whether the step taken will end the game in a "check" or not. Compared to their performance in centralized systems, such 2 Distributed Server Framework in Java Quickware kinds of computation generally will be performed in a more efficient manner in distributed environments, since the problem can be split up and calculated separately by each computational node. 2. Architecture To provide a working and stable infrastructure, our architecture consists of three main components and two major communication parts. Backup Broker Backup Broker Primary Broker Client Application Server(s) Client API 2.1. Component There are three components in our infrastructure, each with different functions to perform: 2.1.1. Broker The Broker provides information regarding which server is to be used by the client. There are two kinds of brokers in the system: a primary and one or more backups. The Primary Broker is the functional broker that responds to every request generated by clients, while keeping track of the server status and jobs submitted to each server. The Backup Broker is a passive standby broker that always monitor the status of primary broker; when the primary broker goes down for some reason, one of the backup brokers will replace it to become the new primary. Every time a client wants to use our infrastructure, it has to contact the primary broker. The Primary broker will then return the server that it thinks is appropriate to serve this client’s request. Primary broker also 3 Distributed Server Framework in Java Quickware keeps track of each server’s alive-message; monitoring whether they’re alive or dead. When the Primary broker updates its information, it will propagate the information to other backup brokers if they exist. 2.1.2. Server Server is the entity that will perform the actual computation of the client's request. Typically it is an idle workstation with extra CPU time that can be used by clients. When we first start each server, it will try to find the primary broker in the system and register itself with the primary broker. That registration will make this server’s service available should any client send request to the broker. During registration, the server will also send its current status. This includes CPU type, memory size, disk size, etc. The broker uses this information when a client has a request that needs specific metrics. Server will send periodic alive-messages to the broker so that its status is always monitored. This alive-message contains information of current status, such as the CPU utilization or available memory. Once a client receives the information about the server that it can use, then the client will open a connection to the server, sending the object it wants to execute there. 2.1.3. Client Client is a node that wants to utilize the infrastructure. When a client has an intensive computation that will benefit from our distributed infrastructure, it will try to find the primary broker in the system. Once this primary broker is found, the client will contact it and sends its request along with some metrics that describe the nature of the job. Upon receiving answer, client will contact the server given by the broker, send its computation job there and wait for the result. 4 Distributed Server Framework in Java Quickware Client is divided into two parts, the Client API and the client application. Client application is an application that wants to use the infrastructure. To do that, client application will use the service provided by Client API. Client API is the part that actually contacts the broker and sends computation to the server. In other words, Client API is the interface between the application and the infrastructure. 2.2. Communication The communication involved can be divided into two parts: 2.2.1. Initialization The Initialization part deals with the registration of a new server with the primary broker. Then the primary broker will update the new server information in other brokers (backups). 1. The broker has information about all the servers and the client's public key (through a look-up file). Every time a server is turned on, it will notify the broker that it is alive, and also send a keep-alive message at a pre-determined interval. 2. The primary broker will also exchange the current server status with a backup broker every time a server is added or removed 5 Distributed Server Framework in Java Quickware 2.2.2. Request Handling This is a typical communication that occurs whenever a client wants to request a server from the broker. 1. The client sends a request message to the broker, containing the authenticator string and the preferences (metrics). 2. The broker responds by giving a ticket and the allocating a corresponding server. The ticket contains the duration of the session, servers to be used, etc. 3. The client sends the request to the server, specifying the object location and the ticket. 3. Detailed Design and Implementation 3.1. Metric The client can define the following metrics, which the broker will then try to match with those of the servers in its server list: 1. Number of computations (NOC) 2. Maximum Memory required.(MEM) 3. CPU time required (approximately). (CPU) 4. Etc. (this list can be extended to adapt to more metrics in the future.) 6 Distributed Server Framework in Java Quickware On the Server side, each Server should send the its current value of metrics during registration. The Broker stores them in its Server list, and does a look-up to find the best server, on receiving the client request. 3.2. Dynamic port allocation Since we want our infrastructure to be scalable, we make our system free of hardcoded values (e.g.: port numbers). We also have an elegant model for message passing between the various entities (Broker, Server and the Client). The model now allows adding new messages with no changes to the underlying communication mechanism. 3.3. Swing-based GUI for each Server Swing is a new Java cross-platform component set — composed of GUI components that have precisely the “look and feel” we specify. When we create a program using Swing and run it under Windows, it has the appearance and behavior of a program written specifically for Windows. When we run the same program on a UNIX workstation, it runs just like any program written for UNIX. When we run it on an Apple Macintosh, it looks and behaves just like any program written specifically for the Mac — and so on. This conforms to our platform-independence architecture. 3.4. Class Loading Design and Implementation Class Loading allows an object to be shipped over to a host that may not know the implementation details of the object. Upon receiving the object, the host will reconstruct the object and execute the specified function and return the results to the original sender. We use this technique instead of RMI because RMI requires the server to have the interface and hence the implementation details of a class that the client wants to execute. One of the objectives of our project is to give the client flexibility in defining its own class and the method to execute. 7 Distributed Server Framework in Java Quickware Note that ClassLoading using the Java ClassLoader package is different from a conventional Remote Procedure Call or Remote Method Invocation in that the object to be executed needs to be sent to the host (server) that will execute it. In RPC or RMI, the object is normally already on the host. Architecture: A conventional object serialization scheme does not suffice in our situation because the server may not always have the implementation information available to de-serialize and reconstruct the object. Therefore, we need to combine the scheme of Object Serialization and Class Loading. Upon deserializing and reconstructing the object, the host will load the class information of the object from the original location of the object on a need by need basis i.e. it will first try to resolve the class locally, and then remotely if required. Thus if a class has been already loaded it need not be remotely loaded again. A client will be given the API to assist in utilizing our architecture. By using the API, the client need not be concerned with much of the underlying implementation details. A basic understanding of our architecture will suffice. To use the API, the clients can simply instantiate the class "ClassServer", ship the object and wait for the result. The API in consultation with the broker will pick the appropriate host based on the metrics. The class “ClassServer” server on the client side is established to assist the host in de-serializing and reconstructing the object. The object de-serializer on the host will request class information from the class loader server. The object deserializer will only request class information for classes that it does not have implementation details for on the host. The object is shipped using object serialization. The object-deserializer on the host may use the class “ClassServer” on the client side to de-serialize the object. The client then specifies the name of the method to be executed. The host will look for the method in that object and execute it. 8 Distributed Server Framework in Java Quickware The client then waits for the result of the execution. The client, who has necessary information to de-serialize the result, can easily reconstruct the result that is shipped from the host. Initially, the client API needs to instantiate a class "ClassServer" on the client side. This class will return the class information in form of bytecodes to the requestor. The bytecodes are taken from the "requested_class_name.class" file. On compilation, the Java compiler will automatically generates this file. To instantiated the class server on the given port: ClassServer cs = new ClassServer(classServerPort); To instantiated the remote execution engine: ClientAPI ca0 = new ClientAPI(classServerPort); The object to be executed remotely will be shipped with object serialization to the server. The de-serializer on the host will contact the class server to obtain implementation details of the class. This is done by using a custom ClassLoader which is created one per each session. This ClassLoader first checks if it can resolve a class locally. If it cannot, it will then issue calls to the ClassServer explained above. The ClassLoader has a cache, so that it will not need to remotely resolve the same class more than once per session. The ClassLoader takes the bytecodes that is received from the ClassServer and construct it into a Class object of the class. This Class object is then use to assist the de-serializer in reconstructing the object. Since object that is shipped may have references to other classes, the ClassLoader will have to recursively resolve all the referred classes. The Server will then execute the specified method. To do this, the Server generates a Method object representing the specified method on the shipped object. This method is then invoked, and the result is shipped back to the client. To creates new instances of type Object: Object_name myobject = new Object_name(parameters); 9 Distributed Server Framework in Java Quickware To ship myobject to the remote Server and executes the method “method_name” on it: ca0.execute(myobject, "method_name"); The API also provides the function to block wait until the result is returned. To block and wait for the result: ca0.waitForResult(); To get the result: int[][] reply0 = (int[][])ca0.getResult(); Some casting is need to get the result back into int[][] because getResult() returns an Object type. 3.5. Failure Handling Our implementation can cope with two types of failures: Broker failures and Server failures. We use the primary-backup mechanism to deal with Broker failures. At any given time, we have one Broker acting as Primary Broker which grants keys to the clients and stores some states relevant to the servers. We may have one or more Brokers that act as the Backup Brokers. The Primary Broker makes its states known to the Backup Brokers after its state gets updated. The Primary Broker exchanges alive-messages with its backups. If the Primary Broker times out, the first Backup Broker in line takes over and becomes the Primary Broker. In intercepting failures that may occur on the servers, we use alive-messages that are exchanged between the Primary Broker and the servers. If a server times out for several times consecutively, the Primary Broker assumes that the server is down and updates its knowledge base that the server has failed. If a server times out while it is performing computation for a client, the Broker notifies the Clients of the server’s failure. The stub on the client automatically restarts the session on another server assigned by the Broker. 10 Distributed Server Framework in Java Quickware In the unlikely case where a “live” server looses several alive-messages in a row, then the broker will still think it's dead. The Broker will update the client and the client will act accordingly. If the client tries to restart the computation with a different server, and later the answer from the “dead” server arrives, then the client has option of dropping this answer or accepting it. The broker, on the other hand, will receive a normal alive-message from the server that it considered dead. This will make the broker send a message back to the server, saying that it doesn't recognize the server and the server should start the re-registration process. 3.6. Broker Election In our system, when the Primary Broker is dead, there is a need to choose a unique Backup Broker to fill up the particular role. An Election Algorithm is employed for the task of choosing a new Primary amongst many backups. This allows us to scale to a large number of backups (though there is an added overhead in message passing). The algorithm selects the one broker from the surviving Backup brokers, with the largest priority as the new Primary Broker. Following are the eight kinds of messages that are passed in the algorithm: New_Broker_Request: when a new Broker is fired up, it sends this message to let the Primary Broker register it. Primary_Alive: the Primary always sends this “alive” message to all Backup Brokers. Election: this message is multicast to other backups when any Backup Broker detects that the Primary Broker is dead, to announce an election. New_Primary: it is sent to all Brokers to notify that the sender is new Primary Broker. Answer: each Backup Broker sends this message in response to an election, and itself starts a new election session. New_Primary_Ack: After receiving an Answer message, a Backup_Broker may respond to this message to agree to the new Primary Broker. 11 Distributed Server Framework in Java Quickware New_Primary_Refuse: After receiving an Answer message, a Backup Broker may responsed to this message to refuse to the new Primary Broker. New_Primary_Confirm: if response messages returned to the new “Primary” Broker are New_Primary_Acks, it will announce this message to confirm itself as new Primary Broker, and the election is finished. New_Primary_Cancel: if any of response message returned to the new “Primary” Broker is New_Primary_Cancel, or time out, it will announce this message to cancel itself as new Primary Broker, and start another election. When the failed Primary Broker is actually perceived as “dead” due to a temporal failure, it probably resumes in some time. In order to avoid two Primary Brokers in the system, a new election session starts. If it has the highest priority, then it will decide that it is the Primary Broke, and “bully-send” the New-Primary message to all brokers. In the best case, the process with the second-to-highest priority notices the Primary Broker’s failure. Then it can immediately elect itself, and send (n-2) messages. The algorithm requires O(n2) messages in the worst case, that is, when the process with the least priority first detects the Primary Broker’s failure. Then (n-1) processes together begin the election process, each sending messages to other brokers with higher priorities. 3.7. Multicasting To get high availability of the primary broker, we use the Multicasting technique instead of the Naming Service of RMI and CORBA. This alongwith the election algorithm will let the startup broker, server or client multicast a Primary_Request message to find a Primary broker in the system. The primary broker in the system will return Primary_Return message to identify its address and port. This approach gives high flexibility to the system because no one needs to know about the primary broker before hand and the system still works as long as there is one broker left in the system. This is an advantage compared to the 12 Distributed Server Framework in Java Quickware Naming Service of RMI and CORBA, which will break down when the host running the naming service is down. 3.8. Load Balancing Our second means of load balancing is initiated when load in the server exceeds certain predetermined threshold. If this happens, server will send a message to the broker telling that this server wants to transfer current object elsewhere. Broker would then try to find another server that is suitable and respond back to inform the server where to transfer the object. After the new server receives the object, it will send notification to the Broker that this transfer has finished successfully. Broker has the option of not propagating this notification to respective clients (thereby making the process transparent) or sending an explicit notification to the client. 3.9. Sample Applications Since this project only provides the infrastructure for a distributed system, it is the client application’s responsibility to utilize the infrastructure in an efficient manner. Our infrastructure is just providing the notion of having several “processing nodes". The client application is responsible for distributing one large task into several processes that can work independently. In order to fully exploit the power of our infrastructure, the task that client wants to solve must be a suitable task for distributed computation. Several types of tasks that can largely benefit from our system are: Tasks that requires numerous independent computations Simply executing each independent computation in different servers can split such a task like this. Because the computations will be executed in parallel, the whole task can be significantly faster than a single node computation. However, since all of the independent computations must be finished, the task cannot be considered complete until the result from the last server is returned. 13 Distributed Server Framework in Java Quickware Tasks of this type also cannot be easily split into several parts without careful consideration of the load that each part will require at the server. In order to achieve maximum distributed effect, each part should be of equal complexity. Examples of this type of tasks are evaluation of mathematical expressions. Tasks that consist of iterative executions Repetitive execution of the same code with different set of data is among the easiest type of application to take advantage of distributed computing. If the code has somewhat similar execution complexity over various sets of data, then a client can simply send each iteration as a different computation problem to be processed by the distributed infrastructure. As with the case of independent computation, this task must also wait for all servers to finish their respective calculations before a final result can be obtained. Example of this type of task is vector multiplication. Tasks that involve trial and error checking for some value These kinds of tasks are the one that can really exploit the full use of our distributed framework. Basically, this task’s purpose is to find a solution to a problem where there is no known algorithm to solve it other than an exhaustive search. So, the task will have to search the entire solution space and try each candidate to see whether it is the right solution or not. Since this task typically stops the moment it finds the first solution, then the ability to do checking on several candidates simultaneously is very appealing. After the task is split into several smaller computations — where each computation works on different set of candidates — it simply waits if any computation returns a solution. So, the task doesn’t have to wait for all the servers to finish their calculations. Example of this type of tasks are back tracking algorithm, one of which is explained in Section4.5. 14 Distributed Server Framework in Java Quickware 4. Discussion After designing and implementing our infrastructure, we came up with some interesting issues to discuss. 4.1. Load Balancing After a thorough examination on class loading, we found out that although stopping the execution of an object is possible, resuming the execution from the current state afterwards is almost impossible. This is due to the nature of object loading that doesn’t preserve local temporal context. Contents of local variable are not conserved. Therefore if we want to have the ability to resume and continue execution, the object must implement a scheme to do bookkeeping of local variables. This task, to be done by the client application is not a trivial one, and poses a significant risk of introducing insidious bugs that only occur during object transfer. So, even though the implementation on our infrastructure is quite straightforward, the enormous routine that has to be added on the client application makes us decide to drop this feature. Instead, we offer a pause and delete feature that can be implemented with minor modification on the client and still offer a reasonable solution in case the load increases. The pause and delete options allow the client to decide whether to stop the object execution temporarily (pause) or to delete the object when the server experiences heavy load. This feature assumes that the client knows and can approximate the object execution time. Therefore, when the server is overloaded and inform it to the client, it can decide whether it is better for this object to be paused or to be deleted. Pausing is appropriate if the object execution is about to be finished. Server overload could be a transitory condition, and it would be better to just wait until the load subsides, rather than restart, executing the object on another server. When the server load drops, the server will inform the client and the object can resume its execution. 15 Distributed Server Framework in Java Quickware Deleting, on the other hand, is preferable when the object execution is still in its infancy. When heavy load is experienced during this early stage, it is better to delete the object, send it to another server and restart it there. Not much computation time is lost because the discarded object execution time is negligible compared to the total execution time. It is clear that only the client can determine whether object execution time up to the moment of heavy load is considered short (better to be deleted) or long (better to be paused). A five minutes elapsed time is short if total execution will take 2 hours, but if total execution is five and a half minutes, it is better to pause. 4.2. Election Algorithm Advantages Our primary-backup broker algorithm has the following advantages: Flexible registration When the broker starts up, it will use multicasting as mentioned before to find a primary broker in the system. The primary broker will add the new 16 Distributed Server Framework in Java Quickware broker in its broker list and transfer this new state of broker list to backup brokers. High availability The primary broker is responsible to send Alive messages to backup brokers. Therefore, a backup broker who notifies that the primary broker is down first will start the Election algorithm for finding the best candidate (the highest priority) to be the new primary broker. Intelligent discovery of backup-broker The Election algorithm will let the backup broker who finds that the primary broker is down to send an Election message to other backup brokers. Only the higher priority backup brokers will send the Answer message back. The lower or equal priority backup will send the Election_Ack message back. The broker who starts the election will update the status of each broker in the broker list and if it does not receive any Answer message, it will claim that it is the best candidate and start the New_Primary_Voting algorithm. High consistency We implement the New_Primary_Voting algorithm like the Mutual Distributed Exclusion algorithm. The broker who claims that it is the best candidate to be the new primary broker and starts the New_Primary_Voting algorithm, will multicast the New_Primary_Voting message. The broker who receives this message and has lower or equal priority and have not voted to other broker yet will send the New_Primary_Voting_Ack and lock itself. Otherwise it will send the New_Primary_Voting_Refuse back. To avoid deadlock, whenever the broker receives the New_Primary_Voting message, which has the priority higher than the one that it already voted on, it will send the New_Primary_Voting_Refuse to the old broker and wait for the New_Primary_Cancel to unlock its vote. This broker will send the 17 Distributed Server Framework in Java Quickware New_Primary_Voting_Postpone message to the new one to postpone the voting. If the voting broker can collect all the voting results, it will send the New_Primary_Confirm message to all brokers. Otherwise it will send the New_Primary_Cancel message to all brokers which will unlock its vote. 4.3. Java SecurityManager Our initial thoughts to get around the security problem, was to simply write a server as an applet and run it in an applet viewer. Running it in a browser was ruled out because of the extra over head that would add to our server. But we found out that an applet viewer has many restrictions that we cannot turn off. And this prevents us from using our multicast protocols. Therefore we reverted back to implementing the server as an application and using the Security Manager package provided in Java to make it more securely robust. The Java Security Manager is tightly integrated with the core of Java. So, we did not have to do much work-around to provide the sand box model. We just custom created our security manager and install it on every server. In order to do this, we create a subclass of SecurityManager, and override all the checkX() methods in it. Any method that we do not override will be disallowed by the JVM. public class AtipSecurityManager extends SecurityManager { public AtipSecurityManager() { super(); } … public void checkConnect(String s, int i) { //to allow all type of connections from and to the JAVA VM } public void checkDelete(String file) { throw new SecurityException("Thou may not access the file system"); //to not allow any type of file deletion from the JAVA VM } … } 18 Distributed Server Framework in Java Quickware As a security feature, only one SecurityManager can be installed for each Java Virtual Machine, and it is the first one to be installed. During the final demonstration, we have shown the functionality of our security in the Server. 4.4. Java Multi Threading problem in Solaris We were experiencing context-switching inconsistencies that we could not explain within our JAVA code. Then we realized that JAVA on Solaris does not perform proper time slicing although it does on Microsoft Windows 95/NT platforms. On Solaris, a JAVA thread will occupy the complete JAVA VM time, unless it issues the java.lang.sleep () method. Otherwise, the thread is nonpreemptive. This only effects the JAVA VM. It appears that other applications running outside JAVA VM do time slicing normally. However, these threads are pre-emptive on Windows even if we don’t issue the java.lang.sleep() method. This was a significant problem because it means that all application (i.e., chess engine codes) that were being shipped to run remotely would need to call java.lang.sleep() at a regular interval. Otherwise, the engine will be not be preempted and the JAVA VM will be solely occupied by the engine until it finishes calculation — which can take a significant time. So, despite the security feature that have been discussed in Section4.3, a person can hack our architecture by sending a code that occupies the whole JAVA VM if the Server is running on the Solaris platform. 4.5. Chess Checkmate Application The decision-making parts in calculating the next move on a computer game is a typical example of a problem requiring a backtrack algorithm. Usually, in order to determine whether a particular move is a good move, the computer has to compute the opponent’s entire possible response move space, and see whether the computer will be in a better position compared to the current position or not. In chess, a typical board configuration might have twenty to thirty possible moves. The opponent might also have more than twenty possible replies to each 19 Distributed Server Framework in Java Quickware of these moves. A good player will go even further and try to compute the reply to each of the opponent’s reply, and so on. Calculating each single move could easily mean calculating hundreds or thousands of board position, depending on how deep the calculation goes. In our project, we will try to make a chess engine, where given a board configuration, and the computer will try to find the best move in such a configuration. Since it is almost impossible to know whether a particular move is the best move until way later, we will limit our application only to find the move that will create a checkmate. In this case, a solution can easily be checked and the computer has a well-defined criteria to measure the feasibility of the move. First, the Chess application will split the required computation to several parts. In our particular implementation, we choose to split it based on the depth level to be searched. But we can easily change this to be based on a subset of possible moves. We choose depth level because we have fine-tuned our engine to intelligently pick the best candidate in each level and try it first. This will make splitting based on possible moves irrelevant, because in almost all but the most extreme board position, the first chosen move will result in the solution. However, we still have to check whether this move will actually lead to a checkmate or not. Since we are trying to find a checkmate as soon as possible, we will try to set the depth level to the smallest value and increase it if we cannot find checkmate with the current depth of search. 20 Distributed Server Framework in Java Quickware 4.6. Performance Analysis We developed the following scenario to test the performance of our system. Although the benchmarks cover a narrow range of our distributed system, it serves our purpose to provide some performance guidance in our system. The tests were performed using Java 1.1.4 under Window NT 4.0. For the initial setup scenario, we tested the setup time of one broker (i.e., elapsed time from when the broker started until it become fully configured as primary broker). Then we also tested the setup time when the first, the second and the third broker is brought up. 21 Distributed Server Framework in Java Quickware Timeout: 2000ms Unit: Second Case Time 1 Time 2 Time 3 Time 4 Time 5 Avg. One Broker setup alone 8 7 8 8 8 7.8 The 1st backup setup The 2nd backup setup The 3rd backup setup 2 3 4 2 2 3 2 2 3 2 2 3 3 3 3 2.2 2.4 3.2 The setup time in the first case is longer than the subsequent ones, since it takes much more time to setup broker environments, to check if there are other (backup) brokers alive, and to start our election algorithm. For the other cases, the setup time is almost equally short, took much less time, than the first one. After detecting there is an existing primary broker, backup brokers in these cases simply get the information from the primary broker. The transfer time of the information is almost the same in the later three cases, although the case 3 the a little longer than the case 2 and case 1, probably due to the slight increase in message passing. Meanwhile, we found out that the timeout interval plays a very important role in the benchmark. Interval of 2,000 ms is used in the above data. If we change the interval to 10,000 ms (which was necessary under Solaris environments), the corresponding times are1: Timeout: 10,000ms Unit: Second Case Average One Broker setup alone The 1st backup setup then The 2nd backup setup then The 3rd backup setup last 37.6 7.4 7.6 8 Next, we set up the following scenario for measuring the Election algorithm time: Election time required when there is only one backup broker, after the primary broker goes down. 1 This is the result shown in the final demo 22 Distributed Server Framework in Java Quickware Election time between two backup brokers, when the primary broker is dead. Election time among three backup brokers, when the primary broker is dead. Unit: Second Case Time 1 Time 2 Time 3 Time 4 Time 5 Average 1. 2. 3. 10 12 31 9 11 30 10 12 29 10 12 23 10 12 28 9.8 11.8 28.2 As we can expect, the election time greatly depends on the number of backup brokers in the system. Less backup brokers will lead to less election time. If there are more backup brokers, it is necessary for them to find each other and also to check the status (such as priority, active/dead) of each in a larger range. Also, message passing in a larger backup broker family will increase greatly compared to the environment where there are only one or two backup brokers. Finally, we also measure the time required to do class loading and execute an actual application. We measure the time needed to perform the checkmate engine without distributing the computation and, then we measure the time taken when it is submitted to the distributed infrastructure with two servers. We also measure the time associated with loading a class by executing an empty class (i.e., it contains no code). This is one of the major overheads of our infrastructure. Unit: Second Case Average Chess application without our infrastructure Chess application with our infrastructure (2 servers) Class loading overhead (one class) Class loading overhead (two classes) 193.5 104.8 0.643 1.870 As predicted, the improvement in distributed environment is significant. This is due not just to the splitting of the work, but also because the chess application only need one server response to complete the calculation. We notice that class loading imposes a small overhead to our system. This means that a computation that needs more than a couple of seconds can easily benefit from our system. It is also observed that loading two classes cost more than twice of loading one class. 23 Distributed Server Framework in Java Quickware This might happen since loading more than one class, makes a heavier burden on the thread infrastructure. So, even though the communication cost is probably only two times, but the cost for the computation might be taking more than that. 4.7. Future Work We would like to have billing as an integral part of our system. Clients could be billed on a range of metrics from CPU time, memory requirements, leasing time, etc. This feature can be easily implemented on the Broker without affecting much of the other components. For billing, we need to focus more on the security aspects that fell outside the scope of this course project but will be essential for any commercialization in the future. We can use Kerberos ticket granting, thereby greatly enhancing the security of our system by providing authentication between different components. Currently the system just simulates the metrics at the server end; mechanisms are required to measure metrics (like load) dynamically in real time. 5. Summary In this course project we have demonstrated the use of broker-based distributed framework to utilize the idle capacity of workstations within an intranet environment. The system has three functional entities namely the broker, the server and the clients. The clients define the metrics while requesting a server from the broker to do their intensive computations. Then the clients establish a connection with the most suitable server specified by the broker and class loading is used to remotely execute the object by shipping the object over to the server and specifying the method to execute. We handle broker and server failures. Broker failures are sensed by the backup brokers, which then start an election algorithm to replace the primary with a backup. Server failures result in the object execution to be restarted at another available server. Load balancing is achieved by allowing pausing and deleting of object execution at the server. Java's Security Manager package is used to make the server 24 Distributed Server Framework in Java Quickware more robust and secure. Finally a sample application of solving the checkmate problem in Chess was built to demonstrate the benefits of our framework. 6. Bibliography [1] Courtois, Todd, “Java networking & communication”, 1998, Prentice Hall [2] Cornell, Gary, "Core Java", 1996, Prentice Hall [3] Sun's web-site http://www.javasoft.com 7. Appendix: Code Summary 7.1. Common Classes within Broker, Server and the Client directories Connection General Thread for handling TCP messages Data Base Class for all types of communication messages Data_* Specification of each type of communication message Packet General format of all communication messages packets including the 7.2. Main Server Classes AtipClassLoader Extension of Java’s ClassLoader to perform loading of client classes to Server. AtipSecManager Our implementation of SecurityManager handler. HelloTXThread Thread that is responsible to create alive packets at predetermined interval that will be sent to primary broker. Metric Returns current metric, right now it is just a simulation. Registration* Handles server registration with primary broker. RoyServer Handles all communication parts of the Server. ServerControlFrame Handles the GUI part of the Server. TCPHandler Listens at the TCP port and upon receiving incoming TCP connection will notify appropriate thread. 25 Distributed Server Framework in Java Quickware 7.3. Main Broker Classes AliveThread Monitors alive message from Primary Broker and start ElectionThread if it detects that Primary Broker is dead. Broker Main class that executes all Broker threads and functionality. BrokerList Handles manipulation of the Broker list. BrokerPriority Returns and compares the priority among Brokers. Election Class that implements the election algorithm. HelloRXThread Thread that handles the incoming alive messages and updates Broker or Server status if necessary. HelloTXThread Thread that is responsible to create alive packet at predetermined interval that will be sent to backup brokers. ListenThread Listens at UDP port and upon receiving incoming packet notifies the appropriate thread. MulticastServer Handles the multicast packets received from the Client and Server. ServerList Handles manipulation of the Server list. TCPListenThread Listens at TCP port and upon receiving incoming TCP connection notifies the appropriate thread. TransactionList Handles manipulation of the Transaction list. 7.4. Client API Classes ClassConnection Handles connection between Client and Server to transfer the object. ClassServer Handles request from the Server if the classes needed to perform computation cannot be resolved locally. ClientAPI Main interface used by the application to communicate with our infrastructure. TCPHandler Listens at TCP port and upon receiving incoming TCP connections notifies the appropriate thread. 26 Distributed Server Framework in Java Quickware 7.5. Main Chess Engine Methods Engine(…) Constructor that receives initial board configuration (BoardInfo type), depth level, and distributed parameter. GetChessStatus() Returns current Chess board status, whether its in check status or not. GetResult() Returns movement that is found by the StartMove. StartMove() Start the seach for finding checkmate move. 27