Distributed Systems – 236351 Exercise 2 – Fault Tolerant Encryption support service Due date 24.3.2008 at midnight System Overview In this exercise, you are to implement a fault tolerant factorization service. As in the previous assignment, clients use this service in order to factorize given decimals. The factorization service publishes itself by registering in a Naming Service server, which you will need to implement as well. Furthermore, in order to support multiple factorizations in parallel (which, as you know, is a very costly task complexity wise), the actual factorizations will be done by several back end factorization servers and the service load is balanced among them. In addition to that you will have to support multiple front end factorization servers, which will be load balanced and backup each other in case of failures. The service should support the adding/removing of both back end and front end factorization servers, the load balance among them and the recovery from a crash of any back/front end factorization server. In addition, in order to save in computation time the back end factorization servers will keep and share among them a set of numbers they have already recognized as primes and based on this set they can only try to factorize numbers that are not already recognized as primes. In addition you have to implement group checkpointing, meaning, each partial factorization result (like 24 partially factorizes to 6 and 4) will be shared among the back end factorization servers and by that they can save a lot of computation time in case a back end server will crash in the middle of a long factorization. All of the group communication among back end factorization servers will be done using Ensemble, and similarly all intra communication between the front end servers will be done by Ensemble. System Components The system is composed of the following components as described in figure 1 and elaborated following. Figure 1 (Each of the back end servers communicates with the Naming Service) Client The system supports several clients. Each client is a console application that supports the following command: FACTORIZE <decimal d> - returns the unique set of prime numbers (of decimal type) that their multiplication will result in the given decimal number d. Clients communicate with the encryption support service and the naming service over the internet. Clients should support front end factorization server failure recovery and load balancing using a two custom sinks. Naming Service The naming service should expose methods for the factorization service to publish its URI and for Clients and back end factorization servers a way to learn about the location of the factorization front end servers. In particular, the naming service should support having multiple front ends servers registering under the same service name. All other entities communicate with the naming service over the internet. Its URI is predefined and publically known. Factorization Server – Front End The factorization server front end should expose methods to support client’s factorization requests and the addition/removal of factorization back end servers (using different interfaces). Each factorization request should be forwarded to a different back end server maintaining some sort of load balancing. The load balancing and crash recovery logic of the back end servers should be implemented via 2 custom sinks. The cluster is maintained by Ensamble. Factorization Server – Back End Factorization servers learn about the URI of the front end servers by querying the naming service over the Internet. However, they should only register themselves at one front end server based on the idea that the front end servers share the back end servers list using group communication. The back end factorization servers should expose a method for factorizing decimal numbers. They should maintain their cluster using Ensamble and use it in order to share a set of already discovered prime numbers which will save them some computations in future factorizations. In addition they will use their cluster for group checkpointing, meaning, after each partial factorization, the information (like 24 partially factorizes to 6 and 4) should be broadcasted and saved on all of the back end servers until this factorization is completed successfully and then it should be deleted, so that in case a back end server will crash before completing its factorization, the back end server that will have to complete the factorization instead of the one that crashed won’t have to start the factorization from scratch. Both the front end and the back end factorization servers are on the same Local Area Network (LAN). Additional Details Intra cluster communication should be made only with Ensamble. All other communication should be made via .NET Remoting. Though you may implement the components as processes on a single machine, you should consider the specified architecture. You can ignore the possibility of network partitions (inside the clusters). It is up to you to complete the design and protocol details of the system. No sophisticated factorization algorithms are required. Maximum code reuse is required. Make your console applications “user friendly”. Print messages to show progress and feedback the user. All of the arguments to the remote methods should be Decimals which can support very large numbers with high precision. You should try to make your custom sinks as service independent as possible. Your code should be reasonably documented and understandable. A detailed external documentation should describe how you solved the exercise. Please take the external documentation seriously, it will consist a substantial part in grading your work. No prime number generation service is required. Submission Submission is in pairs only and using the electronic submission system, the attached filename should be dsex2.zip. The zip file should contain: 1. A text file named submitters.txt containing the names, IDs and emails of the submitters. 2. Another zip file with your entire Visual Studio 2005 solution. 3. A doc/pdf/ppt file with your external documentation. 4. A zipped folder named run. The folder should contain 4 executables named (with respect to their roles) NamingService.exe, Client.exe, FactorizeFront.exe, FactorizeBack.exe and whatever dlls they require. Your solution will be automatically tested after unzipping this zip and executing the attached batch file (run-ex2.bat) which contains the following execution syntax: NamingService.exe <port> e.g. NamingSerivce.exe 7000 Client <naming service uri> e.g. Client.exe http://localhost:7000 FactorizeFront.exe <front-port> <back-port> <naming service uri> e.g. FactorizeFront.exe 8000 8100 http://localhost:7000 FactorizeBack.exe <port> <naming service uri> e.g. FactorizeBack.exe 8200 http://localhost:7000 The execution batch file which will be used for checking your exercises is attached. Make sure you test your “run” zip folder using that batch file before submitting. It is advised to test the executables on more than one machine in order to make sure your code doesn’t use any specially located dlls outside the “run” zip folder or any predefined IP address. Submit a printout of your external documentation to the course cell. Good Luck