Co-Alloation, Fault Tolerane and Grid Computing 1 2 Jon MaLaren,1 Mark M Keown,2 and Stephen Pikles2 Center for Computation and Tehnology, Louisiana State University, Baton Rouge, Louisiana 70803, United States. Manhester Computing, The University of Manhester, Oxford Road, Manhester M13 9PL. Experiene gained from the TeraGyroid and SPICE projets has shown that o-alloation and fault tolerane are important requirements for Grid omputing. Co-alloation is neessary for distributed appliations that require multiple resoures that must be reserved before use. Fault tolerane is important beause a omputational Grid will always have faulty omponents and some of those faults may be Byzantine. We present HARC, Highly-Available Robust Co-alloator, an approah to building fault tolerant o-alloation servies. HARC is designed aording to the REST arhitetural style and uses the Paxos and Paxos Commit algorithms to provide fault tolerane. HARC/1, an implementation of HARC using HTTP and XML, has been demonstrated at SuperComputing 2005 and iGrid 2005. I. INTRODUCTION In this paper we disuss the importane of oalloation to Grid omputing. We provide some bakground on the diulties assoiated with oalloation before presenting HARC, Highly-Available Robust Co-alloator, a fault tolerant approah to oalloation. The paper also inludes a disussion on the problem of fault tolerane and Grid omputing. We make the ase that a omputational Grid will always have faulty omponents and that some of those faults will be Byzantine [26, 27℄. However, we also demonstrate that with suitable approahes it is still possible to make a omputational Grid a fault tolerant system. There are many denitions of Grid omputing but reurring themes are: large sale or internet sale distributed omputing and sharing resoures aross multiple administrative domains. The goal of sharing resoures between organizations dates bak to the ARPANET [33℄ projet and progress towards that goal an be seen in the development of the Internet, the World Wide Web [22℄ and now omputational Grids. While the goals of the ARPANET projet are still relevant today the underlying infrastruture has hanged, powerful omputers have beome heap and plentiful while high performane networks have beome pervasive, presenting developers with a different set of hallenges and opportunities. We believe that o-alloation is a new hallenge, while the falling ost of omponents makes fault tolerane a new opportunity. Parallel to the evolution of ARPANET through to Grid omputing has been the development of the theory of distributed systems providing us with a deeper understanding of distributed systems and a set of algorithms for building fault tolerant systems. Representational State Transfer [12℄, REST, is an arhitetural style for building large sale distributed systems that was used to develop the protools that make up the World Wide Web. Paxos [28℄ is a fault tolerant onsensus algorithm that an be used to build highly available systems [31℄. Paxos Commit [18℄ is Paxos applied to the distributed transation ommit problem. HARC uses REST and Paxos to provide a system that is fault tolerant and suitable for a large sale distributed system suh as a omputational Grid. HARC's fous on fault tolerane is unique among approahes to designing o-alloation servies [3, 9, 24, 34, 39℄. II. CO-ALLOCATION Running distributed appliations on a omputational Grid often requires that the resoures needed by the appliation are available at the same time. The resoures may need to be booked (eg Aess Grid nodes) or they may use a bath submission system (eg HPC systems). We dene o-alloation as the provision of a set of resoures at the same time or at some o-ordinated set of times. Co-alloation an be ahieved by making a set of reservations for the required resoures with the respetive resoure providers. Experiene from the award winning TeraGyroid [7℄ and SPICE [23℄ projets has shown that support for o-alloation on urrent prodution Grids [37, 38℄ is ad-ho and often requires time onsuming diret negotiation with the resoure administrators. The resoures required by TeraGyroid and SPICE inluded HPC systems, speial high performane networks, high end visualization systems, Aess Grid nodes, hapti devies and people. The lak of onvenient o-alloation servies prevents the routine use of the tehniques pioneered by TeraGyroid and SPICE. Without o-alloation omputational Grids are limited in the type of appliations they an support, and so are limited in their potential. Co-alloation's im- 2 portane to Grid omputing means that it must be a reliable servie. III. FAULT TOLERANCE AND GRID COMPUTING Whatever denition of Grid omputing is used we are lead to two inesapable onlusions: a omputational Grid will always have faulty omponents and some of those faults will be Byzantine. Computational Grids are internet sale distributed systems, implying large numbers of omponents and wide area networks. At this sale there will always be faulty omponents. The ase that a omputational Grid will always have faulty omponents is illustrated by the fat that on average over ten perent of the daily operational monitoring tests, GITS [4℄, run on the UK National Grid Servie, UK NGS [38℄, report failures. Sine the start of the UK NGS a number of the ore nodes have been unavailable for days at a time due to planned and unplanned maintenane. Crossing administrative boundaries raises issues of trust: Can a user trust that a resoure provider has ongured and administrates the resoure properly? Can a resoure provider trust that a user will use the resoure orretly? Distributed transations are rarely used between organizations beause of a lak of trust; one organization will not allow another organization to hold a lok on its internal databases. Without trust users, resoure providers and middleware developers must be prepared for Byzantine fault behaviour: when a omponent faults but ontinues to operate in an unpreditable and potentially maliious way. Although Grid omputing supports the reation of limited trust relationships between organizations through the onept of the Virtual Organization [14℄ our experiene has been that Grids exhibit Byzantine fault behaviour. To illustrate the point we provide three examples of Byzantine behaviour whih we have enountered. The rst ase involved the UK eSiene Grid's MDS2 [2, 10℄ hierarhy. A GRIS [10℄ at a site was rewalled preventing GIIS [10℄ higher up in the MDS2 hierarhy from querying it. The GRIS ontinued to report that it was publishing information but whenever a GIIS attempted to query it the rewall would blok the onnetion. The GIIS would blok waiting for the GRIS to respond ausing the whole MDS2 hierarhy to blok. The rewall was raised by the site's network administrators. The loal rewall on the GRIS server and the GRIS servie itself were ongured orretly. The seond ase involved a job submission node on the TeraGrid [37℄. The node onsisted of two servers and utilized a DNS round robin to alloate requests to a node. Unfortunately the DNS entries were mis-ongured and reported the wrong hostname for one of the nodes ausing job submissions to fail randomly. Loal testing at the site did not reveal the problem. The third ase involved a resoure broker that assigned jobs to omputational resoures. Oasionally a omputational resoure would fail in a Byzantine way: it would aept jobs from the resoure broker, fail to exeute the job but report to the resoure broker that the job had ompleted suessfully. The resoure broker would ontinue assigning jobs to the faulty resoure until it was drained of jobs. The examples illustrate the importane of end-toend arguments in system design error reovery at the appliation level is absolutely neessary for a reliable system, and any other error detetion or reovery is not logially neessary but is stritly for performane [35℄. In eah ase making the individual omponents more reliable would not have prevented the problem. Retrotting reliability to an existing design is very diult [30℄. For o-alloation a very real example of Byzantine fault behaviour ours when a resoure provider aepts a reservation for a resoure but at the sheduled time the user, and possibly the resoure provider, disover the resoure is unavailable. IV. REST REST is an arhitetural style for building large sale distributed systems. It onsists of a set of priniples and design onstraints whih were used in designing the protools that make up the World Wide Web. We provide a brief desription of REST and refer the reader to the thesis [12℄ for a omplete desription. REST is based on a lient-server model whih supports ahing and where interations between lient and server are stateless, all interation state is stored on the lient for server salability. The onept of a resoure is entral to REST, resoures have identity and anything that an have an identity an be a resoure. Resoures are manipulated through their representations and are networked together through linking hypermedia is the engine of appliation state. Together the last set of onstraints ombine to make up the priniple of uniform interfae. REST also has an optional onstraint for the support of mobile ode. HTTP [11℄ is an example of a protool that has been designed aording to REST. On the World Wide Web a resoure is identied by a URI [6℄ and lients an retrieve a representation of the resoure using a HTTP GET. HTTP an also supply ahing information along with the representation to allow 3 intermediaries to ahe the representation. The representation may ontain links to other resoures reating a network of resoures. Clients an hange the representation of a resoure by replaing the existing representation with a new one, for example using a HTTP PUT. All resoures on the World Wide Web have a uniform interfae allowing generi piees of software suh as web browsers to interat with them. Web servers are also able to send ode, for example JavaSript, to the lient to be exeuted in the browser. V. PROBLEMS OF CO-ALLOCATION To illustrate the problems assoiated with oalloation we present two possible approahes and disuss their shortomings. Most existing solutions [3, 9, 24, 34, 39℄ for o-alloation are based on variations of these approahes. A. One Phase Approah In this approah a suessful reservation is made in a single step. The user sends a booking request to eah of the Resoure Managers (RM) requesting a reservation. The RMs either aept or rejet the booking. If one or more of the RMs rejets the booking the user must anel any other bookings that have been made. This approah has the advantage of being simple but a potential drawbak is that the user may be harged eah time he anels a reservation. Supporting reservations prevents a RM from running the optimal workload on a resoure as it has to shedule around the reservations. Even if a reservation is anelled before the sheduled time it may already have delayed exeution of some jobs. RMs may harge for the use of the resoure and may harge more for jobs that are submitted via reservation to ompensate for the loss in throughput. They may also harge for reservations that are anelled. Any harging poliy is the prerogative of the RM. Not all RMs may harge and it is unreasonable to make any assumptions about or try to mandate a harging poliy. A o-alloation protool should aommodate the issues assoiated with harging but annot depend on RMs supporting harging. Another potential problem with this approah arises if one of the RMs rejets a booking but the user does not anel the other reservations that have been made. For example the user may fail before he has had a hane to anel the reservations. Even if the user reovers he may not have stored suient state before failing to anel the reservations. This Initial State Abort Booking Held Commit Booking Committed Abort Booking Rejected FIG. 1: The statetransition diagram for a RM in the two phase approah. The RM reeives the booking request message in the initial state. It an either rejet the request and move to the booking rejeted state or aept the request and move to the Booking Held state. The RM waits in the Booking Held state until it reeives the Commit or Abort message from the TM. is related to the question of trust the RMs must trust the user to anel any unwanted reservations. A partial solution is to add an intermediary servie that is trusted by the user and RM to orretly make and anel reservations as neessary. The user sends the set of bookings he requires to the intermediary whih handles all the interations with the RMs, anelling all the reservations if the purposed shedule is unaeptable to any of the RMs. The intermediary is still a single point of failure unless it an be repliated. B. Two Phase Approah In this approah a suessful reservation is made in two steps. To resolve the problem of RMs harging for anelling a reservation we introdue a new state for the RM, Booking Held. The lient an anel a reservation in the Booking Held state without being harged. The approah is similar to the two phase ommit [17℄ algorithm for distributed transations but does not neessarily have the ACID (Atomiity, Consisteny, Isolation and Durability [19℄) properties assoiated with two phase ommit. A proess, the Transation Manager (TM), tries to get a group of RMs to aept or rejet a set of reservations. In the rst phase the TM sends the booking requests to the RMs, the RMs reply with a Booking Held message if the request is aeptable or an Abort message if the request is unaeptable. If all the RMs reply with Booking Held the TM sends a Commit message to the RMs and the reservations are ommitted. Otherwise the TM sends an Abort message to the RMs and no reservations are made. 4 Fig. 1 shows the state transitions for the RM in the two phase approah. Unfortunately the two phase approah an blok. If the TM fails after the rst phase but before starting the seond phase, before sending the Commit or Abort message, the RMs are left in the Booking Held state until the TM reovers. While in the Booking Held state the RMs may be unable to aept reservations for the sheduled time slot and the resoure sheduler may be unable to run the optimal work load on the resoure. The bloking nature of the two phase approah may be aeptable for ertain senarios depending on how long the TM is unavailable and providing it reovers orretly. To overome the bloking nature of the two phase approah requires a three phase protool [36℄ suh as Paxos. Another potential problem with the two phase approah is that RMs must support the Booking Held state. A solution is for RMs that do not support the Booking Held state to move straight to the Booking Committed state if the booking request is aeptable in the rst phase. For the seond phase they an ignore a Commit message and treat an Abort message as a anellation of the reservation for whih the user may be harged. This a relaxation of the onsisteny property of two phase ommit. It is also possible to relax the isolation property of two phase ommit. If a RM reeives a booking request while in the Booking Held state it an rejet the request, advising the lient that it is holding an unommitted booking whih may be released in the future. This prevents deadlok [19℄, when two users request the same resoures at the same time. The role of the TM an be played by the user but should be arried out by a trusted intermediary servie. VI. CO-ALLOCATION AND CONSENSUS Co-alloation is a onsensus problem. Consensus is onerned with how to get a set of proesses to agree a value [26℄. Distributed transation ommit, of whih two phase ommit is one approah, is a speial ase of onsensus were all the proesses must agree on either ommit or abort for a transation. In the ase of o-alloation the RMs must agree to either aept or rejet a purposed shedule. Consensus is a well understood problem with over twenty ve years of researh [13, 2628℄. For example it has been shown that distributed onsensus is impossible in an asynhronous system with a perfet network and just one faulty proessor [13℄. In an asynhronous system it is impossible to dierentiate between a proessor that is very slow and one that has failed. To handle faults requires either fault de- tetors or partial synhrony. Consensus is an important problem beause it an be used to build replias: start a set of deterministi state mahines in the same state and use onsensus to make them agree on the order of messages to proess [25, 31℄. Paxos is a well known fault tolerant onsensus algorithm. We present only a desription of its properties and refer the reader to the literature [28, 29, 31℄ for a full desription of the algorithm. In Paxos a leader proess tries to guide a set of aeptor proesses to agree a value. Paxos will reah onsensus even if messages are lost, delayed or dupliated. It an tolerate multiple simultaneous leaders and any of the proesses, leader or aeptor, an fail and reover multiple times. Consensus is reahed if there is a single leader for a long enough time during whih the leader an talk to a majority of the aeptor proesses twie. It may not terminate if there are always too many leaders. There is also a Byzantine version of Paxos [8℄ to handle the ase were aeptors may have Byzantine faults. Paxos Commit [18℄ is the Paxos algorithm applied to the distributed transation ommit problem. Effetively the transation manager of two phase ommit is replaed by a group of aeptors if a majority of aeptors are available for long enough the transation will omplete. Gray and Lamport [18℄ showed that Paxos Commit is eient and has the same message delay as two phase ommit for the fault free ase. They also showed that two phase ommit is Paxos Commit with only one aeptor. Though it may be possible to apply Paxos diretly to the o-alloation problem by having the RMs at as aeptors we hose to use Paxos Commit instead. The advantage Paxos Commit has over using Paxos diretly for o-alloation is that the role played by the RMs is no more omplex than in the two phase approah. Limiting the role of the RM makes it more aeptable to resoure providers and redues the possibilities for faulty behaviour from the RM. VII. HARC The HARC approah to o-alloation is similar to the two phase approah desribed in Setion II exept the TM is replaed with a set of Paxos aeptors. The user sends a booking request to an aeptor who rst repliates the message to the other aeptors using Paxos and then it uses Paxos Commit to make the reservations with the RMs. HARC terminates one it has deided to ommit or abort a set of reservations. Users and RMs may modify or anel reservations after HARC has terminated, but this is outside the sope of HARC. The aims of HARC are deliberately limited. It does not address the issues of how to hoose the op- 5 timal shedule; how to manage a set of reservations one they have been made; how to negotiate quality of servie with the resoure provider or how to support ompensation mehanisms when a resoure fails to full a reservation or when a user anels a reservation. HARC is designed so that other servies and protools an be ombined with it to solve these problems. A. Choosing a Shedule The RM advertises the shedule when a resoure may be available through a URI. The user retrieves the shedule using a HTTP GET. The information retrieved is for guidane only, the RM is under no obligation by advertising it. It is the RM's prerogative as to how muh information it makes available, it may hoose not to advertise a shedule at all. The RM may supply ahing information along with the shedule using the ahe support failities of HTTP. The ahing information an indiate when the shedule was last modied or how long the shedule is good for. The RM may also support onditional GET [11℄ so that lients do not have to retrieve and parse the shedule if it hasn't hanged sine the last retrieval. From the shedules for all the resoures the user hooses a suitable time when the resoures he requires might be free. A o-sheduling servie for HARC ould use HTTP onditional GET to maintain a ahe of resoure shedules from whih to reate o-shedules for the user. The o-sheduling servie would have to store only soft state making it easy to repliate. B. Submitting the Booking Request The user onstruts a booking request whih ontains a sub-request for eah resoure that he wants to reserve. The booking request also ontains an identier hosen by the user, the UID. The ombination of the identity of the user and the UID should be globally unique. The user sends the booking request to an aeptor using a HTTP POST. This aeptor will at as the leader for the whole booking proess unless it fails in whih ase another aeptor will take over. The leader piks a transation identier, the TID, for the booking request from a set of TIDs that it has been initially assigned. Eah aeptor has a different range of TIDs to hoose from to prevent two aeptors trying to use the same TID. The aeptor eetively repliates the message to the other aeptors by having Paxos agree the TID for the message. This instane of Paxos agrees a TID for the ombination of the user's identity and the user hosen UID. If the user resends the message to the aeptor or to any other aeptor he will reeive the same TID. The user should ontinue submitting the request to any of the aeptors until he reeives a TID the submission of the booking request is idempotent aross all the aeptors. If a user reuses a UID he will get the TID of the previous booking request assoiated with that UID. To avoid reusing a UID the user an reord all the UIDs he has previously used, however a more pratial approah is to randomly hoose a UID from a large namespae. Two users an use the same UID as it is the ombination of the user's identity and the UID that is relevant. The TID is a RequestURI [11℄. The user an use a HTTP GET with the TID to retrieve the outome of the booking request from any of the aeptors. The TID is returned to user using the HTTP Loation header in the response to the POST ontaining the booking request. The HTTP response ode is 201 indiating that a new resoure has been reated. The aeptor should return a 303 HTTP response ode if a TID has already been hosen for the request to allow the user to detet the ase when a UID may have been reused. C. Making the Reservations After the TID has been hosen the leader uses Paxos Commit to make the bookings with the RMs. The booking request is broken down into the subrequests and the sub-requests are sent to the appropriate RM using HTTP POST. The TID is also sent to the RM as the Referer HTTP header in the POST message. The RM responds with a URI that will represent the booking loal to the RM and whether the booking is being held or has been rejeted. It also broadasts this message to all the other aeptors. As in the Paxos Commit algorithm the aeptors deide whether the shedule hosen by the user is aeptable to all the RMs or has been rejeted by any of the RMs. The leader informs eah RM whether to ommit or abort the booking using a HTTP PUT on the URI provided by the RM. It is the RM's obligation to disover the outome of the booking. If the RM does not reeive the ommit or abort message it should use a HTTP GET with the TID on any of the aeptors to disover the outome of the booking. One the aeptors have made a deision it an be ahed by intermediaries. A HTTP GET on the TID returns the outome of the booking request and a set of links to the reservations loal to eah RM. Detailed information on a reservation at a partiular RM an be retrieved by 6 using a HTTP GET on the link assoiated with that reservation. The user an anel the reservation through the URI provided by the RM and the RM an advertise that it has anelled the reservation using the same URI. The user should poll the URI to monitor the status of the reservation, HTTP HEAD or onditional GET an be used to optimize the polling. There is a question of what happens if a RM deides to unilaterally abort from the Booking Held state whih is not allowed in two phase ommit or Paxos Commit. Sine HARC terminates when it deides whether a shedule should be ommitted or aborted we an say that the RM anelled the reservation after HARC terminated. This is possible beause there is only a partial ordering of events in a distributed system [25℄. The user an only nd out that the RM anelled the reservation after HARC terminated. The onus is on the user to monitor the reservations one they have been made and to deal with any eventualities that may arise. D. HARC Ations HARC supports a set of ations for making and manipulating reservations: Make, Modify, Move and Canel. Make is used to reate a new reservation, Modify to hange an existing reservation (eg to hange the number of CPUs requested), Move to hange the time of a reservation and Canel to anel a reservation. A booking request an ontain a mixture of ations and HARC will deide whether to ommit or abort all of the ations. The HARC ations provide the user with exibility for dealing with the ase of a RM anelling a reservation, for example he ould make a new reservation on another resoure at a dierent time and move the existing reservations to the new time. A third party servie ould monitor reservations on the behalf of the user and deal with any eventualities that arise using the HARC ations in aordane to some poliy provided by the user. E. Seurity The omplete booking request sent to the aeptor is digitally signed [5℄ by the user with a X.509 [1℄ ertiate. The aeptors use the signature to disover the identity of the user. Individual sub-requests are also digitally signed by the user so that the RMs an verify that the sub-request ame from an authorized user. The aeptors also digitally sign the sub-requests before passing them to the RMs so that the RM an verify that the sub-request ame from a trusted aeptor. If required a sub-request an be digitally enrypted [21℄ by the user so that the aeptors annot read it. A HTTP GET using the TID returns only the outome of a booking request and a set of links to the individual bookings so there is no requirement for aeptors to provide aess ontrol to this information. The individual RMs ontrol aess to any detailed information on reservations they hold. VIII. HARC AVAILABILITY HARC has the same fault tolerane properties as Paxos Commit whih means it will progress if a majority of aeptors are available. If a majority of aeptors are not available HARC will blok until a majority is restored, sine bloking is unaeptable we dene this as HARC failing. To alulate the MTTF for HARC we use the notation and denitions from [19℄. The probability that an aeptor fails is 1/M T T F were MTTF is the Mean Time To Failure of the aeptor. The probability that an aeptor is in a failed state is approximately M T T R/M T T F , were MTTR is the Mean Time To Repair of the aeptor. Given 2F + 1 aeptors, the probability that HARC bloks is the probability that F aeptors are in the failed state and another aeptors fails. The probability that F out of the 2F + 1 aeptors are in the failed state is: F (2F + 1)! M T T R (1) F !(F + 1)! M T T F The probability that one of the remaining F + 1 aeptors fails is: 1 (F + 1) (2) MT T F The probability that HARC will blok is the produt of (1) and (2): M T T RF (2F + 1)! (3) (F !)2 M T T F F +1 The MTTF for HARC is the reiproal of (3): (F !)2 M T T F F +1 M T T FHARC ≈ (2F + 1)! M T T RF (4) Using 5 aeptors, F = 2, eah with a MTTF of 120 days and a MTTR of 1 day, M T T FHARC is approximately 57,600 days, or 160 years. IX. BYZANTINE FAULTS AND HARC HARC is based on Paxos Commit whih assumes the aeptors do not have Byzantine faults but whih 7 an happen in a Grid environment. We believe that aeptors an be implemented as failstop [19℄ servies that are provided as part of a Virtual Organization's role of reating trust between organizations. Provision of trusted servies is one way of realizing the Virtual Organization onept. If Byzantine fault tolerane is neessary then Byzantine Paxos [8℄ ould be used in HARC. If a HARC aeptor does have a Byzantine fault it annot make a reservation without a signed request from a user. HARC has been designed to deal with Byzantine fault behaviour from users and RMs. No o-alloation protool an guarantee that the reserved resoures will atually be available at the sheduled time. HARC is a fault tolerant protool for making a set of reservations, it does not attempt to make any guarantees one the reservations have been made. The user may be able to deal with the situation were a resoure is unavailable at the sheduled time by booking extra resoures that an at as bakup. This is an appliation spei solution and again illustrates the importane of end-to-end arguments [35℄. HARC provides a fault tolerant servie to the user but fault tolerane is still neessary at the appliation level. X. HARC/1 HARC/1, an implementation of HARC, has been suessfully demonstrated at SuperComputing 2005 and iGrid 2005 [20℄ where it was used to o-shedule ten ompute and two network resoures. HARC/1 is implemented using Java with sample RMs implemented in Perl. HARC/1 is available at http:// www.t.lsu.edu/personal/malaren/CoShed. mean that the reliability of the overall system improves. Just as Beowulf systems built out of ommodity omponents have displaed expensive superomputers so lusters of PCs are displaing expensive mainframe type systems [15℄. HARC demonstrates how important servies an be made fault tolerant to reate a fault tolerant Grid. HARC has been demonstrated to be a seure, fault tolerant approah to building o-alloation servies. It has also been shown that the user and RM roles in HARC are simple. The statetransitions for the RM in HARC are the same as for the two phase approah illustrated in Fig. 1. The only extra requirement for the RM over the two phase approah is that it must broadast a opy of its Booking Held or Abort message to all aeptors. The use of HTTP ontributes to the simpliity of HARC. HTTP is a well understood appliation protool with strong library support in many programming languages. HARC demonstrates through its use of HTTP and URIs the eetiveness of REST as an approah to developing Grid servies. HARC's funtionality an be extended by adding other servies, for example to support o-sheduling and reservation monitoring. Extending funtionality by adding servies, rather than modifying existing servies, indiates good design and a salable system in aordane to REST. The opaqueness of the sub-requests to the aeptors means that HARC has the potential to be used for something other than o-alloation. For example a HARC implementation ould be used as an implementation of Paxos Commit to support distributed transations. XII. XI. ACKNOWLEDGEMENTS CONCLUSION As the size of a distributed system inreases so to does the probablity that some omponent in the system is faulty. However, if a fault tolerant approah is applied then inreasing the size of the system an The authors would like to thank Jim Gray, Savas Parastatidis and Dean Kuo for disussion and enouragement. The work is supported in part through NSF Award #0509465, "EnLIGHTened Computing". [1℄ C. Adams and S. Farrell. Internet X.509 Publi Key Infrastruture Certiate Management Protools. IETF RFC 2510, 1999. [2℄ R. Allan et al. Building the e-Siene Grid in the UK: Grid Information Servies. Proeedings of UK e-Siene All Hands Meeting. 2003. [3℄ A. Andrieux et al. Web Servies Agreement. Draft GGF Reommendation, September 2005. [4℄ D. Baker and M. M Keown. Building the e-Siene Grid in the UK: Providing a software toolkit to en- able operational monitoring and Grid integration. Proeedings of UK e-Siene All Hands Meeting. 2003. [5℄ M. Bartel et al. XML-Signature Syntax and Proessing. W3C Reommendation, February 2002. [6℄ T. Berners-Lee, R. Fielding and L. Masinter. Uniform Resoure Identier (URI): Generi Syntax. IETF RFC 3986, 2005. [7℄ R. Blake et al. The teragyroid experiment superomputing 2003. Sienti Computing, 8 13(1):1-17, 2005. [8℄ M. Castro and B. Liskov. Pratial Byzantine fault tolerane. Proeedings of 3rd OSDI, New Orleans, 1999. [9℄ K. Czajkowski et al. A protool for negotiating servie level agreements and oordinating resoure management on distributed systems. Proeedings of 8th International Workshop on Job Sheduling Strategies for Parallel Proessing, eds D Feitelson, L. Rudolph and U. Shwiegelshohn, Leture Notes in Computer Siene, 2537, Springer Verlag, 2002. [10℄ K. Czajkowski et al. Grid Information Servies for Distributed Resoure Sharing. Proeedings of the Tenth IEEE International Symposium on HighPerformane Distributed Computing (HPDC-10), IEEE Press, 2001. [11℄ R. Fielding et al. Hypertext Transfer Protool HTTP/1.1, IETF RFC 2616, 1999. [12℄ R. Fielding. Arhitetural Styles and the Design of Network-based Software Arhitetures. PhD thesis. University of California, Irvine, 2000. [13℄ M. Fisher, N. Lynh, and M. Paterson. Impossibility of distributed onsensus with one faulty proess. Journal of the ACM, 32(2), 1985. [14℄ I. Foster, C. Kesselman and S. Tueke. The Anatomy of the Grid: Enabling Salable Virtual Organizations. International J. Superomputer Appliations, 15(3), 2001. [15℄ S. Ghemawat, H. Gobio, and S. Leung. The Google Filesystem. 19th ACM Symposium on Operating Systems Priniples, Lake George, NY, Otober, 2003. [16℄ M. Gudgin et al. SOAP Version 1.2 Part 1: Messaging Framework. W3C Reommendation, June 2003. [17℄ J. Gray. Notes on data base operating systems. In R. Bayer, R. Graham, and G. Seegmuller, editors, Operating Systems: An Advaned Course, volume 60 of Leture Notes in Computer Siene. SpringerVerlag, Berlin, Heidelberg, New York, 1978. [18℄ J. Gray and L. Lamport. Consensus on Transation Commit. Mirosoft Researh Tehnial Report MSR-TR-2003-96, 2005. [19℄ J. Gray and A. Reuter. Transation Proessing: Conepts and Tehniques. Morgan Kaufmann Publishers In., San Franiso, CA, USA, 1992. [20℄ A. Hutanu et al. Distributed and ollaborative visualization of large data sets using high-speed networks. Submitted to proeedings of iGrid 2005, Future Generation Computer Systems. The International Journal of Grid Computing: Theory, Methods and Appliations, 2005. [21℄ T. Imamura, B. Dillaway and E. Simon. XML Enryption Syntax and Proessing. W3C Reommendation, Deember 2002. [22℄ I. Jaobs and N. Walsh. Arhiteture of the World Wide Web, Volume One. W3C Reommendation, Deember 2004. [23℄ S. Jha et al. Spie: Simulated pore interative omputing environment using grid omputing to understand dna transloation aross protein nanopores embedded in lipid membranes. Proeedings of the UK e-Siene All Hands Meetings, 2005. [24℄ D. Kuo and M. M Keown. Advane reservation and o-alloation for Grid Computing. In First International Conferene on e-Siene and Grid Computing, volume e-siene, IEEE Computer Soiety Press, 2005. [25℄ L. Lamport. Time, Cloks and the Ordering of Events in a Distributed System. Communiations of the ACM 21, 7, 1978. [26℄ L. Lamport, M. Pease and R. Shostak. Reahing Agreement in the Presene of Faults. Journal of the Assoiation for Computing Mahinery 27, 2, 1980. [27℄ L. Lamport, M. Pease and R. Shostak. The Byzantine Generals Problem. ACM Transations on Programming Languages and Systems 4, 3, 382-401, 1982. [28℄ L. Lamport. The Part-Time Parliament. ACM Transations on Computer Systems 16, 2, 133-169, 1998. [29℄ L. Lamport. Paxos Made Simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, Deember 2001) 18-25, 2001. [30℄ B. Lampson. Hints for omputer system design. ACM Operating Systems Rev. 17, 5, pp 33-48, 1983. [31℄ B. Lampson. How to build a highly available system using onsensus. In Distributed Algorithms, ed. Babaoglu and Marzullo, Leture Notes in Computer Siene 1151, Springer, 1996 [32℄ E. Resorla. HTTP Over TLS. IETF RFC 2818, 2000. [33℄ L. Roberts. Resoure Sharing Computer Networks. IEEE International Conferene, New York City. 1968. [34℄ A. Roy. End-to-End Quality of Servie for High-end Appliations. PhD thesis. University Of Chiago, Illinois, 2001. [35℄ J. Saltzer, D. Reed and D. Clark. End-to-end arguments in system design. Proeedings of the 2nd International Conferene Distributed Computing Systems, Paris, 1981. [36℄ D. Skeen. Nonbloking ommit protools. In SIGMOD '81: Proeedings of the 1981 ACM SIGMOD International Conferene on Management of Data. ACM Press, 1981. [37℄ TeraGrid. http://www.teragrid.org/. [38℄ UK National Grid Servie. http://www.ngs.a.uk/. [39℄ K. Yoshimoto, P. Kovath and P. Andrews. Cosheduling with usersettable reservations. In Job Sheduling Strategies for Parallel Proessing, eds E. Frahtenberg, L. Rudolph and U. Shwiegelshohn, Leture Notes in Computing Siene, 3834, Springer, 2005.