Ch . 9 Agreem en t Protocols 9.1 In trodu ction In d istribu ted system s, w h ere sites (or p rocessors) often com p ete as w ell as coop erate to achieve a com m on goal, it is often requ ired th at sites reach m u tu al agreem ent . - Ex) In d istribu ted d atabase system s, d ata m an agers at sites m u st agree on w h eth er to com m it or to abort a tran saction . Th e form al settin g for a distribu ted agreem ent p rotocol is th e followin g : Th ere are M p rocessors P=p 1 ,...,p M th at are tryin g to reach agreem ent . A su bset F of th e p rocessors are fau lty, an d rem ainin g p rocessors are n onfau lty . Each p rocessor p i P stores a valu e Vi . Du rin g th e agreem ent p rotocol, th e p rocessors calcu late an agreem ent v alu e A i . After th e p rotocol en d s, th e followin g tw o con d ition s sh ou ld h old : ① For every p air p i an d p j of n on fau lty p rocessors, A i = A j . This v alu e is th e agreem en t valu e. ② Th e agreem ent valu e is a fu nction of th e initial valu es {Vi } of th e n onfau lty p rocessors (P - F). 9.2 Th e System Model Agreem ent p roblem s h ave been stu d ied u n d er th e followin g system m od el: Th ere are n p rocessors in th e system an d at m ost m of th e p rocessors can be fau lty . Th e p rocessors can d irectly com m u nicate w ith oth er p rocessors by m essage p assin g . A receiver p rocessor alw ays kn ow s th e id entity of th e sen d er p rocessor of th e m essage. Th e com m u nication m ed iu m is reliable (i.e., it d elivers all m essages w ith ou t introd u cin g an y errors) an d only p rocessors are p ron e to failu re. - 1 - 9.2.1 Synchron ou s vs. Asynchron ou s Com p u tation s Syn chron ou s com p u tation A p rocess receives m essages (1 rou n d ), p erform s a com p u tation (2 rou n d ), an d sen d m essages to oth er p rocesses (3 rou n d). Asynch ron ou s com p u tation Th e com p u tation at p rocesses d oes n ot p roceed in lock step s. A p rocess can sen d an d receive m essages an d p erform com p u tation at any tim e. 9.2.2 Mod el of Processor Failu res A p rocessor can fail in three m od es: Crash fau lt : a p rocessor stop s fu nctionin g an d n ever resu m es op eration . Om ission fau lt : a p rocessor "om its" to sen d m essages to som e p rocessors. Maliciou s fau lt (Byzantin e fau lts): a p rocessor beh aves ran d om ly an d arbitrarily . 9.2.3 Au th enticated v s. N on-Au th enticated Messages Au th enticated m essage system A (fau lty) p rocessor cann ot forge a m essage or ch an ge th e contents of a received m essage. A p rocessor can verify th e au th enticity of a received m essage. An au th enticated m essage is also called a sign ed m essage. N on-au th en ticated m essage system A (fau lty) p rocessor can forge a m essage an d claim to h ave received it from an oth er p rocessor or ch an ge th e contents of a received m essage before it relays th e m essage to oth er p rocessors. A p rocessor h ave n o w ay of verifyin g th e au th enticity of a received m essage. A n on-au th enticated m essage is also called an oral m essage. 9.2.4 Perform ance Asp ects - 2 - Perform ance of agreem ent p rotocols tim e: th e nu m ber of rou n d s m essage traffic: th e nu m ber of m essages exch an ged to reach an agreem ent storage overh ead : th e am ou nt of inform ation th at n eed s to be stored at p rocessors d u rin g th e execu tion of a p rotocol. 9.3 A Classification of A greem en t Protocols Three w ell kn own p roblem s ① Byzantin e agreem ent p roblem ② Con sen su s p roblem Problem Wh o initiates th e valu e Fin al agreem ent Byzantin e Con sen su s On e p rocessor All p rocessors Sin gle valu e Sin gle valu e Interactiv e Con sistency All p rocessors A v ector of v alu es ③ Interactive con sistency p roblem Byzantin e agreem ent p rotocol A sin gle valu e, w hich is to be agreed on, is in itialized by an arbitrary p rocessor an d all n onfau lty p rocessors h ave to agree on th at valu e. Con sen su s p roblem Every p rocessor h as its ow n initial valu e an d all n onfau lty p rocessor m u st agree on a sin gle com m on valu e. Interactive con sistency p roblem Every p rocessor h as its ow n initial valu e an d all n onfau lty p rocessors m u st agree on a set of com m on valu es. Th e th ree agreem ent p roblem s 9.3.1 Th e Byzantin e Agreem ent Problem Three gen erals can n ot reach Byzan tin e agreem ent - 3 - An arbitrarily ch osen p rocessor, called th e source processor, broad casts its initial v alu e to all oth er p rocessors. A solu tion to th e Byzantin e agreem ent p roblem sh ou ld m eet th e followin g tw o objectiv es: Agreem ent : All n onfau lty p rocessors agree on th e sam e valu e. Valid ity : If th e sou rce p rocessor is n onfau lty, th en th e com m on agreed u p on valu e by all n on fau lty p rocessors sh ou ld be initial valu e of th e sou rce. Tw o p oints sh ou ld be n oted : ① If th e sou rce p rocessor is fau lty, th en all n onfau lty p rocessors can agree on any com m on valu e. ② It is irrelev an t wh at valu e fau lty p rocessors agree on or wh eth er th ey agree on a valu e at all. 9.3.2 Th e Con sen su s Problem Every p rocessor broad casts its initial v alu e to all oth er p rocessors. Initial valu es of th e p rocessors m ay be d ifferent . A p rotocol for reachin g con sen su s sh ou ld m eet th e follow in g con d ition s: Agreem ent : All n onfau lty p rocessors agree on th e sam e sin gle valu e. Valid ity : If th e intial valu e of every n onfau lty p rocessor is v, th en agreed u p on com m on valu e by all n on fau lty p rocessor m u st be v . N ote th at if th e initial v alu es of n onfau lty p rocessors are d ifferent, th en all n onfau lty p rocessors can agree on any com m on valu e. - 4 - 9.3.3 Th e Interactive Con sistency Problem Every p rocessor broad casts its initial v alu e to all oth er p rocessors. Th e initial valu es of th e p rocessors m ay be different . A p rotocol for th e interactive con sistency p roblem sh ou ld m eet th e followin g con d ition s: Agreem ent : All n onfau lty p rocessors agree on th e sam e vector, (v 1 , v 2 , ..., v n ). Valid ity : If th e ith p rocessor is n onfau lty an d its initial valu e is v i , th en th e ith v alu e to be agreed on by all n onfau lty p rocessors m u st be v i . N ote th at if th e jth p rocessor is fau lty, th en all n onfau lty p rocessors can agree on any com m on valu e for vj . 9.3.4 Relation s am on g th e Agreem en t Problem s Th e Byzantin e agreem en t p roblem is p rim itive to th e oth er tw o agreem ent p roblem s. 9.4 Solu tion s to th e Byzan tin e Agreem en t Problem Th e Byzantin e agreem en t p roblem is also referred to as th e Byzan tin e generals p roblem . 9.4.1 An Im p ossibility Resu lt We n ow sh ow th at a Byzantin e agreem ent can n ot be reach ed am on g three p rocessors, w h ere on e p rocessor is fau lty . Con sid er a system w ith three p rocessors, p 0 , p 1 , an d p 2 . For sim p licity, w e assu m e th at th ere are only tw o valu es, 0 an d 1, on w hich p rocessors agree an d p rocessor p 0 initiates th e initial valu e. Case Ⅰ: p 0 is n ot fau lty . Since p 0 is n onfau lty, p rocessor p 1 m u st accep t 1 as th e agreed u p on v alu e if con dition 2 is to be satisfied . - 5 - Case Ⅱ: p 0 is fau lty p 0 w ill agree on a valu e of 1 an d p 2 w ill agree on a valu e of 0, w hich w ill violate con d ition 1 of th e solu tion . 9.4.2 Lam p ort-Sh ostak-Pease Algorithm Lam p ort et al.' s algorithm , referred to as th e Oral Message algorith m OM (m ), m >0, solves th e Byzantin e agreem en t p roblem for 3m +1 or m ore p rocessors in th e p resence of at m ost m fau lty p rocessors. Let n d en ote th e total nu m ber of p rocessors (clearly, n≥3m +1). Algorithm OM (0) 1. Th e sou rce p rocessor sen d s its valu e to ev ery p rocessor . 2. Each p rocessor u ses th e valu e it receives from th e sou rce. (If it receives n o valu e, th en it u ses a d efau lt valu e of 0) Algorithm OM (m ), m >0. 1. Th e sou rce p rocessor sen d s its valu e to ev ery p rocessor . 2. For each i, let v i be th e valu e p rocessor i receives from th e sou rce. (If it receives n o v alu e, th en it u ses a d efau lt valu e of 0.). Processor i acts as th e n ew sou rce an d initiates Algorith m OM (m -1) w h erein it - 6 - sen d s th e valu e v i to each of th e n-2 oth er p rocessors. 3. For each i an d each j ( i), let vj be th e valu e p rocessor i received from p rocessor j in Step 2. u sin g Algorithm OM (m -1). Processor i u ses th e valu e m ajority (v 1, v 2 , ..., v n - 1 ). Th e m essage com p lexity of th e algorithm is O (nm ). Exam p le 1 An execu tion of BG(2) on seven gen erals. O i rep resents th e com m an d sent to Li , an d Li : O i is Li ' s rebroadcast of its com m an d . Lj : Li : O i is Lj ' s rebroadcast of wh at Li said h is ord er w as. - 7 - 9.4.3 Dolev et al.' s Algorithm Th e algorithm requ ires u p to 2m +3 rou n d s to reach an agreem ent . Data Stru ctu re Th e algorithm u ses tw o thresh old s: LOW an d H IGH, wh ere LOW :=m +1 an d HIGH :=2m +1. Th e basic id ea is th at any su bset of p rocessors of size LOW w ill h av e at least on e n onfau lty p rocessor . Any su bset of p rocessors of size HIGH inclu d es a m ajority of p rocessors, th at is, m +1, th at are n on fau lty . Th e algorithm u ses tw o typ es of m essages: a "*" m essage an d a m essage con sistin g of th e n am e of a p rocessor . - Th e "*" d en otes th e fact th at th e sen d er of th e m essage is sen d in g a v alu e of 1 an d th e n am e in a m essage d en otes th e fact th at th e sen d er of th e m essage received a "*" from th e n am ed p rocessor . W i x : th e set of p rocessors th at h ave sen t m essage x to p rocessor i. (N ote th at x is eith er a "*" or a p rocessor n am e.) - Each p rocess m aintain s n +1 nu m bers of W sets. - W xi : th e set of witnesses of m essage x for p rocessor i. - A p rocessor j is a direct supporter for a p rocessor k if j directly receives "*" from k . - 8 - - Wh en p rocessor i receives th e m essage "k " from p rocessor j, it ad d s j into W i x becau se j is a w itn ess to m essage "k ". Process j is an indirect supporter for p rocessor k if | W jk | L O W; - A p rocessor j confirm s p rocessor k if | W jk | H I GH ; A p rocess i m aintain s a set, Ci , of confirm ed p rocessors. Th e Algorithm First rou n d : th e sou rce p rocessor sen d s a "*" m essage to all p rocessors (inclu din g itself) if its valu e is 1. If its valu e is 0, it sen d n othin g in th e first rou n d . If th e p rocessors fin ally agree on "*", th en th e agreed u p on valu e is 1. Oth erw ise, th e agreed u p on valu e is 0. Su bsequ ent rou n d s: a p rocessor sen d s its m essage to all p rocessors, receives m essages from oth er p rocessors, an d th en d ecid es w h at m essages to sen d in th e n ext rou n d . initiation op eration - It initiates in th e secon d rou n d if it receiv es a "*" from th e sou rce in rou n d 1. - It initiates in th e K+1st rou n d if at th e en d of Kth rou n d th e card in ality of th e set of th e con firm ed p rocessors (n ot inclu din g th e sou rce) at least L O W+ m ax (0 , K 2 - 2) (referred to as th e condition of initiation). Fou r ru les ① In th e first rou n d , th e sou rce broad casts its valu e to all oth er p rocessors. ② In a rou n d k>1, a p rocessor broad casts th e n am es of all p rocesses for w hich it is eith er a d irect or in d irect su p p orter an d wh ich it h as n ot p reviou sly broadcast . If th e con dition of initiation w as tru e at th e en d of th e p reviou s rou n d , it also broadcasts th e "*" m essage u nless it h as p reviou sly d on e so. ③ If a p rocessor confirm s H IGH nu m ber of p rocessors, it com m its to a valu e of 1. ④ After rou n d 2m +3, if th e valu e 1 is com m itted , th e p rocessors agree on 1; oth erw ise, th ey agree on 0. Exam p le - 9 - p rocessors, 3m +1 fau lty p rocessors, m sou rce is n onfau lty . Th e sou rce p rocessor broadcasts a "*" in th e first rou n d . In th e secon d rou n d, 2m n onfau lty p rocessors w ill initiate (i.e., broadcast "*"). In th e third rou n d, 2m +1 n onfau lty p rocessors (inclu d in g th e sou rce) will broadcast m essages con tainin g th e n am e of th e p rocessors inform in g th at th ey h ave w itn essed a "*" from 2m oth er n onfau lty p rocessors. Thu s, in th e fou rth rou n d , th e witn ess set of all 2m +1 n onfau lty p rocessors w ill contain all 2m +1 n onfau lty p rocessors an d th ey all w ill com m it to a valu e of 1 in th e fou rth rou n d . 9.5 Ap p lication s of Agreem en t Algorithm s Fau lt-Toleran t Clock Synchronization Atom ic Com m it in DDBS - 10 -