GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs

GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs Mugilan Mariappan and Keval Vora Simon Fraser University EuroSys’19 – Dresden, Germany March 27, 2019 Graph Processing 2 Dynamic Graph Processing Real-time Processing • Low Latency Real-time Batch Processing • High Throughput Alipay payments unit of Chinese retailer Alibaba [..] has 120 billion nodes and over 1 trillion relationships [..]; this graph has 2 billion updates each day and was running at 250,000 transactions per second on Singles Day [..] The Graph Database Poised to Pounce on The Mainstream. Timothy Prickett Morgan, The Next Platform. September 19, 2018. 3 Streaming Graph Processing Incremental Processing • Adjust results incrementally • Reuse work that has already been done Query Tornado [SIGMOD’16] GraphIn [EuroPar’16] KineoGraph [EuroSys’12] Graph Mutations KickStarter [ASPLOS’17] .. . .. . Tag Propagation upon mutation Over 75% values get thrown out KickStarter: Fast and Accurate Computations on Streaming Graph via Trimmed Approximations. Vora et al., ASPLOS 2017. 4 Streaming Graph Processing Incremental Processing • Adjust results incrementally • Reuse work that has already been done Query Tornado [SIGMOD’16] GraphIn [EuroPar’16] KineoGraph [EuroSys’12] Graph Mutations KickStarter [ASPLOS’17] .. . .. . Maintain Value Dependences Incrementally refine results Less than 0.0005% values thrown out KickStarter: Fast and Accurate Computations on Streaming Graph via Trimmed Approximations. Vora et al., ASPLOS 2017. 5 Streaming Graph Processing Incremental Processing • Adjust results incrementally • Reuse work that has already been done Monotonic Graph Algorithms Tornado [SIGMOD’16] GraphIn [EuroPar’16] KineoGraph [EuroSys’12] Graph Mutations Query KickStarter [ASPLOS’17] .. . .. . Maintain Value Dependences Incrementally refine results Less than 0.0005% values thrown out KickStarter: Fast and Accurate Computations on Streaming Graph via Trimmed Approximations. Vora et al., ASPLOS 2017. 6 • Belief Propagation • Co-Training Expectation Maximization • Collaborative Filtering Incorrect Vertices Streaming Graph Processing 3M 2.5M 2M 1.5M 1M 500K 0 • Label Propagation 1 2 3 4 Updates 5 • … Incorrect Vertices • Triangle Counting 6K 5K Error > 10% 4K 3K 2K 1K 0 1 2 3 4 Updates 5 7 RG 3M 2.5M 2M 1.5M 1M 500K Incorrect Vertices I Incorrect Vertices Streaming Graph Processing 0 1 2 3 4 Updates 5 6K 5K Error > 10% 4K 3K 2K 1K 0 1 2 3 4 Updates 5 8 I k G R RG 3M 2.5M 2M 1.5M 1M 500K Incorrect Vertices G mutates at k Incorrect Vertices Streaming Graph Processing 0 1 2 3 4 Updates 5 6K 5K Error > 10% 4K 3K 2K 1K 0 1 2 3 4 Updates 5 9 G mutates at k R RG k+1 ? R 3M 2.5M 2M 1.5M 1M 500K 0 R? ≠ RGM Incorrect Vertices I k G Incorrect Vertices Streaming Graph Processing 1 2 3 4 Updates 5 6K 5K Error > 10% 4K 3K 2K 1K 0 1 2 3 4 Updates 5 10 G mutates at k R RG k+1 ? R 3M 2.5M 2M 1.5M 1M 500K 0 R? ≠ RGM 1 2 3 4 Updates 5 k Zs (RG ) k RGM RGM Incorrect Vertices I k G Incorrect Vertices Streaming Graph Processing 6K 5K Error > 10% 4K 3K 2K 1K 0 1 2 3 4 Updates 5 11 GraphBolt G mutates at k I k RG RG k+1 R? R? ≠ RGM • Dependency-Driven Incremental Processing of Streaming Graphs • Guarantee Bulk Synchronous Parallel Semantics k Zs (RG ) k RGM RGM • Lightweight dependence tracking • Dependency-aware value refinement upon graph mutation 12 Bulk Synchronous Processing (BSP) Vertices v u y Iteration x k-1 v u y x k v u y x Streaming Graph 13 Upon Edge Addition Vertices v u y Iteration x k-1 v u y x k v u y x Streaming Graph 14 Upon Edge Addition Vertices v u y Iteration x k-1 v u y x k v u y x k+1 v u y x Streaming Graph 15 Upon Edge Addition Vertices Vertices v u y Iteration x k-1 v u y x k-1 v u y x k v u y x k v u y x v u y x v u y x Streaming Graph Ideal Scenario 16 Upon Edge Addition Vertices Vertices v u y Iteration x k-1 v u y x k-1 v u y x k v u y x k v u y x v u y x v u y x Streaming Graph Ideal Scenario 17 Upon Edge Addition Vertices Vertices v u y Iteration x k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Streaming Graph Ideal Scenario 18 Upon Edge Addition Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Iteration v Vertices Streaming Graph Ideal Scenario 19 Upon Edge Addition Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Iteration v Vertices Streaming Graph Ideal Scenario 20 Upon Edge Addition Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Iteration v Vertices Streaming Graph Ideal Scenario 21 Upon Edge Addition Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Iteration v Vertices Streaming Graph Ideal Scenario 22 Upon Edge Deletion Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Iteration v Vertices Streaming Graph Ideal Scenario 23 GraphBolt: Correcting Dependencies Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x Iteration v Vertices Streaming Graph Ideal Scenario 24 GraphBolt: Correcting Dependencies Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x Iteration v Vertices Streaming Graph Ideal Scenario 25 GraphBolt: Correcting Dependencies Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x Iteration v Vertices Streaming Graph Ideal Scenario 26 GraphBolt: Correcting Dependencies Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x Iteration v Vertices Streaming Graph Ideal Scenario 27 GraphBolt: Correcting Dependencies Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x Iteration v Vertices Streaming Graph Ideal Scenario 28 GraphBolt: Correcting Dependencies Vertices u x k-2 v u y x k-2 v u y x y k-1 v u y x k-1 v u y x k v u y x k v u y x k+1 v u y x k+1 v u y x Iteration v Vertices Streaming Graph Ideal Scenario 29 input graph (see Eq.the 1), i.e., and then num dependency structure as it c of C for subsequent graph mutations, such tracking of value a single value; and then, the aggregated value is used to comL memory footprint, making the entire process memory-bound. Figure 2a.ations Sincestage we are no tl progress. the re�nement using dependencies leads to O |E|.t amount of information to be pute vertex value for the current iteration. This computation To reduce the amount of dependency information that must through those dependency values change ac 2 : signi�cantly increases the we only need to track aggr maintained for iterations which can bet�rst formulated asanalyze be tracked, we carefully how values �owing dependency structure as it graph reduces theWiki amount of (grap depe ºin process memory making the entire memory-bound. through footprint, dependencies participate computing CL . the re�nement stage dow. As weusing can s To reduce the amount of dependency that ci ( ) = (information (c i 1 (u)) ) must Pruning Value Aggregati Tracking Value Dependencies as Value Aggregations. we only need to track agg iterations indicat be tracked, we �rst carefully analyze how 8e=(u, )2Evalues �owing The skewed nature of realGiven a vertex … , its value is computed based on values from≤ reduces the amount of depa those iterations; through dependencies participate in computing C L . synchronous graph algorit where indicates aggregation and indicates its incoming neighbors in twothe sub-steps: �rst,operator the incoming the color density Pruning Value Aggregat vertex values keep on chang Tracking Value Dependencies as Value Aggregations. the the aggregated value to produce the neighbors’ values from applied previouson are aggregated into k-1 xiteration ufunction corresponding ag … The skewed nature of rea and then the number of cha Given vertex , itsthen, value computed based onused values from is �� nal and vertex value. For in Algorithm a singlea value; theisaggregated value is to 1, coma useful opportun ≤example synchronous graph algori ations progress. For examp its incoming neighbors two sub-steps: �rst, incoming �A�� onforline 6current while is the computation on line 9. Since pute vertex value thein iteration. This the computation that keep muston be chan trac vertex values values change across iterat neighbors’ values�owing from aree�ectively aggregatedcombined into k formulated values throughiteration edges are into can be as v2 :previous We conservati and then the number of ch Wiki graph (graph details in a single value; and then, value used to comºthe aggregated aggregated values at vertices, we canistrack these aggregated balance the mem ations progress. For can see, theexam colo pute vertex value iteration. This computation values of dependency information.dow. By As wevalues c iinstead (forv ) the == current ( individual (c i 1 (u)) ) ( ) with recom values change across 2: iterations indicating thatitera ma can be formulated doing so,as value dependencies can be corrected upon graph 8e=(u, )2E particular, we in Wiki iterations; graph (graph details those after 5 itera º ≤ …mutation by incrementally correcting the aggregated values pruning over the dow. As we can see, the cols where indicates the aggregation operator and indicates the color density decreases c ( ) = ( (c (u)) ) i i 1 and propagating corrections across subsequent aggregations dimens iterations di�erent indicating that m the function applied on the8e=(u, aggregated value to… produce the corresponding aggregated va )2E throughout the graph. tal pruning is ac after iter �nal vertex … value. For ≤example in Algorithm 1, ≤ is �� athose usefuliterations; opportunity to 5limit Let ( ) be the aggregated value for vertex for iteration i the aggregation of aggregated va where indicates operatoron andline indicates … the color density decreases �A�� on line 6 while is the computation 9. Since that must be tracked during i, i.e., = the aggregated (c i 1 (u)). We de�ne AG = (V ) as i ( ) on the horizontal rev the function applied value to… produce theA , E A corresponding aggregated values �owing through edges are e�ectively combined into We conservatively prune 8e=(u, )2E which aggregated �nal vertex value. For exampleweincan Algorithm 1, aggregated is �� aend usefulthe opportunity to limi aggregated values at vertices, these ≤graph memory requir dependency in terms track of aggregation values at thebalance on the otherdurin hand �A�� line 6 of while computationinformation. on line 9. Since that must tracked valueson instead individual By values withbe recomputation of iteration k: is thedependency 30 ag by not saving values �owing edges can are e�ectively combined into – We conservatively prun doing so, valuethrough dependencies be corrected upon graph we incorporate E A = { ( i 1 (u), i ( )) particular, : y u v Iteration GraphBolt: Dependency Tracking input graph (see Eq.the 1), i.e., and then num dependency structure as it c of C for subsequent graph mutations, such tracking of value a single value; and then, the aggregated value is used to comL memory footprint, making the entire process memory-bound. Figure 2a.ations Sincestage we are no tl progress. the re�nement using dependencies leads to O |E|.t amount of information to be pute vertex value for the current iteration. This computation To reduce the amount of dependency information that must through those dependency values change ac 2 : signi�cantly increases the we only need to track aggr maintained for iterations which can bet�rst formulated asanalyze be tracked, we carefully how values �owing dependency structure as it graph reduces theWiki amount of (grap depe ºin process memory making the entire memory-bound. through footprint, dependencies participate computing CL . the re�nement stage dow. As weusing can s To reduce the amount of dependency that ci ( ) = (information (c i 1 (u)) ) must Pruning Value Aggregati Tracking Value Dependencies as Value Aggregations. we only need to track agg iterations indicat be tracked, we �rst carefully analyze how 8e=(u, )2Evalues �owing The skewed nature of realGiven a vertex … , its value is computed based on values from≤ reduces the amount of depa those iterations; through dependencies participate in computing C L . synchronous graph algorit where indicates aggregation and indicates its incoming neighbors in twothe sub-steps: �rst,operator the incoming the color density Pruning Value Aggregat vertex values keep on chang Tracking Value Dependencies as Value Aggregations. the the aggregated value to produce the neighbors’ values from applied previouson are aggregated into k-1 xiteration ufunction corresponding ag … The skewed nature of rea and then the number of cha • used Dependency relations translate Given vertex , itsthen, value computed based on values from is �� nal and vertex value. For in Algorithm a singlea value; theisaggregated value is to 1, coma useful opportun ≤example synchronous graph algori ations progress. For examp across its incoming neighbors two sub-steps: �rst, incoming �A�� onforline 6current while is the computation onaggregation line 9. Since points pute vertex value thein iteration. This the computation that keep muston be chan trac vertex values values change across iterat neighbors’ values�owing from aree�ectively aggregatedcombined into k formulated values throughiteration edges are into can be as v2 :previous We conservati • Structure of dependencies and then the number of ch Wiki graph (graph details in a single value; and then, value used to comºthe aggregated aggregated values at vertices, we canistrack these aggregated balance the mem ations graphs progress. For inferred from dow. input can see, theexam colo pute vertex value iteration. This computation values of dependency information. By As wevalues c iinstead (forv ) the == current ( individual (c i 1 (u)) ) ( ) with recom values change across 2: iterations indicating thatitera ma can be formulated doing so,as value dependencies can be corrected upon graph 8e=(u, )2E particular, we in Wiki iterations; graph (graph details those after 5 itera º ≤ …mutation by incrementally correcting the aggregated values pruning over the dow. As we can see, the cols where indicates the aggregation operator and indicates the color density decreases c ( ) = ( (c (u)) ) i i 1 and propagating corrections across subsequent aggregations dimens iterations di�erent indicating that m the function applied on the8e=(u, aggregated value to… produce the corresponding aggregated va )2E throughout the graph. tal pruning is ac after iter �nal vertex … value. For ≤example in Algorithm 1, ≤ is �� athose usefuliterations; opportunity to 5limit Let ( ) be the aggregated value for vertex for iteration i the aggregation of aggregated va where indicates operatoron andline indicates … the color density decreases �A�� on line 6 while is the computation 9. Since that must be tracked during i, i.e., = the aggregated (c i 1 (u)). We de�ne AG = (V ) as i ( ) on the horizontal rev the function applied value to… produce theA , E A corresponding aggregated values �owing through edges are e�ectively combined into We conservatively prune 8e=(u, )2E which aggregated �nal vertex value. For exampleweincan Algorithm 1, aggregated is �� aend usefulthe opportunity to limi aggregated values at vertices, these ≤graph memory requir dependency in terms track of aggregation values at thebalance on the otherdurin hand �A�� line 6 of while computationinformation. on line 9. Since that must tracked valueson instead individual By values withbe recomputation of iteration k: is thedependency 31 ag by not saving values �owing edges can are e�ectively combined into – We conservatively prun doing so, valuethrough dependencies be corrected upon graph we incorporate E A = { ( i 1 (u), i ( )) particular, : y u v Iteration GraphBolt: Dependency Tracking GraphBolt: Incremental Refinement Vertices 0 v u y x y 1 v u y x v update )= i( u Iteration v x u graph mutation. lues upon performed (e.g., dropping d require backpropagation ring re�nement to recomcremental computation. Re�nement es to be added to G and ransform it to GT . Hence, 2 … y(c i 8e=(u, )2E x to 1 (u)) T( i )= … (c Ti 1 (u)) for 8e=(u, )2E T 0  i  L. This is incrementally achieved as: ⁄ ÿ ÿ T T T ( ) = ( ) (c (u)) – (c (u)) 4 (c + ? i i 1 i i 1 i 1 (u)) 8e=(u, )2E a 8e=(u, )2Ed 8e=(u, )2E T s .t .c i 1 (u),c iT 1 (u) “ – – where , - and 4 are incremental aggregation operators 32 GraphBolt: Incremental Refinement Vertices x u y x v recomputed to be able to re�ne values upon graph mutation. Refinement … … (c (u)) to TT ( ) = update ( ) = i recomputed to be able to re�ne values upon graph mutation. update i ( ) = (cii 11 (u)) to ii ( ) = 8e=(u, )2E 8 Transitive changes Direct changes While 8e=(u, )2E 8 While aggressive aggressive pruning pruning can can be be performed performed (e.g., (e.g., dropping dropping 0  i  L. This is incrementally achieve 0…  i  L. This is incrementally certain vertices altogether), ititwould require backpropagation … achieve certain vertices altogether), would require backpropagation T … … recomputed tofrom be1 able to re�ne values upon graph mutation. update (c i 1 (u)) to⁄Ti ( ) = (cTTi 1 (u) v y i( ) = v u y x ÿ values that get changed during re�nement to recomÿ recomputed to be ablevalues to re�ne upon graph mutation. = T )2E (c i 1 (u)) to⁄i ( )T 8e=(u, = (c 1 (u) from thatvalues get changed during re�nement update to recomi ( ) 8e=(u, T( ) = i ( ) T (u)) )2E–TT i (c While aggressive pruning can be performed (e.g., dropping (c ii ()2E 11 (u)) )2E– (cii 11 8e=(u, ) = i( ) (cii 8e=(u, pute the old values incremental Edge additions While aggressive pruning can be performed droppingcomputation. pute the correct correct old values for for(e.g., incremental computation. 0  i  L. This is incrementally achieved as: certain verticesaltogether), altogether),ititwould wouldrequire requirebackpropagation backpropagation 8e=(u, )2E 8e=(u, aa 0  i  L. This is incrementally achieved as: 8e=(u, )2E 8e=(u, )2E )2Edd certain vertices3.3 Dependency-Driven Value Re�nement 3.3 Dependency-Driven Value Re�nement … … ⁄ T ÿ ÿ from values values that that2get get changed changed during re�nement re�nement to recomrecomT( ) = ⁄ ÿ ÿ v u y x from during to T T T u graph lues upon mutation. update ( ) = (c (u)) to (c (u)) for … … Let E and E be the set of edges to be added to G and “ – – … … dd be the i set of edges ito1be addedTto Let Eaavalues and Efor – – Tii (GT )and Ti“ T = ( ) (c (u)) – (c (u)) 4 (c i 1 Edge deletions T i i 1 T … … 1 i 11 4 pute the correct old incremental computation. … … ( ) = ( ) (c (u)) – (c (u)) 4 (c where , and are incremental agg 4 oblebetoable to re�ne values upon graph mutation. update ( )transform = (c i 1 (u)) (c i it1 (u)) to ( 8e=(u, ) =i (c iTwhere (c i 1 incremental iag T - and are i(u)) 1, for re�ne values upon graph mutation. update )) = to ))Ti= (u)) i to pute theupon correct old mutation. values for incremental computation. T(.i. Hence, T T i (8e=(u, iT 1for )2E deleted from G respectively to G i 1 )2E deleted from G respectively to transform it to G Hence, ble to re�ne values upon graph mutation. update ( = (c (u)) to ( = (c (u)) for T be able to re�ne values graph update ( ) = (c (u)) to ( ) = (c (u)) for performed (e.g., dropping(e.g., i i 1 8e=(u, )2E 8e=(u, )2E )2Ed i i 1 T)2E 8e=(u, )2ET ad i8e=(u,8e=(u, i add 1aa new T i)2E 8e=(u, 8e=(u, 8e=(u, 8e=(u, )2E )2Ei 1 8e=(u, )2E d contributions (for edge )2E sive pruning can be performed TT = dropping that new contributions (for edge ad 3.3 Dependency-Driven Value Re�nement T that runing can be performed (e.g., dropping T add 8e=(u, )2E T (u) 3.3 Dependency-Driven Value Re�nement 8e=(u, )2E 8e=(u, )2E G G [ E \ E . Given E , E and the dependence graph 8e=(u, )2E G =dropping G [0 E Given E , E and the dependence graph runing can be performed (e.g.,(e.g., dropping T aa \i E a dd . L. d This is incrementally achieved as: s .t .c (u),c ive pruning can be performed a d i 1 s .t .c (u),c (u) i 1 i 1 L. i This  L. This This is incrementally incrementally achieved as: contributions (for 0edges  is incrementally as: require backpropagation iand 1 u TT . – (for edge edge deletions), deletions), and sdaltogether), altogether), it would require 0questions  ii00  This ishelp incrementally achieved as: gether), ititwould require backpropagation  ibe  L. isus as: contributions Let EErequire and Ebackpropagation Ebackpropagation be the set of of toL. be added to Gtransform and achieved “ – Let be set edges to added to G and we two that CCLLachieved to – – A we ask ask two questions that help us transform to CC“ . would require backpropagation aa and ddA sgether), it would GG,,the L L 4 are incremental 4 where and aggregation where ,, --ÿ tributions (for e�ects of muta TT⁄ ÿ ÿ ⁄ ÿ ÿ tributions (for transitive transitive e�ects ofopera mut ⁄ ÿ hat getre�nement changed during re�nement to recomrecom-to ring to re�nement recomt changed during re�nement to recomdeleted from GG(A) respectively to transform it to G . Hence, deleted from respectively transform it to G . Hence, ⁄ ÿ ÿ … ⁄ ÿ ÿ … et changed during re�nement to recom“ – “ – hat get changed during to T T T TT T T T T T What to Re�ne? (A) What to Re�ne? T ?(c Ti 1 (u)) T =)) ii(( )) (c(c (c (c (u)) 44 (u)) that add new contributions new (foris additions), (( ))iTi= (dependence –– –that (c (u)) 4 (c (u)) TT = G [ While similar to -- and While often isedge similar to ,, remove and (,, EE)ddii= ((( )i))idependence (u)) i(c11often 11(u)) i+ = (= (ciiTigraph (u)) (c––ii(c11add 4contributions (c (u)) ct old values values forGG incremental values for (c (u)) (c (u)) 4 (u)) i(u)) \computation. E . Given E and the = G computation. [EEaa computation. \We E . Given E and the graph ii 111(u)) i 11(c i 1 i 1 (u)) dct values for incremental incremental computation. a i i 1 i d cremental computation. a old for incremental d Wedynamically dynamicallytransform transformaggregation aggregation values in A to make values in A to make G TT edge G contributions (for and update existing 8e=(u, )2E )2E 8e=(u, )2ECaa8e=(u, 8e=(u, )2Ed 8e=(u, contributions (for TT . 8e=(u, aggregation to eliminate eliminate or update update pre )2E aa to aggregation to or prev 8e=(u, )2ETT Tdeletions), dd )2E 8e=(u,8e=(u, )2E 8e=(u, )2E )2E )2E 8e=(u, )2E A , we ask two questions that help us transform C d 8e=(u, A , we ask two questions that help us transform C to C . T Driven Value Re�nement G L 8e=(u, )2E 8e=(u, )2E T ncy-Driven Value Re�nement G L 8e=(u, )2E a d L T Driven Value Re�nement T ncy-Driven Value Re�nement L them consistent with G under synchronous semantics. To them consistent with G under synchronous semantics. T (u)T (u) e�ects of mutations) respectiv ss.t .c (u),c .t .cii 11transitive (u),c tributions Re�nement .tTo .ciiss1.t (u),c .c (u),c (u) our iWe focus our… discussion on increment increment tributions (for transitive respecti We focus discussion on 1(for i 11 (u) ii 11T “ – – … – – “ – – s .t .c (u),c (u) he edges to be added to G and – – “ – – beset theof set of edges edges to be added to G and (A) What to Re�ne? “ – – i 1 set ofset edges to be added to G and be the of to be added to G and do so, we start with aggregation values in �rst iteration, i.e., dthe “ –“ – values – – (A) What to Re�ne? i 1 to do so, we start with aggregation in �rst iteration, i.e., 4 While often is similar , and require undo 4 44 b subsumes that for . We generalize 4 While often is similar to , and require undo 4 subsumes that for . We generalize where , and aggregation operators 4 are 4 incremental T where , and are incremental aggregation operators where , and are incremental aggregation operators T where , T T es to be added to G and “ – – pectively to itit to ..((Hence, We dynamically transform aggregation values in toiteration make G respectively to transform it to to G)) 2 Hence, 33 pectively to transform transform to G G Hence, G respectively to transform it G ..2Hence, VAA and progress progress forward iteration by aggregation iteration. to eliminate G V ,, and forward by iteration. We dynamically aggregation values in4A Aare make or update previous contributi ÿ ÿ 00transform G to ÿ ÿ aggregation to eliminate or update previous contributi that add new contributions (for edge additions), remove old where , and incremental aggregation operators T T that add new contributions (for edge additions), remove old that add new contributions (for edge additions), remove old add new – Titeration consistent Ggraph under synchronous semantics. Given EEa ,, EEEEitand the graph – and the dependence graph EE .. Given ,, to EEthem and dependence ransform G .the Hence, \Given Given anddependence the dependence graph Atwith each i, that we re�ne ( ) 2 VTo that fall under Iteration 0 GraphBolt: Incremental Refinement Vertices x u mutation. y vupongraph recomputedtotobebeable abletotore�ne re�nevalues valuesupon graphmutation. recomputed 0 Refinement … … TT … … x T( ) = T update ( ) = (c (u)) to (u))for for update i (i ) = (c i i 1 (u)) to i i( ) = (c(ci i 1 1(u)) 1 Iteration 8e=(u, )2E 8e=(u,Direct )2E 8e=(u, )2E )2ET T changes While Transitive Whileaggressive aggressivepruning pruningcan canbebeperformed performed(e.g., (e.g.,dropping dropping changes 8e=(u, 00i iL.L.This Thisisisincrementally incrementallyachieved achievedas: as: certain certainvertices verticesaltogether), altogether),ititwould wouldrequire requirebackpropagation backpropagation v values y get 1 during vre�nementutotorecomy x ⁄ ÿ ÿ ⁄ ÿ ÿ from from valuesthat that getchanged changed duringre�nement recomTT TT TT ( ) = ( ) (c (u)) – (c (u)) 4 (c ( ) = ( ) (c (u)) – (c (u)) 4 (c (u)) i i 1 i i 1 i i 1 1(u)) i i 1 pute the correct old values for incremental computation. i i 1 pute the correct old values for incremental computation. T 8e=(u, 8e=(u, 8e=(u, aa dd 8e=(u, )2E )2E 8e=(u, )2E )2E 8e=(u, )2E )2E T 3.3 Dependency-Driven Value Re�nement 3.3 Dependency-Driven Value Re�nement TT … … s .t s .t.c.ci i1 (u),c i i 1 (u) 1 (u),c T T 1 (u) 2 edgesupdate v u y x u E and E be the set of to be added to G and luesLet upon graph mutation. ( ) = (c (u)) to ( ) = (c (u)) for … … “ – – … “ T–i T 4– … Let Ea a and Ed d be the set of edges to be added to G and … … i i 1… … 1TT aggregation T i (u)) Tare T (c … … T= … … where , and are incremental operators 4 T oble beto able to re�ne values upon graph mutation. update ( ) = (c (u)) to ( ) for where , and incremental aggregation operators ted to be able to re�ne values upon graph mutation. update ( ) = (c (u)) to ( ) = (c (u))for for to re�ne values upon graph mutation. update ( ) = (c (u)) to ( ) = (c (u)) for i i 1 ed be able to re�ne values upon graph mutation. update ( ) = (c (u)) to ( ) = (c (u)) T T T T T T i i 1 i i 1 i i 1 i i 1 deleted from G respectively to transform it to G . Hence, 8e=(u, )2E i i i 1 i 1 8e=(u, )2E i i 1 ble to re�ne values upon graph mutation. update ( ) = (c (u)) to ( ) = (c (u)) for deleted from G respectively to transform it to G . Hence, be able to re�ne values upon graph mutation. update ( ) = (c (u)) to ( ) = (c (u)) for performed (e.g., dropping i i 1 i i 1 T i i 1 i8e=(u,8e=(u, 1 T T additions), remove old T8e=(u, )2E )2E )2ET i)2E 8e=(u,8e=(u, )2E8e=(u, that contributions edge 8e=(u, )2E )2E8e=(u, )2E sive pruning can be performed (e.g., dropping T (for runing be (e.g., dropping gressive can be performed (e.g., dropping thatadd addnew new contributions edge additions), remove old ggressive can be performed (e.g., dropping Tpruning 8e=(u, )2E )2E 8e=(u, )2E GGTcan =pruning G [performed Ebe \ E . Given E , E and the dependence graph 8e=(u, 8e=(u, )2E(for a a d d runing can be performed (e.g., dropping 0  i  L. This is incrementally achieved as: ive pruning can performed (e.g., dropping = G [ E \ E . Given E , E and the dependence graph a a d d L. i00This  L. This is incrementally achieved as: 00  is incrementally achieved as: L. i i is achieved as: require backpropagation  L.This This isincrementally incrementally achieved as: contributions (for edge deletions), and update existing conTL. sdaltogether), altogether), itask would require backpropagation  ii00  L. This is incrementally achieved as: gether), itit,would require backpropagation altogether), it would require backpropagation i  This is incrementally achieved as: contributions (for edge deletions), and update existing conertices altogether), it would require backpropagation T A we two questions that help us transform C to C . gether), would require backpropagation G L srtices it would require backpropagation AG , we ask two questions that help us transform C L to CL L . (for ofofmutations) respectively. ⁄ tributions ÿ transitive ÿ ⁄ ÿ ÿ ⁄ ÿ ⁄ ÿ ÿ hat get changed during re�nement to recomrecom⁄ ÿ ÿ ÿ ring re�nement to recomt changed re�nement to es that get changed during re�nement totorecomtributions (for transitive e�ects mutations) respectively. ues that getduring changed during re�nement recom… ⁄ ⁄ ÿ ÿe�ects ÿ ÿ “ – –– et changed during re�nement to recomrecomhat get changed during re�nement to T T T TT T T T T T … (A) What to Re�ne? T T T T T T “ – T T 4 T T T ( ) = ( ) (c (u)) – (c (u)) 4 (c (u)) (u)) (c (u)) (c (u)) (A) What to Re�ne? (i(c (c(ci–– (c 4 (c require (i)ii)While (u)) –1similar (ci 4 (u)) (ci1i (u)) 1 (u)) ( ) ii=(( ))iii==(( )i)ii((=i()() )ii)(==) i(c (u)) –(u)) 11While ii 11,(c 4(u)) (c (u)) (c–ii(c11is (c (u)) 1 1(u)) ct old values values for incremental incremental computation. i4 1to values for incremental computation. orrect old values for incremental computation. (c (c (u)) 44 (u)) i 1often 1often i(u)) is–iisimilar to ,4-ii(c - 11and and requireundoing undoing correct old values for incremental computation. 11(u)) i values 1 i dct values for incremental computation. ii 11(u)) cremental computation. old for computation. We dynamically transform aggregation in A to make G T T T T 8e=(u, )2E 8e=(u, )2E 8e=(u, )2E 8e=(u, )2E )2E 8e=(u, )2E We dynamically transform aggregation values in A8e=(u, make aggregation to eliminate oror update previous contributions. T )2E aa8e=(u, 8e=(u, )2E 8e=(u, T )2E a8e=(u, dd d d8e=(u, d 8e=(u, )2E 8e=(u, )2E G to 8e=(u, )2E8e=(u, )2E 8e=(u, a 8e=(u, )2E )2E T)2E 8e=(u, )2E )2E aa8e=(u, d aggregation to eliminate update previous contributions. T under synchronous semantics. Driven Value Re�nement )2E 8e=(u, )2E ndency-Driven Value Re�nement ncy-Driven Value Re�nement – 8e=(u, )2E a d endency-Driven Value Re�nement T Driven Value Re�nement T T ncy-Driven Value Re�nement them consistent with G To T – ss.t (u),c (u) .t(u),c .c1.c (u),c .t .ciis on (u)i T1 (u) 4 i(u) 1T .t (u),c (u) Re�nement .t.c .ciiss1.t .c (u) We focus our discussion incremental since logic i 1 them consistent with GT under synchronous semantics. To i 1 1 (u),c 1s1i(u),c i 1 ion 1 incremental i 1T 4 sinceits We focus our discussion its logic – – s .t .c (u),c (u) he set of edges to be added to G and d EE be the set of edges to be added to G and “ – – be the set of edges to be added to G and “ – – “ – – i 1 dset of edges to be added to G and be the set of edges to be added to G and nd be the set of edges to be added to G and dthe “ – – do we start with aggregation values in �rst iteration, i.e., “ – – “ – – i 1 4– d doso, – 4 4 subsumes that for . We generalize by modeling it as: 4 so,added we starttoto with aggregation values in �rst iteration, where -- and incremental aggregation operators 4 where , , -are are operators 4and where and are4incremental operators T T. Hence, where , where and are incremental aggregation operators T.G ,, – --i.e., 4 by modeling it as: 34 - and areincremental incremental aggregation operators T es toG0respectively be Gto “where –,iteration. subsumes that foraggregation - .aggregation We generalize pectively itit G .. forward Hence, om G transform itHence, to G respectively to transform itTTto to G Hence, pectively to transform toand G G respectively it G . Hence, rom to transform it to G . Hence, (respectively )to 2 transform Vto ,transform and progress iteration by A ÿ ÿ ⁄ 4 2and V the ,Eand progress forward iteration by iteration. that contributions (for edge additions), remove old that add (for edge additions), remove where ,that - new and are incremental aggregation operators Tdependence add new contributions (for edge additions), remove old old that add add new contributions (forÿ edge additions), remove old that add thatnew addnew newcontributions contributions (for edge additions), remove old ÿ ⁄ T Given EE)EEit graph EEE .\. Given EEE0a.(,,Given ,G and the dependence graph and the dependence graph ,, to EEEA and dependence graph ransform .the Hence, \Given Given and the dependence graph pute vertex value for the current iteration. This computation can be formulated as 2 : º ci ( ) = ( (c i 1 (u)) ) Refinement: Transitive Changes 8e=(u, )2E Iteration … k k+1 ≤ Vertex aggregation where indicates the aggregation operator and indicates the function applied on the aggregated value to… produce the �nal vertex value. For ≤example in Algorithm 1, is �� v x �A�� on line 6 while is the computation on line 9. Since values �owing through edges are e�ectively combined into aggregated values at vertices, we can track these aggregated ( ).( ) y values instead of individual dependency information. By doing so, value dependencies can be corrected upon graph mutation by incrementally correcting the aggregated values and propagating corrections across subsequent aggregations ( ) - Transform throughout graph. ( ).( ) the - Combine Let i ( ) be the … aggregated value for vertex for iteration i, i.e., i ( ) = (c i 1 (u)). We de�ne AG = (VA , E A ) as 8e=(u, )2E dependency graph in terms of aggregation values at the end of iteration k: – E A = { ( i 1 (u), i ( )) : VA = i( ) i 2[0,k ] i 2 [0, k] ^ (u, ) 2 E } ations progress. For example, Figure 4 values change across iterations in Lab Wiki graph (graph details in Table 2) fo dow. As we can see, the color density is iterations indicating that majority of ver those iterations; after 5 iterations, values the color density decreases sharply. As v corresponding aggregated values also st a useful opportunity to limit the amount that must be tracked during execution. We conservatively prune the depen balance the memory requirements for values with recomputation cost during particular, we incorporate horizontal p pruning over the dependence graph tha di�erent dimensions. As values start tal pruning is achieved by directly sto of aggregated values after certain iter the horizontal red line in Figure 4 indic which aggregated values won’t be track on the other hand, operates at vertex-le by not saving aggregated values that h 35 the h eliminates the white regions above 8: ��A��(&sum[ ][i ations + 1], new_de ) example, Figure 4 progress. For ree[u] pute vertex value for the current iteration. This computation values change across iterations in Lab 9: end function can be formulated as 2 : 10: function P��( ) Wiki graph (graph details in Table 2) fo º dow. As we can see, the color density is 11: ) for i 2 [0...k] do ci ( ) = ( (c i 1 (u)) 12: ��M��(E_add, ��, i) iterations indicating that majority of ver 8e=(u, )2E 13: ��M��(E_delete, ��, i) those iterations; after 5 iterations, values ≤ … Vertex aggregation where indicates the aggregation operator and Incremental refinement 14: end forindicates the color density decreases sharply. As v Aggregation the function applied on the aggregated15: valueVto produce= the _updated ��S��(E_add [ E_delete) corresponding aggregated values also st … 16: V1,_chanise�� = ��T��(E_add [ E_delete)to limit the amount �nal vertex value. For ≤example in Algorithm a useful opportunity Complex 17: forxline i 2 [0...k] do v x �A�� on line 6 while is thevcomputation on 9. Since that must be tracked during execution. 18: E_update into = {(u, ) : u 2We V _updated} values �owing through edges are e�ectively combined conservatively prune the depen • Retract : Old value 19: ��M��(E_update, ��, i) aggregated values at vertices, we can) track these aggregated balance the memory requirements for ( ).( ) ( ).( 20: ��M��(E_update, ��, i) : New value • Propagate y y values instead of individual dependency information. By values with recomputation cost during 21: V _dest = ��T��(E_update) doing so, value dependencies can be corrected upon graph particular, we incorporate horizontal p 22: V _chan e = V _chan e [ V _dest mutation by incrementally correcting the aggregated values pruning over the dependence graph tha 23: V _updated = ��M��(V _chan e, ��, i) and propagating corrections across subsequent aggregations di�erent dimensions. As values start ( ) - Transform 24: end for Œ Õtal pruning throughout graph. achieved by sdirectly sto 0 ) ⇥ is 0, s) ⇥ c(u, 0) ) ( ).( ) the - Combine 25: 8s 2 S : ( (u, s (u, , s Belief Propagation Let i ( ) be the 2S aggregated values after certain iter 8e=(u, )2E 8s 0 of … aggregated value for vertex for iteration 0) Œ Õ c(u,s i, i.e., i ( ) = (c i 1 (u)). We de�ne26:AG 8s = (V , E ) as 0 0 A A the horizontal red line in Figure 2S : ( (u, s ) ⇥ (u, , s , s) ⇥ c( ,s 0 ) ) 4 indic 8e=(u, )2E 2S 8e=(u, )2E 8s 0which aggregated values won’t be track dependency graph in terms of aggregation values at the end 27: end function on the other hand, operates at vertex-le of iteration k: Aggregation Vertexvalues Value that h by notTransform saving aggregated – E A = { ( i 1 (u), i ( )) : 36 the h VA = i( ) eliminates the white regions above i 2[0,k ] i 2 [0, k] ^ (u, ) 2 E } Iteration Refinement: Transitive Changes k k+1 8: ��A��(&sum[ ][i ations + 1], new_de ) example, Figure 4 progress. For ree[u] pute vertex value for the current iteration. This computation values change across iterations in Lab 9: end function can be formulated as 2 : 10: function P��( ) Wiki graph (graph details in Table 2) fo º dow. As we can see, the color density is 11: ) for i 2 [0...k] do ci ( ) = ( (c i 1 (u)) 12: ��M��(E_add, ��, i) iterations indicating that majority of ver 8e=(u, )2E 13: ��M��(E_delete, ��, i) those iterations; after 5 iterations, values ≤ … Vertex aggregation where indicates the aggregation operator and Incremental refinement 14: end forindicates the color density decreases sharply. As v Aggregation the function applied on the aggregated15: valueVto produce= the _updated ��S��(E_add [ E_delete) corresponding aggregated values also st … 16: V1,_chanise�� = ��T��(E_add [ E_delete)to limit the amount �nal vertex value. For ≤example in Algorithm a useful opportunity Complex 17: forxline i 2 [0...k] do v x �A�� on line 6 while is thevcomputation on 9. Since that must be tracked during execution. E_update into = {(u, ) : u 2We V _updated} )( 18: ) values �owing through edges are (e�ectively combined conservatively prune the depen • Retract : Old value 19: ��M��(E_update, ��, i) aggregated values at vertices, we can) track these aggregated balance the memory requirements for ( ).( ) ( ).( 20: ��M��(E_update, ��, i) : New value • Propagate y y values instead of individual dependency information. By values with recomputation cost during 21: V _dest = ��T��(E_update) doing so, value dependencies can be corrected upon graph particular, we incorporate horizontal p 22: V _chan e = V _chan e [ V _dest mutation by incrementally correcting the aggregated values pruning over the dependence graph tha 23: V _updated = ��M��(V _chan e, ��, i) and propagating corrections across subsequent aggregations di�erent dimensions. As values start ( ) - Transform 24: end for Œ Õtal pruning throughout graph. achieved by sdirectly sto 0 ) ⇥ is 0, s) ⇥ c(u, 0) ) ( ).( ) the - Combine 25: 8s 2 S : ( (u, s (u, , s Belief Propagation Let i ( ) be the 2S aggregated values after certain iter 8e=(u, )2E 8s 0 of … aggregated value for vertex for iteration 0) Œ Õ c(u,s i, i.e., i ( ) = (c i 1 (u)). We de�ne26:AG 8s = (V , E ) as 0 0 A A the horizontal red line in Figure 2S : ( (u, s ) ⇥ (u, , s , s) ⇥ c( ,s 0 ) ) 4 indic 8e=(u, )2E 2S 8e=(u, )2E 8s 0which aggregated values won’t be track dependency graph in terms of aggregation values at the end 27: end function on the other hand, operates at vertex-le of iteration k: Aggregation Vertexvalues Value that h by notTransform saving aggregated – E A = { ( i 1 (u), i ( )) : 37 the h VA = i( ) eliminates the white regions above i 2[0,k ] i 2 [0, k] ^ (u, ) 2 E } Iteration Refinement: Transitive Changes k k+1 ] [7] 1] [8] 14] ] ations progress. For example, Figure 4 values change across iterations in Lab Wiki graph (graph details in Table 2) fo dow. As we can see, the color density is iterations indicating that majority of ver 8e=(u, )2E those iterations; after 5 iterations, values ≤ … Vertex aggregation where indicates the aggregation operator and indicates Incremental refinement the color density decreases sharply. As v Aggregation the function applied on the aggregated value to… produce the corresponding aggregated values also st �nal vertex value. For ≤example in Algorithm 1, is �� a useful opportunity to limit the amount Simple Complex v x k �A�� on line 6 while is thevcomputation onxline 9. Since that must be tracked during execution. values �owing through edges are e�ectively combined into We conservatively prune Propagate : Change in vertex valuethe depen aggregated values at vertices, we can) track these aggregated balance the memory requirements for ( ).( ) ( ).( y y values instead of individual dependency information. By values with recomputation cost during k+1 doing so, value dependencies can be corrected upon graph particular, we incorporate horizontal p mutation by incrementally correcting the aggregated values pruning over the dependence graph tha and propagating corrections across subsequent aggregations di�erent dimensions. As values start ( ) - Transform throughout graph. tal…pruning is achieved by directly sto ( ).( ) the - Combine Algorithm Aggregation ( ) PageRank Let i ( ) be the aggregated value for vertex for iteration of aggregated values after certain iter … Edges Vertices Õ c (u) horizontal red line in Figure 4 indic i, i.e., i ( ) = PageRank (c i 1 (u)). (PR) We de�ne AG = (VA , E A ) as out _dthe 378M 12M e r e e (u) 8e=(u, )2E 8e =(u, )2E which aggregated values won’t be track 1.0B 39.5M dependency graph in terms of aggregation valuesŒat the end Õ 0 )on 0, s)hand, 0) ) 1.5B 41.7M the other operates at vertex-le Belief Propagation (BP) 8s 2 S : ( (u, s ⇥ (u, , s ⇥ c(u, s of iteration k: 8e =(u, )2E 8s 0 2STransform 2.0B 52.6M Aggregation Vertex Value aggregated values that h by not saving – E A = { ( i 1 (u), i ( )) :Õ 2.5B 68.3M VA = 38 the h i( ) eliminates the )white regions above Label Propagation (LP) 8f 2 F : c(u, f ) ⇥ wei ht (u, i 2[0,k ] 6.6B 1.4B i 2 [0, k] ^ (u, ) 2 E } pute vertex value for the current iteration. This computation can be formulated as 2 : º ci ( ) = ( (c i 1 (u)) ) Iteration Refinement: Aggregation Types ] [7] 1] [8] 14] ] ations progress. For example, Figure 4 values change across iterations in Lab Wiki graph (graph details in Table 2) fo dow. As we can see, the color density is iterations indicating that majority of ver 8e=(u, )2E those iterations; after 5 iterations, values ≤ … Vertex aggregation where indicates the aggregation operator and indicates Incremental refinement the color density decreases sharply. As v Aggregation the function applied on the aggregated value to… produce the corresponding aggregated values also st �nal vertex value. For ≤example in Algorithm 1, is �� a useful opportunity to limit the amount Simple Complex v x k �A�� on line 6 while is thevcomputation onxline 9. Since that must be tracked during execution. ) values �owing through edges are (e�ectively combined into We conservatively prune Propagate : Change in vertex valuethe depen aggregated values at vertices, we can) track these aggregated balance the memory requirements for ( ).( ) ( ).( y y values instead of individual dependency information. By values with recomputation cost during k+1 doing so, value dependencies can be corrected upon graph particular, we incorporate horizontal p mutation by incrementally correcting the aggregated values pruning over the dependence graph tha and propagating corrections across subsequent aggregations di�erent dimensions. As values start ( ) - Transform throughout graph. tal…pruning is achieved by directly sto ( ).( ) the - Combine Algorithm Aggregation ( ) PageRank Let i ( ) be the aggregated value for vertex for iteration of aggregated values after certain iter … Edges Vertices Õ c (u) horizontal red line in Figure 4 indic i, i.e., i ( ) = PageRank (c i 1 (u)). (PR) We de�ne AG = (VA , E A ) as out _dthe 378M 12M e r e e (u) 8e=(u, )2E 8e =(u, )2E which aggregated values won’t be track 1.0B 39.5M dependency graph in terms of aggregation valuesŒat the end Õ 0 )on 0, s)hand, 0) ) 1.5B 41.7M the other operates at vertex-le Belief Propagation (BP) 8s 2 S : ( (u, s ⇥ (u, , s ⇥ c(u, s of iteration k: 8e =(u, )2E 8s 0 2STransform 2.0B 52.6M Aggregation Vertex Value aggregated values that h by not saving – E A = { ( i 1 (u), i ( )) :Õ 2.5B 68.3M VA = 39 the h i( ) eliminates the )white regions above Label Propagation (LP) 8f 2 F : c(u, f ) ⇥ wei ht (u, i 2[0,k ] 6.6B 1.4B i 2 [0, k] ^ (u, ) 2 E } pute vertex value for the current iteration. This computation can be formulated as 2 : º ci ( ) = ( (c i 1 (u)) ) Iteration Refinement: Aggregation Types 20: _dest =e ��T��(E_update) 22: V _updated _chan e, ��, i) 21: _chan ==V��M��(V _chan e [ V _dest 21: _chan _chan e [ V _dest 23: end for e ==V��M��(V 22: V _updated _chan e, ��, i) 22: V _updated =_chan ��M��(V _chan e, ��, i) 24: ��M��(V e, ��B��, k) 23: end for 23: end for 25: function 24: end ��M��(V _chan e, ��B��, k) 24: end ��M��(V 25: function _chan e, ��B��, k) 25: end function Algorithm 2 PageRank - Complex Aggregation 5: 6: 7: oldpr [u][i] ��S��(&sum[ ][i + 1], old_de ree[u] ) end function function ��(e = (u, ), i) newpr [u][i] ��A��(&sum[ ][i + 1], new_de ree[u] ) end function function P��( ) Direct changes for i 2 [0...k] do ��M��(E_add, ��, i) ��M��(E_delete, ��, i) end for V _updated = ��S��(E_add [ E_delete) V _chan e = ��T��(E_add [ E_delete) for i 2 [0...k] do E_update = {(u, ) : u 2 V _updated } ��M��(E_update, ��, i) ��M��(E_update, ��, i) V _dest = ��T��(E_update) V _chan e = V _chan e [ V _dest V _updated = ��M��(V _chan e, ��, i) end for end function GraphBolt: Programming Model Algorithm PageRank - Complex 1: function2��(e = (u, ), Aggregation i) Algorithm 2 PageRank Complex Aggregation oldpr 1: function ��(e = (u, ), i) [u][i] 2: 1: 2: 3: 2: 4: 3: 3: 4: 5: 4: 5: 6: 5: 7: 6: 6: 7: 8: 7: 8: 9: 8: 10: 9: 9: 11: 10: 10: 12: 11: 11: 13: 12: 12: 14: 13: 13: 15: 14: 14: 16: 15: 15: 17: 16: 16: 18: 17: 17: 19: 18: 18: 20: 19: ��A��(&sum[ ][i + 1], ree[u] ) oldpr function ��(e = (u, old_de ), i) [u][i] ��A��(&sum[ ][i + 1], old_de end function ree[u] ) oldpr [u][i] ��A��(&sum[ function ��(e =][i (u,+ 1], ), i)old_de ree[u] ) end function end function function ��(e =][i(u,+ 1], ), i) oldpr [u][i] ) ��S��(&sum[ old_de ree[u] oldpr [u][i] function ��(e = (u, ), i) ��S��(&sum[ ][i + 1], end function old_de ree[u] ) oldpr [u][i] ��S��(&sum[ 1], ),old_de function ��(e][i=+(u, i) end function ree[u] ) newpr [u][i] end function function ��(e][i =+ (u,1],), i) ��A��(&sum[ ree[u] ) newpr [u][i] function ��(e = (u, ),new_de i) ��A��(&sum[ ][i + 1], new_de end function ree[u] ) newpr [u][i] ��A��(&sum[ function P��( ) ][i + 1], new_de ree[u] ) end function end forfunction i 2 [0...k] do ) function P��( function P��( for��M��(E_add, i 2 [0...k] do )��, i) for��M��(E_delete, i 2 [0...k] do ��, ��, i) i) ��M��(E_add, ��M��(E_add, ��, end for ��M��(E_delete, ��, i) i) ��M��(E_delete, ��, i) [ E_delete) V _updated end for = ��S��(E_add end for e ==��T��(E_add V _chan _updated ��S��(E_add [[E_delete) E_delete) _updated ��S��(E_add E_delete) for i 2 [0...k] do V _chan e ==��T��(E_add [[E_delete) V _chan e == ��T��(E_add [ E_delete) E_update {(u, ) : u 2 V _updated } for i 2 [0...k] do forE_update i 2 [0...k]=do ��M��(E_update, i) } {(u, ) : u��, 2 V _updated E_update = {(u, ) : u��, 2 V _updated ��M��(E_update, ��, i) i) } 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 1 40 20: _dest =e ��T��(E_update) 22: V _updated _chan e, ��, i) 21: _chan ==V��M��(V _chan e [ V _dest 21: _chan _chan e [ V _dest 23: end for e ==V��M��(V 22: V _updated _chan e, ��, i) 22: V _updated =_chan ��M��(V _chan e, ��, i) 24: ��M��(V e, ��B��, k) 23: end for 23: end for 25: function 24: end ��M��(V _chan e, ��B��, k) 24: end ��M��(V 25: function _chan e, ��B��, k) 25: end function Algorithm 2 PageRank - Complex Aggregation 5: 6: 7: oldpr [u][i] ��S��(&sum[ ][i + 1], old_de ree[u] ) end function function ��(e = (u, ), i) newpr [u][i] ��A��(&sum[ ][i + 1], new_de ree[u] ) end function function P��( ) for i 2 [0...k] do ��M��(E_add, ��, i) ��M��(E_delete, ��, i) end for V _updated = ��S��(E_add [ E_delete) V _chan e = ��T��(E_add [ E_delete) for i 2 [0...k] do E_update = {(u, ) : u 2 V _updated } Transitive changes ��M��(E_update, ��, i) ��M��(E_update, ��, i) V _dest = ��T��(E_update) V _chan e = V _chan e [ V _dest V _updated = ��M��(V _chan e, ��, i) end for end function GraphBolt: Programming Model - Complex aggregations Algorithm PageRank - Complex 1: function2��(e = (u, ), Aggregation i) Algorithm 2 PageRank Complex Aggregation oldpr 1: function ��(e = (u, ), i) [u][i] 2: 1: 2: 3: 2: 4: 3: 3: 4: 5: 4: 5: 6: 5: 7: 6: 6: 7: 8: 7: 8: 9: 8: 10: 9: 9: 11: 10: 10: 12: 11: 11: 13: 12: 12: 14: 13: 13: 15: 14: 14: 16: 15: 15: 17: 16: 16: 18: 17: 17: 19: 18: 18: 20: 19: ��A��(&sum[ ][i + 1], ree[u] ) oldpr function ��(e = (u, old_de ), i) [u][i] ��A��(&sum[ ][i + 1], old_de end function ree[u] ) oldpr [u][i] ��A��(&sum[ function ��(e =][i (u,+ 1], ), i)old_de ree[u] ) end function end function function ��(e =][i(u,+ 1], ), i) oldpr [u][i] ) ��S��(&sum[ old_de ree[u] oldpr [u][i] function ��(e = (u, ), i) ��S��(&sum[ ][i + 1], end function old_de ree[u] ) oldpr [u][i] ��S��(&sum[ 1], ),old_de function ��(e][i=+(u, i) end function ree[u] ) newpr [u][i] end function function ��(e][i =+ (u,1],), i) ��A��(&sum[ ree[u] ) newpr [u][i] function ��(e = (u, ),new_de i) ��A��(&sum[ ][i + 1], new_de end function ree[u] ) newpr [u][i] ��A��(&sum[ function P��( ) ][i + 1], new_de ree[u] ) end function end forfunction i 2 [0...k] do ) function P��( function P��( for��M��(E_add, i 2 [0...k] do )��, i) for��M��(E_delete, i 2 [0...k] do ��, ��, i) i) ��M��(E_add, ��M��(E_add, ��, end for ��M��(E_delete, ��, i) i) ��M��(E_delete, ��, i) [ E_delete) V _updated end for = ��S��(E_add end for e ==��T��(E_add V _chan _updated ��S��(E_add [[E_delete) E_delete) _updated ��S��(E_add E_delete) for i 2 [0...k] do V _chan e ==��T��(E_add [[E_delete) V _chan e == ��T��(E_add [ E_delete) E_update {(u, ) : u 2 V _updated } for i 2 [0...k] do forE_update i 2 [0...k]=do ��M��(E_update, i) } {(u, ) : u��, 2 V _updated E_update = {(u, ) : u��, 2 V _updated ��M��(E_update, ��, i) i) } 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 1 41 22: V _updated _chan e, ��, i) 21: _chan e ==V��M��(V _chan e [ V _dest 23: end for 22: V _updated = ��M��(V _chan e, ��, i) 24: ��M��(V _chan e, ��B��, k) 23: end for 25: function _chan e, ��B��, k) 24: end ��M��(V 25: end function 5: 6: 7: oldpr [u][i] ��S��(&sum[ ][i + 1], old_de ree[u] ) end function function ��D��(e = (u, ), i) newpr [u][i] oldpr [u][i] ��A��(&sum[ ][i + 1], new_de ree[u] old_de ree[u] ) end function function P��( ) for i 2 [0...k] do ��M��(E_add, ��, i) ��M��(E_delete, ��, i) end for V _updated = ��S��(E_add [ E_delete) V _chan e = ��T��(E_add [ E_delete) for i 2 [0...k] do E_update = {(u, ) : u 2 V _updated } Transitive changes ��M��(E_update, ��D��, i) V _dest = ��T��(E_update) V _chan e = V _chan e [ V _dest V _updated = ��M��(V _chan e, ��, i) end for end function GraphBolt: Programming Model - Simple aggregations Algorithm 2 PageRank - Complex Aggregation Algorithm PageRank - Complex 1: function2��(e = (u, ), Aggregation i) Algorithm 1 PageRank - Simple Aggregation oldpr [u][i] 1: 2: 1: 2: 3: 2: 4: 3: 3: 4: 5: 4: 5: 6: 5: 7: 6: 6: 7: 8: 7: 8: 9: 8: 10: 9: 9: 11: 10: 10: 12: 11: 11: 13: 12: 12: 14: 13: 13: 15: 14: 16: 14: 15: 17: 15: 16: 18: 16: 17: 19: 18: 17: 20: 19: 18: function ��(e = 1], (u, ), i) ��A��(&sum[ ][i + ree[u] ) function ��(e = (u, old_de ), i) [u][i] oldpr ��A��(&sum[ ][i + 1], old_de end function oldpr [u][i] ree[u] ) ��A��(&sum[ ][i + 1], function ��(e = (u, ), i)old_de ree[u] ) end function end function function ��(e =][i(u,+ 1], ), i) oldpr [u][i] ) ��S��(&sum[ old_de ree[u] oldpr [u][i] function ��(e = (u, ), i) ��S��(&sum[ ][i + 1], end function oldpr [u][i] old_de ree[u] ) ��S��(&sum[ 1], ),old_de function ��(e][i=+(u, i) ree[u] ) end function newpr [u][i] end function function ��(e][i =+ (u,1],), i) ��A��(&sum[ ree[u] ) newpr [u][i] function ��D��(e =new_de (u, ), i) ��A��(&sum[ ][i + 1], new_de end function newpr [u][i] ree[u] ) ��A��(&sum[ function P��( ) ][i + 1], new_de ree[u] end function end forfunction i 2 [0...k] do ) function P��( function P��( for��M��(E_add, i 2 [0...k] do )��, i) for��M��(E_delete, i 2 [0...k] do ��, ��, i) i) ��M��(E_add, ��M��(E_add, ��, end for ��M��(E_delete, ��, i) i) ��M��(E_delete, ��, i) [ E_delete) V _updated end for = ��S��(E_add V _chan end for e ==��T��(E_add _updated ��S��(E_add [[E_delete) E_delete) for i 2 [0...k] do _updated ��S��(E_add E_delete) V _chan e ==��T��(E_add [[E_delete) E_update {(u, ) : u 2 V _updated } V _chan e == ��T��(E_add [ E_delete) for i 2 [0...k] do ��M��(E_update, i) } {(u, ) : u��, 2 V _updated forE_update i 2 [0...k]= do ��M��(E_update, ��, i) i) E_update = {(u, ) : u��, 2 V _updated} 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: oldpr [u][i] 20: old_de ree[u] ) 21: 22: 23: 24: Algorithm 2 PageRank - Complex Aggregation 1: function ��(e = (u, ), i) oldpr [u][i] ��A��(&sum[ ][i + 1], old_de ree[u] ) 3: end function 4: function ��(e = (u, ), i) 2: 42 GraphBolt Details … • Aggregation properties & non-decomposable aggregations • Pruning dependencies for light-weight tracking • • Vertical pruning • Horizontal pruning Hybrid incremental execution • With & without dependency information • Graph data structure & parallelization model 43 ces M M M M M B uation. mB 48) Hz GB B/sec ion. Experimental Setup Algorithm Aggregation ( PageRank (PR) Belief Propagation (BP) 8s 2 S : Label Propagation (LP) Œ 8f 2 F : Collaborative Filtering (CF) Triangle Counting (TC) h Õ ( Õ 8s 0 2S Õ 8e =(u, )2E 8e =(u, 8e =(u, )2E c(u, f ) ⇥ wei ht (u, ) c (u)⇥w ei ht (u, ) Õ w ei ht (w, ) 8e =(w, )2E tr )2E c i (u).c i (u) , Õ (u, s 0 ) ⇥ (u, , s 0, s) ⇥ c(u, s 0 ) ) 8e =(u, )2E Õ ) c (u) out _d e r e e (u) 8e =(u, )2E 8e =(u, )2E Co-Training Expectation Maximization (CoEM) Õ … Õ 8e =(u, )2E c i (u).wei ht (u, ) i |in_nei hbor s(u) \ out _nei hbor s( )| Table 4. Graph algorithms used in evaluation and their aggregation functions. ining Expectation Maxpervised learning algoCollaborative Filtering ach to identify related Triangle Counting (TC) iangles. aphs used in our evalued an initial �xed point To ensure a fair comparison among the above versions, experiments were run such that each algorithm version had the same number of pending edge mutations to be processed (similar to methodology in [44]). Unless otherwise stated, 100K edge mutations were applied before the processing of each version. While Theorem 4.1 guarantees correctness of results via synchronous processing semantics, we validated correctness for each run by comparing �nal results. Graphs Wiki (WK) [47] UKDomain (UK) [7] Twitter (TW) [21] TwitterMPI (TT) [8] Friendster (FT) [14] Yahoo (YH) [49] Edges 378M 1.0B 1.5B 2.0B 2.5B 6.6B Vertices 12M 39.5M 41.7M 52.6M 68.3M 1.4B Table 2. Input graphs used in evaluation. System A System B Core Count 32-core 32 (1 (2 ⇥ GB 48) Server: / ⇥2 32) GHz 96 / 231 Core Speed 2GHz 2.5GHz Memory Capacity 231GB 748GB Memory Speed 9.75 GB/sec 7.94 GB/sec Table 3. Systems used in evaluation. Be La Co M Coll Tr Tab is a learning algorithm while Co-Training Expect imization (CoEM) [28] is a semi-supervised lea 44 rithm for named entity recognition. Collaborativ Computation time Execution time (seconds) 0 0 PR on TT( 30 20 10 1K 200 1K 10K 10K 100K COEM on FT 150 100 50 100K 200 0 0 150 100 50 1K 300 1K 10K 10K 100K 225 150 75 100K Normalized Computation time time Execution time computation (seconds) CF on TT Computation time Execution time (seconds) 40 Execution time Computation time (seconds) 50 Computation time Execution time (seconds) Computation time Execution time (seconds) GraphBolt Performance Ligra GraphBolt-Reset GraphBolt 500 0 LP on FT 0 BP on TT 400 1 300 200 100 1K 80 1K 10K 10K 100K 0 TC on FT 60 40 20 100K 45 50 25 50 1K 10K COEM on FT (28.5x to 32.3x) 40 30 20 10 0 20 100 15 10 5 0 1K 10K 100K 14 7 0 20 100 0 10 5 100 1K 10K BP on TT (8.6x to 23.1x) 60 1 40 20 LP on FT (5.9x to 24.7x) 50 25 0 20 100 15 10 5 1K 10K 100K 75 500 25 0 100K 75 0 80 0 100 15 0 100K 1 % Edge Computations 75 Execution time Computation time (seconds) Computation time Execution time (seconds) % Edge Computations 0 100 21 Computation time Execution time (seconds) 4 CF on TT (3.8x to 8.6x) % Edge Computations 8 28 GB-Reset GraphBolt-Reset GraphBolt GraphBolt Normalized computation time Execution time Computation time Normalized (seconds) computation time 12 0 % Edge Computations PR on TT(1.4x to 1.6x) Computation time Execution time (seconds) 16 % Edge Computations % Edge Computations Computation time Execution time (seconds) GraphBolt Performance 2 1K 10K 100K TC on FT (279x to 722.5x) 1.5 1 0.5 0 0.1 100 0.075 0.05 0.025 0 1K 10K 100K 46 50 25 50 1K 10K COEM on FT (28.5x to 32.3x) 40 30 20 10 0 20 100 15 10 5 0 1K 10K 100K 14 7 0 20 100 0 10 5 100 1K 10K BP on TT (8.6x to 23.1x) 60 1 40 20 LP on FT (5.9x to 24.7x) 50 25 0 20 100 15 9.3% 10 3.6% 5 0.2% 1K 10K 100K 75 500 25 0 100K 75 0 80 0 100 15 0 100K 1 % Edge Computations 75 Execution time Computation time (seconds) Computation time Execution time (seconds) % Edge Computations 0 100 21 Computation time Execution time (seconds) 4 CF on TT (3.8x to 8.6x) % Edge Computations 8 28 GB-Reset GraphBolt-Reset GraphBolt GraphBolt Normalized computation time Execution time Computation time Normalized (seconds) computation time 12 0 % Edge Computations PR on TT(1.4x to 1.6x) Computation time Execution time (seconds) 16 % Edge Computations % Edge Computations Computation time Execution time (seconds) GraphBolt Performance 2 1K 10K 100K TC on FT (279x to 722.5x) 1.5 1 0.5 0 0.1 100 0.075 0.05 0.025 0 1K 10K 100K 47 50 25 50 1K 10K COEM on FT (28.5x to 32.3x) 40 30 20 10 0 20 100 15 10 5 0 1K 10K 100K 14 7 0 20 100 100 0 10 5 100 1K 10K BP on TT (8.6x to 23.1x) 60 1 40 20 LP on FT (5.9x to 24.7x) 50 25 0 20 100 15 10 5 1K 10K 100K 75 500 25 0 100K 75 0 80 0 100 15 0 100K 1 % Edge Computations 75 Execution time Computation time (seconds) Computation time Execution time (seconds) % Edge Computations 0 100 21 Computation time Execution time (seconds) 4 CF on TT (3.8x to 8.6x) % Edge Computations 8 28 GB-Reset GraphBolt-Reset GraphBolt GraphBolt Normalized computation time Execution time Computation time Normalized (seconds) computation time 12 0 % Edge Computations PR on TT(1.4x to 1.6x) Computation time Execution time (seconds) 16 % Edge Computations % Edge Computations Computation time Execution time (seconds) GraphBolt Performance 2 0.4% 1K 2.1% 10K 7.8% 100K TC on FT (279x to 722.5x) 1.5 1 0.5 0 0.1 100 0.075 0.05 0.025 0 1K 10K 100K 48 Detailed Experiments … • Varying workload type • Mutations affecting majority of the values v/s affecting small values • Varying graph mutation rate • Single edge update (reactiveness) v/s 1 million edge updates (throughput) • Scaling to large graph (Yahoo) over large system (96 core/748 GB) 49 375 175 375 250 125 0 250 1 10 100 Batch Size 1K 125 1 10 100 Batch Size 1K PR on TT 10K 41.7x 10K 150 125 100 Execution Time (sec) 500 625 Diferential Dataow GraphBolt 500 Execution Time (seconds) 625 0 7.6x Execution Time (seconds) Execution Time (seconds) GraphBolt v/s Differential Dataflow 175 150 125 100 PR on TT Dierential Dataow GraphBolt 75 50 25 0 75 20 40 60 80 Single Edge Mutations 100 50 25 0 20 40 60 80 Single Edge Mutations 100 PR on TT (100 single edge mutations) Differential Dataflow. McSherry et al., CIDR 2013. 50 Summary • Efficient incremental processing of streaming graphs • Guarantee Bulk Synchronous Parallel semantics • Lightweight dependence tracking at aggregation level • Dependency-aware value refinement upon graph mutation • Programming model to support incremental complex aggregation types Acknowledgements 51

GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs

Products

Support

GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib