IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others Overview       Motivation Case study: inTransit Architecture Flow graph deployment/reconfiguration Experiments Other aspects of the system 2 Motivation  Lots of data produced in lots of places  Examples: operational information systems, scientific collaborations, end-user systems, web traffic data 3 Airline example Check seats Flights arriving Rebook missed connections Shop for flights Concourse display Flights departing Bags scanned Gate display Customers check-in Weather updates Catering updates Baggage display Home user display FAA updates 4 Previous solutions  Tools for managing distributed updates     Pub/sub middlewares Transaction Processing Facilities In-house solutions Times have changed     How How How How to to to to handle larger data volumes? seamlessly incorporate new functionality? effectively prioritize service? avoid hand-tuning the system? 5 Approach  Provide a self-managing distributed data flow graph Weather data Select ATL data Terminal or web Predict delays Flight data Check-in data Generate customer messages Correlate flights and reservations 6 Approach   Deploy operators in a network overlay Middleware should self-manage this deployment   Provide necessary performance, availability Respond to business-level needs 7 IFLOW X-Window Client Coordinates Calculates Distance and Bonds Coordinates +Bonds FLIGHTS Radial Distance Molecular Dynamics Experiment OVERHEADDISPLAY WEATHER COUNTERS IPaq Client ImmersaDesk AirlineFlowGraph CollaborationFlowGraph Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), { (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } { (DistBond, RadDist), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } IFLOW middleware [ICAC ’06] 8 Case study  inTransit   Query processing over distributed event streams Operators are streaming versions of relational operators 9 Architecture Query? Application layer Middleware layer Data-flow parser ECho pub-sub Stones PDS Messaging inTransit Distributed Stream Management Infrastructure Flow-graph control Underlay layer IFLOW [ICDCS ’05] 10 Application layer  Applications specify data flow graphs   Can specify directly Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; N1 N2 N7 ⋈ ⋈ ‘DL207’ N10 11 Middleware layer  ECho – pub/sub event delivery   Event channels for data streams Native operators    E-code for most operators Library functions for special cases Stones – operator containers  Queues and actions Channel 1 ⋈ Channel 3 Channel 2 12 Middleware layer  PDS – resource monitoring   Nodes update PDS with resource info inTransit notified when conditions change CPU CPU? CPU CPU 13 Flow graph deployment  Where to place operators? 14 Flow graph deployment   Where to place operators? Basic idea: cluster physical nodes 15 Flow graph deployment  Partition flow graph among coordinators   Coordinators represent their cluster Exhaustive search among coordinators N1 ? ⋈ N2 ⋈ ? ‘DL207’ ? N10 N7 16 Flow graph deployment  Coordinator deploys subgraph in its cluster  Uses exhaustive search to find best deployment ⋈ ? 17 Flow graph reconfiguration  Resource or load changes trigger reconfiguration   Clusters reconfigure locally Large changes require inter-cluster reconfiguration ⋈ 18 Hierarchical clusters  Coordinators themselves are clustered   Coordinators form a hierarchy May need to move operators between clusters  Handled by moving up a level in the hierarchy 19 What do we optimize  Basic metrics   Bandwidth used End to end delay 1 0.9 0.8  Autonomic metrics   Business value Infrastructure cost 0.7 0.6 Business utility 0.5 0.4 0.3 0.2 0 10 0.1 20 0 30 0 1 2 3 End-to-end delay 40 4 User priority 5 6 7 8 9 50 10 [ICAC ’05] 20 Experiments  Simulations     GT-ITM transit/stub Internet topology (128 nodes) NS-2 to capture trace of delay between nodes Deployment simulator reacts to delay OIS case study    Flight information from Delta airlines Weather and news streams Experiments on Emulab (13 nodes) 21 Approximation penalty 700 Centralized Decentralized End-to-end delay (ms) . 600 500 400 300 200 100 0 4 6 8 10 12 14 Nodes in flow graph Flow graphs on simulator 22 Impact of reconfiguration End-to-end delay (ms) . 400 350 300 250 200 150 100 50 Dynamic Static 0 0 500 1000 1500 2000 Time (seconds) 10 node flow graph on simulator 23 Impact of reconfiguration 68 66 End-to-end delay (ms) Dynamic Network congestion Static Increased processor load 64 62 60 58 56 54 0 500 1000 1500 2000 Time (seconds) 2 node flow graph on Emulab 24 Different utility functions 500 actual-utility 450 cost delay 450 400 350 400 300 350 250 200 300 150 250 Delay (msec) Utility or cost (10^3 dollars/sec) 500 100 200 50 0 150 Utility Cost Delay Optimization criterion Simulator, 128 node network 25 Query planning  We can optimize the structure of the query graph    A different join order may enable a better mapping But there are too many plan/deployment possibilities to consider Use the hierarchy for planning   Plus: stream advertisements to locate sources and deployed operators Planning algorithms: top-down, bottom-up [IPDPS ‘07] 27 Planning algorithms  Top down A⋈B⋈C⋈D A⋈B⋈ A B A⋈B C⋈D C ⋈ D C⋈D 28 Planning algorithms  Bottom up A⋈B A⋈B A B A⋈B C D ⋈C⋈D A⋈B A⋈B⋈C⋈D 29 Bandwidth cost per unit time (dollars) Query planning 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Phased Combined 100 queries, each over 5 sources, 64 node network 30 Availability management  Goal is to achieve both:    These goals often conflict!   Performance Reliability Spend scarce resources on throughput or availability? Manage tradeoff using utility function 31 Fault tolerance  Basic approach: passive standby   Log of messages can be replayed Periodic “soft-checkpoint” from active to standby ⋈ X  ⋈ Performance versus availability (fast recovery)   More soft-checkpoints = faster recovery, higher overhead Choose a checkpoint frequency that maximizes utility [Middleware ’06] 32 Proactive fault tolerance  Goal: predict system instability 33 Proactive fault tolerance 34 Mean time to recovery 38 IFLOW beyond inTransit inTransit Pub/sub Science app … Self-managing information flow Complex infrastructure 39 Related work  Stream data processing engines    Content-based pub/sub   STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. Borealis, TRAPP, Flux, TAG Gryphon, ARMADA, Hermes Overlay networks   P2P Multicast (e.g. Bayeux)  Grid  Other overlay toolkits  P2, MACEDON, GridKit 40 Conclusions  IFLOW is a general information flow middleware    inTransit distributed event management infrastructure     Self-configuring and self-managing Based on application-specified performance and utility Queries over streams of structured data Resource-aware deployment of query graphs IFLOW provides utility-driven deployment and reconfiguration Overall goal    Provide useful abstractions for distributed information systems Implementation of abstractions is self-managing Key to scalability, manageability, flexibility 41 For more information   http://www.brianfrankcooper.net cooperb@yahoo-inc.com 42

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research

Related documents

Products

Support

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib