IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others Overview Motivation Case study: inTransit Architecture Flow graph deployment/reconfiguration Experiments Other aspects of the system 2 Motivation Lots of data produced in lots of places Examples: operational information systems, scientific collaborations, end-user systems, web traffic data 3 Airline example Check seats Flights arriving Rebook missed connections Shop for flights Concourse display Flights departing Bags scanned Gate display Customers check-in Weather updates Catering updates Baggage display Home user display FAA updates 4 Previous solutions Tools for managing distributed updates Pub/sub middlewares Transaction Processing Facilities In-house solutions Times have changed How How How How to to to to handle larger data volumes? seamlessly incorporate new functionality? effectively prioritize service? avoid hand-tuning the system? 5 Approach Provide a self-managing distributed data flow graph Weather data Select ATL data Terminal or web Predict delays Flight data Check-in data Generate customer messages Correlate flights and reservations 6 Approach Deploy operators in a network overlay Middleware should self-manage this deployment Provide necessary performance, availability Respond to business-level needs 7 IFLOW X-Window Client Coordinates Calculates Distance and Bonds Coordinates +Bonds FLIGHTS Radial Distance Molecular Dynamics Experiment OVERHEADDISPLAY WEATHER COUNTERS IPaq Client ImmersaDesk AirlineFlowGraph CollaborationFlowGraph Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), { (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } { (DistBond, RadDist), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } IFLOW middleware [ICAC ’06] 8 Case study inTransit Query processing over distributed event streams Operators are streaming versions of relational operators 9 Architecture Query? Application layer Middleware layer Data-flow parser ECho pub-sub Stones PDS Messaging inTransit Distributed Stream Management Infrastructure Flow-graph control Underlay layer IFLOW [ICDCS ’05] 10 Application layer Applications specify data flow graphs Can specify directly Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; N1 N2 N7 ⋈ ⋈ ‘DL207’ N10 11 Middleware layer ECho – pub/sub event delivery Event channels for data streams Native operators E-code for most operators Library functions for special cases Stones – operator containers Queues and actions Channel 1 ⋈ Channel 3 Channel 2 12 Middleware layer PDS – resource monitoring Nodes update PDS with resource info inTransit notified when conditions change CPU CPU? CPU CPU 13 Flow graph deployment Where to place operators? 14 Flow graph deployment Where to place operators? Basic idea: cluster physical nodes 15 Flow graph deployment Partition flow graph among coordinators Coordinators represent their cluster Exhaustive search among coordinators N1 ? ⋈ N2 ⋈ ? ‘DL207’ ? N10 N7 16 Flow graph deployment Coordinator deploys subgraph in its cluster Uses exhaustive search to find best deployment ⋈ ? 17 Flow graph reconfiguration Resource or load changes trigger reconfiguration Clusters reconfigure locally Large changes require inter-cluster reconfiguration ⋈ 18 Hierarchical clusters Coordinators themselves are clustered Coordinators form a hierarchy May need to move operators between clusters Handled by moving up a level in the hierarchy 19 What do we optimize Basic metrics Bandwidth used End to end delay 1 0.9 0.8 Autonomic metrics Business value Infrastructure cost 0.7 0.6 Business utility 0.5 0.4 0.3 0.2 0 10 0.1 20 0 30 0 1 2 3 End-to-end delay 40 4 User priority 5 6 7 8 9 50 10 [ICAC ’05] 20 Experiments Simulations GT-ITM transit/stub Internet topology (128 nodes) NS-2 to capture trace of delay between nodes Deployment simulator reacts to delay OIS case study Flight information from Delta airlines Weather and news streams Experiments on Emulab (13 nodes) 21 Approximation penalty 700 Centralized Decentralized End-to-end delay (ms) . 600 500 400 300 200 100 0 4 6 8 10 12 14 Nodes in flow graph Flow graphs on simulator 22 Impact of reconfiguration End-to-end delay (ms) . 400 350 300 250 200 150 100 50 Dynamic Static 0 0 500 1000 1500 2000 Time (seconds) 10 node flow graph on simulator 23 Impact of reconfiguration 68 66 End-to-end delay (ms) Dynamic Network congestion Static Increased processor load 64 62 60 58 56 54 0 500 1000 1500 2000 Time (seconds) 2 node flow graph on Emulab 24 Different utility functions 500 actual-utility 450 cost delay 450 400 350 400 300 350 250 200 300 150 250 Delay (msec) Utility or cost (10^3 dollars/sec) 500 100 200 50 0 150 Utility Cost Delay Optimization criterion Simulator, 128 node network 25 Query planning We can optimize the structure of the query graph A different join order may enable a better mapping But there are too many plan/deployment possibilities to consider Use the hierarchy for planning Plus: stream advertisements to locate sources and deployed operators Planning algorithms: top-down, bottom-up [IPDPS ‘07] 27 Planning algorithms Top down A⋈B⋈C⋈D A⋈B⋈ A B A⋈B C⋈D C ⋈ D C⋈D 28 Planning algorithms Bottom up A⋈B A⋈B A B A⋈B C D ⋈C⋈D A⋈B A⋈B⋈C⋈D 29 Bandwidth cost per unit time (dollars) Query planning 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Phased Combined 100 queries, each over 5 sources, 64 node network 30 Availability management Goal is to achieve both: These goals often conflict! Performance Reliability Spend scarce resources on throughput or availability? Manage tradeoff using utility function 31 Fault tolerance Basic approach: passive standby Log of messages can be replayed Periodic “soft-checkpoint” from active to standby ⋈ X ⋈ Performance versus availability (fast recovery) More soft-checkpoints = faster recovery, higher overhead Choose a checkpoint frequency that maximizes utility [Middleware ’06] 32 Proactive fault tolerance Goal: predict system instability 33 Proactive fault tolerance 34 Mean time to recovery 38 IFLOW beyond inTransit inTransit Pub/sub Science app … Self-managing information flow Complex infrastructure 39 Related work Stream data processing engines Content-based pub/sub STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. Borealis, TRAPP, Flux, TAG Gryphon, ARMADA, Hermes Overlay networks P2P Multicast (e.g. Bayeux) Grid Other overlay toolkits P2, MACEDON, GridKit 40 Conclusions IFLOW is a general information flow middleware inTransit distributed event management infrastructure Self-configuring and self-managing Based on application-specified performance and utility Queries over streams of structured data Resource-aware deployment of query graphs IFLOW provides utility-driven deployment and reconfiguration Overall goal Provide useful abstractions for distributed information systems Implementation of abstractions is self-managing Key to scalability, manageability, flexibility 41 For more information http://www.brianfrankcooper.net cooperb@yahoo-inc.com 42