Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider A Provider B Transcoder Thin Client Cellular Phone Video-on-demand server Text to audio Provider R Email repository Provider Q Problem Statement and Goals Provider A Video-on-demand server Provider A Provider B Transcoder Thin Client Provider B Goals – Performance: Choose set of service instances – Availability: Detect and handle failures quickly – Scalability: Internetscale operation Problem Statement – Path could stretch across – multiple service providers – multiple network domains – Inter-domain Internet paths: – Poor availability [Labovitz’99] – Poor time-to-recovery [Labovitz’00] – Take advantage of service replicas Related Work – TACC: composition within cluster – Web-server choice: SPAND, Harvest – Routing around failures: Tapestry, RON We address: wide-area n/w perf., failure issues for long-lived composed sessions Is “quick” failure detection possible? • What is a “failure” on an Internet path? – Outage periods happen for varying durations • Study outage periods using traces – 12 pairs of hosts • Berkeley, Stanford, UIUC, UNSW (Aus), TU-Berlin (Germany) • Results could be skewed due to Internet2 backbone? – Periodic UDP heart-beat, every 300 ms – Study “gaps” between receive-times • Results: – Short outage (1.2-1.8 sec) Long outage (> 30 sec) • Sometimes this is true over 50% of the time – False-positives are rare: • O(once an hour) at most – Similar results with ping-based study using ping-servers – Take away: okay to react to short outage periods, by switching service-level path UDP-based keep-alive stream HB destination HB source Total time Num. False positives Num. Failures Berkeley UNSW 130:48:45 135 55 UNSW Berkeley 130:51:45 9 8 Berkeley TU-Berlin 130:49:46 27 8 TU-Berlin Berkeley 130:50:11 174 8 TU-Berlin UNSW 130:48:11 218 7 UNSW TU-Berlin 130:46:38 24 5 Berkeley Stanford 124:21:55 258 7 Stanford Berkeley 124:21:19 2 6 Stanford UIUC 89:53:17 4 1 UIUC Stanford 76:39:10 74 1 Berkeley UIUC 89:54:11 6 5 UIUC Berkeley 76:39:40 3 5 Acknowledgements: Mary Baker, Mema Roussopoulos, Jayant Mysore, Roberto Barnes, Venkatesh Pranesh, Vijaykumar Krishnaswamy, Holger Karl, Yun-Shen Chang, Sebastien Ardon, Binh Thai Internet Source Architecture Destination Peering: exchange perf. info. Composed services Logical platform Peering relations, Overlay network Hardware platform Service clusters Location of Service Replicas Application plane Functionalities at the Cluster-Manager Finding Overlay Entry/Exit Service cluster: compute cluster capable of running services Service-Level Path Creation, Maintenance, and Recovery Link-State Propagation At-least Perf. -once UDP Meas. Liveness Detection Evaluation • What is the effect of recovery mechanism on application? – Text-to-Speech application Leg-2 End-Client 1 – Two possible places of failure • • • • • Text to audio Leg-1 Text Source Request-response protocol Data (text, or RTP audio) Keep-alive soft-state refresh Application soft-state (for restart on failure) 20-node overlay network One service instance for each service Deterministic failure for 10sec during session Metric: gap between arrival of successive audio packets at the client What is the scaling bottleneck? – 2 Parameter: #client sessions across peering clusters • – – – Measure of instantaneous load when failure occurs 5000 client sessions in 20-node overlay network Deterministic failure of 12 different links (12 data-points in graph) Metric: average time-to-recovery 1 Recovery time: 2963 ms Recovery time: 822 ms (quicker than leg-2 due to buffer at text-to-audio service) Recovery of Application Session: CDF of gaps>100ms Recovery time: 10,000 ms Jump at 350-400 ms: due to synch. text-to-audio processing (impl. artefact) Average Time-to-Recovery vs. Instantaneous Load • Two services in each path • Two replicas per service • Each data-point is a separate run End-to-End recovery algorithm 2 High variance due to varying path length Load: 1,480 paths on failed link Avg. path recovery time: 614 ms Results: Discussion • Recovery after failure (leg-2): 2,963 = 1,800 + O(700) + O(450) 1 – 1,800 ms: timeout to conclude failure – 700 ms: signaling to setup alternate path – 450 ms: recovery of application soft-state: re-process current sentence • Without recovery algorithm: takes as long as failure duration • O(3 sec) recovery – Can be completely masked with buffering – Interactive apps: still much better than without recovery • Quick recovery possible since failure information does not have to propagate across network • 12th data point (instantaneous load of 1,480) stresses emulator limits – 1,480 translates to about 700 simul. paths per clustermanager 2 – In comparison, our text-to-speech implementation can support O(15) clients per machine • Other scaling limits? Link-state floods? Graph computation? Summary • Service Composition: flexible service creation • We address performance, availability, scalability • Initial analysis: Failure detection -- meaningful to timeout in O(1.2-1.8 sec) • Design: Overlay network of service clusters • Evaluation: results so far – Good recovery time for real-time applications: O(3 sec) – Good scalability -- minimal additional provisioning for cluster managers • Ongoing work: – Overlay topology issues: how many nodes, peering – Stability issues Feedback, Questions? Presentation made using VMWare Emulation Testbed Rule for 12 App Node 1 Emulator Rule for 13 Lib Rule for 34 Node 2 Rule for 43 Node 3 Node 4 Operational limits of emulator: 20,000 pkts/sec, for upto 500 byte pkts, 1.5GHz Pentium-4