RMCAT - Shared Bottleneck Detection and Coupled Congestion Control vs. QoS for WebRTC Michael Welzl CAIA, Swinburne University, Melbourne 4. December 2014 Contributors • Main people behind this work: – Shared bottleneck detection: Algorithm development & evaluation: David Hayes Stuff available from: http://heim.ifi.uio.no/davihay/ – http://www.ietf.org/proceedings/91/slides/slides-91-rmcat-2.pdf – NorNet testing: Simone Ferlin (Simula Research Labs) – Coupled congestion control: Safiqul Islam Stuff available from: http://heim.ifi.uio.no/safiquli/coupled-cc/index.html – Other contributors: Stein Gjessing, Naeem Khademi 2 WebRTC in a nutshell • Multimedia services in browsers, available everywhere, without plugins – P2P (browser-to-browser); major focus on traversing middleboxes (NATs etc) è all traffic will be one UDP 5-tuple – Includes SCTP data channel (e.g. control traffic in a game) – Also a lot of IETF effort to agree on audio/video codecs • Control by web developer, via Javascript API, standardized in W3C • Protocols standardized in IETF WGs: RTCWEB (main) + RMCAT (congestion control) + DART (QoS) – Note, RMCAT charter is about media, not only WebRTC 3 WebRTC possibilities: games, toys... and talking to tech support in the browser 4 Prioritization • WebRTC needs priorities – e.g., draft-ietf-rtcweb-use-cases-and-requirements includes use case “Simple Video Communication Service, QoS” – Priorities to be exposed via JS API • Just use QoS? – Not available everywhere; some people just want to set DSCP and “hope for the best” – Needs special rules: different DSCP values for one UDP-5-tuple?! draft-ietf-dart-dscp-rtp-10 – When common shared bottleneck is known, priorities between flows from a single sender realized by coupled congestion control (QoS can still protect you from other users) 5 Coupled congestion control: background • Having multiple congestion controlled flows from the same sender compete on the same bottleneck is detrimental – Combine congestion controllers in the sender è better control fairness (with priorities) and get less delay and loss • Two elements: 1) shared bottleneck detection (sbd), 2) coupled congestion control – In WebRTC, 1) can sometimes be very easy: same 6-tuple. But measurement-based sbd enables broader application of 2) (same sender, different receivers) – apparently, “same 6-tuple” turned this into “QoS vs. coupled-CC” in the IETF 6 Shared bottleneck detection: scenarios Sender + Receiver H2 Sender Case 2 Case 3 H1 Receiver H3 Case 1 • Only case 1 can be supported by 6-tuple • Case 2 can be supported by measurement-based SBD • Case 3 requires coordinating senders H1 and H2 – Can be supported by measurement-based SBD, but needs more (H1 ó H2 coordination); not currently considered 7 SBD for RMCAT [ David Hayes, Simone Ferlin-Oliveira, Michael Welzl: "Practical Passive Shared Bottleneck Detection Using Shape Summary Statistics", IEEE LCN 2014, 8-11 September 2014. ] [ draft-hayes-rmcat-sbd-01.txt ] • Requirements: – Little signalling (also true for RMCAT CC algos) – Passive: require no changes to traffic – Reliable for long-term data/multimedia streams (e.g. video) • False positive worse than false negative – Realistic to implement as a real-time function in a browser (no expensive offline computation) 8 SBD overview • Goal: determine correlation from common queue • Receiver calculates summary statistics from OWD – To deal with noise, lag and limit feedback • Variance: Packet Delay Variation [RFC 5481] • Skewness: skew_est • Oscillation: freq_est (avg. # of significant mean crossings) • Sender: group flows experiencing congestion (skew_est); divide based on freq_est; PDV; skew_est 9 Why skewness? • Change from positive to negative near 100% load makes it a good congestion indicator • Not good to use alone: ambiguity and susceptible to extreme sample values M/M/1/K Queue 10 Practical estimation of skewness • Count proportion of samples above and below the mean • Removes ambiguity 11 Simulation setup • Background traffic based on real traffic traces – >90% • Flows 1&2 send at twice the rate of 3&4 • Various combinations of bottlenecks activated 12 13 Real network tests with NorNet 14 Coupled congestion control • When possible, best done by scheduling packet transmission from different sources with a single congestion controller – Congestion Manager (CM) • Disadvantages of this approach: – Hard to combining multiple applications (RMCAT is about RTP-based applications in general, not only WebRTC) – Difficult to switch on/off • Hence, goal of [draft-welzl-rmcat-coupled-cc-04]: achieve benefits with minimal changes to existing congestion controllers [ Safiqul Islam, Michael Welzl, Stein Gjessing, Naeem Khademi: "Coupled Congestion Control for RTP Media", ACM SIGCOMM Capacity Sharing Workshop (CSWS 2014), 18 August 2014, Chicago, USA. ] 15 Best paper award è to be published in ACM SIGCOMM CCR too. “Flow State Exchange” (FSE) • The result of searching for minimum-necessarystandardization: only define what goes in / out, how data are maintained – Could reside in a single app (e.g. browser) and/or in the OS Traditional CM FSE-based CM CM CM Stream 1 Stream 2 FSE Stream 1 Stream 2 Another possible implementation of flow coordination FSE Stream 1 Stream 2 16 First version: only passive • Goal: Minimal change to existing CC – each time it updates its sending rate (New_CR), the flow calls update (New_CR, New_DR), and gets the new rate – Complicates the FSE algorithm and resulting dynamics (e.g. need dampening to avoid overshoot from slowly-reacting flows) Update_rate() Flow 1 Store Information New_Rate Update_rate() Calculate Rates FSE Flow 2 New_Rate Update_rate() Flow n New_Rate 17 Now: FSE - Active • Actively initiates communication will all the flows – Perhaps harder to use, but simpler algorithm and “nicer” resulting dynamics Update_rate() Flow 1 Store Information Calculate Rates New_Rate FSE Flow 2 New_Rate Flow n New_Rate 18 Now: FSE - Active • Actively initiates communication will all the flows – Perhaps harder to use, but simpler algorithm and “nicer” resulting dynamics Flow 1 Store Information New_Rate Update_rate() Calculate Rates FSE Flow 2 New_Rate Flow n New_Rate 19 Active algorithm: 1st version • Every time the congestion controller of a flow determines a new sending rate CC_R, the flow calls UPDATE – Updates the sum of all rates, calculates the sending rates for all the flows and distributes them to all registered flows for all flows i in FG do FSE_R(i) = (P(i)*S_CR)/S_P send FSE_R(i) to the flow I end for • Designed to be as simple as possible 20 Evaluation • Done: simulations, rate-based – AIMD: RAP – Multimedia-oriented: TFRC • Ongoing: simulations, window-based: – Delay-based: LEDBAT – TCP (SCTP-like, for data channel) • Planned: currently proposed RMCAT CC’s – Simulations – Real-life tests (put code in browser) 21 Simulation setup F1 F1 F2 F2 R1 Fn R2 Fn • Bottleneck – 10 Mbps; Queue-length – 62 Packets (1/2 BDP); Packet Size – 1000 Bytes; RTT – 100 ms • All tests (except when x-axis = time) ran for 300 seconds, carried out 10 times with random start times picked from first second; stddev consistently very small ( <= 0.2% ) 22 Dynamic behavior: Rate Adaptation Protocol RAP ( = rate-based AIMD) 7 7 Flow 1 Flow 2 6 Sending Rate (Mbps) Sending Rate (Mbps) 6 Flow 1 Flow 2 5 4 3 5 4 3 2 2 20 21 22 23 Time (s) With FSE 24 25 20 21 22 23 24 25 Time (s) Without FSE 23 Dynamic behavior: TFRC 6 6 Flow 1 Flow 2 5.5 Sending Rate (Mbps) Sending Rate (Mbps) 5.5 Flow 1 Flow 2 5 4.5 5 4.5 4 4 20 21 22 23 Time (s) With FSE 24 25 20 21 22 23 24 25 Time (s) Without FSE 24 FSE goals • Charter: “Develop a mechanism for identifying shared bottlenecks between groups of flows, and means to flexibly allocate their rates within the aggregate hitting the shared bottleneck.” (requirement F34 in draft-ietf-rtcweb-use-cases-and-requirements-12) – This works perfectly 10 Flow 1 Flow2 Sending Rate (Mbps) 8 Priority of flow 1 increased over time 6 4 2 0 50 100 150 Time(s) 200 250 300 • But: because this avoids competition between flows, we expected reduced queuing delay and loss as a side effect 25 Average queue length (RAP) 12 FSE Without FSE 11 Average Queue Length 10 9 8 7 6 5 4 3 2 4 6 8 10 12 Number of Flows 14 16 18 20 26 Packet loss ratio (RAP) 30 FSE Without FSE 25 Packet Loss Ratio % 20 15 10 5 0 2 4 6 8 10 12 Number of Flows 14 16 18 20 27 14 14 12 12 10 10 Queue size (pkts) Queue size (pkts) What’s going on? 8 6 8 6 4 4 2 2 0 0 15 15.5 16 16.5 17 17.5 18 18.5 19 15 15.5 Time (s) With FSE 16 16.5 17 17.5 18 18.5 19 Time (s) Without FSE • Queue drains more often without FSE – E.g.: 2 flows with rate X each; flow 1 halves its rate: 2X è 1 ½X – When flows synchronize, both halve their rate on congestion, which really halves the aggregate rate: 2X è 1X 28 Fix: Conservative Active FSE algorithm • No congestion (increase): do as before • Congestion (decrease): proportionally reduce total rate (like one flow) – e.g. flow 1 goes from 1 to ½ => total goes from X to X/2 • Need to prevent that flows ignore congestion or overreact – timer prevents rate changes immediately after the common rate reduction that follows a congestion event – Timer is set to 2 RTTs of the flow that experienced congestion – Reasoning: assume that a congestion event can persist for up to one RTT of that flow, with another RTT added to compensate for fluctuations in the measured RTT value 29 20 Average Queue 15 10 FSE Without FSE 2 11 10 9 8 7 6 5 4 3 2 1 0 4 6 8 10 12 14 16 18 20 # of Flows FSE Without FSE 2 100 Link Utilization % TFRC 4 6 8 10 12 14 16 18 20 # of Flows Packet Loss Ratio 80 Link Utilization FSE Without FSE Throughput - 1 flow 60 50 2 4 6 8 10 12 # of Flows 14 16 18 20 55 50 45 40 35 FSE Without FSE 30 Receiver makes assumptions 2 4 6about 8 10 12 14 2.5 # of Flows sending rate (expected length of loss interval) è loss event ratio p 2 calculation wrong è sender too 1.5 aggressive 90 70 Average Queue Length RAP 25 Packet Loss Ratio % 30 5 Packet Loss Ratio % 60 16 18 20 1 0.5 0 FSE Without FSE 2 4 6 8 10 12 14 16 18 20 # of Flows 100 Link Utilization % Average Queue Length 35 90 80 70 60 50 FSE Without FSE 2 4 6 8 10 12 # of Flows 14 16 18 20 30 Multiple LEDBAT flows 31 How to evaluate app-limited flows? • Not easy: who is in control? • RMCAT codec model not available yet • From a transport point of view, the send buffer can either run empty or not, with variations in how quickly changes between these two states occur – We used a non-reacting video trace of a person talking in a video conference with a well-known H264 encoder (X264) to steer the app sending rate • I-frame in the beginning, rest was mostly P-frames 32 1 app-limited flow, 1 greedy flow (RAP) FSE Without FSE 16 14 12 10 8 6 4 Flow 1 Flow 2 2 0 0 5 10 15 Time (s) 20 25 Sending Rate (Mbps) Sending Rate (Mbps) 16 14 12 10 8 6 4 Flow 1 Flow 2 2 0 0 5 10 15 20 25 Time (s) FSE-controlled flows proportionally reduce the rate in case of congestion; without FSE, synchronization causes app-limited flow to over-react 33 Using priorities to “protect” the applimited from the greedy flow (RAP) 25 Flow #1 Flow #2 Link Utilization Throughput 20 15 10 5 0 40 35 30 25 20 15 10 5 Capacity High-priority (1) application limited flow #1 is hardly affected by a lowpriority (0.2) flow #2 as long as there is enough capacity for flow 1 34 2 FSE controlled flows competing with synthetic traffic (TFRC) Goodput (Mbps) • TMIX synthetic traffic, taken from 60 minute trace of campus traffic at the University of Carolina [TCP Evaluation suite] • We used the preprocessed version of this traffic, adapted to provide an approximate load of 50% 6 Flow #1 Flow #2 5 4 3 2 1 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Priority of Flow #2 Throughput ratios very close to theoretical values è FSE operation largely unaffected 35 Thank you! Questions? 36