Mobile Network Estimation Minkyong Kim, Brian Noble Mobile Software Systems University of Michigan MOBILITY Adaptive distributed systems Many systems adapt to changes in network capacity media-rich applications: web browsers, video players, … performance enhancement: caching, prefetching, … distributed systems: query planning, agent migration, … All of these systems follow the same general form observe network traffic at one or both endpoints estimate the latency, bandwidth, loss rate, … react if anything changes in an “interesting” way All of this depends on estimating network capacity well turns out to be a difficult problem MOBILITY Networks have variable performance Sources of variation in mobile, wireless networks nodes move, leading to unpredictable topology changes often more than one connection alternative physical layer subject to fading, shadowing, multi-path Sources of variation in wide-area networks bursty congestion over all time scales routing changes between autonomous systems (BGP) Typically, adaptive systems are evaluated very carefully with respect to clean, idealized network changes my own work in Odyssey is guilty as charged MOBILITY Goals of a good estimator Estimate metrics that matter to the system many network estimators focus on physical capacities link capacity is like a “speed limit” try driving the speed limit in LA during rush hour instead: measure available capacities Provide three characteristics accuracy: gives correct estimates in steady state agility: detect a true shift in capacity rapidly stability: ignore short-lived transient changes MOBILITY Current estimators: EWMA filters Most use exponentially weighted moving average filters at each time step, incorporate new observation (Ocurrent) with old estimate (Eold) using a weighted linear combination: Ecurrent = a(Eold) + (1-a)Ocurrent The term a is called the gain large gain: biases toward stability small gain: biases toward agility gain is set statically You can’t have your cake and eat it too MOBILITY A tale of two estimators TCP: a stable filter that is too stable estimates round trip time (RTT): segment, ACK stable estimator: gain set to 7/8 used to set retransmission timeout (RTO) under rapidly escalating congestion, RTO grows too slowly RTO adds “fudge factor” based on variance Odyssey: an agile filter that is too agile estimates latency and bandwidth for bulk transfers applications react to change by changing fidelity agile filter: gain set to 1/4 (latency) and 1/8 (bandwidth) transient changes leads to “tail-chasing” adaptations applications must add hysteresis to dampen transients MOBILITY The rest of this talk Introduce a simple fluid flow network model used to derive spot observations that are fed to filters Describe three filters that adapt to prevailing conditions error-based: vary gain based on quality of estimate stability-based: vary gain based on observed noise flip-flop: use a control to select an agile or stable filter Evaluate the quality of these filters subject each to a variety of networking conditions compare agility and stability to TCP, Odyssey filters MOBILITY A fluid-flow network model Our model is based on the packet-pair technique model network path as single, bottleneck link send two packets back to back from source to sink sink ACKs both packets as they are received spread between ACKs measures bandwidth along path We need both bandwidth and latency take two observations to solve for two unknowns Several subtle points depend only on passive traffic observations spot observations filter out self interference assumes symmetric network performance MOBILITY The error-based filter Problem with EWMA filters comes from static gain Instead, vary gain based on predictive quality of estimates each estimate forms a prediction for next observation at each observation, compare prediction with actual value Scale gain with the accuracy of prediction predictions that are accurate deserve higher weight if inaccurate, should converge on observation quickly Tends to ignore small changes, follow large changes MOBILITY Error-based filter in action this is trouble MOBILITY The stability-based filter The error-based filter will be “pulled” by large transients will tend towards instability during transient dips Instead, base gain on stability in recent observations moving range: difference between adjacent observations noisy observations lead to larger moving ranges Scale gain with the magnitude of the moving range when observations are noisy, each deserves less weight when observations are stable, changes more significant Tends to ignore large changes, follow small ones MOBILITY Stability-based filter in action this is trouble MOBILITY Subtleties in variable-gain filters The gain in each is based on some source metric Gain must be in the range [0..1] need some way of scaling the source metric determine the maximum {error, instability} recently seen scale current {error, instability} relative to maximum Transient changes in source metric have drastic effects smooth observed source metrics by secondary filter secondary filter has static gain (!) rather than provide tertiary filter, tune empirically Sometimes, variable-gain filters are neither agile nor stable source metric places them somewhere in the middle MOBILITY A short detour: statistical process control Suppose you had a machine that built widgets widgets specified to have some size, error tolerance How do you know your machine is building good widgets? idea: periodically grab k widgets, measure them if average size is about what you expect, things are OK if not, machine is probably out of control Formalizing this idea: the control chart population mean, m sample standard deviation, s control lines: m+3s, m-3s the 3s rule: stay inside the lines MOBILITY m+3s m m-3s The flip-flop filter Use a control chart to select for agility, stability run two static-gain EWMA filters in parallel maintain a control chart for each observation if within control limits, use agile filter (a = 0.1) otherwise, use stable filter (a = 0.9) Cannot apply simple control chart directly to this problem true mean is not known, and it changes over time sample standard deviation is not known Use approximations (individual x-chart) m follows simple smoothed estimate of observations s approximated with 2-element moving range MOBILITY Flip-flop filter in action switch to agile filter switch to stable filter MOBILITY Evaluating candidate filters Can these filters be as agile as the Odyssey filter… in recognizing a true change in link bandwidth? in reacting to the presence of cross traffic? in detecting a change in ad hoc topology? in detecting a wide-area route change? Can these filters be as stable as the TCP filter… in resisting a transient change in link bandwidth? in tolerating the presence of cross traffic? in tolerating retransmissions in ad hoc networks? in tolerating noise across a real wide-area network? Can they predict in an ad hoc network with cross traffic? MOBILITY Experimental methodology All experiments in this talk used ns, a network simulator the wide-area set are based on live network traces Extensions to support variable-link experiments script controls base physical performance of a link can vary latency, bandwidth over time Ad hoc networking simulations include Monarch extensions collision-avoidance link-level ACK, retransmission In each experiment, filters converge to same value they do not differ in accuracy only differences in agility, stability MOBILITY Link changes First set of experiments: impulse-response tests connect client, server with a single ns link vary link performance with a variant of a square wave persistent change: decrease from 10Mb/s to 1Mb/s transient change: dip from 10Mb/s to 1Mb/s and back Vary number of request/response pairs exposed to change poisson request generator, random response size Agility: measured by settle time time to reach an estimate within 10% of nominal Stability: measured by mean squared error penalizes large, short disturbances more than small, long MOBILITY Agility for step-down waveform Settle time (sec) 100 FF SF EF Ody TCP 10 1 0.1 1 2 3 4 Packets per second (avg) MOBILITY 5 Stability for impulse-down waveform Mean squared error 0.030 0.025 FF SF EF Ody TCP 0.020 0.015 0.010 0.005 0.000 1 2 3 4 Packets during transient MOBILITY 5 Cross traffic experiments Start request/response traffic between client and server at 50 seconds, inject 5Mb/s cross traffic All filters slightly optimistic in estimates not all packets see full queue delays Agility: settle time Stability: coefficient of variance congestion sink router A router B client congestion source MOBILITY server Cross traffic results: agility 6 Settle time (sec) 5 FF SF EF Ody TCP 4 3 2 1 0 Traffic On MOBILITY Traffic Off Cross traffic results: stability Coefficient of Variation (%) 25 20 FF SF EF Ody TCP 15 10 5 0 Traffic On MOBILITY Traffic Off Simple ad hoc topology changes Place three server/router nodes in a line single client walks from server to end of line, and back topology changes at each stage Agility results do not add much new information similar to congestion: TCP is bad, rest are comparable Stability results are useful coefficient of variation after settle time server node A stage 2 node B stage 1 stage 3 client MOBILITY stage 5 stage 4 Coefficient of Variation (%) Stability results: topology changes 60 50 FF SF EF Ody TCP 40 30 20 10 0 Stage 2 Stage 3 Stage 4 Stage 5 Position of mobile client MOBILITY Summary of comparisons stability agility FF MOBILITY Step Up Step Down Congestion Wide-Area Mobile Transient Congestion Wide-Area Mobile SF EF Ody TCP Acid test: predicting ad hoc performance Typical ad hoc simulation 50 nodes in 1500x500 meter space initial locations randomly distributed throughout space nodes move in random waypoint model Nodes are formed into 25 pairs one pair is our test client/server: poisson traffic remaining 24 pairs exchange CBR traffic vary rate of congestion traffic across experiments No filter does particularly well two static filters are worst performers flip-flop is best of the bunch MOBILITY Ad hoc accuracy results Average Estimated Error (s) 2.5 2 FF SF EF Ody TCP 1.5 1 0.5 0 64 128 256 512 1024 Size of CBR packets (bytes) MOBILITY 2048 Related Work S. Keshav: introduced packet-pair, bottleneck bandwidth fuzzy estimator: similar to error-based estimator analysis for rate-allocating servers (not FCFS) Packet-pair extensions Paxson: receiver-based packet pair: time at both ends Lai: receiver-only packet pair: time at receiver Active probing: Bolot, Downey, Carter & Crovella, … measurement load can be substantial Lai’s general network model, packet tailgating technique Balakrishnan’s congestion manager: unified RTT observations can benefit from our filters for better estimates MOBILITY Conclusions Adaptive systems depend on quality of measurement particularly hard to estimate network capacity Standard filtering techniques: agile or stable, but not both Adaptive filters: tune for prevailing network conditions agile when possible, stable when necessary Best alternative: flip-flop filter composition of two static-gain EWMA filters statistical process control used to select between them comparable to Odyssey’s agile filter in 4/5 scenarios comparable to TCP’s stable filter in 3/4 scenarios provides best predictions in complex ad hoc network MOBILITY Questions? Further details: http://mobility.eecs.umich.edu/ Preprint of the paper is available MOBILITY