Overview of Mesh Networking Research @ MSR Jitendra Padhye Microsoft Research January 23, 2006 What are mesh networks? • Multi-hop wireless networks • Mostly static nodes • Unplanned node placement • Applications: Disaster relief, Backhaul for city-wide wireless networks, Meeting mesh, Neighborhood Meshes, internet connection sharing • Many startups …. Three main problems in mesh networking • Capacity • Capacity • Capacity Why is capacity a problem? Source Mesh Router Destination With a single radio, a node can not transmit and receive simultaneously. A two-hop path has half the capacity of a one-hop path. Other interference patterns also possible. Seminal Result by Gupta and Kumar (2000): Capacity = O(1/sqrt(n)) MSR’s research on Mesh Network Capacity • Capacity estimation • Capacity improvement using multiple radios and other techniques • Feasibility study using realistic traffic Mesh Network Capacity Estimation • New framework for estimating capacity of multi-hop wireless networks – Gupta-Kumar result is asymptotic – Our framework calculates optimal capacity of a given mesh network for given set of flows MobiCom 2003 (Jain, Padhye, Padmanabhan and Qiu). • Our framework requires knowledge of which links interfere with one another – Problem of “conflict graph” estimation – N nodes O(N^2) links O(N^4) pairs! – We developed an approximation technique that takes O(N^2) time IMC 2005 (Padhye, Agarwal, Padmanabhan, Qiu, Rao and Zill) Key Insight: Multiple radios necessary to improve capacity Improving capacity using Multiple Radios • Select best radio to send each packet using locally available information – Multi-radio unification protocol IEEE BroadNets 2004: Adya, Bahl, Padhye, Wolman and Zhou) – Problem: sub-optimal in many cases • Optimize entire path for a given flow – Take into account interference and link capacity along entire path – Implemented in Mesh Connectivity Layer (MCL) MobiComm 2004: Padhye, Draves, Zill • If second radio has very low bandwidth, can we use it to offload signaling? – Simulation-based study of separating control and data into different frequency bands IEEE BroadNets 2005 (Kyasanur, Padhye, Bahl) How do we know how much capacity is “enough”? Feasibility study using realistic traffic • Collect traffic traces from Microsoft’s wired network • Replay on mesh testbed • Study delay characteristics of replayed traffic • Conclusions: – Factors such as specific card brands, placement of servers have significant impact, routing metrics have less impact. – 2-radio mesh network likely sufficient for supporting normal office traffic – Some large delay spikes. • MobiSys 2006 (Eriksson, Agarwal, Bahl, Padhye) Ongoing work related to capacity: • Capacity improvement using network coding • Use of directional antennas to reduce interference • Use of spectrum etiquettes and cognitive radios to improve spectrum utilization Other challanges: • Self-management – Network without administrator – is it possible? – Engineering challenges such as automatic address assignment • Security and Fairness – Freeloaders – Information leakage by observing traffic – Malicious nodes can disrupt routing Backup slides Mesh Connectivity Layer (MCL) Design & Implementation Design Choice Multi-hop networking at layer 2.5 Framework – – – NDIS miniport – provides virtual adapter on virtual link NDIS protocol – binds to physical adapters that provide next-hop connectivity Inserts a new L2.5 header Why Layer 2.5? – – – Works over heterogeneous links (e.g. wireless, powerline) Transparent to higher layer protocols. • works equally well with IPv4 and IPv6 ARP etc. continue to work without any changes Features – – DSR-like routing with optimizations at virtual link layer – Link Quality Source Routing (LQSR) Incorporates 5 different link selection metrics: – Hop count, RTT, Packet Pair, ETX, WCETT Scope: Technical Problems we looked at Range and Capacity – Off-the-shelf wireless hardware Is severely range limited – Throughput of 802.11 MAC degrades rapidly with the number of hops Our Solution: multi-radio meshbox, directional ant., NLDP, Interference management, Capacity-cal Routing – Network connectivity is highly dynamic – Classical single path & shortest path routing perform poorly in a dense network Our Solution: LQSR & MR-LQSR, WCETT (ETX, PacketPair, RTT,..) Security and Fairness – Mesh is susceptible to freeloaders and malicious users – Achieving “fairness” without topological and traffic information is difficult Our Solution: “Windows certificate", greedy behavior detection, watchdog mechanism, intrusion detection Self Management – End users are non-technical – A no-network operator model is challenging Our Solution: M3, watchdog mechanism, data cleaning, liar detection, on-line network simulation, beacon stuffing, server placement Spectrum Management – Tragedy of the commons – Exploit spectrum white space Our Solution: Control channel, dual-frequency meshes, 700-900 MHz, Spectrum etiquettes Impact of path length on throughput Experimental Setup • 23 node testbed 10000 9000 One IEEE 802.11a radio per node (NetGear card) • Randomly selected 100 senderreceiver pairs (out of 23x22 = 506) 8000 Throughput (Kbps) • 7000 6000 5000 4000 3000 2000 1000 0 • 3-minute TCP transfer, only one connection at a time Solution: Multi-Radio Meshes 0 1 2 3 4 5 6 Byte-Averaged Path Length (Hops) If a connection takes multiple paths over lifetime, lengths are byte-averaged Total 506 points. Link Selection Metrics Many metrics have been studied in literature – – – – – – – – – – Hop count Round trip time Packet pair Expected data transmission count incl. retransmission Weighted cumulative expected transmission time Signal strength stability Energy related Link error rate Location related … The ones in red are implemented in MCL Link Selection Metric for Single Radio: ETX • Each node periodically broadcasts a probe • The probe carries information about probes received from neighbors • Each node can calculate loss rate on forward (Pf) and reverse (Pr) link to each neighbor • Selects the path with least total ETX ETX 1 (1 Pf) * (1 Pr) Advantages – Explicitly takes loss rate into account – Implicitly takes interference between successive hops into account – Low overhead Disadvantages – PHY-layer loss rate of broadcast probe packets is not the same as PHY-layer loss rate of data packets Broadcast probe packets are smaller Broadcast packets are sent at lower data rate – Does not take data rate or link load into account Developed by De Couto et al @ MIT (2003) Baseline comparison of Metrics Single Radio Mesh Experimental Setup Median path length: HOP: 2, ETX: 3.01, RTT: 3.43, PktPair: 3.46 • 23 node testbed 1600 One IEEE 802.11a radio per node (NetGear card) • Randomly selected 100 sender-receiver pairs (out of 23x22 = 506) • 3-minute TCP transfer, only one connection at a time 1400 Median Throughput (Kbps) • 1200 1000 800 600 400 200 0 HOP ETX RTT PktPair ETX performs the best Link Selection Metric for Multiple Radios: WCETT State-of-art metrics (shortest path, Packet Pair, RTT, ETX) do not leverage channel, range, data rate diversity Multi-Radio Link Quality Source Routing (MR-LQSR) – Link metric: Expected Transmission Time (ETT) Takes bandwidth and loss rate of the link into account – Path metric: Weighted Cumulative ETTs (WCETT) Combine link ETTs of links along the path Takes channel diversity into account – Incorporates into source routing Developed by Draves, Padhye et al @ MSR(2004) Expected Transmission Time (ETT) Given: – – – – Loss rate p Bandwidth B Mean packet size S Min backoff window CWmin Takes bandwidth and loss rate of the link into account ETT ETxmit ETbackoff where, ETxmit S B(1 p) i 7 f(p) 1 2 (i 1) p i i 0 ETbackoff CWmin f(p) 2(1 p) WCETT = Combines link ETTs Need to avoid unnecessarily long paths - bad for TCP performance - bad for global resources Given a n hop path, where each hop can be on any one of k channels, and two tuning parameters, a and b: a* ETT b* max WCETT n i 1 All hops on a path on the same channel interfere – Add ETTs of hops that are on the same channel – Path throughput is dominated by the maximum of these sums i 1 j k Xj a b where Xj ETTi hop i is on channel j Select the path with min WCETT Baseline Comparison of Metrics Two Radio Mesh Experimental Setup Median path length: HOP: 2, ETX: 2.4, WCETT: 3 • 23 node testbed Median Throughput of 100 transfers • 3-minute TCP transfer • Two scenarios: – Baseline (Single radio): 802.11a NetGear cards – Two radios 802.11a NetGear cards 802.11g Proxim cards 3500 2989.5 3000 Throughput (Kbps) • Randomly selected 100 sender-receiver pairs (out of 23x22 = 506) Single Radio Two Radios 2500 2000 1601 1379 1500 1508 1155 844 1000 500 0 WCETT ETX Shortest Path WCETT utilizes 2nd radio better than ETX or shortest path Path Length and Throughput Which metric is best? WCETT Experimental Setup ETX HOP 3.5 • • 23 node testbed Randomly selected 100 senderreceiver pairs (out of 23x22 = 506) 3-minute TCP transfer (transmit as many bytes as possible in 2 minutes, followed by 1 minute of silence) For 1 or 2 hop the choice of metric doesn’t matter 2.5 2 1.5 1 0.5 0 A C WCETT D ETX E HOP F Testbed Configuration 4000 Throughput (Kbps) • Hop Length 3 3500 3000 2500 2000 1500 1000 500 0 A C D E F Comparison of Metrics Wireless Office Scenario 23 node indoor testbed. Two radios (both 802.11a) per node. 11 active clients, 4 servers. Heavy Office Traffic 1 hour, 308 sessions, 587.5 MB total Light Office Traffic 1 hour, 415 sessions, 19.72 MB total 10000 1000 474 100 10 89 120 179 82 11 4 6 4 5 3 8 3 6 ETX HOP PKTPAIR 1000 RTT 590 862 943 31 30 3 3 ETX HOP 100 27 10 4 2 1 WCETT Additional Delay (ms) Additional Delay (ms) 10000 1 WCETT PKTPAIR Relatively light traffic means performance is okay for all metrics. WCETT does better under heavy load (worst case delay) RTT Management: Resiliency against Liars/Lossy Links • • Identify nodes that report incorrect information (liars) Detect lossy links Assume • • Nodes monitor neighboring traffic, build traffic reports and periodically share info. Most nodes provide reliable information Simulation Results Detect liars Fraction of lying nodes identified Problem 1 0.8 0.6 0.4 0.2 0 NL=1 NL=2 Challenge • Watchdogs Find the smallest number of lying nodes to explain inconsistency in traffic reports Use the consistent information to estimate link loss rates NL=10 NL=15 NL=20 false positive Detect lossy links Fraction of lossy links identified • • NL=8 coverage Wireless links are error prone and unstable Approach NL=5 1 0.8 0.6 0.4 0.2 0 NL=1 NL=2 NL=5 NL=8 NL=10 NL=15 NL=20 coverage false positive