Delay-Tolerant Networks Acknowledgements: Most materials presented in the slides are based on the tutorial slides made by Dr. Ling-Jyh Chen, Dr. Kevin Fall and Dr. Thrasyvoulos Spyropoulos. “Legacy” Networks Internet, Telephone network Wired or fixed links A SUCCESS STORY! Wireless Networks: Cellular Cellular Networks: Wired backbone + wireless last link Wireless Last Hop Wired Backbone A SUCCESS STORY for voice/SMS! Internet? (GPRS): not really (low bandwidth + high price) Wireless Networks: WiFi 802.11, wimax Still: only wireless local-loop Higher bandwidth than cellular: 54Mbps Much cheaper/KB Wireless Networks: WiFi (2) Only Partial Coverage: HOTSPOTS No real “mobile computing”! Wireless Networks: Ad-hoc and Sensor Networks Self-organized: no wired infrastructure Peer-to-peer: nodes are routers Examples: sensor nets; disaster recovery, etc. Disaster Recovery Target Tracking Wireless Networks Ad Hoc and Sensor Networks (2) The past approach: “apply the successful and well understood Internet paradigm to ad hoc networks also” Assume existence of explicit links (strong enough SINR) Establish end-to-end paths End-to-end path S D node link Mobility might change these paths: re-establish them Wireless Networks Ad Hoc and Sensor Networks (3) Ad-hoc Networks: A success story? NOT REALLY! No real ad hoc application (killer app) out there except maybe some military networks Why? Most wireless networks are NOT like the Internet! The “Internet” Assumptions E2E path doesn’t have really long delay Reacting to flow control in ½-RTT effective Reacting to congestion in 1-RTT effective E2E path doesn’t have really big, small, or asymmetric bandwidth Re-ordering might happen, but not much End stations don’t cheat Links not very lossy (<1%) Connectivity exists through some path Even MANET routing usually assumes this More Internet Assumptions Nodes don’t move around or change addresses Easy to assign addresses in hierarchy Thought to be important for scalability In-network storage is limited Not appropriate to store things long-term in network End-to-end principle Routers are “flakier” than end hosts Non-Internet-Like Networks Random and predictable node mobility Military/tactical networks (clusters meet clusters) Mobile routers w/ disconnections Big delays, low bandwidth (high cost) Satellites Exotic links (deep space comms, underwater acoustics) Big delays, high bandwidth Busses, mail trucks, delivery trucks, etc. Challenged Networks Intermittend/scheduled/opportunistic links High error rates/low usable capacity Very large delays Different network architectures Characteristics 1: Path and Link characteristics High latency, low data rate e.g. 10 kbps, 1-2 second latencies Asymmetric data rates Disconnection Non-faulty disconnections • Motion • Low-duty-cycle operation Routing subsystem should not treat predictable disconnections as faults and can use this information to pre-schedule messages Long queueing times Conventional networks rarely greater than a second Challenged network could be hours or days due to disconnection Characteristics 2: Network Architectures Interoperability considerations Networks may use application-specific framing formats, data packet size restrictions, limited node addressing and naming etc. Security End-to-end approach not attractive • Require end-to-end exchanges of keys • Undesirable to carry traffic to destination before authentication/access control check Characteristics 3: End System Characteristics Limited longevity Round-trip time may exceed node’s lifetime making ACK-based policies useless Low duty cycle operation Disconnection affects routing protocols Limited resources Affects ability to store and retransmit data due to limited memory IP Routing May Not Work E2E path may not exist Lack of many redundant links Path may not be discoverable (e.g., fast oscillations) Traditional routing assumes at least one path exists, fails otherwise Routing algorithm solves wrong problem Wireless broadcast media is not an edge in a graph Objective function does not match requirements • Different traffic types wish to optimize different criteria • Physical properties may be relevant (e.g., power) IP Routing May Not Work E2E path may not exist Lack of many redundant links Path may not be discoverable (e.g., fast oscillations) Traditional routing assumes at least one path exists, fails otherwise Routing algorithm solves wrong problem Wireless broadcast media is not an edge in a graph Objective function does not match requirements • Different traffic types wish to optimize different criteria • Physical properties may be relevant (e.g., power) Inter-Planetary Internet (IPN) Networking in Space Existing satellite networks for deep space missions: Proprietary, not that efficient, one for each mission NASA/JPL: “Extend the idea of Internet in outer space” One reusable network for all missions Gain from experience already acquired Extending the Internet in Space Long Propagation Delays vs. “Chatty” Internet Protocols Propagation Delay is much larger than transmission time! (minutes around our solar system) Internet protocols are “chatty” TCP: S: “Hi! You want to talk?” (SYN) 20min R: “Sure! Let’s establish a session” (SYN+ACK) 20min S: “Ok, let’s go for it!” (ACK) 20 min ….. (slow start phase) S: “Can you send me the pic of Mars?” ….. TCP chatiness More than 3h for one 1MB pic! transmission time (1MB/128Kbps) = 1min !!! Idea: “Bundles” Bundle: Application-meaningful message Contains all necessary info packed inside one “bundle” (atomic message) Next hop has immediate knowledge of storage and bandwidth requirements Optional ACKs Depending on class service Goal: Avoid chattiness Minimize number of propagation delays “paid” Intermittent Connectivity No more links! Now we have “contacts” Contact 1: “Dish A sees earth Sat B from 12:30h to 12h:45h” Contact 2: “Sat B sees rover C on mars from 17:30h to 18:30h” Idea: Store-Carry-and-Forward Store a bundle for a looong period of time. Forward when the next contact is available Hours or even days until appropriate contact. Postal system: “move packages from one storage place to another (switch intersection), along a path that eventually reaches the destination” How is this different from Internet routers’ store-andforward? 1) Persistent storage (hard disk, days) vs memory storage (few ms) 2) Wait for next hop to appear vs. wait for table-lookup and available outgoing routing port Store-Carry-and-Forward (2) 1 12 D 13 S 14 2 16 11 3 4 15 7 5 8 10 Store-Carry-and-Forward (3) Store-Carry-and-Forward (4) DTN vs End-to-end Internet Operation Networking in Space Heterogeneity Heterogeneous networks to interconnect Link delay, asymmetry, error rate, reliability mechanism Different protocol stack + Different node capabilities Examples: Earth’s Internet: short delays, low error rate, TCP reliability Sensor network at Mars: short delays, high error rate, data aggregation at sink(s) Satellite backbone: long delays, high error rate, LTP (lightweight transport protocol) Boundles: A Store and Forward Overlay What About Retransmission? Custody Transfers Error rates can be high in wireless links What if a retransmission is needed? Contact 1: “Dish A sees earth Sat B from 12:30h to 12:45h” Contact 2: “Sat B sees rover C on mars from 17:30h to 18:30h” Contact 3: “Dish A sees Sat B again in one week” It’s better that B takes “custody” of message and retries sending it itself Custody Transfer (2) Custody Transfer (3) Moving the Retransmission Point Closer Benefits of hop-by-hop vs. end-to-end error control For paths with many lossy links re-Tx requirements are much higher for end-to-end (linear vs. exponential) E.g. 3 links each with error 1-p: (hop-by-hop) 3/p extra bandwidth (end-to-end) 3/(p^3) extra bandwidth Retransmission overhead is increased by long propagation delays Regions and DTN Gateways DTN gateways are interconnection points between dissimilar network protocol and addressing families called regions e.g. Internet-like, Ad-hoc, Mobile etc. DTN gateways Perform reliable message routing & security checks Store messages for reliable delivery Resolve globally-significant name tuples to locally-resolvable names for internal destined traffic Name Tuples: two variable length portions Region name • Globally-unique hierarchically structured region name • Used by DTN gateways for forwarding messages Entity name • Resolvable within the specified region, need not be unique outside it E.g. { internet.icann.int, http://www.ietf.org/ } Naming Delay Tolerant Networks (DTN) Kevin Fall (~2002): “maybe these idea is not only useful for deep space networks” DTN: Very Brief History DTNRG chartered as IRTF research group (end of 2002) Chair: Kevin Fall (Intel Research Berkeley) Architecture evolved from deep-space-focused Interplanetary Internet project Funded by DARPA 1999-2002 IRTF Group IPNRG retired when DTNRG formed DTN became a DARPA program in 2004 11+ Internet draft Implementation: simulator (DTNSIM) and Linux codes (DTN2) Intermittent Connectivity: The Technical Argument Wireless links are not like wires! End-to-end path S node link D Intermittent Connectivity: The Technical Argument Intermittent Connectivity may appear because of: p propagation effects: shadowing, deep fades B X A B B Intermittent Connectivity: The Technical Argument(2) Intermittent Connectivity may appear because of: Propagation effects, shadowing, deep fades Mobility: paths change too fast; huge overhead for maintenance C C A B Intermittent Connectivity: The Technical Argument(2) Intermittent Connectivity may appear because of: Propagation effects, shadowing, deep fades Mobility: paths change too fast; huge overhead for maintenance Power: nodes shut down to save power or “hide” Save power (e.g. sensor) A C B Low probability of detection (LPD) (e.g. army node) Intermittent Connectivity The Economical Argument Maybe it’s cheaper to not assume connectivity rather than enforce it Rural areas (countryside, freeways) : overprovision of base stations? OR just live with a sparse network and “episodic” connectivity? Sensor Networks (attached on animals): Enough Tx power for connectivity? ($100/node) Very low power nodes? (e.g. RFID, $1/node) Wireless Connectivity: A Different View End-to-end path S S node link S X X D DD path X path disruption! disruption! Applications: Sensor Networks for Habitat Monitoring ZebraNet (Princeton) Biologists want to learn animal habits Size of herds Mobility patterns (running, sleeping, grazing) Daily habits (watering) Attach “tracking collars” on animals Current technology surprisingly inefficient Satellite trackers: high energy, low bit rate GPS trackers: often have to retrieve collar for data Sensor nodes with wireless radios? Applications: Sensor Networks for Habitat Monitoring (2) Herd of zebras (range of few meters) Herd of zebras (range of few meters) Z Z Z Z Z Z Z Z Z Z base station Increase power for connectivity? Considerably reduce lifetime of network! (power law) What about obstacles? Live with a sparse network (connected clusters) Use DTN principles to carry traffic towards sink Vehicular Networks “Drive-Thru Internet” Vehicle-to-roadside (base station, sensors) Vehicular Networks “Drive-Thru Internet” (2) Internet email reply send email send email email reply write email Asynchronous operation: OK for e-mail! Web caching; Local information; download news Enough bandwidth even at high speeds! Vehicular Networks (VANETs) Vehicle-to-Vehicle Networks Accident Prevention Traffic Reports Can be combined with Vehicle-to-Roadside Why Vehicular DTN Networks? Gradual deployment => initially sparse network Even dense deployments: Paths change too fast! Before enough time to be discovered An example UCLA’s Vehicular Sensor Network Internet to Remote Communities Internet to underdeveloped countries/remote villages Rural Kiosks (shared among villagers) Sell/buy agricultural products Banking/Transactions with government Land Titles (Hernando Soto) Satellite: low bandwidth, expensive Microwave links: expensive, unreliable(?) Dial-up: low bandwidth, unreliable (?) Power network: UNRELIABLE! Internet to Remote Communities (2) Email, cached/asynchronous services Use: Village bus, postman’s vehicle, passing cars Equip with radio, antenna, and storage Use: dial-up, satellite, microwave links when available Internet to Nomadic Communities The SAAMI nomadic community of Lapland Application: Underwater Networks Acoustic signal: short range; longer prop delays Environmental sensors: Information collected by mobile base stations, or even animals equipped with transceivers (e.g. whales) Tactical (military) Networks Communicating beyond enemy lines Need to retain connectivity despite jamming, losses Powering down nodes (LPD/LPI) Ad-Hoc Networks (revisited) DTN is not only for “extreme” networks Maybe it can be used to achieve real “mobile computing” without the need for a connected network Why? Hotspots Now we have to “look for” the hotspot Mobile computing = the user moves until he can compute!! Extend Access Point (WiFi) connectivity with ad-hoc subnetworks Data maybe available at local peers Establish a peer-to-peer network between local nodes Local news/info may be available at a node nearby Peer-to-peer wireless Pocket Switched Networks HAGGLE project (www.haggleproject.org) Conference Campus Summarizing: Delay Tolerant Architecture for Wireless A necessity: Deep space communications, underwater networks Remote, underdeveloped areas A choice: Sensor networks Vehicular networks Extension: Peer-to-peer wireless Protocol Design: A Paradigm Shift Current protocols are problematic for “challenged environments” Too many assumptions do not hold Need new protocols that take the realities of these emerging wireless environments as starting points; no ad-hoc fixes Security and Application Issues Security: avoid using infrastructure Public Key: need a connected server which will map name-to-public-key Reputation Systems: revoking a certificate might take a very long time Application: must be delay tolerant Network is delay tolerant; what about users?? Applications, interfaces with persistence More about Security Issue Problems: Secure opportunistic channel establishment Mutual opportunistic authentication Protection from overrun entities PKI works poorly if connectivity is poor Approach using Hierarchical Identity Based Crypto (HIBC) IBC: generate public key based on a string (e.g., address) but private key must be generated by private key generator HIBC: cooperating hierarchy of PKG’s No lookup required to find disconnected node’s public key More about Security Issue (2) Bootstrap New user communicates w/PKG over secure channel to get initial key pair Can also used tamper-resistant device Reversal of accumulated source route used for PKG to reach new node Use of Time Add datastamp to public key ID’s helps to minimize compromise time if device is lost Time-based keys instead of CRL’s (Certificate Revocation List) • Fail-safe vs fail-insecure (CRLs) Routing Legacy Routing Graph: G = {V,E} V: set of nodes E: set of links w(e): E→ cost function (capacity, energy, queue size) Routing (S,D): path PSD = {v0,…,vi,…,vN: vi V, v0 = S, vN = D} such that eii+1E and min w(e ii1 ) PSD i Legacy Routing Proactive Protocols (table-driven) Link-state, distance vector Obtain global topology information (Topology Updates) Dijkstra’s, Bellman-Ford algorithm Calculate minimum cost paths Distributed algorithms Dijkstra’s algorithm Shortest paths from A to V-{A} Initialization: cost C(A)=0, C(v) = ; set Q = {empty} Loop: pick v Q: C(v) is minimum; Q = Q + {v} if C(v) + wvj < C(j) => C(j) = C(v) + wvj Terminate: when Q = V Example of Dijkstra’s Algorithm Step 1 Step 2 L(B)=4 L(B)=4 B 4 B 2 L(A)=0 1 A 4 D L(D)= L(A)=0 1 A 3 6 2 C L(C)=6 Step 3 L(C)=5 Step 4 L(B)=4 L(B)=4 B 4 B 2 L(A)=0 1 A C L(C)=5 4 D 3 6 L(D)=6 3 6 C D L(D)=6 2 L(A)=0 1 A D 3 6 C L(C)=5 L(D)=6 Legacy Routing Reactive Protocols DSR, AODV Step 1) Flood Route Request message (RREQ) Step 2) Nodes that forward RREQ append their ID on header Step 3) The path that reaches D first = “shortest path” Step 4) Send back Route Reply (RREP) with reverse path from that found in header If path breaks Repeat route discovery Or fix locally if other subpaths available are known (route maintenance) Legacy Routing for DTN Proactive Routing S (DSDV, OLSR) UPD Flood Periodic Topology Updates (UPD) S learns next hop to D D UPD UPD UPD reaches only same cluster as D! Reactive Routing REQ S (DSR, AODV) Flood Route Request (REQ) S waits for reply from D REQ reaches only same cluster as S! REQ REQ REQ D UPD UPD DTN Routing Graph is disconnected and/or time-varying G(t) = {V, E(t)} G = {V,C}, C = set of contacts ci ci = {vi,vj,tstart,tfinish,bandwidth,prop. delay,…} Types of Contacts Scheduled contacts E.g. satellite links, message ferry All info known Probabilistic contacts Statistics about contacts known E.g. mobility model, or past observation+prediction Bus relay, sensors with random wake-up schedule Opportunistic contacts Not known before it occurs E.g. a tourist car that happens to drive by the village Routing: Scheduled Networks DTN Routing for Scheduled Contacts Problem Setting: Set of contacts ci Set of storage capacities bci:vi V → Set of messages mi = {s,d,t,m} Future traffic demand Evaluation Metrics Messages Delivered Average Delay (why?) Connected with message drops Connected with throughput Knowledge Oracles Problem 1) Assume we know data about (“oracle”) Contacts Summary (Oracle) Statistics about all contacts (frequency, duration, capacity); e.g. contact time cij occurs every T minutes Contacts (Oracle) Specific info about all contacts; e.g. contact cij(t1), cij(t2), cij(tn) Queuing (Oracle) Info about all queue sizes Q(nij,t) (all nodes and all times) Traffic Demand Oracle Info about all future traffic demand m1 = {s1,d1,t1,m1}, m1 = {s2,d2,t2,m2},etc. Problem 2) Implement each oracle (centralized/distributed) Routing Algorithm Classes Zero Knowledge No oracles used; only current/local view available Worst-case performance (baseline) Complete Knowledge All oracles used + buffer (resource) information Optimal performance (for comparison only) Partial Knowledge Explore tradeoffs of using only some of the available oracles Routing with Zero Knowledge Oracles used: None Algorithm: First Contact Look at currently available contacts Choose one in random or first that comes up Performance: Random Routing Random walk on time-varying connectivity graph Cycles, oscillate between nodes, dead-end Routing with Partial Knowledge Computing minimum cost (“shortest”) paths Delay: Transmission Propagation Queuing = Waiting for contact + waiting for queue to drain Link weight w(e,t) = message arriving at edge e at time t, is predicted to arrive at end of e at time t + w(e,t) Modify Dijkstra’s algorithm Minimum Expected Delay (MED) Algorithm Oracles used: Contact Summary Edge cost = average waiting time average contact wait + transmission + propagation Regular routing => minimize average path delay Downsides: No reaction to congestion Ignores a good link even if it is available Dijkstra with time-varying costs Pseudo-code Dijkstra with time-varying costs (2) Message size = m Edge Capacity = c(e,t) Edge Propagation Delay = d(e,t) Queue backlog = Q(e,t,s) w(e,t) = w’(e,t,m,s) = t’(e,t,m,s) – t + d(e,t’) t" t' (e, t, m, s) min{t" | c(e, x)dx (m Q(e, t, s)} t Dijkstra’s with Time-varying Costs Example Step 1 Time = 0 L(B)=5 B cAB=(5,7),(13,16),(20,22)… cBD=(3,4),(11,15),(26,28)… wAB(0) = 5 L(A)=0 A cBC=(7,10),(14,15),(26,30)… D L(D)= wAC(0) = 9 cAC(9,10),(14,17),(25,26),… cCD=(6,7),(13,15),(23,25)… C L(C)=9 Dijkstra’s with Time-varying Costs (2) Example Step 1 Time = 5 L(B)=5 B cBD=(3,4),(11,15),(26,28)… cAB=(5,7),(13,16),(20,22)… wBC(5) = 2 L(A)=0 A wAC(5) = 6 cBC=(7,10),(14,15),(26,30)… D cAC=(9,10),(14,17),(25,26),… L(D)= L(D)=11 cCD=(6,7),(13,15),(23,25)… C L(C)=9 L(C)=7 Earliest Delivery (ED) Oracles used: Contacts Q(e,t,s) = 0 Ignores queuing info Ignores buffer occupancy Source routing ED is optimal if: 1. Low traffic rates (e.g. 1 msg) 2. Or infinite bandwidth and buffer Problems If an edge is missed due to lack of bandwidth => may result in disastrous behavior Earliest Delivery with Local Queuing (EDLQ) Oracles used: Contacts PLUS: look at local queues for choosing paths: e = (s,*) Q(e,t,s) = data queued for e at time t otherwise Q(e,t,s) = 0 Problems: Buffer overflow Potential loops (not consistent topology view between nodes when running Dijkstra) Earliest Delivery with Global Queuing (EDAQ) Oracles used: Contacts, Queuing Q(e,t,s) = data queued for e at time t at node s Source routing Requires bandwidth reservation (ensure that no later arrivals change the experienced queue size) How is this to be implemented? Current queuing knowledge depends on reservations up to now Still no bandwidth Variations and Practical Considerations Re-computing routes for ED (earliest delivery) Message might miss contact due to queuing If missed => re-compute remaining shortest path (at intermediate node) Implementing queuing oracle with local info Local queuing keeps track of messages it forwards and their path Extrapolate (expected) queue sizes at other nodes (based on capacity and traffic assumptions) Message/Path splitting Message fragmentation Multi-path routing (e.g. for MED algorithm) Routing with Complete Knowledge What are we missing?? Buffer constraints Future traffic demand How do we solve this? Multi-commodity flow problem: balance flows over links Dynamic version: balance flows over contacts We can formulate a Linear Program for the problem in hand note: variable space might grow exponentially Routing with Complete Knowledge (2) Many ideas from graph theory and network flow problems Optimize some metric (e.g. average path cost) While abiding to constraints (e.g. link/buffer capacities) Transport Networks with time-varying graphs Quickest transshipment of cargo with time-varying links (e.g. a periodic cargo flight) Dynamic Network Flows Rather difficult problems in general Performance Comparison A network of (20) city buses with radios Varying traffic load Conclusion 1: ED(-,LQ,AQ) algorithms better Conclusion 2: ED algorithm optimal for small loads Performance Comparison (2) Large bandwidth => ED is optimal Small bandwidth => ED closer to MED Performance Comparison (3) Higher transmission range => more contacts => easier to route Smaller buffer space => ED* schemes perform better Performance Comparison (4) Practical Routing for DTNs How to implement Oracles The contact oracle: No need to assume full knowledge MED: expected contact delay (average over all future contacts) MEED: estimate future contact delay, based on past contacts (sliding window) MEED Algorithm (Minimum Estimated Expected Delay) Keep history of past contacts Maintain running average Sliding window Large window => slow reaction to changes Small window => too many updates, oscillations Link-state epidemic dissemination Whenever a contact changes significantly (x% form previous estimate) => flood topology update packet Link-state Topology => Epidemic Dissemination Message vector i Table with topology updates from nodes NSi Two nodes meet: exchange message vectors NSA and NSB Exchange topology updates not in common until NSA=NSB Flood new topology updates further Calculating the Routing Path Eventually topology updates from all nodes (global topology) – not all equally “fresh” Source Routing? Intermediate hops might have more recent info than source Hop-by-Hop Routing? What if an infrequent contact (large expected wait) arrives first? Per-contact routing = assign current contact cost 0 Example of MEED routing Link AB (path ABD) are better on average than link AC (path ACD) But if at time t link AC is up, then ACD is better! (per contact routing) Link-state DTN Routing: Conclusion Link-state overhead: O(N2) If node mobility not restricted everyone sees everyone else, eventually - Can be an interesting approach IFF: Nodes are static: e.g. sensor with wake-up schedule Topology changes infrequently/network is dense BUT: If mobility pattern does not have enough structure (e.g. IID) then it degenerates to random forwarding Extensions? How to extend to keep track of 1) 2) average queuing average traffic requirements Approximate other algorithms EDLQ EDAQ LP? Message Ferrying A sparse network of “production” nodes Nodes may be static (e.g. sensors) => how to bridge partitions? Nodes may be mobile, but slow => long delays waiting for a contact to occur may take time Solution: Use specialized nodes (DataMules or Message Ferries) to carry traffic between production nodes Ferries are always mobile No energy considerations Message Ferrying 1. Enforce Ferry Trajectory Robots, unmanned aerial vehicles (UAVs) Li al ‘03, Zhao et al ’04 S DataMule DataMule DataMule D DataMule The problem: design optimal trajectories Message Ferrying 2. Use Existing Trajectories Scheduled mobility: Uncontrolled but predictable mobile nodes (e.g. city buses) Jain et al. ’04 S D Predict ferry mobility Optimal use of available ferry bandwidth Production node trajectory Message Ferrying: The Problem Space Ferry mobility 1. Designed for non messaging reasons (buses) 2. Optimized for message transfer (robots) Production node mobility Number of ferries Single vs. Multiple ferries Ferry relaying: Static vs. Mobile Yes/No Node Relaying Node-to-ferry vs. node clustering Ferries for non-messaging reasons No explicit trajectory design + known schedules => could apply principles from earlier presented algorithms (e.g. ED, MED, etc.) No trajectory design + no/limited knowledge of schedules => use opportunistic routing, e.g. epidemic (later) Focus on trajectory design cases Static Nodes + Single Ferry bij = traffic (rate) requirement from node i to j Ferry route L of length |L| Ferry speed f: ferry cycle T = |L|/f d ijL = average delay for traffic from i to j Wait for ferry: T/2f Upload data (queuing at node): f(ferry in range, upload rate) Wait for destination (on ferry): T/2f Download data to recipient: f(ferry in range, download rate) dL L b d ij ij i, j b ij i, j average delay for all traffic Static Nodes + Single Ferry (2) Problem: find trajectory L, such that: - min d L L ij (Delay Problem) (Bandwidth Problem) - while satisfying traffic matrix B = {bij} Delay Problem Assume infinite/enough bandwidth for bij All data uploaded when encountered min d L L ij ,such that L passes by all nodes If bij = bji => dL= |L|/f Delay Problem = Traveling Salesman Problem (NP-complete) Step 1: TSP approximation algorithms Step 2: Local optimization Traveling Salesman Problem Given a (connected) weighted graph Find a path that: Visits all nodes exactly once And has a minimum cost Bandwidth Problem Increase route (local detour) to satisfy bandwidth requirement Tx rate Path extension for i (x i 2r)W si | L | x j Traffic demand of i (per cycle) j Minimize amount of increase (Linear Program) minimize x i i subject to Wx i s j x j si | L | 2rW, j xi 0 Optimal Trajectory Design: The online problem Previous case: traffic requirements known in advance => offline, optimal solution What if traffic requests arrive on-demand Problem: design trajectory to optimally serve existing requests Minimize message drop rate Minimize expected delivery delay Mobile Nodes + Single Ferry Ferry has a predefined route, which is known Nodes decide when to move close to the ferry to upload data (Node-Initiated Message Ferrying, NIMF) Task (e.g. sensing) Receiver Mobile Nodes + Single Ferry (2) Goal 1: minimize time not performing task E.g. time moving = time not sensing Goal 2: minimize message drop ratio While “working”, outgoing messages accumulate in buffer => buffer overflow While not going to ferry, incoming messages accumulate in ferry => buffer overflow Messages have TTL => if not delivered in time they are dropped When to Move Towards Ferry? Keep msg drop rate low: Di(t) = msg drop rate at i (outgoing) Df->i(t) = msg drop rate for i at ferry (incoming) Gi = msg arrival rate at i Gf->i = msg arrival rate at ferry for I (Di(t) + Df->i(t))/(Gi+ Gf->i) > (condition 1) Keep fraction of time not performing task low: (task time)/(total time) > w (condition 2) Shortcomings of NIMF What if node task is correlated with message delivery? e.g. task = sensing data that needs to be periodically transmitted to a sink Conditions 1 and 2 may not be able to be satisfied at the same time! WHY? How are the nodes mobile? Robots? A person decides to move close to the bus? Static Nodes + Multiple Ferries Case 1: No ferry interaction Case 2: Ferry relaying Ferries can exchange data with each other Synchronization between ferries Case 3: Node relaying Node overhead for storing inter-relay traffic Ferry Trajectory Design Phase 1: Assign nodes to ferries Phase 2: Choose path for each ferry Phase 3: Fine tune route to meet traffic demand Single-Route Algorithm (SIRA) All nodes follow the same route Constant speed and distance No interaction Phase 1: all nodes to all ferries Phase 2,3: similar to single ferry Ferries step 1: Traveling Salesman approximation step 2: Local delay optimizations (waitm = wait1/m) step 3: minimum route extension to satisfy traffic Multi-Route Algorithm (MURA) Different Routes + no Relaying Algorithm: Step 1: assume n ferries – assign one to each node Step 2: estimate ED (expected delay) and reassign until m ferries and ED minimum Step 3: refine assignment for end-to-end feasibility Step 4: calculate optimal route for each ferry independently Estimating ED (expected weighted delay) Calculate weighted delay per route Say route with k relays Route delay is a tuple (E*,E’) E* = excess capacity E’ = expected delay if capacity is met a = total data rate = service rate of route = 0.5 k W L(1 a μ), if a μ E if a μ 0, * if a μ 0, 1 a E' L(1 )(1 ), if a μ k μ-a (Re)assigning Nodes to Routes Re-assign based on 4 operations – goal is to get m ferries and minimum ED Op.1) overlap (i,j): extend one route to include node of other Op.2) merge (i,j): combine routes i,j into one; ferries = ki+kj Op.3) merge-(i,j): combine routes i,j into one; ferries = ki+kj-1 Op.4) reduce(i): ki = ki - 1 (Re)assigning Nodes to Routes The algorithm Problem 1: sender-destination not in same route Problem 2: route traffic demand > route capacity Continue overlap/merge until assignment is feasible…OR Node Relaying Algorithm (NRA) Multi-hop routing: node S ferry fi node r ferry fj node D Bound number of hops to maintain throughput (Gupta et. al) Overhead on relaying nodes Node Relaying Algorithm (NRA) (2) For each S-D pair nij: geographic routing => path of cells (e.g. C2,C3,C4) Overlap operation between Cx,Cy => shared node is relay Assign ferries: 1 to each cell -> add extra ferry to highest EWD Ferry Relaying Algorithm (FRA) Data is relayed between ferries => no node relaying Similar to NRA algorithm…until last step After routes are calculated per cell, need to synchronize between cells (not easy) Performance Analysis with Multiple Ferries Some simulation results show that MURA (nonrelaying) has the best performance Is it because of the extra resources required by message relaying? Is it because of the specific algorithms chosen for relaying (i.e. could find better ones) Does it depend on traffic pattern? if uniform traffic, and no traffic weights, wouldn’t MURA routes need to cover ALL nodes?? Multiple Ferries with Independent but Known Routes Ferry mobility is not related to data delivery (e.g. bus of networks) Hence, it cannot be changed Calculate inter-ferry contacts based on their mobility schedules Apply algorithms like MED, ED, etc. Maybe even MEED, or some opportunistic routing if schedules are not fully deterministic (e.g. traffic jam, etc.) Summarizing: DTN Routing Scheduled/Known Contacts: Modified Dijkstra Algorithm (time-dependent weights) Dynamic Flow Problems Enforced Contacts with Specialized Nodes (Ferries): Design of Optimal Mobility Paths (TSP) Optimal Assignment of Ferries Opportunistic Contacts? Contacts not known in advance No extra nodes; only the mobility of the nodes themselves is available Routing: Opportunistic Networks Routing with Scheduled Contacts Graph is disconnected and/or time-varying Set of contacts C: known Set of nodes V: known (B,D) = {10,12},{19,21} B D D D A D C (C,D) = {8,10},{15,17} Tx Range Tx Range Routing with Unknown Contacts Opportunistic Routing Graph is disconnected and/or time-varying Set of contacts C: unknown! Set of nodes V: often unknown too! (B,D) = ?? B WHERE IS D? D D A WHERE IS D? C (C,D) = ?? Tx Range D WHERE IS D? D Epidemic Routing Give a message copy to every node encountered essentially: flooding in a disconnected context D F E D B D D D D A C Epidemic Routing (2) Message Vectors Node A encounters node B Message Vector of A Message Vector of B Dest ID Seq. Num. Dest ID Seq. Num. D 0 D 0 (G,1) G 1 E 0 F 0 F 0 F 1 (E,0),(F,1) Epidemic Routing (2) Message Vectors After message exchange Message Vector of A Message Vector of B Dest ID Seq. Num. Dest ID Seq. Num. D 0 D 0 E 0 E 0 F 0 F 0 F 1 F 1 G 1 G 1 Encounters Two nodes “encounter” each other when they are inside Transmission Range How do they know? Beacons: periodically transmit a “HELLO” message to discover neighbors e.g. Bluetooth association Implications: 1. Some encounters might be missed 2. Encounter not immediately when in range Encounter => MSG vector exchange (+other info) Delay of Epidemic Routing (a coloring problem analog) 1 M K Ti ED M 1 K 1 i 1 D 2 T1 = 1 red → 1any blue 2 T2 = any of 2 red → any blue S M nodes I.I.D. mobility Epidemic Routing Performance Redundant copies reduce delay But: too much redundancy is wasteful and often disastrous (due to contention) Transmissions for Epidemic Routing Delay for Epidemic Routing 160000 120000 epidemic optimal 100000 80000 60000 40000 20000 0 delivery delay (time units) total transmissions 140000 7000 6000 epidemic optimal 5000 4000 3000 2000 1000 0 increasing traffic Too many transmissions increasing traffic Plagued by contention Randomized Flooding (Gossiping) “Spread” the message with a probability p ≤ 1 p = 1) epidemic p = 0) direct transmission D D E D Outcome < p) Give a copy Outcome > p) Don’t give copy K-neighbor Epidemic Each node receiving a copy, can copy it again up to K times D G D D E F Already given 2 copies! Node E cannot fwd more J K=2 Flooding-based Schemes Can reduce the transmissions of epidemic With some penalty on delay! Given long enough time, all nodes receive a copy Still flooding-based! Let’s re-think the problem. Must we flood everyone (or almost everyone)? Single-copy vs. Multi-copy routing strategies “Single-copy”: only a single copy of each message exists in the network at any time “Multi-copy”: multiple copies of a message may exist concurrently in the network Single-copy Multi-copy + lower number of transmission + lower delivery delay + lower contention for shared resources + higher robustness Choosing A Next Hop A local and intuitive criterion: A forwarding step is efficient if it reduces the expected distance from destination usually: reduction of expected distance => reduction of expected hitting time Destination B A C Efficient Routing : Ensure that each forwarding step on the average reduces distance or hitting time with destination Direct transmission Forward message only to its destination simplest strategy minimizes transmissions F E D B D D S C The Delay of Direct Transmission D D D S EM: expected meeting time 2 nodes starting from stationary distribution EM > ED: EM is a lower bound on delay! ET: expected hitting 1 node is static (with position from uniform distribution Randomized routing A node forwards message to a new node with probability p; NO Duplication! It’s Hand-over! F E D B D D D D A C Randomized Why Transmitting is Faster Than Not! D F D D D B D C A Transmission Speed is Faster than Node’s Speed! Why Transmitting is Faster Than Not! Randomized EB TD ETD EATD = ET(d) ET(d 1) ET(d 1) 2 B PBA = ½ A d B PAB = ½ Utility-based Routing Utility UX(Y) = f(tX(Y)) D t(D) = 26 Policy: forward to B iff UB(D) > UA(D) + Uth B diffused with node mobility smaller timer closer distance tB(D) = 100 tA(D) = 138 t(D) = 68 t(D) = 218 tX(Y): time since X last saw Y Indirect location information D D A t(D) = 0 Last encounter timers For most mobility models Utility-based Routing (cont’d) Randomized EB TD ETD EATD = ET(d) ET(d 1) ET(d 1) 2 EBTD PBA ET(d 1) (1 PBA ) ET(d 1) Utility-based B PBA > =½ A d B PAB < =½ Result 1: Utility-based routing has a larger expected delay reduction than the simple randomized policy Problems with Utility Routing D tA(D) = 20 tA(D) = 20 tA(D) = 20 A tA(D) = 200 Timer values are good indicators of proximity only if their value is small. Timers/utility updated only when destination is found If source’s (relay’s) neighbors happen to have larger timers, message gets stuck for a long time Transitivity Idea If A sees B, and B has recently seen D, then A is probably close to D too. update tA(D) when A encounters B • cache of most fresh entries for scalability (dAB): expected time to cover distance dAB tA(D) = tB(D) + (dAB) • (dAB) = (dAB)2 (random walk) • (dAB) = dAB (random waypoint) No transitivity Transitivity PDF of timer value of A for D, when A is far from D Seek and Focus A hybrid routing strategy Set of node utility values: A time-varying, probabilistic utility-field with the global maximum at destination Utility-based routing is a greedy search of the field Issue: message often gets stuck at local maxima Seek and Focus Seek phase: If current utility is below Uf perform randomized forwarding (quickly look for a “good lead”) Focus phase: If current utility is above Uf perform utility-based routing for at most Tf time units (follow the lead) Re-seek phase: If no better relay is found for Tf, perform randomized routing for at most Tseek or until a better relay is found (if stuck at local maximum, do “perimeter search”) Oracle-based optimal algorithm Assume all future movements are known Then, the algorithm picks the sequence of forwarding decisions that minimizes delay Note that flooding (multi-copy strategy) has the same delay as this algorithm when there is no contention Effect of Connectivity Random Walk (“local” model) Transmissions (Random Walk) Delivery Delay (Random Walk) 100000 randomized utility (no trans) utility (trans) seek&focus (trans) optimal 700 600 500 400 300randomized utility 200 100 utility 10000 1000 100 randomized optimal 10 0 40 (8.6%) 50 (14.8%) 60 (27.7%) 70 (52.9%) 80 (79.2%) seek&focus time units (LOG SCALE) transmissions (per msg) 800 Tx Range (connectivity %) Increasing connectivity 40 (8.6%) 50 (14.8%) 60 (27.7%) 70 (52.9%) 80 (79.2%) X-axis: Tx Range (Connectivity) Randomized has smallest delay Tx Range (connectivity %) seek&focus Increasing connectivity But, with order(s) of magnitude more transmissions Y-axis: Y-axis: Delivery delay Transmissions per msgwith transitivity performs (LOGvery SCALE) Utility-based few transmissions But, with up to 10x worse delay than randomized Without transitivity things are even worse Seek & Focus achieves both low delays (close to randomized) and low transmissions (slightly higher than utility-based) Effect of Connectivity Random Waypoint (non-local) Transmissions (Random Waypoint) Delivery Delay (Random Waypoint) 10000 120 100 80 randomized 60 40 random utility (no trans) utility (trans) seek&focus optimal 20 time units (LOG SCALE) transmissions (per msg) 140 randomized 1000 100 0 10 30 (5.7%) Tx Range (connectivity %) A bad forwarding decision is costly Still high transmissions Utility-based has good delays and low transmissions 30 (5.7%) 40 (8.6%) 50 (14.8%) 60 (27.7%) 70 (52.9%) Randomized not fast for non-local mobility models 40 (8.6%) 50 (14.8%) 60 (27.7%) 70 (52.9%) Tx Range (connectivity %) utility utility Choice of the right transitivity function is important! No transitivity, or wrong transitivity (e.g. random walk) is really bad. Seek & Focus achieves even better delays Yet, with slightly more transmissions Single-copy Strategies: Lessons Learned Utility-based forwarding can be a good routing primitive ONLY IF utility function is correctly designed! (transitivity + mobility model stats) Seek and Focus (hybrid) is the best candidate if a single-copy routing scheme has to be used can fix some of the utility-based routing shortcomings BUT, best single-copy strategy still an order of magnitude slower than optimal! 2-hop Scheme Source gives a copy to any relay encountered Relays can only give copy to destination D F E Relay C cannot FWD to B B D D Dst D Src Relay C can FWD to Dst C 2-hop Scheme Performance How many transmissions? (M-1)/2 Delay? T1 = time until source meets any node (M-1) T2 = time until source meets any node (M-2) epidemic: time until 2 red meet any of M-2 (smaller) ED(n) ETn 1 Rem. Delay after n copies M n 1 ED(n 1) M n Prob{next node not DST) BUT: a relay node may meet destination in the meantime! Controlled Replication (“Spraying”) 2-hop scheme uses (M-1)/2 copies Still a lot! Only half of epidemic Limit number of copies to L (small, fixed) Transmissions = L! L = 2) Achieves O(1) per node capacity and deals with Kumar’s and Gupta’s conjecture (capacity →0) (Grossglauser et al. ‘01) L > 2 and L = O(1): (constant L) Still capacity gain Transmissions << M Multi-path diversity to reduce delay (Spray & Wait) Source Spraying Only source can give a copy (like 2-hop) Start with L copies; give one to L-1 first relays met Delay (Src Spray) > Delay (2-hop) Assuming no contention! Tree-based Spraying Use forwarding tokens; SRC starts with L tokens When L = 1, can only forward to DST L=1 D F E L=1 L=1 L=1 B L=4 D Src D D L=2 L=2 Dst D C Tree-based Spraying (2) L n1 L-n1 j nj j-nj I.I.D. movement => Binary is optimal (nj = j/2) Heterogenous => high complexity Binary Spraying = Time-limited Epidemic Do epidemic spreading until time T After T, switch to direct transmission If T = ETL then the same as token-based (on average Remember: ETL = time until epidemic “covers” L nodes Replication Method Matters Delay of Spray and Wait 4000 source spray and wait binary spray and wait (analysis) optimal 3500 time units 3000 2500 2000 1500 1000 500 0 5 10 15 20 L (# of copies) 100x100 network with 100 nodes 1. Efficient spraying becomes more important for large L 2. Few copies suffice to achieve a delay only 2x the optimal! Effect of Traffic Load (Rand. Way. - 500x500 grid, 100 nodes, Tx Range = 10) 4500 Total Transmissions 45000 40000 35000 30000 random-flood utility-flood seek&focus spray&wait(L=16) spray&wait(L=10) 25000 20000 15000 10000 Delivery Delay (time units) 50000 4000 3500 3000 2500 2000 1500 1000 500 0 5000 0 increasing traffic ) ) c us od od c 10 16 mi o o = = o l l e f f (L (L &f id a it a it lity ek om i ep t e w w d u s & & ran ra y ra y p p s s increasing traffic Transmissions Delay Low traffic >10x epidemic 3-4x other multi-copy same as epidemic 1.4-2.2x other schemes High traffic 1.8-3.3x same as above Spray and Wait: A good scenario Covered by Relay 2 1 12 D 13 S 14 2 16 11 3 15 7 8 5 10 4 9 6 Covered by Relay 1 Relays are highly mobile Relays routes are uncorrelated Spray and Wait: A bad scenario 1 12 D 13 S 14 2 16 11 3 15 7 Node S’ community Node D’s community 8 5 10 4 9 6 Relays move slowly Relays move locally and are correlated Spray & Wait Performance Spray and Wait has desirable performance IF nodes move frequently around the network (e.g. VANETs, a mesh network over city buses, etc.) But, Spray and Wait may get in trouble if nodes’ mobility is restricted inside a local area nodes’ mobility is extremely slow (e.g. human mobility) Spray & Focus 1st Phase: Binary Spraying like Spray & Wait 2nd Phase: Utility-based routing with transitivity for each copy Advantages: still: few transmission + redundant copies plus: take advantage of good transmission opportunities copies don’t get stuck in local neighborhood Effect of Connectivity: Random Walk (500x500 square, 100 nodes) Transmissions (thousands) 70 K = 15 (7.8%) 60 K = 20 (14.9%) 50 K = 25 (35.9%) 40 K = 30 (68%) 30 K = 35 (92.5%) 20 10 0 od flo y l it uti ran od flo m do &w ray p s ait us oc f & ray sp Delivery Delay Delivery Delay (time units) Transmissions 3000 slow! 2500 2000 fastest 1500 1000 500 0 ic em d i ep it od us od wa oc flo flo f & y y & m l it ra ray do uti sp p n s a r Transmissions: still ~10x improvement for both protocols Spray & Wait is slow: suffers from locality of movement Spray & Focus is the fastest: Takes advantage of locality Close-to-optimal (unless very low transmission range) Heterogeneous Scenarios Base Stations (pstatic) Roam around network (infrequent) 1-pL(i) 1-pR stay inside community pL(i) (i) Community (local) Nodes Fast/Mobile Nodes (pfast) Effect of Connectivity: Community-based Mobility (cont’d) Scenario 1: Homogenous Community nodes (100%) Scenario 2: Two types of nodes Community nodes (90%) Roaming nodes (10%) Scenario 3: Four types of nodes Community nodes (40%) Local nodes (40%) Roaming nodes (10%) Static nodes (10%) 25 Delay(SW) / Delay (SF) Delay Improvement by Spray and Focus Scenario 1 20 Scenario 2 Scenario 3 15 10 5 0 40 (8.6%) 50 (14.8%) 60 ( 27.7%) 70 (79.2%) Transmission Range (Connectivity %) Spray Routing: Summarizing “Non-local” mobility models: Spray and Wait 10x fewer transmissions AND smaller delay! Spray and Focus has similar performance; but we don’t really need it “Local” mobility models: Spray and Focus Spray and Wait is slow Spray and focus has close-to-optimal performance Why does spraying work? Law of diminishing returns for number of copies used Improvements Smart Replication Who should get the copies? Other Utility Functions Energy Mobility Trustworthiness GPS location Queue Size Hybrid An Analytical Framework Why do we need it? Confirm our previous observations Predict performance under a larger range of settings Use this theory for system-design e.g. choose the right number of copies for Spraying approaches An analytical framework for “mobility-assisted routing” Component 1) Hitting and Meeting Times: the basic building block; depends on mobility model; calculated for: random walk, random direction, random waypoint, and a new model Component 2) Multiple copies Component 3) Forwarding a message “Plug n’ calculate”: calculate the delay of any scheme by combining the right components Performance Analysis An Analytical Framework Assumptions Network area: • Random walk: grid (torus) – discrete movement • Waypoint-based models: square (torus) – continuous movement Infinite bandwidth, infinite buffers calculate delivery delay Notation: M: number of nodes N: network area K: transmission range (small enough to have partial connectivity ) EATB: expected hitting time from A to B ET: expected hitting time starting from stationary distribution EM: expected meeting time between two nodes starting from stationary distribution Random Walk Hitting Time (Tx Range K ≥ 0) Hitting time ET = EXTA (EM still equal to ET/2) A(K) 1) EXTA = EXTY - EATY Y p = 0.25 K=3 X 2) EXTY = cNLogN 2K 1 K 2 N 3) E A TY K 2 1 2K 1 K 2 ET N cLogN K 2 1 Random Direction (Random Waypoint) Hitting Time N Movement is a set of “epochs” Method: 1. D Probability that any given epoch hits the destination epoch finish 2KL Phit N 2. Expected number of such epochs (geometric) Ne 3. N K epoch start S L 1 N Phit 2KL Multiply by the expected duration of each epoch Te N ET Te 2KL 4. EM: divide by (normalized) relative E[| v v S D |] speed between S and D, vr ] E[| v S |] ET EM vr Modeling Epidemic Spreading Case Study: Epidemic Routing/Optimal D 2 EM M -1 1 2 S EM 2(M - 2) EDopt M nodes Tx Range = K HM-1 EM (M - 1) where HM-1 is the harmonic sum M1 1 HM1 i1 i Modeling Epidemic Spreading Markov Chains (Probabilistic Model) Prob(ii+1,t) = (N-i)*i*t N+1: nodes 1/: meeting time state i: i copies state A: DST found Epidemic Routing 2-hop Routing Modeling Epidemic Spreading: Fluid Models (Deterministic) Assume N (num. of nodes) I(t) = average number of “infected” nodes at time t I (t) λ (N I) I ' (1) P(t) = P(Td <t) CDF of delivery delay P(t+dt)-P(t) = Prob{t ≤ Td ≤ t + dt} = Prob{DST meets one of nI(t) infected nodes in [t,t+dt]} * Prob(Td>t) = E[Prob(DST meets nI(t) | nI(t)] * (1-P(t)) = E[nI(t)dt]*(1-P(t)) = I(t) * (1-P(t)) dt => P (t) λ I (1 P) ' (2) Modeling Epidemic Spreading (2): Fluid Models (Deterministic) Ordinary Differential Equations (ODEs) Or systems of ODEs Sometimes PDEs, too. Solve (1) for I(t) – it’s a separable ODE N I(t) 1 (N 1)e λNt Replace I(t) in (2) and solve for P(t) N P(t) (N 1) e λNt Expected Delay ETd (1 P(t))dt 0 lnN λ(N 1) Modeling Message Forwarding Case Study 2: Randomized algorithm q: probability of Tx jump D f(K) q = p • P(at least one node within range) Average jump length: f(K): average transmission distance D = 1 – q + q f(K) 1-q: probability of random walk jump Message Forwarding (cont’d) Case Study 2: Randomized algorithm Approximate actual message movement with a random walk performing D independent 1-step jumps at each time slot Note: This walk is slower than the actual walk would reach destination later, on the average Define an appropriate martingale to show that: EDrnd Message movement 2 EDdt D 1 Destination movement Note: D + 1 ≥ 2 randomized is faster than direct transmission Random Direction/Waypoint: Similar procedure gives exact result Utility-based algorithms (no transitivity) p t x Prob{node with higher utility within range AND node is closer to D} D 0 1 2 r-K p r-2 p p: probability of no forwarding => random walk step r-1 p r p p p p p p p r+1 r+2 p r+K N p p tx Prob{node with higher utility within range AND node is farther than D} EDutil is simply the expected hitting time from stationarity to a state ≤ K *Similar procedure for seek and focus without transitivity Source Spray and Wait Let ED(i) denote the expected remaining delay after i copies are spread Clearly EDsw(src) = ED(1) ED(1) can be calculated through a system of recursive equations If not destination, add extra term Expected remaining delay after i copies are spread ED(i) EDdt i(M - i) Time until a new node is found If destination, stop If new node found by 1 source, another ED(iforward 1) i copy M i 1 Mi P(not destination) i 1 ED(i) i If found by relay, do nothing A similar recursion procedure gives the delay of Optimal Spray and Wait Case Study: Choose the Number of Copies for Spray and Wait Exact delay not in closed form Derive a bound in closed form This is an upper bound for any Spray and Wait algorithm Probability a wait phase is needed Wait Phase Spray Phase EDsw ES ML 1 EW M1 L 1 EM EW L EM ES i1 M i Bound is tight for L<<M Choosing L to achieve specified delay Suppose we want to achieve EDsw= EDopt for some >1 We choose the minimum L that satisfies EM M L 1 EM HM-1 = EM M 1 L (M - 1) i1 M i L 1 Upper bound on EDsw EDopt Some values (for M=100): 1.5 2 3 4 6 8 10 Lmin 21 13 8 6 4 3 2 What If Network Parameters Are Unknown? To compute Lmin we need to know M Use meeting times statistics and do online estimation Method: EM T1 M 1 Estimator: 2T2 3T1 ˆ M T2 2T1 M value 1 1 T2 EM M 1 M 2 Estimation of M (200x200 grid) 400 350 300 250 200 150 100 50 0 Actual M = 200 Estimated M 0 1000 2000 3000 4000 number of samples Applies to any mobility model with exponential meeting times Routing: Other Issues Epidemic Routing: Wasted Resources Epidemic routing hands over a copy to every node encountered… Even after the message has been delivered! After the destination is encountered by at least one relay, no need to keep other copies around anymore Unnecessary transmissions (energy, throughput) Valuable buffer space Reducing Resource Waste “Dis-infection Schemes” After one copy has been delivered: 1. Inform other nodes to stop spreading more copies 2. No need to give extra copies to “non-infected” nodes Remove copy from buffers Clear up buffer space of infected nodes Full Erase When encountering the destination => delete message from buffer D E F D D A D C B X dst D Delete local copy Node may get a copy again! D IMMUNE Delete packet AND maintain an antipacket msg id: e.g. (src,dst,seq) Implies that node is recovered D D E F D D A D C No new copy to recovered nodes B X B Delete Recovered local Node copy msg: (S,D,0) D D IMMUNE-TX Propagate anti-packet to already infected nodes D D Avoided this Tx E F D A D C Norecovered! new copy to C recovered nodes msg: (S,D,0) X D B Delete Recovered local Node copy msg: (S,D,0) D dst VACCINE Propagate anti-packet to ANY node encounter Vaccinate susceptible nodes Avoid this Tx, too Vaccinate E D E F D A D D dst C Norecovered! new copy to C recovered nodes msg: (S,D,0) B SIR Model Epidemiology I: infected nodes Nodes with a copy, and no anti-packet R: recovered nodes Nodes with an anti-packet S: susceptible nodes (S = N – I – R) Haven’t ever received a copy or anti-packet SIR Model: ODEs Immune: Immune-TX I (t) λ (N - I - R) I - λ I R' (t) λI ' I ' (t) λ (N - I - R) I - λ I(R 1) R' (t) λI(R 1) Total number of transmissions E[Tx] = limt{I(t) + R(t)} – I(0) Immune I ' (t) λ (N - I - R) I - λ I R' (t) λI dI N - I - R - 1, I(0) 1 dR I(R) ( N 1)e R R N lim I(t) 0 lim R(t) lim [( N 1)e R N] t lim R(t) N N 10 t t t E[Tx] N - 1 Total Number of Transmissions IMMUNE-TX I ' (t) λ (N - I - R) I - λ I(R 1) R' (t) λI(R 1) R 2 (N - 1)R 1 I(R) R 1 N - 3 N 2 2N 5 E[Tx] 2 Performance of Buffer Management The more aggressive the recovery scheme 1) the less the total transmissions (ignoring overhead of antipackets) 2) the smaller the buffer occupancy Queuing Policies Limited buffer space Nodes with little memory (e.g. sensors) Nodes might offer only a small chunk of memory for 3rd party traffic What if a message has to be dropped? Queuing Policies (2) When new packet arrives on buffer and buffer is full: Droptail drop it if buffer is full Drophead drop the oldest packet in buffer (most hops or least time to TTL expiration) rational(?): large time in the network => little chance to be delivered before TTL expires Drophead-sp (source-prioritized) Don’t drop a source packet for an arriving relay packet Queuing Policies: Performance buffer droptail drophead drophead (sp) 5 0.97 0.22 0.05 10 0.95 0.03 0.0 20 0.90 0.002 0.0 Drophead: fast infenction, high packet loss for small buffers Drophead-sp: slower infenction, higher delivery ratio QoS Provision Multi-type traffic: what about traffic of different priorities (e.g. emergency messages vs. advertisements) Multiple queues? Different forwarding policies E.g. never drop type A for type B Different routing policies? Reducing the overhead of epidemic: Network Coding So far we were not changing packets’ content Replication Forwarding Drops Coding may combine one or more packets x1 Incoming links x2 x3 x2 x1 Outgoing links x3 Store-and-forward x3 x2 x1 Reducing the overhead of epidemic: Network Coding So far we were not changing packets’ content Replication Forwarding Drops Coding may combine one or more packets Incoming links x3 x2 x1 Outgoing links Network Coding f(x1,x2,x3) Coding Packets: A simple example XOR: The simplest combination: msg x1: 1 0 f(x 1 , x 2 ) x1 x 2 1 1 1 0 0 1 msg x2: 0 1 f(x1,x2): 1 1 De-coding Packets: A simple example Assume node that send x1 receives the coded packet f(x1,x2) msg x1: 1 0 1 1 0 1 1 0 f(x1,x2): 1 1 msg x2: 0 1 Butterfly Network: Store-andForward Two sources: S1, S2 R1,R2: receive traffic from both S1 and S2 x1 S1 x2 S2 x2 x1 x2 x1 x1 x2 x2 x1 R1 4 units: received x1,x2 R2 Time 1 Time 2 Time 3 Time 4 3 units: received x1,x2 Butterfly Network: Network Coding Two sources: S1, S2 R1,R2: receive traffic from both S1 and S2 x1 S1 x2 x2 x1 S2 Time 1 Time 2 Time 3 x1 x 2 x1 x 2 x1 x 2 x2 x1 R1 3 units: received x1,x2 R2 3 units: received x1,x2 Network Coding for Wireless Broadcast nature of medium: natural ground for network coding x2 Bx 1 A A x2 Bx 1 A C No coding: delay = 4 x1 Ax 2 B B Network Coding for Wireless Broadcast nature of medium: natural ground for network coding x1 x 2 B A x1 x2 Bx x11 x 2 A C Coding: delay = 3 A B x1 x 2 x2 Linear Network Coding m packets n linear combinations b1 = a11x1+ a12x2+…+ a1mxm b2 = a21x1+ a22x2+…+ a2mxm ………………………………. bn = an1x1+ an2x2+…+ anmxm independent linear combinations ≥ m Centralized choice of coefficients => Decode! Distributed) ai random and independent => decode (prob 1) Network Coding for Challenged Nets The model Set of nodes V N(u): {iV: i neighbor of u} Set of sources S V (m = |S|) Messages: xi, i=1,…,m xi = [xi1, xi2,…, xiM], M symbols F2k = (0,2k-1) K > 8 to ensure independence for random coding Encoding vectors: gi = [gi1, gi2,…, gim], m symbols F2k Encoding matrix G: m m m row i = (gi1,…,,gim | gij x j1 , gij x j2 ,…, gij x jM ) j 1 Encoding vector j 1 j 1 gi*Gi (Gi= ith symbols of all xi} Encoding Matrix: Example Encoding matrix G at node 1 m = 3 messages in total Each message contains M = 4 symbols in F8 g1=[1,0,0] 1 0 0 5 4 1 2 g2=[1,1,0] 1 1 1 6 3 2 2 - - - - - - - Encoding vectors (2) M = 4 (symbols per message) Encoding Matrix: Example Encoding matrix G at node 1 m = 3 messages in total Each message contains M = 4 symbols in F8 g1=[1,0,0] 1 0 0 5 4 1 2 g2=[1,1,0] 1 1 1 6 3 2 2 g2=[0,1,1] 0 1 1 3 7 3 4 New encoded message arrived: increase rank of matrix G? No! Linearly dependent with 1,2 (x3 = x1 XOR x2 (mod 8)) Encoding Matrix: Example Encoding matrix G at node 1 m = 3 messages in total Each message contains M = 4 symbols in F8 g1=[1,0,0] 1 0 0 5 4 1 2 g2=[1,1,0] 1 1 1 6 3 2 2 g2=[1,0,1] 1 0 1 2 4 1 0 New encoded message arrived: increase rank of matrix G? Yes! 3 linearly dependent vectors (Gaussian elimination) Network Coding for Challenged Nets: Forwarding At time t-dt node i receives an innovative message/vector With probability d: send (gi(t),yi(t)) = ri(t)*Gi(t) ri(t) = random vector (in F2k) Like gossiping: instead of forwarding new message, forward a linear combination of all messages currently in buffer! All nodes in N(i) receive (gi(t),yi(t)) If not innovative discard If innovative, add to matrix G and do same process Need at most m innovative messages to decode Can probably decode some elements before that! Performance of Network Coding Increase Delivery Ratio: better utilize forwarding opportunities Increase average delay (have to wait for multiple messages to be received Generation Management: Which messages to code together? Assume infinitely large network with a percentage of nodes being sources Do we code messages from all sources?? Coding matrix G will be huge! Delay until all messages decoded Code messages of subsets of sources together How do we choose subsets?? Code multiple messages of same source How many generations?? Network Coding Gains Generation management: Larger generations Better coding gains (throughput, energy, delivery) Larger potential end-to-end delay, complexity Related nodes in same generation? Types of traffic Multiple single-source single-destination messages One source-one destination, multiple messages Many sources-one destination Multiple one source-many destinations messages (multicast, broadcast) End-to-end vs. hop-by-hop decoding 1) Decoding of messages at end nodes This is what we were looking at so far Issues with generation management Potentially long/unbounded delays 2) Opportunistic Network (De-)Coding Keep track of neighbors messages Code only if next hop can decode x1 x3 x2 x1 x f(x 2 1,x2,x3) x3 x1 x f(x 1 1,x2,x3) x3 x2 Erasure Coding Provide better fault-tolerance by adding redundancy without the overhead of strict replication (e.g., Reed-Solomon, Gallager, Tornado, and IRA codes) Applications: P2P, overlay routing, WSN, data storage, etc. Erasure Coding (r=2, n=4) A A-1 A-2 B A-3 A-4 B-1 B-2 C B-3 B-4 C-1 C-2 D C-3 C-4 D-1 D-2 Lossy Channel A-1 A-3 A-2 A A-4 B-1 B-3 C-1 B-2 B D-1 C-4 C D D-3 D-4 Layered Multiple Description Coding (LMDC) Layered coding Unequal erasure coding LMDC Examples Video Web Document Transport Layer Issues in DTNs TCP offers: Ports Still used by the overlay bundle layer Sequencing Still there, but for bundles Connection Impossible in most DTN cases Reliability Late ACKs. Large RTT. Congestion Control Very difficult to get up-to-date congestion info in partitioned environments; Reliability in DTNs: “Hop-by-Hop” Each message copy forwarded is acknowledged by the next hop This holds also if multiple message copies are propagated (e.g. epidemic) Hop-by-hop reliability has minimum delay No need to wait for end-to-end ack BUT: Hop-by-hop reliability does not guarantee end-to-end reliability Reliability in DTNs: “Active Receipt” Intermediate node may: lie, shut down, break down. Active receipt: generated by destination when it receives the message Active receipt = new message Other nodes route it as a normal message Epidemic spreading of receipt to guarantee acknowledgement ACK size < MSG size => less overhead Vaccinates/Cures other nodes encountered in the meantime (essentially VACCINE) Reliability in DTNs: “Passive Receipt” Active receipt: floods two messages Often, most overhead is MAC access “Passive Receipt”: - generated by destination when it receives the message - can only be passed to infected nodes (essentially IMMUNE-TX) Plus: less overhead than active receipts Minus: larger delay than active receipts Reliability in DTNs: “Network-Bridged Receipt” Assume complementary network: DTN + (low bandwidth, connected network) Cellular network DTN network: send bulky data (with delay tolerance; e.g. ftp) Cellular network: send immediate small ACK Could even be used for disinfection(?) Reliability in DTNs What else could we try? Where is each approach applicable? What is the penalty of late ACKs? What about ACKing multiple messages Can we take advantage of mobility/social structure to improve? Congestion Control in DTNs Connected Network Cut back send rate! D D D D D D Message Drop! S Congestion Notification D Buffer Full Congestion Control in DTNs Disconnected Network D D D D D D D rate! Cut back send D D Message Drop! D S Irrelevant Notification! Unnecessarily reduce throughput! May not see S Congestion Notification Buffer No Congestion! Full Mobility Models Random Walk All nodes perform independent random walks Move to any neighboring location with probability ¼ p = 0.25 Uniform stationary distribution torus: on boundary reflect on other side Brownian Motion as an extension Normal distribution increments Random Waypoint Choose a point in the network uniformly Choose speed randomly Pause for a random amount of time Choose another point uniformly and repeat Pause Random Direction Random Waypoint has some problems Non uniform stationary distribution: concentration in center If not started from stationary distribution => convergence issues: slowly drifting from uniform to center Random Direction 1. Choose direction uniformly in 360o 2. Move for exponential amount of time 3. Reflect or turn-around on boundary Uniform Stationary Distribution Other Models Manhattan Model All nodes move within restricted street borders Grid structure (vertical and horizontal streets, like Manhattan) Stop lights? Freeway Model Nodes move on lanes of one line; lanes in both directions Potentially other crossing freeways Speed considerations between nodes in same lane Group Mobility Subset of nodes associated with a leader Followers make move based on leader’s move Impact of Mobility Model on Performance A study comparison between DSR, AODV, TORA, and DSDV under Random Waypoint All routing protocols (proactive and reactive) Showed DSR was better overall Comparison for different mobility models (Rand. Waypoint, Freeway, Manhattan, etc.) Winner depends on mobility model; AODV actually better in more cases Some Common Assumption of Synthetic Mobility Models No location preference Uniform choice of destination Uniform stationary distribution IID node mobility Every node is doing the same Statistically equivalent Real-life Mobility Base Stations (pstatic) Roam around network (infrequent) 1-pL(i) 1-pR stay inside community pL(i) (i) Community (local) Nodes Fast/Mobile Nodes (pfast) Common Mobility Models: What is Wrong? Location preference? Nodes don’t visit all locations equally frequently Usually: spent most of the time in a small subset of locations (e.g. office, house, library, etc.) Identical node behavior? Different nodes; some more mobile than others Vehicles vs. pedestrians; first-year student vs. graduate student Does time play any role? Morning: commute to work Noon: lunch Weekend-vs-week What else? Social relationships Traces From Real Wireless Networks WLAN (WiFi) traces Collect logs from deployed WLANs in campuses Association(s) between user node and Access Point(s) (AP) Traces of contacts between different wireless nodes (ad hoc mode) PDAs carried around by users Logs of different encounters (e.g. PDA associations) DTN: We Care About Contacts Contact traces => we get this directly WLAN traces: translate Node-AP associations into Node-Node associations Same AP at the same time => contact Not always true What happens between APs? Public DTN Traces ZebraNet Bus trace (SF, Toronto, DieselNet) Campus trace (UCSD, Dartmouth, MIT) Conference trace (Infocom, SIGCOMM) Enterprise trace (Intel, IBM) http://crawdad.org Traces: What Have We Learned? Location/Node preference Tend to see specific locations/nodes, more often than other Node Heterogeneity Some nodes see all locations/nodes; others a small subset Behavior over time Different for different time of day, day of week, etc. Periodic behavior Community-based Mobility Capture Location Preference Roam outside community Rest of the network (Rand. Direction or Waypoint) 1-pL(i) pR(i) Continue roaming 1-pR(i) stay inside community local Ci pL(i) Community (e.g. house, campus) Community-based Mobility (2) Capture Node Heterogeneity Each node may have a different community pL(i) pR(i) pL(j) 1-pL(i) local pR(j) 1-pL(j) roam local roam 1-pR(i) 1-pR(j) Node i Node j Community-based Mobility (3) Multiple Communities (house, office, library, cafeteria) Rest of the network p23(i) Office C2 p12(i) p32(i) p21(i) Library C3 House (C1) p11(i) Community (e.g. house, campus) Community-based Mobility (4) Multiple Communities (house, office, library, cafeteria) p11(i) p22(i) p12(i) C1 C2 p21(i) p24(i) p32(i) C4 C3 p43(i) Inter-Community Mobility? Intra-Community Mobility? Community-based Mobility (5) Capture time-dependent behavior t = {morning, noon, weekend,…} p11(i)(t) p22(i)(t) p12(i)(t) C1 C2 p21(i)(t) p24(i)(t) p32(i)(t) C4 C3 p43(i)(t) Mobility Profile Macroscopic View of Mobility Node i: {π(i)(C1), π(i)(C2),…, π(i)(Cn)} Approach 1: Route towards most popular communities (e.g. geographic routing) Approach 2: {π(i)(C1), π(i)(C2),…, π(i)(Cn)} = coordinates in an n-dimensional space Route to nodes whose distance is small in this n-dimensional space Multi-tiered Community Roaming outside local community is not uniform either! Move further away from local community with decreasing probability Tier 4 Tier 3 Tier 2 p13(i)(t) p14(i)(t) p12(i)(t) Tier 1 Inter-contact Times Time between subsequent encounters with the same node Consecutive transmission opportunities to a given node Contact-based trace measurements: what is the distribution of inter-contact times? WLAN traces (Dartmouth, UCSD) Inter-node (ad hoc mode) traces (Cambridge, Toronto) CCDF for Inter-contact Times LOG-LOG plots Straight line in log-log plot => power law/heavy-tailed (slope = exponent) CCDF for Inter-contact Times (2) WLAN traces: similar behavior Power Law Distributions P[X > x] = x-a Infinite variance a < 2: infinite mean There is a high probability that some very large values will be drawn if X is sampled sequentially Contrast: exponential decay variables Very large values: almost improbable Most of the mobility models (synthetic) presented so far had exponential tails Power Law Distributions: Complications Theory: most analysis (Markov, ODEs, combinatorics) assumes exponential tail Essentially for X1,X2,…,Xn IID and exponential E[min{X1,X2,…,Xn}] = EX / n Protocol Performance Opportunistic routing: give a copy randomly Depending on the exponent (a) any opportunistic protocol (e.g. direct transmission, 2-hop, spray&wait) may have infinite delay! But is it REALLY Heavy-tailed? Power-law only within a range of CCDF What about the rest of the tail (artifact of experiments, or not power-law really)? Lognormal Seems Fit Better Inevitable Censorship in Measurements UCSD trace P(T>t) Survival Curve 0.4 0.2 0.06 5x10^3 censored data 6x10^4 6.6x10^6 Self-Similarity Test Hurst values are located between [0.5,1] Time-Variance Plot, R/S Plot, Periodogram Plot, Whittle Estimator Social Networks Social Networks Social Network: who interacts with whom? Who is a “friend” of whom? Graph model: Vertices = humans, Weighted Edges = strength of interaction Social Network-based Mobility Model 1. Create (simulation) or Derive (from existing info – e.g. department affiliation) a social network among all nodes 2. Assign nodes to communities according to social network 3. Assign communities to locations 4. Induce mobility based on social network Communities in Social Networks Social networks have high clustering co-efficient Interaction Matrix = Connectivity Matrix For all weights > threshold => assume a connected link Community 1 Community 2 Community 3 Identify Communities: Find nodes that connect communities (intuition: shortest paths go through these) Communities in Social Networks Social networks have high clustering co-efficient Interaction Matrix = Connectivity Matrix For all weights > threshold => assume a connected link Community 1 Community 2 B: connects 1,2 Identify Communities: Find nodes that connect communities (intuition: shortest paths go through these) Mapping Communities to Locations Assume a grid with different locations of interest Geographic consideration might gives us the candidate locations Mobility Between Communities pc(i) = attraction of node i to community/location c p1(B)(t) pC(i) w jC p2(B)(t) ij {j C} p3(B)(t) Social Network-based Mobility Model Can reproduce similar behavior to (heavy-tailed) traces Inter-contact times Contact durations Some issues Nodes move only between specific (community) locations Different social graph weights depending on time of day Evolve social graph weights Social Networks for Information Dissemination Social networks are often better to find information that is location, community, or time-specific! Small World and Scale-Free properties Separation/diameter is smaller than random networks Query can often be answered quicker through peers Example: “where is a good Thai restaurant in Nice?” Approach 1: Find PC => Google => look websites that rate restaurant => hope the one suggested IS actually good Approach 2: Ask friend who lives in Nice (he might now, or have heard, or ask another friend) What if we could do this wirelessly also? PeopleNet Architecture Cellular Networks (WiMaX) as main infrastructure Bluetooth peer-to-peer networks (WiFi – ad hoc) Users transmit querys Request query: “who knows/has X?” (ticket to Monaco rally) Offer query: “I have/know Y” Queries are tagged according to some subject (e.g. sports, finance, news, etc.) PeopleNet Architecture (2) A query is sent to a subset of locations/base stations that have been assigned to the given query type Geography might play a role: e.g. “where is the closest local bookstore?” A few users receive the query through infrastructure, and propagate further using peer-to-peer messages If a “match” is found, requesting user is notified (SMS, email) Further Issues Research Issues Routing Buffer Management Power Management Auto-Configuration Network Reliability Free-riders Black holes Worm holes Information Security Data Encryption Real-world applications (and killer applications) Underwater Networks, Vehicular Networks, People Networks, Scientific Monitoring Networks, etc.