Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University Jon Feldman Google Labs DNA Research Group 1 Outline Problem Description Solution Approach: Growth Codes Experiments and Simulations Conclusions and Ongoing work ACM Sigcomm 2006 2 Background: A generic sensor network Sensor Nodes Sink(s) x1 Sensed Data x9 x2 x10 Data follows multi-hop path to sink(s) A few node failures can break the data flow x12 x3 x11 x8 x5 x6 x13 x7 Generic Aim: Collect data from all nodes at sink(s) x 4 ACM Sigcomm 2006 3 Specific Context: Disaster Scenarios e.g., Monitoring earthquakes, fires, floods, war zones Problems in this setting Congestion near sink(s) All nodes simultaneously forward data Overwhelm sink(s) capacity Virtual queue: Congestion near sink ACM Sigcomm 2006 4 Specific Context: Disaster Scenarios - 2 Problems in this setting Network Collapsing: nodes failing rapidly Pre-computed routes may fail Data from failed nodes can be lost Data Recovery from subset of nodes acceptable ACM Sigcomm 2006 5 Challenges Networking Challenges: Coding Challenges: Disaster scenarios: feedback often infeasible Frequent disruptions to routing tree if setup Difficult to predict node failures: sink locations unknown, surviving routes unknown Difficult to synchronize nodes’ clocks Data source distributed (among all sensor nodes) Prior approaches (Turbo codes, LDPC codes) aim at fast complete recovery Sensor nodes have very limited memory, CPU, bandwidth ACM Sigcomm 2006 6 Data Objectives Persistence Fraction of data that eventually reaches the sink(s) Sink Preserve data from failed sensor nodes x x 8 x3 x Deliver data to 6 x12 2 x9 + x1 x10 6 of 10 symbols reach sink. 60% sink(s) Persistence as fast =as possible x11 = x5 Maximize Data Persistence ACM Sigcomm 2006 7 Limitations of Previous Work Channel Coding based (e.g. Turbo Codes [Anderson-ISIT94], LT Codes [Luby02]) Aim for complete recovery in minimum time Difficult to implement with distributed sources Routing-based (e.g. Directed Diffusion [Govindan00], Cougar [Yao-SIGMOD02]) Conjecture: Too fragile (disrupted easily) for disaster scenarios ACM Sigcomm 2006 8 Our Approach Two main ideas Randomized routing and replication Avoid actively maintaining routes Replicate data to increase data survival Distributed channel codes (Growth Codes) Expedite data delivery & survivability First (to our knowledge) ACM Sigcomm 2006 distributed channel codes 9 Outline Problem Description Our Solution: Growth Codes Experiments and Simulations Conclusions and Ongoing work ACM Sigcomm 2006 10 Network Assumptions 4 3 2 5 S 1 6 S 7 N node sensor network Limited storage: each node stores small # of data units Large storage at sink(s): sink receives codewords from random node(s) All sensed data assumed independent (no source coding) ACM Sigcomm 2006 11 High Level View of the Protocol 4 1 2 3 Nodes send data at random times (Current implementation: exponentially distributed timers) ACM Sigcomm 2006 12 High Level View of the Protocol (2) Symbols 4 Degree 1 codewords 1 2 0 Degree 2 codeword Even if node 3 fails Sender picks a random symbol Node 3’s data survives XORs it with its own symbol K1 3 K3 After time K1, nodes start sending degree 2 codewords ACM Sigcomm 2006 13 K2 High Level View of the Protocol (3) After time K1, nodes start sending degree 2 codewords After time K2, nodes start sending degree 3 codewords . . After time Ki, nodes start sending degree i+1 codewords What are good values for {Ki}? 0 Please refer to our paper Note: No need to tightly synchronize clocks (Times Ki can be out of sync at different nodes) ACM Sigcomm 2006 14 K1 K3 K2 The Intuition behind Growth Codes Codewords When very few symbols decoded Easy to decode low degree codewords Set of symbols decoded at Sink time ACM Sigcomm 2006 15 The Intuition behind Growth Codes(2) Codewords When significant number of symbols decoded Low degree codewords often redundant Set of symbols Higher degree codewords more likely to be useful decoded at Sink ACM Sigcomm 2006 16 Outline Problem Description Growth Codes Simulations and Experiments Conclusions and Ongoing work ACM Sigcomm 2006 17 Simulations/Experiments: Compare data persistence of various approaches 1. Simulations: 2. Centralized Setting: compare GC with other channel coding schemes Distributed Simulation: assess large-scale performance of coding vs no coding Experiments on motes: Compare time of complete recovery for GC vs routing Measure resilience to node failures ACM Sigcomm 2006 18 Comparison with various coding schemes (N = 1500) Centralized Simulation (to compare with other channel coding schemes for which only centralized versions exist) Single source, single sink Source generates random codewords No coding is fast beginning: slowdown explained via according toincoding scheme (GC,isSoliton) Coupon Collector’s problem Sink Zero failure rate Soliton/ R-Soliton: poor partial recovery (reason: high 1 degree codewords sent too early) Growth Codes closest to theoretical upper bound Sourceright degree at the right time) (reason: 19 ACM Sigcomm 2006 Growth Codes vs No Coding (Varying N) Distributed Simulation (to assess the performance gain of coding) N sources, single sink Random graph topology (avg degree 10) Sink receives 1 codeword per time unit Complete recovery takes: O(N logN) time without coding (Coupon Collector’s effect) Linear time with Growth Codes Soliton/R-Soliton: cannot compare in a distributed setup ACM Sigcomm 2006 20 Experiments with (micaz) motes (to measure data persistence with time) GC vs TinyOS’s “MultiHop” routing protocol No routing state at time 0 (scenario where sensor nodes are deployed rapidly) Experimental Topology S “MultiHop” for persistence: takes long time to complete route setup Comparison with GC simulator validates simulator performance ACM Sigcomm 2006 21 Motes experiments: Resilience to node failures Nodes generate data every 300 seconds 3 nodes fail just after 3rd data generation “MultiHop” sets up routing 3 random nodes fail Nodes generate S data 0 600 300 Experimental Topology ACM Sigcomm 2006 Nodes send data 22 to sink 900 “MultiHop” repairs routes Motes experiments: Resilience to node failures 1st generation: GC faster, MH takes time to setup routes 2nd generation: routing already setup, MH very fast 3rd generation: MH needs to repair routes “MultiHop” repairs routes “MultiHop” sets up routing 3 random nodes fail Nodes generate data 0 600 300 ACM Sigcomm 2006 23 Nodes send data to sink 900 Other Results: Please refer to our paper Good values for K1, K2, … More simulations/experiments Various topologies Other failure scenarios Implementation details: Memory usage at sensor nodes: how it affects performance How to handle periodic data generation How to reduce overhead of coefficients ACM Sigcomm 2006 24 Conclusions Data persistence in sensor networks: First distributed channel codes (GC) Protocol requires minimal configuration Is robust to node failures Simulations and experiments on micaz motes show, (compared to prior coding and routing methods) GC achieves complete recovery faster GC recovers more partial data at any time ACM Sigcomm 2006 25 Ongoing Work Adapt Growth Codes to scenarios where sensor data is correlated Take advantage of any available routing information (e.g. before a disaster) Estimate network size on the fly to use in Growth Codes ACM Sigcomm 2006 26 Thanks for your patience ! For more information DNA Research Lab, Columbia University http://dna-wsl.cs.columbia.edu/ ACM Sigcomm 2006 27