FIND: Faulty Node Detection for Wireless Sensor Networks Shuo Guo, Ziguo Zhong and Tian He University of Minnesota, Twin Cities Background Importance of fault detection in WSNs • Node failures => performance degradation Two types of faults • Function fault Crash of nodes, packet loss, routing failure or network partition Shuo Guo@University of Minnesota 2 Related Work Function faults Q. Cao, et. al, Declarative Tracepoints in SenSys’08 M. Khan, et. al, DustMiner in SenSys’08 J. Yang, et.al, Clairvoyant in SenSys’07 V. Krunic, et.al, NodeMD in MobiSys’07 N. Ramanathan, et.al, Sympathy in SenSys’05 Shuo Guo@University of Minnesota 3 Background Importance of fault detection Two types of faults • Function fault Crash of nodes, packet loss, routing failure or network partition • Data fault Behaves normally except for sensing results Shuo Guo@University of Minnesota 4 Related Work Function faults Q. Cao, et. al, Declarative Tracepoints in SenSys’08 M. Khan, et. al, DustMiner in SenSys’08 J. Yang, et.al, Clairvoyant in SenSys’07 V. Krunic, et.al, NodeMD in MobiSys’07 N. Ramanathan, et.al, Sympathy in SenSys’05 Data faults M. Ding, et.al, Localized Fault-Tolerant Event Boundary Detection in Sensor Networks. In INFOCOM’05 L. Balzano, et.al, Blind Calibration of Sensor Networks. In IPSN’07 V. Bychkovskiy, et.al, A Collaborative Approach to In-Place Sensor Calibration. In IPSN’03 J. Feng, et.al, Model-Based Calibration for Sensor Networks. In IEEE Sensors, 2003. Shuo Guo@University of Minnesota 5 Related Work Outlier detection • Identify readings numerically distant from the rest After-deployment calibration • Find a mapping function that maps faulty readings into correct ones (Y=aX+b) Limitations • Assumptions on data distribution • Mapping functions may not exist Shuo Guo@University of Minnesota 6 Our Work Objective: find a blacklist of possible faulty nodes, in order of their probability of being faulty • Node locations are available • Generally monotonic sensing readings over distance, can have violations locally, but general trend holds • No longer assume any mathematical model for reading-distance relationship • No longer assume any function to map faulty readings into correct ones • Detect both random and biased faulty readings Shuo Guo@University of Minnesota 7 Node Sequences and Ranks Node sequence RSS: Received Signal Strength • A complete node list of node IDs sorted by reading (e.g., RSS), or physical distance from targets Rankings • The rank a node appears in a node sequence 1 physical distance-based sequence:1243 2 RSS-based node sequence: 1243 2413 Node 1’s ranking in 1243 is 1 -50dBm -55dBm -62dBm 4 -60dBm 3 -65dBm Node 1’s ranking in 1243 is 1 Node 1’s ranking in 2413 is 3 Shuo Guo@University of Minnesota Ranking Difference 8 Main Idea Find mismatch between RSS-based and physical distance-based rankings 1 2 4 3 Events Distance RSS 1243 2314 4123 Ranking Difference 2431 2341 4213 Total: 1 2 3 4 3 1 1 5 1 1 0 2 1 0 1 2 1 0 0 1 1. Unknown target locations? Estimate distance sequences from RSS-based sequences? 2. Why ranking difference? 3. How many nodes are faulty, given ranking differences? Shuo Guo@University of Minnesota 9 Sequence Estimation Estimate physical distance-based sequence ŝ from RSS-based sequences s • Map Division: find = consisting of all possible distance-based sequences • Maximum A Posterior (MAP) estimation N-node Network N! Possible Given Topology Sequences small subset O(N4) ŝ s Shuo Guo@University of Minnesota 10 Map Division Divide map into subareas identified by a unique node sequence indicating distance information distance-based sequence 2 2 1 1 distance-based sequence 1 2 Shuo Guo@University of Minnesota 11 Map Division Divide map into subareas identified by a unique node sequence indicating distance information 2 1 V {s1 , s2 ,..., sM } 1243 2134 1423 2314 4123 4 3241 4312 34213 Shuo Guo@University of Minnesota 12 Sequence Estimation Estimate physical distance-based sequence ŝ from RSS-based sequences s • Map Division: find all possible distance-based sequences = • Maximum A Posterior (MAP) estimation N! Possible Sequences N-node Network subset s Shuo Guo@University of Minnesota ŝ 13 MAP Estimation (1) Estimate physical distance-based sequence ŝ from RSS-based sequences s V= Find s V maximizes i Pr(s | si ) Pr(s ) i Ai Pr(s ) i A Ai : the area of subarea i A : the total sensing area Shuo Guo@University of Minnesota 14 Main Idea Find mismatch between RSS-based and physical distance-based rankings 1 2 4 3 Events RSS 2431 2341 4213 Ranking Difference Distance 1243 2314 4123 Total: 1 2 3 4 3 1 1 5 1 1 0 2 1 0 1 2 1 0 0 1 1. Unknown target locations? DONE! Estimate distance sequences from RSS-based sequences? 2. Why ranking difference? 3. How many nodes are faulty, given ranking differences? Shuo Guo@University of Minnesota 15 Why ranking difference? Average ranking difference is a provable indicator of possible data faults Theorem 1: A node is faulty if its average ranking difference is above a bound B given by • N: Total number of nodes • Ne: Total number of faulty nodes • μe: Average ranking difference of faulty nodes Shuo Guo@University of Minnesota 16 Why ranking difference? Theorem 2: Nodes with larger average ranking difference have higher probability of being faulty Sorting by ranking differences gives correct order in likelihood of being faulty nodes Theorem 1 Theorem 2 Ranking Difference 1 2 3 4 3 1 1 Total: 5 1 0 1 2 1 0 0 1 1 1 0 2 High Sort in descending order of ranking differences: 4 1 2 3 probability of being faulty Shuo Guo@University of Minnesota Low 17 Why ranking difference? Theorem 2: Nodes with larger average ranking difference have higher probability of being faulty Theorem 1 Theorem 2 : defective rate α=25% 1 α=75% 1 2 4 α unknown?? Sorting by ranking differences gives correct order in likelihood of being faulty nodes High Sort in descending order of ranking differences: 4 2 1 3 probability of being faulty Shuo Guo@University of Minnesota Low 18 Main Idea Find mismatch between RSS-based and physical distance-based rankings 1 2 4 3 Events RSS 2431 2341 4213 Ranking Difference Distance 1243 2314 4123 Total: 1 2 3 4 3 1 1 5 1 1 0 2 1 0 1 2 1 0 0 1 1. Unknown target locations? DONE! Estimate distance sequences from RSS-based sequences? 2. Why ranking difference? DONE! 3. How many nodes are faulty, given ranking differences? Shuo Guo@University of Minnesota 19 Detection Algorithm K 1 High Theorem 1 Blacklist 2 3 4 5 6 7 8 9 ranking difference probability of being faulty nodes Low D(nk ) B Update Check D(nk ) B Yes! Add next node Shuo Guo@University of Minnesota Stop 20 Practical Issues Detection in noisy environments Distance-Based 1 2 3 4 5 6 7 8 9 RSS, Noise Free 1 2 3 4 5 6 7 8 9 RSS, with Noise 1 2 3 4 5 6 7 8 9 • Increases ranking differences of normal nodes Remove nodes from blacklist if their ranking difference is close to that of normal nodes Shuo Guo@University of Minnesota 21 Practical Issues Simultaneous events elimination 1 2 4 3 Radio signal : sum of RSS • Detected sequence no longer matches with any distance sequence Remove sequences with short Longest Common Subsequence (LCS) Shuo Guo@University of Minnesota 22 Practical Issues Subsequence estimation 1 -50dBm 4 -60dBm 2 -55dBm 3 ??? • Complete RSS-based node sequences unavailable Use also truncated distance-based subsequence in mapping process Shuo Guo@University of Minnesota 23 Evaluation Two test-bed experiments • Radio signal: 25 nodes recording RSS, 35m×35m map, R=25m 29,39 and 49 events Shuo Guo@University of Minnesota 24 Evaluation Two test-bed experiments • Radio signal: 25 nodes recording RSS, 35m×35m map, R=25m 29,39 and 49 events 49 events 19 events Avg. Shuo Guo@University of Minnesota 25 Evaluation Two test-bed experiments • Radio signal: 25 nodes recording RSS, 35m×35m map, R=25m 29,39 and 49 events 19 events Shuo Guo@University of Minnesota 26 Evaluation Two test-bed experiments • Radio signal: 25 nodes recording RSS, 35m×35m map, R=25m • Acoustic signal: 20 nodes recording timestamp, 5m×6m map, 18 events Shuo Guo@University of Minnesota 27 Evaluation Simulations • 100 nodes, 250m×250m map, 50 events, R=25m • where With Noise Noise Free Shuo Guo@University of Minnesota 28 Summary FIND detects faulty nodes by assuming only monotonic sensing readings over distance Ranking difference is a provable indicator of possible faulty nodes Detection algorithm finalizes blacklist given node list ordered by ranking difference 5% false positive rate and false negative rate can be achieved in most noisy environments Shuo Guo@University of Minnesota 29 Preliminary Experiments EVENT 1 RSS vs. Distance 49 sensor nodes, 49 events Shuo Guo@University of Minnesota 30 Preliminary Experiments EVENT 1 EVENT 2 RSS vs. Distance 49 sensor nodes, 49 events Shuo Guo@University of Minnesota 31 Preliminary Experiments EVENT 1 Average of 49 Events Y = f(x)?? Large Variance! EVENT 2 RSS vs. Distance The monotonicity assumption is more accommodating to real world environments than the assumption based on a more specific model. Shuo Guo@University of Minnesota 32 Evaluation Simulations • 100 nodes, 250m×250m map, 50 events, R=25m • where α sensitivity: α is changed from 25% to 400% of true value 4α 0.25α,0.5α, α, 2α Shuo Guo@University of Minnesota 33 Detection Algorithm Given a node list {n1 , n2 ,...nN } in descending order of ranking differences, find k such that {n1 , n2 ,...nk } are best estimation (blacklist) for faulty nodes. Theorem 1 D(nk ) B Theorem 2 Pr(ni is f aul t y) Pr(ni 1 is f aul t y) Starting from n1 , add nodes into blacklist one by one Update Ne , e , B , after adding a new node Stop if D(nk ) B no longer holds Uniqueness proved! Shuo Guo@University of Minnesota 34 MAP Estimation (2) How to calculate si s Pr(s | s ) i 1 2 3 4 5 1 K 2 6 3 5 ( 6)Pr( s ' | s' ) faulty i N s' i s' 6 L 4 + 1 2 3 4 5 7 8 9 1 2 3 5 4 7 8 9 … 5 faulty 4 faulty : defective rate ? 7 7 8 9 distance-based 9 RSS-based (1 6)( normal )(LK ) Pr(s '' | s'') i 3 4N 5 faulty 8 s'' i s '' 1 2 7 8 9 1 2 7 8 9 Subsequence Matched, Stop! Shuo Guo@University of Minnesota 35 MAP Estimation (1) Estimate physical distance-based sequence ŝ from RSS-based sequences s V= Find s V maximizes i Pr(s | si ) Pr(s ) i Ai Pr(s ) i A Ai : the area of subarea i A : the total sensing area Shuo Guo@University of Minnesota 36 Map Division Divide map into subareas identified by a unique node sequence indicating distance information 2 1 1243 2134 1423 2314 4123 4 V {s1 , s2 ,..., sM } Size of V = 8 << 4! = 24 3241 4312 34213 s Shuo Guo@University of Minnesota ŝ 37 MAP Estimation (2) How to calculate si s Pr(s | s ) i 1 2 3 4 5 1 K 2 6 3 5 ( )Pr(s ' | s' ) i N + 6 faulty s' i s' 6 L 4 7 8 9 distance-based 7 8 9 RSS-based (1 )( )(LK ) Pr(s '' | s'') i N 6 normal 1 2 3 4 5 7 8 9 1 2 3 5 4 7 8 9 … 5 faulty : defective rate ? 4 faulty s'' i s '' 1 2 7 8 9 1 2 7 8 9 Subsequence Matched, Stop! Shuo Guo@University of Minnesota 38