UNIVERSITY OF SOUTHERN CALIFORNIA Understanding and Utilizing Multi-Dimensional Correlations in Sensor Networks: A Protocol Design Perspective Ahmed Helmy Department of Electrical Engineering USC Viterbi School of Engineering University of Southern California helmy@usc.edu Web: ceng.usc.edu/~helmy, Lab: nile.usc.edu UNIVERSITY OF SOUTHERN CALIFORNIA Outline • Classifying Correlations • How to Utilize Correlations? • Insights for Protocol Design – Gradient-based Routing (RUGGED) – Active Query Routing (ACQUIRE) – Abnormality Detection and Filtering Inserted Data • WLANs as Sensor Networks (IMPACT) – Sensing access and usage patterns – Analyzing correlations in wireless users behavior • Issues UNIVERSITY OF SOUTHERN CALIFORNIA Correlation Classification • Dimensions of Correlation: – Spatial • Between neighboring nodes – Temporal • Across time (different samples) for the same node – Spatio-temporal • Moving target (e.g., vehicle), moving phenomenon (e.g., fire) • What is correlated? – Sensor readings (e.g., temperature, light, gradients) – Communication channel (e.g., loss, fading) – Localization information, … UNIVERSITY OF SOUTHERN CALIFORNIA How Can We Utilize Correlations? • In-network processing – Aggregation – Abstraction/ adaptive fidelity/ zoom-in • • • • Prediction (model-based), enables Caching Routing (gradients in time and space, etc.) Abnormality detection (attacks, failures, mis-calibration) Equivalence – Sampling smaller set of nodes (sleep/wake-up) – Topology control UNIVERSITY OF SOUTHERN CALIFORNIA RUGGED: RoUting on finGerprint Gradients in sEnsor Networks Jabed Faruque, Ahmed Helmy Department of Electrical Engineering University of Southern California faruque@usc.edu, helmy@usc.edu URL: http://nile.usc.edu, http://ceng.usc.edu/~helmy - Faruque, Psounis, Helmy, IEEE/ACM DCOSS 2005. - Faruque, Helmy, IEEE ICPS 2004. UNIVERSITY OF SOUTHERN CALIFORNIA Introduction • Sensor networks are envisioned to be widely used for habitat and environmental monitoring, among others • Every physical event produces a fingerprint in the environment • Usually diffusion laws are inherent property of many physical phenomena f(d) 1/d, where d = distance from the source, = diffusion parameter, depends on the type of effect (e.g. for temperature = 1, light = 2) UNIVERSITY OF SOUTHERN CALIFORNIA Example (of diffusion): Isoseismal (intensity) maps (North Palm Springs earthquake of July 8, 1986) Ref.: Southern California Earthquake Center. (http://www.scec.org) UNIVERSITY OF SOUTHERN CALIFORNIA Why Natural Information Gradient is Important? • This natural information gradient is FREE • Routing protocols can use it to forward query packet (greedily) - Locate event(s); e.g., fire, nuclear leakage. • Diffusion property is not limited to natural phenomena - Time gradient • Existing approaches – flooding, expanding ring search, random-walk, etc. do not utilize this information gradient UNIVERSITY OF SOUTHERN CALIFORNIA Challenges 100 -Erroneous reading of malfunctioning sensors - Calibration error, obstacles. Cause local max/min magnitude of effect 80 60 40 20 -Environmental noise 0 0 50 100 distance -In real life, sensors unable to measure below certain threshold. So, diffusion curve has finite tail -Non-uniform sensor distribution (gaps) Dip Local Maximum gap 150 200 UNIVERSITY OF SOUTHERN CALIFORNIA Objective Design an efficient algorithm to locate source(s) in sensor networks, utilizing the natural information gradient i.e., the diffusion pattern of the event’s effect - Gradient based - Fully distributed - Robust to node or sensor failure or malfunction - Capable of finding multiple sources Environment Model • Event’s effect follows the diffusion law • Discontinuity exists in the diffusion curve with finite tail • Environmental noise UNIVERSITY OF SOUTHERN CALIFORNIA Basic Protocol A node can have two mode - flat region mode - gradient region mode A node forwards the query to neighbors with its information level To forward the query, each node uses following algorithm: 1. Information gradient region follows greedy approach - Forwards the query to the neighbors if the information level about the event improves 2. Unsmooth gradient region use probabilistic forward based on the Simulated Annealing concept - Probabilistic function is fp(x) = 1/xa, where x = hop count in the information gradient region and ‘a’ depends on the diffusion parameter ( ) 3. Use flooding for the flat (ie. zero) information region - Decrease latency to reach gradient information region - Handles query in the absence of event Query ID prevents looping Once query is resolved, node uses the reverse path to reply UNIVERSITY OF SOUTHERN CALIFORNIA E E Q’ Q’ Q’ np np nngp ng ng Q’ Q’ Q’ np Mx ngnp Mn ng Q’ Q’ Q Q’ np np nngp ng ng Q • All neighbors (np) of Mx have less information, so they forward the query to their neighbors probabilistically • All neighbors (ng) of Mn have more information, so they forward the query to their neighbors UNIVERSITY OF SOUTHERN CALIFORNIA Query Types • I. Single-value query - Search for a specific value and have a single response • II. Global Maxima search - Search for the maximum value of information in the system - Intermediate nodes suppress non-promising replies • III. Multiple Events detection (still presents a challenge) - Search for multiple events of same type Performance Metrics • Reachability i.e., success probability - Probability that the query will reach the source • Overhead in terms of average energy dissipation - Number of transmissions to forward the query and to get the reply - Reachability ~98% is achievable in presence of noise, gaps and flat region • For the probabilistic function fp(x) = 1/xa, a < is recommended, but close to gives optimal trade-off between reachability and overhead UNIVERSITY OF SOUTHERN CALIFORNIA Comparisons • Existing gradient-based routing protocols can be categorized into two major approaches • Single-path approach - CADR [Chu2002], Min-hop [Liu2003], … • Multiple-path approach - GRAB [Ye2003], RUGGED [Faruque2004] Which approach to choose? UNIVERSITY OF SOUTHERN CALIFORNIA Objective • Analyze the performance of these general approaches to route a query - Model query success rate and overhead • Using probability tools - For ideal and lossy wireless link conditions • Simulate the protocols based on these approaches in more realistic scenarios - Also investigate path quality metric • Compare both approaches using analytical and simulation results UNIVERSITY OF SOUTHERN CALIFORNIA Brief Description of Routing Approaches Single-path Query forwarding with look-ahead = 1 41.5 57.4 57.4 57.4 Multiple-path Query forwarding 41.5 41.5 57.4 57.4 S 27.8 32.9 41.5 57.4 100 57.4 41.5 23.8 27.8 32.9 41.5 57.4 100 57.4 41.5 23.8 27.8 3.4 41.5 57.4 57.4 57.4 41.5 23.8 27.8 3.4 41.5 57.4 57.4 57.4 41.5 23.8 31.0 32.9 41.5 41.5 41.5 41.5 41.5 21.1 23.8 31.0 32.9 41.5 41.5 41.5 41.5 41.5 18.9 21.1 30.0 27.5 29.0 32.9 32.9 9.0 32.9 32.9 23.8 27.5 27.5 27.5 27.5 27.5 27.5 27.5 23.8 23.8 4.1 98.1 23.8 23.8 23.8 23.8 21.1 18.9 21.1 30.0 27.5 29.0 32.9 32.9 80.5 32.9 32.9 67.0 3.2 21.1 23.8 27.5 27.5 27.5 27.5 27.5 27.5 27.5 67.0 3.2 21.1 17.2 92.1 21.1 23.8 23.8 4.1 98.1 23.8 23.8 23.8 23.8 17.2 92.1 21.1 21.1 21.1 21.1 3.1 18.9 17.2 17.2 Q 18.9 Active 18.9 Node 17.2 17.2 17.2 41.5 23.8 Look-ahead = 1 17.2 57.4 S Q 6.9 Candidate Node 21.1 21.1 17.2 18.9 21.1 21.1 17.2 18.9 3.8 18.9 17.2 17.2 17.2 17.2 6.9 Active21.1 Nodes 21.1 UNIVERSITY OF SOUTHERN CALIFORNIA Variations of Single-path Approach Depends on Next Active node selection policy 1. Basic single-path approach 18 12 - Selects a candidate node having maximum information and higher than current active node 10 15 7 8 - Sensitive to local maxima 2. Improved single-path approach - Selects a candidate node having maximum information - Information of the selected node can be less than the current active node 12 10 13 9 14 11 7 8 10 Candidate node Active node UNIVERSITY OF SOUTHERN CALIFORNIA Comparisons -Query Success Rate (ideal and lossy link case,pc= 0.05) Ideal link case - analytical result Lossy link case - analytical result • Query success rate of the improved single-path approach drops drastically for lossy links while the multiple-path approach is quite resilient • ARQ may improve success rate of the improved single-path approach UNIVERSITY OF SOUTHERN CALIFORNIA Comparisons - Overhead Overhead of both approaches Energy saving of the multiple-path approach over improved single-path approach • Multiple-path approach creates extra paths due to probabilistic forwarding, so overhead increases • Single-path approach uses 1-hop look ahead at every step to decide on the forwarder • With the increase of malfunctioning nodes, the overhead of the single-path approach increases - The length of the path increases UNIVERSITY OF SOUTHERN CALIFORNIA Results – Path Quality (ideal link case) • Ratio of the average path length due to a routing approach over the shortest path length between a source and a sink • Multiple-path approach results shorter path which are close to the shortest path • With the increase of malfunctioning nodes, the path length of the single-path approach increases UNIVERSITY OF SOUTHERN CALIFORNIA Conclusions • Multiple-path approach causes less overhead when a source is < 20hops from sink - Multiple-path approach yields shorter paths - With increase of malfunctioning nodes, the query success rate of the multiple-path approach degrades gracefully - With lossy links - Query success rate of the single-path approach drops drastically - Multiple-path approach is quite resilient UNIVERSITY OF SOUTHERN CALIFORNIA Future work • Combine the benefits of both routing approaches in a hybrid routing approach • Develop more adaptive multiple-path approach to reduce the number of extra paths due to probabilistic forwarding • Implementation & evaluation in a test-bed - on-going 150 sensor node new test-bed at USC - continued work under the NSF-funded ACQUIRE project UNIVERSITY OF SOUTHERN CALIFORNIA ACQUIRE: ACtive QUery Forwarding In Sensor Networks Original team: Narayanan Sadagopan, Bhaskar Krishnamachari, Ahmed Helmy Current: Sundeep Pattem, Jabed Faruque, Rahul Orgaonkar, Yongjin Kim, Jung-Hyun Jun, Sapon Tanachaiwiwat, Shao-Cheng Wang Funding: NSF NETS NOSS, Intel (equipment) Department of Electrical Engineering USC Viterbi School of Engineering University of Southern California URL: http://ceng.usc.edu/~acquire UNIVERSITY OF SOUTHERN CALIFORNIA Develop a model of variation over time (or space) using measurements Use the model to predict data/readings. Only trigger updates or queries when data/readings deviate from predicted value. Depending on the data dynamics, we may be able to cache information collected earlier and answer queries without having to trigger new data collection. UNIVERSITY OF SOUTHERN CALIFORNIA ACtive QUery forwarding In sensoR nEtworks (ACQUIRE)* • A mechanism for answering one-shot, complex queries for replicated data in sensor nets: – One-shot (vs. continuous): answers are given based explicit queries about current readings. – Complex (vs. simple): the query can contain several sub-queries. E.g: (x OR y) AND z. – Replicated data: several sensors might have answer to a sub-query. • Example: Micro Climate Data Collection – – Different sensor modalities Give a location where (Temp > 80 degrees OR Humidity > 40%) AND Wind speed > 20 mph * N. Sadagopan, B. Krishnamachari, A. Helmy, “Active Query Forwarding In Sensor Networks (ACQUIRE)”, AdHoc Networks Journal - Elsevier, Jan 2005 [Earlier version in SNPA ‘03] UNIVERSITY OF SOUTHERN CALIFORNIA UNIVERSITY OF SOUTHERN CALIFORNIA Flooding Based Queries (Directed Diffusion) D 1 D 1 [QA, QC] [QA, QC] C 2 [QA, QC] x* 9 [QA, QC] 4 A C 2 [RA, RA, RC] [RC] x* [RA, RC, RC] [QA, QC] E 3 C 8 [QA, QC] [QA, QC] [QA, QC] B 7 [QA, QC] 9 4 C 6 5 8 A [RA, RC, RC] [RC] 10 [RA] [RA, RC] E 3 A [QA, QC] C B 7 C [RA] 10 6 A (a) Flooding of interest query from querier node (sink x*) [RC] C 5 A (b) Response to query Flooding: • Useful for long standing (continuous) queries • Replicated responses might make it very inefficient. C A UNIVERSITY OF SOUTHERN CALIFORNIA ACQUIRE D 1 B C 2 7 x* 8 A [QA, QC] [RA, RC] E 3 [RA, RC] [RA, RC] [QA, QC] LEGEND 4 Active Query A [QA, RC] C Complete Response Update Messages 9 10 C C 6 5 A (d) Sample trajectory of active query (solid) and response (dashed) in basic ACQUIRE (zero look-ahead) ACQUIRE • An active node “refreshes” data from its “neighborhood”. • The query is then forwarded to a node on the edge of the neighborhood UNIVERSITY OF SOUTHERN CALIFORNIA ACQUIRE • Key Features – In-network processing – Does not rely on geographic information or unicast routing protocol • Existence of these may considerably improve performance – d helps us span the space from random walk (d = 0) to flooding (d = D, the network diameter) UNIVERSITY OF SOUTHERN CALIFORNIA ACQUIRE • Look-ahead parameter, d – Determines the size of the “neighborhood” in hops. – Effects a tradeoff between the number of steps taken to resolve the query and the energy consumed. – Optimal look-ahead, d* • Depends on the query rate, refresh rate and the data dynamics (captured by the amortization factor, c) • May be achieved by localized schemes. • The higher the query rates & lower the data dynamics, the higher the optimal look ahead. UNIVERSITY OF SOUTHERN CALIFORNIA Performance of ACQUIRE Average Energy per Query 4000 c=0.06 3500 3000 c=0.05 c=0.07 c=0.04 2500 2000 c=0.03 1500 c=0.02 1000 c=0.01 500 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Look-ahead Parameter (d) [N=1000, M=200] C is the refresh/query ratio (e.g., 0.01 means refresh once every 100 queries) [the refresh overhead is amortized over the saving in queries] UNIVERSITY OF SOUTHERN CALIFORNIA ACQUIRE • Efficiency – 60-75% energy savings over Expanding Ring Search (analytical results) – Order of magnitude savings over flooding. • Future Work – Develop ACQUIRE in to a full fledged protocol that actively adapts the ‘d’ parameter for optimal performance – Evaluation over an experimental sensor network test bed. – ceng.usc.edu/~acquire UNIVERSITY OF SOUTHERN CALIFORNIA Correlations and Inserted Data • Main purpose of sensor networks: Collect Data • Sybil attacks may insert false data that affect operation of sensor networks: – Impersonating multiple IDs (at same/different times) – Outlier detection alone will not work • Approach: – Understand normal correlations between data – Detect outliers based on reference to normal behavior – Design protocol robust to massive amount of forged data UNIVERSITY OF SOUTHERN CALIFORNIA Single Attacker Scenario I Data: X from location (x,y) --Interesting events MobiQuitous 2005 5 UNIVERSITY OF SOUTHERN CALIFORNIA Single Attacker Scenario II Data: X’ from location (x,y) --Normal events MobiQuitous 2005 6 UNIVERSITY OF SOUTHERN CALIFORNIA Sybil Attack Scenario I Attackers (sybil nodes) Source Data: Wi from location (xi,yi) --Interesting events Source/forwarder Inactive node Aggregator Sink MobiQuitous 2005 UNIVERSITY OF SOUTHERN CALIFORNIA Attackers (sybil nodes) Sybil Attack Scenario II forwarder Source Data: Wi’ from location (xi,yi) --Normal events Inactive node Aggregator Sink MobiQuitous 2005 UNIVERSITY OF SOUTHERN CALIFORNIA Data Correlation (Great duck island) ID 111 T 111 1 P 1 116 H T P 122 H T P 126 H T P H 1 116 .74 .64 .74 1 1 1 122 .83 .42 .91 .84 .67 .80 1 1 1 126 .67 .41 .56 .55 .50 .64 .70 .55 .77 T: Temperature, P: Pressure, H: Humidity ID: Sensor ID (only 4 neighboring sensors are shown) 1 1 1 UNIVERSITY OF SOUTHERN CALIFORNIA Anomaly Relationship Test (ART) Architecture Statistical Analysis Module Correlationcoefficient analysis T*-test (Outlier threshold) Authentication Module Distributed Interactive Proof S. Tanachaiwiwat, A. Helmy, MobiQuitous 2005 UNIVERSITY OF SOUTHERN CALIFORNIA Anomaly Relationship Test (ART) Protocol Prover (attacker) Perform at verifiers only! (1)Correlation/T*test (2)Request valid credential source (4) Send report to sink (3)Response with valid/invalid/no response Compromised /Failed Sybil (5) Cross verify Verifier (aggregator) Verifier (forwarder) MobiQuitous 2005 sink 9 UNIVERSITY OF SOUTHERN CALIFORNIA Summary • Dynamic sliding window Correlation analysis and T*Test can alleviate the attack effectively even under full scale attack from sybil nodes. • Remarks – Recognition of normal/abnormal/malicious events based on statistical analysis – Malicious data insertion can cause the problem to critical mission in WSN – Error is reduced by using Dynamic Sliding Window and careful choice of correlation threshold MobiQuitous 2005 22 UNIVERSITY OF SOUTHERN CALIFORNIA WLANs as Sensor Networks Total Population: ~ 25,000 students Wireless Users: ~6000 students Access Points: ~400 UNIVERSITY OF SOUTHERN CALIFORNIA IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis* • Classes of future sensor networks will be attached to humans • What kinds of correlations exist between users? • Analyze measurements of wireless networks – Understand Wireless Users Behavior (individual and group) – Develop models to understand associations and friendship • Study of relationships and user behavior based on measurements of various University WLANs * W. Hsu, A. Helmy, “IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis”, USC TR, July ‘05 (Under Submission) UNIVERSITY OF SOUTHERN CALIFORNIA Statistics of Studied Traces - Four major campuses - Month long traces studied - Total users in the study: over 12,000 users - Total Access Points in the study: over 1,300 UNIVERSITY OF SOUTHERN CALIFORNIA Observations: On-line Time On-off behavior is very common for wireless users. This seems especially true for small handheld devices. There are clear categories of heavy and light users, the distribution of which is skewed and heavily depends on the campus. UNIVERSITY OF SOUTHERN CALIFORNIA Observations: Visited Access Points (APs) [percentage of visited APs] •Individual users access only a very small portion of APs in the network, less than 35% in all campuses. The long-term mobility of users is highly skewed in terms of time associated with each AP. On average a user spends more than 95% of time at its top five most visited APs. UNIVERSITY OF SOUTHERN CALIFORNIA Observations: Visited APs •The majority of users experience low mobility while using the network. This is even true for portable devices such as PDAs. The actual handoff statistics depend heavily on the environment. UNIVERSITY OF SOUTHERN CALIFORNIA Observations: Similarity Index •We observe clear repetitive patterns of association in wireless network users. Typically, user association patterns show the strongest repetitive pattern at time gap of one day/one week. UNIVERSITY OF SOUTHERN CALIFORNIA Observations: Encounters •In all the traces, the MNs encounter a relatively small fraction of the user population; below 40% in most cases and never reaching above 60% in any case. Except for UCSD trace, on average a MN only encounters 1.88%-5.94% of the whole population. The number of total encounters for the users follows a BiPareto distribution, the parameters of which depends on the campus. UNIVERSITY OF SOUTHERN CALIFORNIA Encounter-graphs • Definition – When 2 nodes access the same AP at the same time we call this an ‘encounter’ – The encounter graph has all the mobile nodes as vertices and its edges link all those vertices that encounter each other UNIVERSITY OF SOUTHERN CALIFORNIA Small World Graph: Low path length, High clustering Regular Graph - High path length - High clustering 1 Random Graph - Low path length, - Low clustering 0.8 0.6 0.4 0.2 Clustering Path Length 0 0.0001 0.001 0.01 0.1 1 probability of re-wiring (p) - In Small Worlds, a few short cuts contract the diameter (i.e., path length) of a regular graph to resemble diameter of a random graph without affecting the graph structure (i.e., clustering) UNIVERSITY OF SOUTHERN CALIFORNIA Encounter-graphs and Friendship • Encounters link most of the MNs together in a connected graph: – Albeit each MN encounters only with small portion of the population. – The encounter graph is a SmallWorld graph – Even for short time period (1 day) its clustering coefficent, average path length, and connectivity are all close to those for longer traces. • Friendship between MNs is highly asymmetric. – The distribution for the friendship index is exponential for all the traces, regardless of the friendship definition (based on time, encouner, or location). – Among all node pairs there are less than 5% with friendship index larger than 0.01, and less than 1% with friendship index larger than 0.4. UNIVERSITY OF SOUTHERN CALIFORNIA UNIVERSITY OF SOUTHERN CALIFORNIA Encounter-graphs using Friends •Top-ranked friends tend to form cliques and low-ranked friends are the key to provide random links and reduce the degree of separation in encounter graph. UNIVERSITY OF SOUTHERN CALIFORNIA Encounter-based Information Diffusion •Encounters patterns are rich enough to support information diffusion. Specifically, information can be delivered to more than 94% of users within two days. The reachability and average delay do not decrease significantly until at least ~40% of nodes are selfish. UNIVERSITY OF SOUTHERN CALIFORNIA Vision: Building Community-wide Wireless/Mobility Library • Library of measurements from WLANs, mobility and associations from potential wireless societies (e.g., universities, vehicular nets) • Library of realistic models of user behavior (e.g., mobility, traffic, friendship, encounter models, … ) • Library of benchmarks and guidelines for simulation and evaluation • How much insight can we get by analyzing the traces? • Can we use the insight to ‘design’ protocols of the future (not only for evaluation)? • Currently 20 major universities willing to share their traces • …. more to come: http://nile.usc.edu/MobiLib (under heavy update) • If you have traces: helmy@usc.edu ! UNIVERSITY OF SOUTHERN CALIFORNIA Issues • How can we model correlations accurately? • How can we further utilize correlations? • Context-aware protocols: – Phenomenon-aware protocols – Socially-aware protocols • Other kinds of correlations: – Sensor Networks Test-beds: correlation between radio connectivity and phenomenon (e.g., rain) – … UNIVERSITY OF SOUTHERN CALIFORNIA Thank You ! • Related Links – ACQUIRE: ceng.usc.edu/~acquire – Mobility Library: nile.usc.edu/MobiLib – Lab: nile.usc.edu – Homepage: ceng.usc.edu/~helmy