Node Clustering in Wireless Sensor Networks by Considering Structural Characteristics of the Network Graph Nikos Dimokas1 Dimitrios Katsaros1,2 Yannis Manolopoulos1 1Informatics Dept., Aristotle University, Thessaloniki, Greece 2Computer & Comm. Engineering Dept., University of Thessaly, Volos, Greece 4th ITNG Conference, Las Vegas, NV, 2-4/April/2007 1 Wireless Sensor Network (WSN) Wireless Sensor Networks features • Homogeneous devices • Stationary nodes • Dispersed Network • Large Network size • Self-organized • All nodes acts as routers • No wired infrastructure • Potential multihop routes 2 Communication in WSN • Communication between two unconnected nodes is achieved through intermediate nodes. • Every node that falls inside the communication range r of a node u, is considered reachable. 3 WSN - Applications • Applications • Habitat monitoring • Disaster relief • Target tracking • Many of these applications require simple and/or aggregate function to be reported. Clustering allows aggregation and limits data transmissions. 4 What is Clustering Cluster member Clusterhead Gateway node Intra-Cluster link Cross-cluster link • Nodes divided in virtual group according to some rules • Nodes belonging in a group can execute different functions from other nodes. 5 Clustering in WSN • Involves grouping nodes into clusters and electing a CH • Members of a cluster can communicate with their CH directly • CH can forward the aggregated data to the central base station through other CHs • Clustering Objectives • • • • Allows aggregation Limits data transmission Facilitate the reusability of the resources CHs and gateway nodes can form a virtual backbone for intercluster routing • Cluster structure gives the impression of a smaller and more stable network • Improve network lifetime • Reduce network traffic and the contention for the channel • Data aggregation and updates take place in CHs 6 Relevant work – Clustering • Based on the construction of Dominating Set • Nodes belonging to the DS are carrying out all communication • Running out of energy very soon • Based on the residual energy of each node • Proposed ways to rotate the role of CH among nodes of clusters • Can be easily combined with the algorithms of the first family • Our proposal : the GESC protocol supports • dynamically estimation of CHs depending on the requester node, and thus improvement of network lifetime • a novel metric for characterizing node importance • localization • minimum number of messages exchanged among the nodes 7 Relevant work – Topology Control MST LMST Minimum Spanning Tree (MST) and Localized Minimum Spanning Tree (LMST): Calculated with Dijkstra’s algorithm and Li, Hou & Sha, respectively. sample graph w u u v Relative Neighborhood Graph (RNG): An edge uv is included in RNG iff it is not the longest edge in any triangle uvw. v uv not included uv included w w v u uv included u v Grabriel Graph (GG): An edge uv is included in GG iff the disk with diameter uv contains no other node inside it. uv not included Delaunay Triangulation (DT), Partial Delaunay Triangulation (PDT), Yao graph (YG), etc: A lot of other (variants of) geometric structures Topology Control: Choosing a set of links from the possible ones. Not exactly our problem. So graph-theoretic concepts, than geometric ones. 8 Minimal Dominating Set • A vertex set is DS (Dominating Set) • Any other vertex connected to one DS vertex • It is CDS, if it is connected • It is MCDS if its size is minimum among CDS • Discovery of the MCDS of a graph is in NP-complete DS CDS 9 Motivation for new clustering protocol • The protocol should: • be localized, and thus distributed • fully exploit the locally available information in making the best decisions • be computationally efficient • minimize the number of message exchange among the nodes • be energy efficient and thus extend network lifetime. This could be achieved with the use of different nodes for relaying messages • not make use of “variants”, e.g., node IDs, because a (locally) best decision might not be reached (even if it does exist) 10 Well-known CDS algorithm Wu and Li’s algorithm • Each node exchanges its neighborhood information with all of its one-hop neighbors • Any node with two unconnected neighbors becomes a dominator (red) • The set of all the red nodes form a CDS 11 Well-known CDS algorithm Wu and Li’s algorithm (Pruning Rules 1 & 2) Open neighbor set N(v) = {u | u is a neighbor of v} Closed neighbor set N[v] = N(v)U{v} v u v u A node v can be taken out from the CDS if there exists a node u such that N[v] is a subset of N[u] and the ID of v is smaller than the ID of u u v w A node u can be taken out from the CDS if u has two neighbors v and w such that N(u) is covered by N(v)UN(w) and its ID is the smallest of the other two nodes’ IDs 12 Heed protocol (1/2) • Every sensor node has multiple power levels. • Periodically selects CHs according to a hybrid of the node residual energy and node degree. • TCP is the clustering process duration and TNO is the network operation interval. • Clustering is activated every TCP + TNO seconds. • Initial number of CHs is Cprob. • The probability of a node to become a CH is CHprob. CH prob C prob Eresidual Emax • The probability of a node to become a CH is CHprob. 13 Heed protocol (2/2) • Intracluster – Intercluster communication • Intracluster communication is proportional to: • Node degree (load distribution) • 1 / node degree (dense clusters) • If variable power levels ara allowed for intracluster communication then select CHs using average minimum reachability power. M AMRP MinPwr i i 1 M 14 Leach protocol (1/2) • All nodes can transmit with enough power to reach the BS and the nodes use power control. • Cluster formation during set-up phase and data transfer during steady-state phase. • Each node elects itself as CH at the beginning of round r+1 with probability Pi(t). k is the number of clusters. N P t 1 k i 1 i • All nodes are CHs the same number of times. • All nodes have the same energy after N/k rounds. 15 Leach protocol (2/2) • Every node elects as CH the node that requires the least energy consumption for communication. • Every CH set-up a TDMA schedule and transmitted to the nodes. Every node could transmit data in the corresponding time-slot. Weakness • Limited scalability • Could be complementary to clustering techniques based on the construction of a DS 16 Weakness of current approaches • Some approaches can not detect all possible eliminations because ordering based on node ID prevents this. As a consequence they incur significantly excessive retransmissions • Others rely on a lot of “local” information, for instance knowledge of k-hop neighborhood (k > 2), e.g., [WD04,WL04] • Other methods are computationally expensive, incurring a cost of O(f2) or O(f3), where f is the maximum degree of a node of the ad hoc network, e.g., the methods reported in [WL01, WD03, DW04] and [SSZ02] • some methods (e.g., [QVLl00,SSZ02]) do not fully exploit the compiled information; for instance, the use of the degree of a node as its priority when deciding its possible inclusion in the dominating set might not result in the best local decision 17 Terminology and assumptions • WSN is abstracted as a graph G(V,E) • An edge e=(u,v) exists if and only if u is in the transmission range of v and vice versa. All links in the graph are bidirectional. • The network is assumed to be connected • N1(v) : the set of one hop neighbours of v • N2(v) : the set of two hop neighbours of v • N12(v) : combined set of N1(v) and N2(v) • LNv : is the induced subgraph of G associated with vertices in N12(v) • dG(v,u) : distance between v and u 18 A new measure of node importance • Let σuw=σwu denote the number of shortest paths from u V to w V (by definition, σuu=0). • Let σuw(v) denote the number of shortest paths from u to w that some vertex v V lies on. • We define the node importance index NI(v) of a vertex v as: • Large values for the NI index of a node v indicate that this node can reach others on relatively short paths, or that v lies on considerable fractions of shortest paths connecting others. In the former case, it captures the fact of a possibly large degree of node v, and in the latter case, it captures the fact that v might have one (some) “isolated” neighbors 19 The NI index in sample graphs In parenthesis, the NI index of the respective node; i.e., 7(156): node with ID 7 has NI equal to 156. Nodes with large NI: Articulation nodes (in bridges), e.g., 3, 4, 7, 16, 18 With large fanout, e.g., 14, 8, U Therefore: geodesic nodes 20 The NI index in a localized algorithm • For any node v, the NI indexes of the nodes in N12(v) calculated only for the subgraph of the 2-hop (in general, k-hop) neighborhood reveal the relative importance of the nodes in covering N12 • For a node u (of the 2-hop neighbourhood of a node v), the NI index of u will be denoted as NIv(u) 21 NI computation • At a first glance, NI computation seems expensive, i.e., O(m*n2) operations in total for a 2-hop neighbourhood, which consists of n nodes and m links: • calculating the shortest path between a particular pair of vertices (assume for the moment that there exists only one) can be done using bfs in O(m) time, and there exist O(n2) vertex pairs • Fortunately, we can do better than this by making some smart observations. The improved algorithm (CalculateNodeImportanceIndex) is quite complicated and beyond the scope of this presentation • THEOREM. The complexity of the algorithm CalculateNodeImportanceIndex is O(n*m) for a graph with n vertices and m edges 22 Pseudocode for CalculateNodeImportanceIndex (1/2) 23 Pseudocode for CalculateNodeImportanceIndex (2/2) 24 Evaluation setting (1/2) • We compare GESC to: • WL 1+2, improved scheme incorporating the rules indicated • MPR, the MultiPoint Relaying method described in [QVL00] • SSZ, reported in [SSZ02], which was selected as a Fast Breaking Paper for October 2003 • Implementation of protocols using J-Sim simulation library • Sensor network topologies with 100, 300, 500 nodes. • Each topology consists of square grid units • Each sensor node is uniformly distributed between the point (0,0) and (100,100) • Two sensor nodes are neighbors if they are placed in the same or adjacent grid units. 25 Evaluation setting (2/2) • Varying levels of node degree from 4 to 10 • Run each protocol at least 100 times for each different node degree. Each time a different node is selected to start broadcasting • Performance metric • Energy dissipation • Broadcast messages • Latency 26 Impact of the #nodes (1/2) 27 Impact of the #nodes (2/2) 28 Impact of the average node degree 29 Impact of energy consumption 30 Conclusions and Future Work • Defined and investigated a novel distributed clustering protocol for WSN based on a novel localized metric • The calculation of this metric is very efficient, linear in the number of nodes and linear in the number of links • Proved that it is very efficient in terms of communication cost and in terms of prolonging network lifetime • The protocol is able to reap significant performance gains, reducing the number of rebroadcasting nodes • Simulated an environment to evaluate the performance of the protocol and competitive protocols using J-Sim simulator • Comparison with protocols based on residual energy (LEACH,HEED) • GESC – GEodegic Sensor Clustering has been proven to prevail 31